Starbase: Frequently Asked Questions
Introducing Starbase
Q.1. What is Starbase?
Q.2. Why might I be interested in it?
Q.3. Where can Starbase be obtained?
Q.4. How much does it cost?
Getting Started with Starbase
Q.5. What do I need to know to get started?
Q.6. What is the format of a table file?
Q.7. What are the minimum commands I need to know to get
started?
Q.8. What are operators that can be used with the
row command?
Q.9. What is a regular expression? How do I use it?
Q.10. Are there any tutorials on Starbase and databases
available?
Troubleshooting
Q.11. How do I find out about reported bugs in
starbase programs?
Q.12. How can I tell the difference between
spaces and TAB characters in my tables?
Q.13. Emacs keeps screwing up the TABs in my table!
How can I make it behave?
Q.14. Whenever I try to use the column
command,
I get a "No such file or directory"
error.
Introducing Starbase
Q.1. What is Starbase?
Starbase is a simple relational database system developed by John Roll
that is specially suited for managing tables of astronomical data. It
is made up of collection of UNIX programs that make use of standard
UNIX features and tools. The basic table manipulating features are
similar to the /rdb system (but does not require it). Extra support
for astronomical applications were implemented at the Smithsonian
Astrophysical Observatory. The package also contains tools from
the Star Link Project.
Q.2. Why might I be interested in it?
Starbase's simple design makes it particularly well-suited for
manipulating "small" tables (< 100,000 rows?) containing astronomical
data. With a little preparation in advance Starbase can handle very
large tables (hundreds of millions of rows). Here are some reasons why:
- The tables are plain ASCII with TABS delimiting the columns.
This means you can:
- view them with cat, more, less, etc.
- edit them using your favorite editor
- integrate them easily into your own programs and
scripts.
- Starbase understands sexadecimal format used for things like
right ascension and declination.
- Starbase provides funtions useful to astronomers
such as coordinate precession, FK4-FK5 conversions, and
searches by distance from a sky position.
- Starbase requires no special shell so it can interact with
standard UNIX tools, such as grep, awk, and sed.
Q.3. Where can Starbase be obtained?
Starbase can be downloaded from the starbase home page
home page
Q.4. How much does it cost?
Nothing--it's free.
Getting Started With Starbase
Q.5. What do I need to know to get started?
Check Q.6 for a quick overview of table format and
then Q.7 for an introduction to the most basic
commands.
Starbase comes with HTML documentation. A good place to start is the main
Starbase page (the starbase man page or starbase). This lists and briefly describes the
programs that make up Starbase. The extensions that have been added
at SAO are covered in the tawk extensions page.
A Starbase table is a plain ASCII file with the following components:
- the first line is taken as the title of the table.
- a section of comments made up of free-format text. (Note that no
comment character is needed.)
- a two-line table header. The first line gives the names of the
columns, seperated by single TAB characters. The second line
is a set of dashed lines, one set for each column, seperated by
TAB characters. The dashed line signals the start of the table
data.
- rows of table data. Each line is a table record where the
values for each column are seperated by single TAB characters.
Here's an example:
Table1
This table is named Table1
This is a text comment in the header of a table. This portion may
ramble on and on but should not contain any line with ONLY dash (-)
and tab characters. The dash line is the indication that the data
table is about to begin.
RA Dec
-- ---
0:0:0 0:0:0
12:00 -30.0
15 60:00:30.4
If you use an editor to create a table, it's a good idea to look it
over to make sure that there is only one TAB character between each
column in a row. To see where the TABS are, run cat -tv table |
more
, where table is the name of the file containing the
table (see also Q.12).
See the format man page for more
examples of tables.
Refer to the starbase man page for a
short summary of all the starbase commands. For starters, though,
here's a quick explanation of the three commands you might use the
most.
Before you do anything interesting with a table, you often want to
just look at it. A good way to do this is with the
jusitfy command, which will neaten up a table so that all the
columns are aligned. Try,
justify < in.tbl | more
where in.tbl
is the input table. Or,
justify -i in.tbl | more
Tables are usually read into a starbase command either with the -i
option or with the UNIX input redirection operator, <. The output
is usually either redirected to a file (using >) or piped into
another command like more
or another starbase command.
The most common fuction of a database is to allow one to extract
records that match certain conditions. With Starbase, this is
accomplished with the row command. If the
table looks like the one given in Q.6 and is called
in.tbl
, one can extract all rows where the RA is greater
or equal to 12 hours with:
row 'RA>=12' < in.tbl > out.tbl
Or,
row 'RA>11:59' < in.tbl | justify | more
See Q.8 for a summary of all the operators one can
use to select rows of data. Check also the
file format man page to see how one can define special table
variables that can be used by the row command.
Often one is only interested in seeing certain columns from a table
(especially if the rows are very long). Using the example table from
Q.6, in.tbl
, one can extract just the
RA and Dec columns with:
row 'RA>11:59' < in.tbl | column RA Dec > out.tbl
Or,
column RA Dec < in.tbl | row 'RA>11:59' > out.tbl
Refer to question Q.14 if you get an error like
column: RA: No such file or directory.
Other useful commands
The header and
headeroff commands allow you to extract or remove the header
portion from a table. compute allows you
to calculate new values for a column from other values in the row.
jointable is used to combine rows
from two tables that have matching values in a column.
For a complete list, see the starbase man
page.
The row and compute commands makes use of the operators
supported by the UNIX awk command. Here's a list of those used for
comparing two values:
Operation | operator | example |
Equal | == | x == 40, s == "string" |
Greater than | > |
x > 40, s > "string" |
Greater than or equal | >= |
x >= 40 |
Less than | < | x < 40
|
Less than or equal | <= |
x <= 40, s <= "string" |
string contains | ~ |
s ~ /string/ |
string does not contain | !~ |
s !~ /string/ |
Logical AND | && |
x > 5 && y < 40 |
Logical OR | && |
s ~ /big/ || s ~ /large/ |
Logical NOT | ! | ! x |
Logical grouping | ( ) |
(x > 5 && x < 11) || (x == 0 && y == 1) |
A regular expression is a way of representing a pattern of characters.
Such patterns are used to search for substrings in a string of
characters or to see if a string matches a particular pattern. In
Starbase, one would use a regular expression with the row command to
search for records in which the text in a column matches a certain
pattern. For example, in
row 'type ~ /gala/' < in.tbl > out.tbl
"gala" is the regular expression. This command returns all records in
which the string in the "type" column contains the substring "gala".
This includes galaxy, galaxies, and extragalactic.
The advantage of regular expressions is their ability to describe
patterns symbolically via metacharacters. Here's some of the
of the metacharacters you might use:
. | | matches any single character |
* | | matches zero or more of the previous character
|
+ | | matches one or more of the previous character
|
? | | matches zero or one of the previous character
|
^ | | when at the beginning of a regular expression,
matches the beginning of a value
|
$ | | when at the end of a regular expression,
matches the end of a value
|
[...] | | matches any one of the characters between the
brackets |
[^...] | | matches any one character not among those
between the brackets |
\ | | escapes the special meaning of the next
character, e.g. \. means a real period.
|
Here are some sample uses:
type ~ /proto.*env/ | |
matches when type contains a substring beginning with "proto" and
ending with "env", including "protostellar environment" and
"protogalactic environment". |
type ~ /[gG]ala/ | |
matches "gala" or "Gala" appearing anywhere in type |
title ~ /^Study/ | |
matches when title begins with the "Study" |
type ~ /^[gG]alaxy$/ | |
type is restricted to being equal to either "galaxy" or
"Galaxy" |
Note that some metacharacters are also interpreted by the UNIX shell.
That's why it is a good idea to enclose search clauses in single
quotes when using the row
command.
Because of Starbase's similarity to the commercial /rdb package, the
book UNIX Relational Database Management by Manis, Schaffer, and
Jorgensen (Prentice Hall, ~$70) is often recommended. Chapters 1, 2,
3, 4, and 6 will bring an astronomer new to relational databases up to
speed.
Troubleshooting
Q.11. How do I find out about reported bugs in starbase programs?
The most up-to-date list of bugs can be accessed from
http://cfa-www.harvard.edu/~john/starbase/BUGS. The Starbase
distribution also comes with a list of bugs in a file called
BUGS.
The most important thing to keep in mind when manipulating Starbase
tables is to make sure that one and only one TAB character appears
between each column. Normally, one only needs to worry about this
when editing tables by hand. It's often easy to accidently insert
spaces where a TAB should, or to insert multiple TABs where only one
should go. (The Emacs editor by default will often replace spaces
with TABs without your knowing it--see Q.13 for
more details.)
Thus, if you've been editing a Starbase table by hand, you may wish to
check it over to ensure that the TABs are where they belong. To do
this you can use the UNIX cat
command by specifying the
"-vt" options:
cat -vt in.tbl | more
TAB characters in the table will appear as "^I". You might find the
output a bit of a jumble. If so, you might try sending the table
through the justify command first:
justify -i in.tbl | cat -vt | more
Q.13. Emacs keeps screwing up the TABs in my table! How can I
make it behave?
If you use a recent version of the Emacs editor to edit a Starbase
table, you may find that it sometimes corrupts the TAB structure of
the table. This is usually due to a default feature of the Emacs Text
Mode: when you hit the TAB key, not only will Emacs insert a TAB
character, it will also replace as many of the preceding spaces with
TAB characters as possible while still keeping the same alignment.
To make your Emacs compatible with Starbase, you need tell Emacs to
only insert a single TAB when you hit the TAB key. This can be
accomplished by placing the following into your .emacs file (normally
in your home directory):
(defun my-text-mode-hook ()
(define-key text-mode-map "\t" 'self-insert-command))
(add-hook 'text-mode-hook 'my-text-mode-hook)
Q.14. Whenever I try to use the column
command, I get a
"No such file or directory" error.
If when you issue the command:
column RA < in.tbl > out.tbl
you get the error message column: RA: No such file or
directory, you may need to adjust your command search path. Some
UNIX systems come with a another command called "column" (e.g. Linux
has /usr/bin/column) not related to Starbase. To make sure the
Starbase version gets called, place the directory containing the
Starbase commands closer to the beginning of your search path. For
example, if the starbase commands are located in
/usr/local/starbase/bin, you can (if you are using the C-shell) type:
set path = (/usr/local/starbase/bin $path)