SAO Instrument Data Storage over the Internet

An MMTO-to-Cambridge data automatic transfer system is available for use at the MMTO, for instruments using the packrat computer system.

Transfer from MMTO to Cambridge

Moving data to its archive location

Generating a transfer activity summary

The user interface model is a line printer queue. The user (or, in this case, the data system) uses the lp command to ``print'' a file to a remote ``printer'', which actually ftp's it to a disk in Cambridge, MA. With each file is also sent a control file giving useful information such as the ckecksum, the original file path, the original file timestamp, and any options supplied using -o <options> with the lp command.

Various options affect the transmission and eventual archiving of the data file.

The queue name is settable by the sysadmin, but we're using archive currently. In general, the form of the command is:

lp -d archive [-o <keyword=value>] filename

as with any printer command. The lpstat -o archive command will show any files queued but not yet transmitted.

At the other end of the link, here called the ``archive site'', one or more ``lpsweeper'' processes copy the files from the ftp incoming directory to their instrument-specific destination.

Queue options

The general form for options is keyword=value, where multiple options can be added as separate -o arguments (e.g.: -o inst=minicam -o maxftp=0) or a comma-separated list (i.e. no whitespace) to a single -o flag (e.g.: -o inst=minicam,maxftp=60).

All the options except those below beginning with "rem" (i.e., remhost, etc.) are put into the control file sent to the archive site for possible use by the sweeper programs.

Keywords

instrument or inst     Default: inst=undef
Sets the instrument name, e.g. inst=minicam. This selects which archive will receive the data back in Cambridge. The default is settable by the sysadmin.

maxftp     Default: maxftp=600
Sometimes, due to network problems, the ftp session simply hangs with no exit status. To prevent indefinite timeouts, the interface script will allow the ftp session to last this number of seconds (per file) before deciding to abort and issue an error. Setting maxftp=0 is equivalent to an infinite amount of time.

dir     Default: <none>
This is passed on to the remote side as a hint about which directory this file should be archived to. See the lpsweeper documentation.

remhost     Default: remhost=tdc-ftp.harvard.edu
Sets the host to transmit the data to. Should not be changed under normal circumstances.

rempath     Default: rempath=/incoming
Sets the directory to put the data into. Should not be changed under normal circumstances.

remuser     Default: remuser=tdcin
Sets the remote user ftp account. Should not be changed under normal circumstances.

rempw     Default: rempw=********
Sets the password of the remote ftp account. Actually, this is a restricted ftp guest account, and is pretty well protected, so announcing the password ought not to be a problem. Should not be changed under normal circumstances.

Back to top


Moving data to its archive location - LPSWEEPER

Once the data arrives on the disk in Cambridge, it must be moved to its destination. This is accomplished by a program called lpsweeper, located in /data/oir/bin.

lpsweeper, instructed by command line arguments, takes all files of a given instrument, checks their validity, then moves them to a destination directory, timestamping it to agree with the creation time of the source file at the instrument end. There is also an option to sweep out all files that are not of a recognized instrument, moving these to a miscellaneous directory. This keeps the ftp incoming directory and partition free to accept more data.

The expected use of lpsweeper is to be run from a crontab set up by each instrument, at reasonable intervals such as 10 minutes.

Options

-i  instrument     Default: <none>
Tells lpsweeper to look for files from the instrument named instrument. There is no default for this argument; lpsweeper will abort if an instrument name is not supplied.

-d  instdir     Default: <none>
Tells lpsweeper to put files from the instrument into the directory instdir, or a dated directory under it. There is no default for this argument; lpsweeper will abort if an instrument directory is not supplied.

-L  logdir     Default: instdir/xferlogs
Use this path as the directory for writing log files. A new logfile for each day is generated, unless an explicit logfile is given. If this path does not start from root, the current working directory is prepended to it. The default log directory is instdir/xferlogs.

-l  logfile     Default: YYYYMMDD.tdcin.log
Use this file to hold the logging information instead of making one up based on the current date. If this is a path from root, the logdir option is ignored; otherwise the full path is logdir/logfile. The default logfile name is of the form YYYYMMDD.tdcin.log, where the date part is the UTC time of the lpsweeper execution.

-f  input-directory     Default: /data/mc4/tdc-ftp/incoming
Directory to scan for input files. Default is /data/mc4/tdc-ftp/incoming. This would normally only be changed to sweep files out of a ``miscellaneous'' directory (see the -A option, below).

-o  options
This supplies options of the form optname=value for processing exactly as if supplied by the original lp command on the instrument computer. Not all options are processed, however. Those that are:

dir=dirname
Supplies a directory under which to store the files instead of a directory derived from the original path or the queuing time.

-D
Store files under a dated subdirectory (see Operation). This is the default.

+D
Do not store files under a dated subdirectory (see Operation).

-A
Sweep ``ANY'' files not known to be instrument files. With this option, there is a default destination directory, currently /data/mc4/wyatt/lparchive. In this case only, both the control file and the data file are copied unchanged to the destination, so that it will be possible to later run lpsweeper on the destination to move the files to their correct destination.

This option should only be used by one agreed-upon user, or unaffiliated files may get lost as they would wind up in different miscellaneous directories depending on which sweep ran at the time the file was in the incoming directory. Currently, I (Bill Wyatt) am running lpsweeper -A once per day at 1802 hours ET, and putting the unaffiliated files into dated directories under:

/data/oirperm/wyatt/lparchive/ANY

-n ignore-list-additions
Add one or more names to a list of instruments to be ignored by the -A mode, above. The list can be comma-separated or by whitespace if enclosed in quotes. See ignore-list for the default list.

+n ignore-list
This options replaces the default option list with that on the command line. The current default list of instruments to ignore is:

minicam megacam hectospec hectochelle fast ircam afoe 4shooter
Dangerous. Maybe too much so to be useful.

Examples

Example 1, run as user mcam:
lpsweeper -i minicam -d /data/mc4/mcam/archive/rawdata/data/ARCHMINICAM
This would be a typical command an instrument crontab could execute every 10 or 15 minutes to move its files out of the FTP directory. If, for example, the files were originally in directory /h/ftphome/ARCHMINICAM/2002.0523 at the MMTO, then they wind up in Cambridge under /data/mc4/mcam/archive/rawdata/data/ARCHMINICAM/2002.0523.

Example 2, run as user wyatt:

lpsweeper -A -n wyatt -d /data/mc4/wyatt/lparchive

Sweep all files, including control files, and excepting the defaults and those identified as ``wyatt'', into /data/mc4/wyatt/lparchive/YYYY.MMDD. The transfer log will be /data/mc4/wyatt/misc/xferlogs.

Example 3, run as user wyatt:

lpsweeper -i wyatt +D -d $HOME/lparchive -l $HOME/lplogs

Sweep any files for ``wyatt'', putting them without dated subdirectories, in /home/wyatt/lparchive and the transfer log data into file /home/wyatt/lplogs.

Back to top


Generating a transfer activity summary - LPLOGSUM

The lplogsum program generates a summary of transfer activity for one or all instruments over an arbitrary period of time.

Output example from the command:

lplogsum -i minicam -B 2003-04-01 -D 30 -n -z MST
        Internet Data Storage Report

Period covered: 2003-04-01 12:00:00 MST  to
                2003-05-01 12:00:00 MST  or 30 days, 0 hours

Instrument: minicam      689 files       0 errors

FTP performance for 689 files:
  Files sizes:  Min =   5497920 bytes
                Max =  41670700

  Median rate:  0.92 MB/s   Median Time:  13 seconds
     Min        0.47           Min         5
     Max        2.20           Max        75

   356  at  0.0 < rate <= 1.0 MB/sec
   320  at  1.0 < rate <= 2.0
    13  at  2.0 < rate <= 4.0
     0  at  4.0 < rate <= 8.0
     0  at  8.0 < rate


Options

-i  instrument     Default: <none>
Tells lplogsum to look for logs from the instrument named instrument. There is no default for this argument; lplogsum will abort if an instrument name is not supplied.

-l  sweeper-logdir     Default: <see text>
Tells lplogsum to scan for lpsweeper transfer logs from the directory sweeper-logdir. If the instrument name above is one of spec, chelle, mega, or minicam, this defaults to the approprirate value for that instrument, otherwise, it must be supplied.

-f  ftp-logfile     Default: /data/tdc-ftp-log/xferlog
Tells lplogsum to scan the file ftp-logfile for data on transfer times and rates.

-B  yyyy-mm-dd     Default: current date
Gives the base day for the starting search time.

-b  hours     Default: <none>
Gives the offset in hours from midnight for the base day to start the search. For a search to start at midnight instead of the current time, an hours value of zero must be given.

-n  or  -noon
Sets the search start time in the base day to 12:00 noon. This is a identical to -b 12.

-D  days     Default: -1 day
Gives the offset in days from the base day to the end of the search time. Note that if this is negative, the search is into the past relative to the base day. If both the -h and -D option are given together, the last one on the command line is used.

-h  hours     Default: -24 hours
Gives the offset in hours from the base day to the end of the search time. Note that if this is negative, the search is into the past relative to the base day. If both the -h and -D option are given together, the last one on the command line is used.

-z  timezone     Default: local time zone, e.g. EST5EDT
This sets the search times and offsets to be in the specified time zone.

-v  or  +v   Default: 1
Controls the verbosity of the output. Consistent with the non-intuitive Unix conventions, -v increases the level to 2, while +v sets it to zero.

v = 0  ( +v )
Shows the number of files and reported errors for the search period. Nothing is printed unless data has been transferred during the search interval.

v = 1  (default)
Shows the above plus information on the file sizes and transfer rates during the search period. The summary shows nothing if no data has been transferred during the search interval.

v = 2  ( -v )
Shows the above, but even if no data have been transferred, at least the search period is printed.

Examples

  1. Check the Megacam transfers in the last 24 hours:
    lplogsum -i mega
  2. Check the Hectospec transfers in the last 48 hours.
    lplogsum -i spec -h -48

  3. Check the Hectospec transfers in the last 24 hours from MST noon, when new data files go into the new day's directory:
    lplogsum -i spec -n -z MST
            or
    lplogsum -i spec -nz MST

    Note: if it is not later than 12:00 in the MST timezone, you are covering a time period including the future!

  4. Check the October, 2003 Hectochelle transfers starting at midnight, and note that the spaces after the option letters are not needed:
    lplogsum -ichelle -B2003-10-01 -D31 -b0 -zMST

Back to top