The user interface model is a line printer queue. The user (or, in
this case, the data
system) uses the lp command to ``print'' a file to a remote
``printer'', which actually ftp's it to a disk in Cambridge, MA.
With each file is also sent a control file giving useful information
such as the ckecksum, the original file path, the original file timestamp,
and any options supplied using -o <options> with the
lp command.
Various options
affect the transmission and eventual archiving of the data file.
The queue name is settable by the sysadmin, but we're using archive
currently. In general, the form of the command is:
lp -d archive [-o <keyword=value>] filename
as with any printer command. The lpstat -o archive command will show
any files queued but not yet transmitted.
At the other end of the link, here called the ``archive site'', one or more
``lpsweeper'' processes copy the files
from the ftp incoming directory to their
instrument-specific destination.
Queue options
The general form for options is keyword=value, where multiple
options can be added as separate -o arguments
(e.g.: -o inst=minicam -o maxftp=0)
or a comma-separated
list (i.e. no whitespace) to a single -o flag
(e.g.: -o inst=minicam,maxftp=60).
All the options except those below beginning with "rem" (i.e.,
remhost, etc.) are put into the control file sent to the
archive site for possible use by the sweeper programs.
Keywords
- instrument or inst
Default: inst=undef
-
Sets the instrument name, e.g. inst=minicam. This selects which
archive will receive the data back in Cambridge. The default is settable by
the sysadmin.
- maxftp
Default: maxftp=600
-
Sometimes, due to network problems, the ftp session simply hangs with no
exit status. To prevent indefinite timeouts, the interface script will
allow the ftp session to last this number of seconds (per file) before
deciding to abort and issue an error. Setting maxftp=0 is equivalent
to an infinite amount of time.
- dir
Default: <none>
-
This is passed on to the remote side as a hint about which directory this
file should be archived to. See the lpsweeper
documentation.
- remhost
Default: remhost=tdc-ftp.harvard.edu
- Sets the host to transmit the data to. Should not be changed under normal
circumstances.
- rempath
Default: rempath=/incoming
- Sets the directory to put the data into. Should not be changed under normal
circumstances.
- remuser
Default: remuser=tdcin
- Sets the remote user ftp account. Should not be changed under normal
circumstances.
- rempw
Default: rempw=********
- Sets the password of the remote ftp account. Actually, this is a
restricted ftp guest account, and is pretty well protected, so announcing the
password ought not to be a problem.
Should not be changed under normal circumstances.
Back to top
Moving data to its archive location - LPSWEEPER
Once the data arrives on the disk in Cambridge, it must be moved to its
destination. This is accomplished by a program called
lpsweeper, located in /data/oir/bin.
lpsweeper, instructed by command line arguments, takes all
files of a given instrument, checks their validity, then moves them to a
destination directory, timestamping it to agree with the creation time of the
source file at the instrument end. There is also an option to sweep out all
files that are not of a recognized instrument, moving these to a
miscellaneous directory. This keeps the ftp incoming directory and partition
free to accept more data.
The expected use of lpsweeper is to be run from a crontab set
up by each instrument, at reasonable intervals such as 10 minutes.
Options
- -i instrument
Default: <none>
- Tells lpsweeper to look for files from the instrument named
instrument. There is no default for this argument;
lpsweeper will abort if an instrument name is not
supplied.
- -d instdir
Default: <none>
- Tells lpsweeper to put
files from the instrument into the directory instdir, or a dated
directory under it. There is no default for this argument;
lpsweeper will abort if an instrument directory is not
supplied.
- -L logdir
Default: instdir/xferlogs
- Use this path as the directory for writing log files. A new logfile
for each day is generated, unless an explicit logfile is given. If this
path does not start from root, the current working directory is prepended
to it. The default log directory is instdir/xferlogs.
- -l logfile
Default: YYYYMMDD.tdcin.log
- Use this file to hold the logging information instead of making one up
based on the current date. If this is a path from root, the logdir
option is ignored; otherwise the full path is logdir/logfile.
The default logfile name is of the form YYYYMMDD.tdcin.log,
where the date part is the UTC time of the lpsweeper
execution.
- -f input-directory
Default:
/data/mc4/tdc-ftp/incoming
- Directory to scan for input files.
Default is /data/mc4/tdc-ftp/incoming. This would
normally only be changed to sweep files out of a ``miscellaneous'' directory
(see the -A option, below).
- -o options
- This supplies options of the form optname=value for processing
exactly as if supplied by the original lp command on the
instrument computer. Not all options are processed, however. Those that are:
- dir=dirname
- Supplies a directory under which to store the files instead of a
directory derived from the original path or the queuing time.
- -D
- Store files under a dated subdirectory (see Operation). This is the
default.
- +D
- Do not store files under a dated subdirectory (see Operation).
- -A
- Sweep ``ANY'' files not known to be
instrument files. With this option, there is a default destination directory,
currently /data/mc4/wyatt/lparchive. In this case only,
both the control
file and the data file are copied unchanged to the destination, so that it
will be possible to later run lpsweeper on the destination to
move the files to their correct destination.
This option should only be used by one
agreed-upon user, or unaffiliated files may get lost as they would wind
up in different miscellaneous directories depending on which sweep ran
at the time the file was in the incoming directory. Currently, I (Bill Wyatt)
am running lpsweeper -A once per day at 1802 hours ET, and
putting the unaffiliated files into dated directories under:
/data/oirperm/wyatt/lparchive/ANY
- -n ignore-list-additions
- Add one or more names
to a list of instruments to be ignored by the -A mode, above. The list
can be comma-separated or by whitespace if enclosed in quotes. See
ignore-list for the default list.
- +n ignore-list
- This options replaces the default option list with that on the
command line. The current default list of instruments to ignore is:
minicam megacam hectospec hectochelle fast ircam afoe 4shooter
Dangerous. Maybe too much so to be useful.
Examples
Example 1, run as user mcam:
lpsweeper -i minicam -d /data/mc4/mcam/archive/rawdata/data/ARCHMINICAM
This would be a typical command an instrument crontab could execute every 10
or 15 minutes to move its files out of the FTP directory. If, for example,
the files were originally in directory
/h/ftphome/ARCHMINICAM/2002.0523 at the MMTO, then they
wind up in Cambridge under
/data/mc4/mcam/archive/rawdata/data/ARCHMINICAM/2002.0523.
Example 2, run as user wyatt:
lpsweeper -A -n wyatt -d /data/mc4/wyatt/lparchive
Sweep all files, including control files, and excepting the defaults
and those identified as ``wyatt'',
into /data/mc4/wyatt/lparchive/YYYY.MMDD.
The transfer log will be
/data/mc4/wyatt/misc/xferlogs.
Example 3, run as user wyatt:
lpsweeper -i wyatt +D -d $HOME/lparchive -l $HOME/lplogs
Sweep any files for ``wyatt'', putting them without dated
subdirectories, in /home/wyatt/lparchive and the transfer log data
into file /home/wyatt/lplogs.
Back to top
Generating a transfer activity summary - LPLOGSUM
The lplogsum program generates a summary of transfer activity for
one or all instruments over an arbitrary period of time.
Output example from the command:
lplogsum -i minicam -B 2003-04-01 -D 30 -n -z MST
Internet Data Storage Report
Period covered: 2003-04-01 12:00:00 MST to
2003-05-01 12:00:00 MST or 30 days, 0 hours
Instrument: minicam 689 files 0 errors
FTP performance for 689 files:
Files sizes: Min = 5497920 bytes
Max = 41670700
Median rate: 0.92 MB/s Median Time: 13 seconds
Min 0.47 Min 5
Max 2.20 Max 75
356 at 0.0 < rate <= 1.0 MB/sec
320 at 1.0 < rate <= 2.0
13 at 2.0 < rate <= 4.0
0 at 4.0 < rate <= 8.0
0 at 8.0 < rate
Options
- -i instrument
Default: <none>
- Tells lplogsum to look for logs from the instrument named
instrument. There is no default for this argument;
lplogsum will abort if an instrument name is not
supplied.
- -l sweeper-logdir
Default: <see text>
- Tells lplogsum to scan for lpsweeper transfer logs
from the directory sweeper-logdir. If the instrument name above is
one of spec, chelle, mega, or minicam, this defaults to the
approprirate value for that instrument, otherwise, it must be supplied.
- -f ftp-logfile
Default: /data/tdc-ftp-log/xferlog
- Tells lplogsum to scan the file ftp-logfile for data
on transfer times and rates.
- -B yyyy-mm-dd
Default: current date
- Gives the base day for the starting search time.
- -b hours
Default: <none>
- Gives the offset in hours from midnight for the base day to start the
search. For a search to start at midnight instead of the current time, an
hours value of zero must be given.
- -n or -noon
- Sets the search start time in the base day to 12:00 noon. This is a
identical to -b 12.
- -D days
Default: -1 day
- Gives the offset in days from the base day to the end of the search time.
Note that if this is negative, the search is into the past relative to the
base day. If both the -h and -D option are given together,
the last one on the command line is used.
- -h hours
Default: -24 hours
- Gives the offset in hours from the base day to the end of the search time.
Note that if this is negative, the search is into the past relative to the
base day. If both the -h and -D option are given together,
the last one on the command line is used.
- -z timezone
Default: local time zone, e.g. EST5EDT
-
This sets the search times and offsets to be in the specified time zone.
- -v or +v
Default: 1
- Controls the verbosity of the output. Consistent with the non-intuitive
Unix conventions, -v increases the level to 2, while
+v sets it to zero.
- v = 0 ( +v )
- Shows the number of files and reported errors for the search period.
Nothing is printed unless data has been transferred
during the search interval.
- v = 1 (default)
- Shows the above plus information on the file sizes and transfer rates
during the search period. The summary shows nothing if no data has been
transferred during the search interval.
- v = 2 ( -v )
- Shows the above, but even if no data have been transferred, at least
the search period is printed.
Examples
- Check the Megacam transfers in the last 24 hours:
lplogsum -i mega
- Check the Hectospec transfers in the last 48 hours.
lplogsum -i spec -h -48
- Check the Hectospec transfers in the last 24 hours from MST noon,
when new data files go into the new day's directory:
lplogsum -i spec -n -z MST
or
lplogsum -i spec -nz MST
Note: if it is not later than 12:00 in the MST timezone, you are covering
a time period including the future!
- Check the October, 2003 Hectochelle transfers starting at midnight,
and note that the spaces after the option letters are not needed:
lplogsum -ichelle -B2003-10-01 -D31 -b0 -zMST
Back to top