retriever
: the Metcast client application
The retriever
's only goal is to establish a connection to a web
application server (directly or via a proxy), communicate a request
for products, and deliver the reply to a local client. It is up to
content helpers to make sense of the received data (which are tagged
with a MIME type and a JMV area name).
See Met-Cast-HTTP.html for much more detailed discussion of HTTP file/transport within Metcast.
The retriever
is a faceless (32-bit console, in Windows parlance) application: it accepts parameters from the command line, and writes log and error messages onto the standard output (stderr
), which can be re-directed into a file. The retriever
can be launched off a command line (DOS-like prompt in Win95/NT), from a batch file/shell script, or from another application using system()
or fork()/exec()
POSIX calls.
The retriever
currently runs under UNIX, Linux, and Windows 95/NT. It is compiled using gcc 2.8.1/libstdc++ 2.8.1 on HP-UX, gcc 2.7.2/libg++ 2.7.2 on Linux, and Visual C++ 5.0 on WinNT, from the same code base.
retriever config-file [req-file-mime-type request-file]
Thus this retriever module is called (usually by a JMV shell or a similar module)
with two file names: the name of a configuration file, and the path to
a request file
that contains request phrases themselves. The req-file-mime-type
tells a MIME
type of the request file:
text/x-request-mfr
text/x-mbl
application/x-www-form-urlencoded
POST
-ed form. In this case, the retriever emulates a typical Web browser.
The configuration file contains all the information necessary to establish a connection to a HTTP server, authenticate ourselves, if necessary. The configuration file also tells us if this request is a one-time deal, or we have to camp and check for updates. In the former case, we submit the request, listen for a reply, close the connection, and exit. Otherwise, the module sleeps for a specified period of time, establishes a new connection with the server and checks again with it to see if some of the requested products changed in the meantime. The module does this checking over and over again (until killed by the user).
Both the req-file-mime-type
and request-file
command-line parameters may be omitted if a SERVER_URL
parameter in the configuration file specifies a URL with a FILE:
schema. In that case, no remote server is contacted, no request is submitted, and the specified FILE:
is processed as if it were a server's response. See below for more details.
The helpers (that actually process the content data) are specified through a mailcap
file, which is loaded at the beginning. The path to the file is determined from an env variable MAILCAP
; a "mailcap
" file (in the current directory) is assumed by default.
Some errors that may occur during an HTTP transaction are hard (for example,
an error in the syntax of the request file). In that case, this
retriever quits immediately. Some errors (generally assertion failures,
the system being out of memory, file permissions are not right) are fatal
and crash the retriever. Still, some errors are soft, for example,
connection time-outs, a server being too busy, etc. These errors may go
away if we repeat the transaction. Thus, any transaction that ended up
with a soft error automatically restarted, up to a specified number
of times (see RETRY_ON_ERROR
configuration parameter). If the error still persists, it becomes a hard error, and we quit.
retriever retriever.conf text/x-request-mfr NORF/request.mfr retriever retriever.conf text/x-mbl request-syn.mbl retriever retriever.conf application/x-www-form-urlencoded web-form.urlencoded # The following trick is used to process a channel activation record echo "SERVER_URL=FILE://$file_name" > /tmp/r.conf retriever /tmp/r.conf
retriever
can handle multi-part MIME messages, that is, several products
(datatypes) packed into a single message. A user can ask a Metcast server for a number of products of various types (data grids, satellite images, and real-time synoptic observations, to name a few) in a single request. The Metcast server would spit everything in one single message. The client will then unpack the composite reply into separate product files, and call corresponding helpers to make sense of the
data. A composite message may also contain other composite messages in
turn; the retriever
can process all these. In fact, I strove to
make the retriever
handle every MIME-1.0 compliant message (as
described in [RFC2045]); furthermore, I relaxed a few rules dealing with CR
/LF
line termination.
FILE://
-- retrieving from a local fileretriever
can also handle a FILE:
URL schema. A server URL in the retriever's configuration file
may be specified simply as
SERVER_URL = file:///users/oleg/HTTP-Retrieve/conus.mime
In this case, no HTTP server is contacted: the retriever
takes this file as an already received reply and processes it in the regular
way. The file must contain a valid MIME message, which had been prepared by a Metcast server, received via e-mail, or composed in any other way.
This ability of the retriever
to handle composite
messages can be used in a somewhat unexpected way. A composite message
may be considered an archive -- an ordered collection of
entities. The retriever
then can act as an un-archiver,
extracting each entity and passing it to a MIME helper for further
processing. For example, one can pack a set of directories and files
into a single message and later have the retriever
unpack
this archive and restore the original directory tree. Unlike
tar
, however, a retriever
can perform a more
advanced processing of archived entities than simply creating
directories and copying content into files. It all depends on the
current set of MIME helpers, as specified by a
mailcap
. With another mailcap, the same
retriever
can feed the content into a database rather to
store it in files. Or it can launch specific applications once the
content is extracted.
retriever
is 5.1, as of Apr 22, 1998
The present version of the retriever
is backward compatible with the previous one, version 4.1 as of Jan 16, 1998.
The present version is much more lenient towards line termination in MIME headers. According to the common practice, the retriever now accepts header lines in single or composite MIME entities terminated with a single CR
, a single LF
, or with a CRLF
combination. Note, the MIME standard [RFC2045] provides only for the latter option.
The retriever will now attempt a partial match with wildcard mailcap entries (like text/*
and */*
) when the exact match failed.
The retriever can now send a Authorization:
request header if enabled by the user. This lead to some changes in the retriever's configuration file.
The new retriever
is also faster, and its source code is better structured.