`retriever`: the Metcast client application

This is the module that implements an HTTP file layer of Metcast: it submits a request to a HTTP server, listens to the reply, receives content data, and launches a helper or helpers to process the data.

The retriever's only goal is to establish a connection to a web application server (directly or via a proxy), communicate a request for products, and deliver the reply to a local client. It is up to content helpers to make sense of the received data (which are tagged with a MIME type and a JMV area name). See Met-Cast-HTTP.html for much more detailed discussion of HTTP file/transport within Metcast.

The retriever is a faceless (32-bit console, in Windows parlance) application: it accepts parameters from the command line, and writes log and error messages onto the standard output (stderr), which can be re-directed into a file. The retriever can be launched off a command line (DOS-like prompt in Win95/NT), from a batch file/shell script, or from another application using system() or fork()/exec() POSIX calls.

The retriever currently runs under UNIX, Linux, and Windows 95/NT. It is compiled using gcc 2.8.1/libstdc++ 2.8.1 on HP-UX, gcc 2.7.2/libg++ 2.7.2 on Linux, and Visual C++ 5.0 on WinNT, from the same code base.

Synopsis:

retriever config-file [req-file-mime-type request-file]

Thus this retriever module is called (usually by a JMV shell or a similar module) with two file names: the name of a configuration file, and the path to a request file that contains request phrases themselves. The req-file-mime-type tells a MIME type of the request file:

text/x-request-mfr: if the request file is in a deprecated request.mfr format
text/x-mbl: for a request in the new Metcast request language
application/x-www-form-urlencoded: if the request file contains a URL-encoded contents of a POST-ed form. In this case, the retriever emulates a typical Web browser.

The configuration file contains all the information necessary to establish a connection to a HTTP server, authenticate ourselves, if necessary. The configuration file also tells us if this request is a one-time deal, or we have to camp and check for updates. In the former case, we submit the request, listen for a reply, close the connection, and exit. Otherwise, the module sleeps for a specified period of time, establishes a new connection with the server and checks again with it to see if some of the requested products changed in the meantime. The module does this checking over and over again (until killed by the user).

Both the req-file-mime-type and request-file command-line parameters may be omitted if a SERVER_URL parameter in the configuration file specifies a URL with a FILE: schema. In that case, no remote server is contacted, no request is submitted, and the specified FILE: is processed as if it were a server's response. See below for more details.

The helpers (that actually process the content data) are specified through a mailcap file, which is loaded at the beginning. The path to the file is determined from an env variable MAILCAP; a "mailcap" file (in the current directory) is assumed by default.

Some errors that may occur during an HTTP transaction are hard (for example, an error in the syntax of the request file). In that case, this retriever quits immediately. Some errors (generally assertion failures, the system being out of memory, file permissions are not right) are fatal and crash the retriever. Still, some errors are soft, for example, connection time-outs, a server being too busy, etc. These errors may go away if we repeat the transaction. Thus, any transaction that ended up with a soft error automatically restarted, up to a specified number of times (see RETRY_ON_ERROR configuration parameter). If the error still persists, it becomes a hard error, and we quit.

Examples

  retriever retriever.conf text/x-request-mfr NORF/request.mfr

  retriever retriever.conf text/x-mbl request-syn.mbl

  retriever retriever.conf application/x-www-form-urlencoded web-form.urlencoded

  
     # The following trick is used to process a channel activation record
  echo "SERVER_URL=FILE://$file_name" > /tmp/r.conf
  retriever /tmp/r.conf

Delivering several products in a single message

The retriever can handle multi-part MIME messages, that is, several products (datatypes) packed into a single message. A user can ask a Metcast server for a number of products of various types (data grids, satellite images, and real-time synoptic observations, to name a few) in a single request. The Metcast server would spit everything in one single message. The client will then unpack the composite reply into separate product files, and call corresponding helpers to make sense of the data. A composite message may also contain other composite messages in turn; the retriever can process all these. In fact, I strove to make the retriever handle every MIME-1.0 compliant message (as described in [RFC2045]); furthermore, I relaxed a few rules dealing with CR/LF line termination.

`FILE://` -- retrieving from a local file

Besides HTTP, the retriever can also handle a FILE: URL schema. A server URL in the retriever's configuration file may be specified simply as

SERVER_URL = file:///users/oleg/HTTP-Retrieve/conus.mime

In this case, no HTTP server is contacted: the retriever takes this file as an already received reply and processes it in the regular way. The file must contain a valid MIME message, which had been prepared by a Metcast server, received via e-mail, or composed in any other way.

This ability of the retriever to handle composite messages can be used in a somewhat unexpected way. A composite message may be considered an archive -- an ordered collection of entities. The retriever then can act as an un-archiver, extracting each entity and passing it to a MIME helper for further processing. For example, one can pack a set of directories and files into a single message and later have the retriever unpack this archive and restore the original directory tree. Unlike tar, however, a retriever can perform a more advanced processing of archived entities than simply creating directories and copying content into files. It all depends on the current set of MIME helpers, as specified by a mailcap. With another mailcap, the same retriever can feed the content into a database rather to store it in files. Or it can launch specific applications once the content is extracted.

History

The current version of the retriever is 5.1, as of Apr 22, 1998

The present version of the retriever is backward compatible with the previous one, version 4.1 as of Jan 16, 1998.

The present version is much more lenient towards line termination in MIME headers. According to the common practice, the retriever now accepts header lines in single or composite MIME entities terminated with a single CR, a single LF, or with a CRLF combination. Note, the MIME standard [RFC2045] provides only for the latter option.

The retriever will now attempt a partial match with wildcard mailcap entries (like text/* and */*) when the exact match failed.

The retriever can now send a Authorization: request header if enabled by the user. This lead to some changes in the retriever's configuration file.

The new retriever is also faster, and its source code is better structured.

retriever: the Metcast client application