This document explains how to store annotated pieces of information into a Metcast Channel. Once published, the data will be delivered to all subscribers of that channel when they ask for an update.
Contents
w-shove
into a Metcast Channel
To publish a product, you submit its content to a Metcast Taker
according to the HTTP protocol. You can write the publishing code in
any language that allows opening or exploiting network connections and
manipulating text strings -- that is, in almost any programming
language, FORTRAN, JavaScript and VB included. On the other hand, you
may find it more convenient to use an application called
w-shove
, or its Perl "clone" uptow.pl
.
For example, to publish a file /tmp/memo1.txt
into a
channel with an id MEMO
and annotate the stored data with
three attributes, you will run the following command line:
w-shove w-shove.conf 'text/plain; cid=MEMO;fname=memo1.txt;topic="How to publish";author=Dave' /tmp/memo1.txtwhere
w-shove.conf
file is as follows:
# HTTP uploader's configuration SERVER_URL = http://metcast.host/cgi-bin/oleg/taker PROXY_ENABLE = FALSE PROXY_NAME = proxy.host PROXY_PORT = 80The
SERVER_URL
given above is valid only for
publishing into channels that accept submissions from everyone and
do not require any authorization.
To publish into a restricted channel a different URL has to be specified:
SERVER_URL = http://metcast.host/cgi-bin/oleg/rest/takerWith this URL, a user must provide credentials: a
w-shove
configuration file has to define
AUTH_ENABLE = true
and AUTH_CREDENTIAL =
user-credential
. The user-credential
must match
the one stored on a HTTP server. Authentication with a HTTP server is
not enough however: The Metcast Channels database must permit the
authenticated user to write into the requested channel. If any of
these security checks fail, the user gets an HTTP error message
(Authentication required
, Authentication
failed
, Forbidden
). Needless to say, all
publishing attempts are logged by the HTTP server itself; in addition,
the Metcast Taker keeps its own log of database transactions. See Authentication and
Access Restrictions section for further discussion.
When publishing into a channel that was defined to contain at most one product, the old content is removed when the new product is accepted. Thus the channel always contains the latest version of the thing. If the channel has a capacity larger than 1, the new product is added to the ones published previously. When a user subscribes to such a channel, all or some of the published products would be delivered, depending on the cut-off modification time given by the user. If a channel has already been filled up to the administrator-specified maximum, the earliest added product(s) will be removed before a newly received content is added.
When you submit new data to a channel of limited capacity (e.g., one), somebody may be reading from the channel at that time. You can ask then how your submission will affect the reader. Will the reader receive the complete content, the old version of channel data, the new version, or both versions? When you shove a file into a channel of length one, the server opens a transaction, removes the old data, creates a new row in a product table, and commits the transaction. Until the transaction is committed all the work the server does is invisible to other clients. One of the hallmarks of database systems is atomicity and isolation. Therefore you can count on Channels operations being atomic. If client A initiated a read transaction and client B started uploading new content some time later, client A will get one response, with the old content. If B starts the transaction before A does, then depending on the circumstances (precise timing of lock acquisitions), A will get either the new or the old content. In both cases client A will get only one response.
The published product is immediately available for distribution. To
verify that the product has indeed been published, you may check the
status of the channel, by asking a Metcast server to describe
the channel of interest. See the Metcast Table
of Contents for more detail.
A virtual channel is the one that wishes to store its content in a
place other than a BLOB of the MChannels database, or which requires a
special processing of incoming data. For example, one may set a
virtual channel for tropical cyclone warnings, synoptic reports, or
satellite imagery. These products are not normally stored in the
MChannels database. The satellite imagery is processed and inserted in
a special mdimg
database, while synoptic reports are
handed over to a decoder. A publisher however uses the same
w-shove
to push this "virtual" content; the publisher
does not even need to be aware of the precise way the incoming content
is handled.
Even when a product is to be stored in and distributed from the MChannels database, the incoming content may need to be processed before it can be served. For example, a publisher shoves a product in a compressed or encoded form; the content has to be decoded or decompressed prior to distribution. A publisher may send a plain text file or a PowerPoint document, yet wishes it to be served as an HTML document. A virtual channel may take a product and publish only differences between this new product and some reference content. The latter is especially useful for automatically generating software update reports and patches.
As was mentioned above, a user (publisher) will never notice any
difference when publishing into a real vs. a virtual channel. He will
use the same w-shove
program in the same way. This
transparency offers several advantages. For example, a satellite
imagery channel may be configured as a virtual channel at one site,
and as a real channel at another. The first site will take the
incoming satellite data and store them in a special mdimg
database, or include on a web page. The other installation will simply
accumulate imagery without any processing. One can subscribe to that
channel and pull the latest product(s). Thus MChannels can
accommodate both synchronous (push) and asynchronous (poll)
publishing strategies.
An interesting application of Virtual Metcast Channels is uploading of files. A content pushed into a virtual channel does not have to be inserted into a MChannels database -- these data may as well be stored in a file on a target computer. This scenario indeed looks similar to uploading of files via FTP. Metcast upload however is far more secure than that of FTP. For one thing, the client has to authenticate itself to a HTTP daemon and be authorized by the daemon and the Metcast taker. Usernames and other client identification used in this authorization process have nothing to do with login names on the target computer. Furthermore, the location where the content will be stored is entirely under control of the taker. The uploading client has no idea where the data will end up at. Unlike an FTP client, a Metcast shover cannot browse directories on the target computer, cannot see any of its files (let alone alter or delete them). See below for more advantages of the Metcast upload over FTP (and DPSR, which is FTP-based).
As an example, I created a special virtual channel to push feed for Metcast decoders. Normally the decoders are fed by DPSR; however it may take weeks to set DPSR up. We needed a quicker solution (which took only a few hours to implement).
The feed channel is a virtual channel: The published content is not
inserted into a Metcast Channels database. Rather, it is deposited into one
of the directories where Metcast decoders look for their feed files:
for example, /sample/images/
,
/sample/observation/
,
/sample/grids/
. Publishing is done with the familiar
w-shove
. A command line to publish a GRIB feed file may
look like
w-shove w-shove.conf "application/octet-stream; cid=21" /w/dpsr/grids/B541048GRBA WMO feed file may be published as
w-shove w-shove.conf "application/octet-stream; cid=21" /w/dpsr/observation/MTR50.SRVR
We have proven that we can distribute data from one center to SPAWAR labs and other centers, transparently through several firewalls. It has to be stressed that this transparency does not compromise security. For one thing, a publisher must have proper authentication and authorization to submit data. For another, access to a remote computer is restricted only to creation of files within a specially designated directory tree (which nobody but the receiving site knows about). There is no need to set up holes in firewalls. Upload directory names are not exposed. A taker may check for disk utilization before accepting content -- and refuse the upload if the disk is too full already. See the feed vtaker's source code for more details.
This facility has already being used to upload satellite imagery to Metcast servers.
TEMPxxx
files in the upload directory. The content is
received elsewhere and then moved into the target folder. Thus upload
directory's modification timestamp is not changed until the content is
received entirely and ready for processing. The absence of
TEMPxxx
files makes directory monitoring scripts simpler.
To clearly see that the features above make indeed a great difference, let us take a realistic example. Suppose we have 100 data files 100 MB each. We need to send these files from computer A to computer B and load them into a database.
With FTP, we can upload the files from A to B one-by-one. That means, 100 connections must be opened and closed, and their resources allocated and released. When we are uploading a file, its data are written into a temporary file, which later has to be renamed. We have to do that 100 times: more load on the system to read and write i-nodes. Host B must run an application that constantly monitors the upload directory and processes files it finds. Needless to say this is inefficient: a monitoring application must incessantly scan a directory; most of the time this re-reading will not produce any new results.
We may also choose to tar all 100 files on host A into a single archive, and FTP this one archive. This will require only two, data and control, FTP connections. However, both systems A and B must have extra 10 GB of scratch disk space to store the archive. Transmission of the data cannot commence until tar finishes creating the archive on host A. Likewise, host B cannot start processing of the incoming files until all 10 GB of data are received and untarred.
If we use HTTP however, we will transmit all 100 files through a
single connection -- in one multi-part message, for example. The
other end can start processing content as it is received. No
temporary files need to be created -- no renaming needs to be done
whatsoever, no repeated scanning of any directory. A file will be
loaded into the database right after it was received. The receiving
end does not need to wait for the whole transmission to finish. The
content can be optionally compressed or encoded -- transparently,
without creating any temporary files at all.
Only a client that is authorized to publish products into a channel may delete products from that channel.
Normally (obsolete) products are deleted implicitly and automatically, to make room for newly received products. For example, suppose a channel administrator specified that a particular channel can hold at most 5 products. If a new product arrives when the channel already holds 5 products, the oldest product will be deleted before the new one is inserted. If a channel is set to hold only one product, publishing into it always replaces the current content of the channel.
Sometimes it may be necessary to explicitly delete a specific
product. The product to delete has to be identified by its OID. The
client has to submit a DELETE
HTTP request. For example,
to delete a product with OID=123
from a channel
MEMO
on a server
https://metcast.host/cgi-bin/oleg/taker
, a client has to
establish a TLS/SSL connection to metcast.host
and submit
the following request:
DELETE /cgi-bin/oleg/taker/channels/MEMO/123 HTTP/1.1 Host: metcast.host empty-lineIf the deletion is successful, the server will reply with a response code 204 (No Content).
To publish a product into a channel we need to know that channel's ID, which we can determine from the list of all available channels, channels' table of contents. The latter can be obtained, for example, by sending the following request to a Metcast Server
(things (products (Describe (channels))))
To delete a product from a channel we need to know the OID of the
product, in addition to the channel ID. If we have requested and received
products from the Metcast Channels, we already know the Channel ID
and the OIDs of the received products. These identifiers are the part of
product's metadata, given to us in the Content-Type:
MIME
header. See section Request and delivery of
things for more details.
Alternatively, OIDs and attributes of all products in a particular
channel (e.g., with the channel ID MEMO
) can be
determined through an MBL query
(things (products (Describe (Channel MEMO))))
A lighter version of the HTTP upload:
Perl/
For more details, see the comments to the Metcast Taker source code,
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/taker.scmto the Feed Virtual Channel taker code,
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Feed-vtaker.pland to the
w-shove
code,
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/w-shove.cc
w-shove
is available in the source form
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/w-shove.ccand as a compiled executable for Sun/Solaris, WinNT/9x, HP-UX and Linux:
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Solaris/w-shove
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/HP-UX/w-shove
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Linux/w-shove http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/WinNT/w-shove.exe