Publishing into Metcast Channels

This document explains how to store annotated pieces of information into a Metcast Channel. Once published, the data will be delivered to all subscribers of that channel when they ask for an update.

Contents

w-shove into a Metcast Channel
Publishing into a Virtual Channel
Uploading files via a Metcast Channel
- Metcast Channels vs DPSR as file uploaders
Deleting products from a Metcast Channel
Determining a Channel ID or an OID
References

`w-shove` into a Metcast Channel

To publish a product, you submit its content to a Metcast Taker according to the HTTP protocol. You can write the publishing code in any language that allows opening or exploiting network connections and manipulating text strings -- that is, in almost any programming language, FORTRAN, JavaScript and VB included. On the other hand, you may find it more convenient to use an application called w-shove, or its Perl "clone" uptow.pl.

For example, to publish a file /tmp/memo1.txt into a channel with an id MEMO and annotate the stored data with three attributes, you will run the following command line:

     w-shove w-shove.conf  'text/plain; cid=MEMO;fname=memo1.txt;topic="How to publish";author=Dave' /tmp/memo1.txt

where w-shove.conf file is as follows:

     # HTTP uploader's configuration
     SERVER_URL = http://metcast.host/cgi-bin/oleg/taker
     PROXY_ENABLE = FALSE
     PROXY_NAME = proxy.host
     PROXY_PORT = 80

The SERVER_URL given above is valid only for publishing into channels that accept submissions from everyone and do not require any authorization.

To publish into a restricted channel a different URL has to be specified:

     SERVER_URL = http://metcast.host/cgi-bin/oleg/rest/taker

With this URL, a user must provide credentials: a w-shove configuration file has to define AUTH_ENABLE = true and

AUTH_CREDENTIAL =
user-credential

. The user-credential must match the one stored on a HTTP server. Authentication with a HTTP server is not enough however: The Metcast Channels database must permit the authenticated user to write into the requested channel. If any of these security checks fail, the user gets an HTTP error message (Authentication required,

Authentication
failed

, Forbidden). Needless to say, all publishing attempts are logged by the HTTP server itself; in addition, the Metcast Taker keeps its own log of database transactions. See Authentication and Access Restrictions section for further discussion.

When publishing into a channel that was defined to contain at most one product, the old content is removed when the new product is accepted. Thus the channel always contains the latest version of the thing. If the channel has a capacity larger than 1, the new product is added to the ones published previously. When a user subscribes to such a channel, all or some of the published products would be delivered, depending on the cut-off modification time given by the user. If a channel has already been filled up to the administrator-specified maximum, the earliest added product(s) will be removed before a newly received content is added.

When you submit new data to a channel of limited capacity (e.g., one), somebody may be reading from the channel at that time. You can ask then how your submission will affect the reader. Will the reader receive the complete content, the old version of channel data, the new version, or both versions? When you shove a file into a channel of length one, the server opens a transaction, removes the old data, creates a new row in a product table, and commits the transaction. Until the transaction is committed all the work the server does is invisible to other clients. One of the hallmarks of database systems is atomicity and isolation. Therefore you can count on Channels operations being atomic. If client A initiated a read transaction and client B started uploading new content some time later, client A will get one response, with the old content. If B starts the transaction before A does, then depending on the circumstances (precise timing of lock acquisitions), A will get either the new or the old content. In both cases client A will get only one response.

The published product is immediately available for distribution. To verify that the product has indeed been published, you may check the status of the channel, by asking a Metcast server to describe the channel of interest. See the Metcast Table of Contents for more detail.

Publishing into a Virtual Channel

A virtual channel is the one that wishes to store its content in a place other than a BLOB of the MChannels database, or which requires a special processing of incoming data. For example, one may set a virtual channel for tropical cyclone warnings, synoptic reports, or satellite imagery. These products are not normally stored in the MChannels database. The satellite imagery is processed and inserted in a special mdimg database, while synoptic reports are handed over to a decoder. A publisher however uses the same w-shove to push this "virtual" content; the publisher does not even need to be aware of the precise way the incoming content is handled.

Even when a product is to be stored in and distributed from the MChannels database, the incoming content may need to be processed before it can be served. For example, a publisher shoves a product in a compressed or encoded form; the content has to be decoded or decompressed prior to distribution. A publisher may send a plain text file or a PowerPoint document, yet wishes it to be served as an HTML document. A virtual channel may take a product and publish only differences between this new product and some reference content. The latter is especially useful for automatically generating software update reports and patches.

As was mentioned above, a user (publisher) will never notice any difference when publishing into a real vs. a virtual channel. He will use the same w-shove program in the same way. This transparency offers several advantages. For example, a satellite imagery channel may be configured as a virtual channel at one site, and as a real channel at another. The first site will take the incoming satellite data and store them in a special mdimg database, or include on a web page. The other installation will simply accumulate imagery without any processing. One can subscribe to that channel and pull the latest product(s). Thus MChannels can accommodate both synchronous (push) and asynchronous (poll) publishing strategies.

Uploading files via a Metcast Channel

An interesting application of Virtual Metcast Channels is uploading of files. A content pushed into a virtual channel does not have to be inserted into a MChannels database -- these data may as well be stored in a file on a target computer. This scenario indeed looks similar to uploading of files via FTP. Metcast upload however is far more secure than that of FTP. For one thing, the client has to authenticate itself to a HTTP daemon and be authorized by the daemon and the Metcast taker. Usernames and other client identification used in this authorization process have nothing to do with login names on the target computer. Furthermore, the location where the content will be stored is entirely under control of the taker. The uploading client has no idea where the data will end up at. Unlike an FTP client, a Metcast shover cannot browse directories on the target computer, cannot see any of its files (let alone alter or delete them). See below for more advantages of the Metcast upload over FTP (and DPSR, which is FTP-based).

As an example, I created a special virtual channel to push feed for Metcast decoders. Normally the decoders are fed by DPSR; however it may take weeks to set DPSR up. We needed a quicker solution (which took only a few hours to implement).

The feed channel is a virtual channel: The published content is not inserted into a Metcast Channels database. Rather, it is deposited into one of the directories where Metcast decoders look for their feed files: for example, /sample/images/, /sample/observation/, /sample/grids/. Publishing is done with the familiar w-shove. A command line to publish a GRIB feed file may look like

     w-shove w-shove.conf  "application/octet-stream; cid=21"
             /w/dpsr/grids/B541048GRB

A WMO feed file may be published as

     w-shove w-shove.conf "application/octet-stream; cid=21"
             /w/dpsr/observation/MTR50.SRVR

We have proven that we can distribute data from one center to SPAWAR labs and other centers, transparently through several firewalls. It has to be stressed that this transparency does not compromise security. For one thing, a publisher must have proper authentication and authorization to submit data. For another, access to a remote computer is restricted only to creation of files within a specially designated directory tree (which nobody but the receiving site knows about). There is no need to set up holes in firewalls. Upload directory names are not exposed. A taker may check for disk utilization before accepting content -- and refuse the upload if the disk is too full already. See the feed vtaker's source code for more details.

This facility has already being used to upload satellite imagery to Metcast servers.

Metcast Channels vs DPSR as file uploaders

As with DPSR, publishing into a Feed Channel can propagate transparently through firewalls. Unlike DPSR, we do not need a special hole in the firewall for an uploading host. We do not need to make an account on the target computer for each publishing host. A receiving host can change the feed directories at wish, without notifying anybody or asking to update DPSR tables. Unlike DPSR, where publishing, storing and distributing is tightly coupled, Metcast Channel publisher and receiver are related only by protocol. Their internal directory structure, producing and processing of messages remain their internal business -- solely under control of their local administrators. No one has to be notified when a directory is renamed or a subtree is moved.
As DPSR, the Feed Channel can guarantee atomicity of a transfer. The Feed Channel however does not create TEMPxxx files in the upload directory. The content is received elsewhere and then moved into the target folder. Thus upload directory's modification timestamp is not changed until the content is received entirely and ready for processing. The absence of TEMPxxx files makes directory monitoring scripts simpler.
Feed Channel can launch a processing application the moment a new data file arrives. Moreover, a Feed Channel may launch an application and have it process feed data while they arrive. There is no need to save incoming content to a file and then feed it into a decoder.
HTTP 1.1 offers persistent connections: it is possible to execute several -- many -- requests through a single connection. Even with HTTP/1.0 we can transmit a great number of files through a single connection. FTP can do that too -- provided that you make a tar archive of the files first. See an example below for more details.
HTTP server delivers not only content of a file but metadata as well: for example, last-modified timestamp, MD5 digest of the content, and other meta-data. FTP server is not extensible in this respect.

To clearly see that the features above make indeed a great difference, let us take a realistic example. Suppose we have 100 data files 100 MB each. We need to send these files from computer A to computer B and load them into a database.

With FTP, we can upload the files from A to B one-by-one. That means, 100 connections must be opened and closed, and their resources allocated and released. When we are uploading a file, its data are written into a temporary file, which later has to be renamed. We have to do that 100 times: more load on the system to read and write i-nodes. Host B must run an application that constantly monitors the upload directory and processes files it finds. Needless to say this is inefficient: a monitoring application must incessantly scan a directory; most of the time this re-reading will not produce any new results.

We may also choose to tar all 100 files on host A into a single archive, and FTP this one archive. This will require only two, data and control, FTP connections. However, both systems A and B must have extra 10 GB of scratch disk space to store the archive. Transmission of the data cannot commence until tar finishes creating the archive on host A. Likewise, host B cannot start processing of the incoming files until all 10 GB of data are received and untarred.

If we use HTTP however, we will transmit all 100 files through a single connection -- in one multi-part message, for example. The other end can start processing content as it is received. No temporary files need to be created -- no renaming needs to be done whatsoever, no repeated scanning of any directory. A file will be loaded into the database right after it was received. The receiving end does not need to wait for the whole transmission to finish. The content can be optionally compressed or encoded -- transparently, without creating any temporary files at all.

Deleting products from a Metcast Channel

Only a client that is authorized to publish products into a channel may delete products from that channel.

Normally (obsolete) products are deleted implicitly and automatically, to make room for newly received products. For example, suppose a channel administrator specified that a particular channel can hold at most 5 products. If a new product arrives when the channel already holds 5 products, the oldest product will be deleted before the new one is inserted. If a channel is set to hold only one product, publishing into it always replaces the current content of the channel.

Sometimes it may be necessary to explicitly delete a specific product. The product to delete has to be identified by its OID. The client has to submit a DELETE HTTP request. For example, to delete a product with OID=123 from a channel MEMO on a server https://metcast.host/cgi-bin/oleg/taker, a client has to establish a TLS/SSL connection to metcast.host and submit the following request:

    DELETE /cgi-bin/oleg/taker/channels/MEMO/123 HTTP/1.1
    Host: metcast.host
    empty-line

If the deletion is successful, the server will reply with a response code 204 (No Content).

Determining a Channel ID or an OID

To publish a product into a channel we need to know that channel's ID, which we can determine from the list of all available channels, channels' table of contents. The latter can be obtained, for example, by sending the following request to a Metcast Server

        (things
         (products
           (Describe (channels))))

To delete a product from a channel we need to know the OID of the product, in addition to the channel ID. If we have requested and received products from the Metcast Channels, we already know the Channel ID and the OIDs of the received products. These identifiers are the part of product's metadata, given to us in the Content-Type: MIME header. See section Request and delivery of things for more details.

Alternatively, OIDs and attributes of all products in a particular channel (e.g., with the channel ID MEMO) can be determined through an MBL query

        (things 
         (products
	   (Describe (Channel MEMO))))

References

A lighter version of the HTTP upload:

Perl/

For more details, see the comments to the Metcast Taker source code,

http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/taker.scm

to the Feed Virtual Channel taker code,

http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Feed-vtaker.pl

and to the w-shove code,

http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/w-shove.cc

w-shove is available in the source form

http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/w-shove.cc

and as a compiled executable for Sun/Solaris, WinNT/9x, HP-UX and Linux:

http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Solaris/w-shove
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/HP-UX/w-shove
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/Linux/w-shove http://zowie.metnet.navy.mil/~spawar/JMV-TNG/Code/WinNT/w-shove.exe

Publishing into Metcast Channels

`w-shove` into a Metcast Channel

Publishing into a Virtual Channel

Uploading files via a Metcast Channel

Metcast Channels vs DPSR as file uploaders

Deleting products from a Metcast Channel

Determining a Channel ID or an OID

References

History

$Id: Publishing.html,v 3.8 2002/04/26 23:45:21 oleg Exp oleg $

Publishing into Metcast Channels

w-shove into a Metcast Channel

Publishing into a Virtual Channel

Uploading files via a Metcast Channel

Metcast Channels vs DPSR as file uploaders

Deleting products from a Metcast Channel

Determining a Channel ID or an OID

References

History

$Id: Publishing.html,v 3.8 2002/04/26 23:45:21 oleg Exp oleg $

`w-shove` into a Metcast Channel