Metcast Channels

Introduction and Overview
Request and delivery of things
Publishing products into a channel [a separate document]
Metcast Taker [a separate document]
1. Virtual Taker Interface
2. Metcast Taker's logs
Metcast Channels vs. ...
Usage scenarios
1. ...channels and deltas: distributing software updates
Metcast Channels' Table of Contents
Authentication and Access Restrictions
Metcast Channels Administration Protocol [a separate document]
Implementation Details
References: URLs, source code, executables, documentation
Notes on an early version of Metcast Channels [a separate document]

Introduction and Overview

Metcast Channels is a communication system to distribute arbitrary annotated pieces of information: things. The system accepts data and accompanying attributes from publishers, and lets clients retrieve the information or subscribe to updates. The pieces of data are literally things. They can be anything: a satellite image, a product grid, a synoptic report, a software update, a PowerPoint presentation, a database schema, a software distribution, a FAQ, a survey form, a survey result, this year's budget datasheet, etc. Besides data, each thing has a set of associated metadata, at the very least the media type: image, text file, html document, sound file, etc. A publisher can attach other metadata to the content, as arbitrary name-value pairs, content attributes. The attributes can carry, for example, the identity of the publisher, a signed message digest of the content, public key fingerprint, or the original file name. Pieces of data are grouped into channels. Items in a channel have a common set of attributes and do not have separate identities. Besides these two constraints, grouping of things into channels is based on convenience and appropriateness for a publisher or subscribers.

The names of all content attributes must be declared when a channel is created. A channel administrator will also assign attributes one of three usage classes and, optionally, a default value. A required attribute does not have a default value: the value of such an attribute must be set by a publisher. An attribute of an implied usage class may have a default value. If the publisher does not specify this attribute, the default value will be associated with the published piece of data. A fixed attribute is similar to the implied one. However, the publisher may not override its default value: if a publisher chooses to supply the value of a fixed attribute, the value must be identical to the one declared for the attribute. We should point out a similarity of content attributes to XML element attributes.

When a new channel is created, an administrator associates with the channel a descriptive string and a channel id, a short alpha-numerical identifier. The administrator also declares channel's attributes. A channel is known to publishers and subscribers by its id. A Channel Table of Contents lists channel ids, descriptions, declared attributes and other appropriate data for all or selected channels. The table of contents provides the users of the system with information about available channels and their ids.

A channel contains zero, one or several timestamped documents. Some channels always contain at most one document, for example, the latest survey form. Whenever a new version of such a document is published, it replaces the old one. On the other hand, a "polar satellite channel" may hold several images taken at different points in time; you can retrieve all of them, or only those that were recorded since a specified time moment. You can also periodically poll the channel: to subscribe to receive updates as they become available. Metcast provides the mechanisms without undue restrictions; it is up to a maintainer of a channel to set a policy that is the most appropriate for that particular channel.

One of the attributes that is always defined by a publisher of a product is product's MIME type. The mime-type attribute must be declared when the channel was created, as was outlined above. The usage class of this attribute can be defined as either fixed or implied/required. In the former case, a channel will accept products of only single MIME type, the one set for the channel (e.g., text/plain). If the mime-type attribute is not declared as fixed, the channel is polymorphic. You can publish products of several types into it -- e.g., image/tiff, image/gif -- and retrieve them with their original MIME types.

The Metcast channel system is built upon the existing Metcast Communication system, which is comprised of

a client (e.g., a retriever) that polls a server
a Metcast server that takes a request, parses it and sends zero, one or several products in reply
a database of products the server can deliver, with "things" being one of them.

A client and a server communicate through an HTTP pipe as described in more detail in a technical note Met-Cast-HTTP.html

Request and delivery of things

A user can request products from a Metcast Channel by the values of their attributes. In other words, Metcast Channels offer an associative access to the pieces of data stored in the channels. This access method is similar to (yet more flexible than) that of Linda or of a Java Messaging service.

A client is always the one who initiates a transaction, by submitting a request formulated in a special request language MBL. The language is described in a technical note Request-Lang.html on this site.

To receive a piece of data from a Metcast Channel, a client should send a request for a product named Channel with the id of the channel of interest. For example:

        (MetcastCh
                (products (Channel GOES-VIS)))

You may supply an IF-MODIFIED-SINCE HTTP header to make the request time-conditional.

In more detail, a request for a product from a Metcast Channel has the following format:

        (Channel channel-id attr-constraint ...)

An MBL request may have several Channel requests. A Channel request may list one or several attr-constraints. Each attribute constraint has one of the following forms:

(attr= name val): selects only those products that are annotated by an attribute name with the value val
(not-attr= name val): the opposite of the above selection
(attr-like name val): selects only those "things" that are annotated by an attribute name with the value that matches val, which may contain SQL-style wildcards: _ and %
(not-attr-like name val): the opposite of the above selection

If a Channel request has several attribute constraints, they are implicitly ANDed together. The language of attribute constraints is a subset of LDAP Search Filters, see RFC 1558 and RFC 1487. The absent OR connective can be easily emulated by repeated Channel requests.

The pieces of data from Metcast Channels are delivered to a requesting client as any other Metcast product (along with the other products had they been requested together with the Channel). The product's metadata, content attributes, become the part of the product's Content-Type. The value of the mime-type attribute becomes the media type of the MIME entity that carries the product. In addition to the name-value pairs set explicitly by the publisher, the product will be annotated with the values of fixed and implied attributes for which default values were declared. Furthermore, the Content-Type: header sent with the MIME entity envelope will contain four more implicit attributes:

AREA=: an opaque alpha-numeric identifier of the request as set by the client
CLASS=: the channel id
OID=: an integer that uniquely identifies the product in its channel
timestamp=: the timestamp of publishing the product, in UTC epoch seconds.

A Metcast client -- for example, a Metcast retriever, or a web browser -- will pass the Content-Type: attributes to a helper application, usually as directed by an appropriate mailcap entry.

Let us consider an example of a channel that stores incremental Vector Data (VDU) updates to Digital Naval Charts (DNC). As the Channel Table of Contents shows, the channel has an id of VDU-INCR and the following attributes:

mime-type: with a fixed value application/x-vdu-incr
os-type: OS type the attributed is prepared for: UNIX or Win
region: DNC region number
from-v: DNC version to which this update applies
to-v: after the update applies, the local DNC configuration will have this version number
lib: DNC library identifier, e.g., h1718082 (for a Harbor library).

The values of these attributes are set up when a particular VDU file is shoved into the channel.

The following is a sample MBL request:

        (vdu-req
          (products
	    (Channel VDU-INCR (attr= os-type "UNIX") (attr= region "18"))
	    (Channel VDU-INCR (attr= region "17") (attr= from-v "12")
                              (attr-like lib "co%"))
	 ))

It asks for two sets of incremental VDU updates. The first Channel request selects all VDUs for UNIX platform and DNC region 18. The second Channel product request selects all updates for the coastal library from version 12, for region 17. The percent sign % in the second request is a SQL wildcard character.

Both sets of VDU data are delivered to a client in a single multi-part message. Each part is a VDU product that satisfies the first or the second selections. The MIME parts may have, for example, the following Content-type headers:

Content-type: application/x-vdu-incr; OID=123; AREA=vdu-req; CLASS=VDU-INCR;
    timestamp=996291071; os-type=UNIX; region=18;
    from-v=12; to-v=13; lib=approach
Content-type: application/x-vdu-incr; OID=456; AREA=vdu-req; CLASS=VDU-INCR;
    timestamp=996291071; os-type=UNIX; region=17;
    from-v=12; to-v=13; lib="coast17a"
Content-type: application/x-vdu-incr; OID=457; AREA=vdu-req; CLASS=VDU-INCR;
    timestamp=996291071; os-type=UNIX; region=17;
    from-v=12; to-v=14; lib="coast17a"

A MIME handler (in mailcap) can use the values of these attributes when processing the delivered content.

Thus the associative access lets us store all VDUs within one channel -- and request only the ones that fit our criteria.

Comparison of Metcast Channels with similar technologies

Metcast Channels are somewhat similar to a Java Messaging Service JMS, IBM's MQSeries messaging service, or CORBA's event channels. Metcast Channels combine benefits of these three technologies without restrictions of each. Metcast Channels do not require a publisher or a subscriber to use a particular programming language or an API. Metcast Channels are lighter: easier to set up and to use. Metcast Channels are based on the HTTP protocol. Therefore, they can take the full advantage of the existing Web infrastructure, in particular, authentication, caching, and proxying. Unlike JMS, the server of data is stateless, and therefore, highly scalable. As a subscription "session" is maintained entirely by a client, the client is able to change subscription parameters at any time.

Metcast Channels and CORBA's event service

An article "Dynamic Logging & The Corba Notification Service" by Tarak Modi, published in March 2001 issue of Dr. Dobbs Journal, pp. 42-47, provides material for the comparison. The article states:

The CORBA Event service provides a loosely coupled method of communication between the providers of events (that is, publishers) and consumers of events (subscribers). This is achieved through the Event channel that handles the registering/unregistering of publishers/subscribers. In this model publishers do not care how many subscribers are waiting for events, or even if there are any subscribers at all. The same goes for subscribers. Publishers/subscribers of events can operate in push/pull mode, or can be combined in any combination on a channel.
Despite these benefits, several major shortcomings in the Event service have been identified:

The Event service lacks explicit quality of service (QoS) control. There are no policies ... to be able to specify a maximum queue size, delivery order, or how to discard events in case of a queue overflow...
All event data in the Event service is of type any. The channel has no way of performing any type checks as the event passes through it. The typed consumers and suppliers addendum to the Event service is rarely supported in Event service implementations due to its complexity.
The Event service lacks event filtering...
The Event service does not support sharing of event types being published or subscribed to a channel...

Like the Event service, Metcast Channels support the publish/subscribe mode of communication among loosely coupled entities. Metcast Channels can emulate both logical push and pull. Unlike the CORBA Event service, Metcast Channels have a precise semantics of event (i.e., product) delivery and of dropping of events in case of a queue overflow: the earliest events are dropped first. An administrator of the Metcast Channels specifies the enforceable maximum queue size. All products submitted to a channel are typed, with at least a MIME type. The MIME type can be specialized with an arbitrary number of name-value pairs: attributes. A Metcast channel can be polymorphic, monomorphic, or polymorphic with regard to only certain attributes. Publishing to a channel succeeds only when all the type checks associated with the channel pass. Since the Metcast Channels are typed, to any required degree, a support for event filtering is straightforward. Clients are able to request products only of specific types.

Tarak Modi's article goes on to describe a CORBA Notification Service, which is intended to eventually replace the CORBA Event service. The major change, the article says, is the introdution of a "structured event type". Structured event data consist of a header and a body; the header is made of a sequence of name-value pairs. Similarity to typed products of Metcast Channels is striking. A typed Metcast Channels product is a MIME entity, whose header carries the type information of payload data in the form of standard and user-defined name-value pairs. There are important differences, however. CORBA's structured event is a binary datatype, which is not standard yet. Metcast Channels employ MIME encapsulation of binary data. MIME is a rather mature, standard protocol. Tools to build and decompose MIME entities are built into every modern e-mail client. Since MIME headers are textual, you can view, add or change metadata in any text editor.

Finally, the coupling between producers and consumers of data in the CORBA services is not as loose as desired. A publisher must obtain an object reference to a proxy consumer; the process of publishing is invocation of special methods on the proxy. In Metcast Channels, producers and consumers are truly unaware of each other presence, interfaces, and states. Therefore, the amount of state the system has to maintain is minimal. In fact, Metcast Channels server and taker are stateless. This feature makes the system highly scalable. Furthermore, the complete decoupling of producers from consumers avoids covert channels and bodes well for security.

Metcast Channels and FTP-based services (DPSR)

Both Metcast and DPSR are push services. DPSR is the one that genuinely pushes data: it actively establishes a connection to a client and transfers data, via FTP, to a specified directory. Metcast delivers data through a client pull: it is the client that opens a connection, asks for the data and puts them wherever it wants to. Therefore:

Metcast places immensely lighter burden on a server administrator. Metcast server does not need to know how to log in to a client, which directory to put the data into, etc. The client has the full control over its own computer, over the update schedule, and over its own file system. The client can change download parameters at will without any need to notify a server or server's sysadm.
Metcast server is stateless, which makes it easier to administer. It does not need a tangle of configuration files and tables.
Metcast provides more security as the server does not need to log in to a client computer, does not need a special account.
Metcast can work transparently through firewalls with the help of ordinary web proxies; no special arrangements are necessary.
Metcast uses HTTP, which is a more efficient protocol than FTP DPSR is based on. With HTTP, request and replies are transmitted through a single link; FTP requires two connections to be established.
Metcast's retriever executes a user program to process the incoming data the moment the data are received. You do not need any daemons watching over DPSR directories waiting for files to show up. Thus Metcast is much lighter on the client, too (let alone more flexible).
HTTP can upload or download a dynamic content. For example, you can send data directly into a database, without creating an intermediate file. Likewise, with HTTP you can do compression, differencing, encoding, and any other translation or processing of the content on the fly. FTP is limited to static content.
HTTP server delivers not only content of a file but metadata as well: for example, last-modified timestamp, MD5 digest of the content, etc. meta-data. FTP server is not extensible in this respect.

The Metcast introduction document lists other advantages.

See also a comparison between Metcast Channels and DPSR as far as uploading of files is concerned.

Metcast Channels vs other Push services

Metcast is the closest to BackWeb
Unlike Pointcast (and similar push services), Metcast can distribute any things, which are not restricted to be html documents (as in Pointcast) or software updates.
Metcast Channels are akin to EventChannels of CORBA, with a Push mode for a supplier and a Pull mode for a consumer.

Usage scenarios

Channels and Deltas: distributing JMV software updates

The Metcast channels architecture is flexible enough to permit distributions of updates (deltas) for a product. Let us consider a JMV software installation. One channel may contain the entire installation package; this channel should be restricted to hold at most one product: the latest one. Another channel may accumulate updates: specific deltas. A user may choose to receive the whole distribution. Or he may opt to download updates, from a specific date on. The user may wish to subscribe to the update channel so he can be receiving patches as they become available.

It is possible to set up the two channels in such a way so to make publishing a new software distribution remarkably convenient. A publisher merely needs to send a new version of the software. This version is stored in its entirety in the corresponding channel. At the same time, a virtual channel will compute the difference between the new and the previous versions, create a patch, and push it into the other channel.

The Metcast Channels provide storage and distribution mechanisms; neither the database nor the server are concerned with particular rules of deriving deltas or applying patches -- these issues have to be solved on different layers, by different agents or people. Metcast Channels do not impose any restrictions in this respect.

Metcast Channels' Table of Contents

A Channel Table of Contents is an XML document that lists channel ids, descriptions, declared attributes and other appropriate data for all or selected channels. A Metcast client can either browse through the table of contents, or search for a channel that is annotated by particular attributes, which stores the content of a specific MIME type, or whose description contains particular keywords. Once the client found the ids of the channels of interest, the client can subscribe to them as outlined elsewhere in this document. The Channel Table of contents thus provides an associative access to Metcast Channels.

The Channel Table of contents is a part of a larger Metcast Table of Contents (MTOC), which a Metcast server sends in response to a Describe query. This query implements a reflexive facility of the server, which a client may use to find out the products the server can deliver and how to ask for them.

In more detail, the Channel Table of Contents is described in [CTOC].

Authentication and Access Restrictions

Getting access to a channel is a two-stage process: a regular HTTP authorization mechanism followed by an additional, finer-grain authorization performed by the Metcast server or a publishing application. Permission to retrieve a product or to publish a new one is granted only when both stages allow it.

The first stage is a regular HTTP authorization/authentication applied to Metcast server or taker, which are ordinary executable resources (URLs) from the point of view of the HTTP server. A number of HTTP server configuration directives restrict access to server's resources based on client's host name, domain name, IP address, network, etc. In addition, an HTTP server may require client's authentication, using HTTP Basic, or other authentication scheme. Furthermore, a server may configured to accept only secure (TSL, SSL) connections and demand client certificate. If a particular client is denied access to a resource representing a Metcast taker (taker URL), the client may not publish into any of the channels.

When a Metcast server is told of a client's request, the server knows for a fact that the client successfully passed all standard HTTP access restrictions if they were imposed. If the HTTP daemon insisted on user authentication and was satisfied with it, the Metcast server is told that user's name. The server then checks to make sure that a requested channel permits a specified operation -- reading from or publishing into -- for this particular user, or for the public. Only when the second check passes that the request is processed.

Whenever access is denied by an HTTP daemon or a Metcast server/publisher, the client receives the standard HTTP error code and the corresponding message.

The second stage of access restrictions to a channel is controlled by a dedicated table in a Metcast Channels database. The table defines who may read or write into which channel. Permission to read or publish to a channel may be granted to a particular authenticated user, to several users, or to '*' (meaning everyone, or public). Only the database administrator has authority to modify or even to read this table. A Metcast client positively cannot find out which other users are granted permissions for which channels.

If a channel is not publicly readable, a request for that channel's products from an unauthenticated or unauthorized user will return nothing. To an unauthorized user, a restricted channel will appear empty.

Implementation Details

Metcast Channels' data and meta-data reside in their own self-contained database -- called MChannels -- managed by INFORMIX-OnLine Dynamic Server Version 9.2. The database is comprised of a set of tables describing channels, declared attributes, published "things", and their attributes. All the published content is stored in the database itself, as BLOBs. There are no external files to worry about; deleting a row from a DiscreteThings table un-publishes the corresponding product and removes its content. The database system guarantees atomicity, consistency, isolation and durability of all publishing and servicing transactions.

The source code (see below) contains extensive comments that discuss implementation issues in much more detail.

References

[CTOC] The Metcast Channels Table of Contents (CTOC)
<http://www.metnet.navy.mil/Metcast/XML/CTOC.html>

Source code

    http://www.metnet.navy.mil/Metcast/Code/

Contains the source code of a Metcast server and a taker.

History

$Id: Metcast-Channels.html,v 2.5 2003/02/26 04:58:22 oleg Exp oleg $

oleg-at-acm.org
Your comments, problem reports, questions are very welcome!