From posting-system@google.com Wed Aug 29 19:18:41 2001 Date: Wed, 29 Aug 2001 19:13:29 -0700 Reply-To: oleg@pobox.com From: oleg@pobox.com (oleg@pobox.com) Newsgroups: comp.lang.scheme Subject: Re: s-expressions as a file format? References: <3NVg7.17243$4b5.432365@news6.giganews.com> <3b861fe0.9527635@news.minvenw.nl> <16Jh7.22354$4b5.558222@news6.giganews.com> <7eb8ac3e.0108271310.39fb628b@posting.google.com> Message-ID: <7eb8ac3e.0108291813.3770e349@posting.google.com> Status: OR S-expression-based files are handy not only for authoring Web pages. They can be used to build XML documents. The following is a non-contrived and less-trivial example of that. It is straightforward to convert "data" into (tag "data") and vice versa. The SSAX parser and SXML manipulation tools can do that easily. However, exporting _relational_ sources into XML often runs into an impedance mismatch. XML by its nature a hierarchical database. We will show an example where generating XML from s-expression involves denormalization, "table joins" and third-order tags. The s-expression format turns out not only more understandable and insightful, but more concise as well, by a factor of four. The background: DISA (www.disa.mil) is in charge of maintaining an XML registry for the defense community. The registry accepts submissions of XML formats, as collections of DTD/Schema documents, textual descriptions, sample code, etc. Every submission package must have a Manifest.xml file, which describes all the submitted documents as well as every element ("tag") and its attributes. The following is a representative sample. It describes one example XML document, one element and one of its attributes. The XML snippets below are not a joke: that's how Manifest.xml must really look like. BTW, I don't work for DISA; the Manifest.xml format wasn't my idea. The snippets are taken from the actual manifest, http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/Manifest.xml and edited for brevity. Still they look ugly, and I apologize for that. We will see shortly how this ugly code can be represented in a concise and pleasing way. Description of a sample XML document: 27 February 2001 OMF Example: METAR/SYNOP/SPECI MET OMF-sample.xml OMF2.2 OMF-sample.xml MET OMF-SYNOP.html OMF2.2 Description of one XML element (element BTSC) within the submitted collection: 12 April 2000 an observation report on temperature, salinity and currents at one particular location on the ocean surface, or in subsurface layers MET BTSC OMF4.1 MET MET BTLEVELS OMF4.1 MET TStamp OMF4.1 MET Depth OMF4.1 MET OMF-BATHY.html OMF1.4 Description of an attribute ('TStamp'), which annotates a BTSC element: 12 April 2000 Time Stamp MET TStamp OMF4.1 10 second MET OMF.html OMF2.2 The analysis of DISA documentation and its sample code shows that Manifest.xml file is a collection of resources, which are described in particular ways and can be related to one another. It seemed logical then to make such representation explicit. The following is a snippet from a Manifest.scm file http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/Manifest.scm which corresponds to a superset of above XML snippets. (Resource "OMF.html" "Weather Observation Definition Format (OMF) Document" "10 March 2000" "2.2") (DescribeDoc "OMF.html") (Resource "OMF-SYNOP.html" "Surface Weather Reports from land and sea stations" "12 April 2000" "2.2") (DescribeDoc "OMF-SYNOP.html") (Resource "BTSC" "an observation report on temperature, salinity and currents at one particular location on the ocean surface, or in subsurface layers" "12 April 2000" "4.1") (Resource "BTID" "identification and position data, which constitute Section 1 of FM 62 - 64." "12 April 2000" "4.1") (Resource "BTLEVELS" "a sequence of BTLEVEL elements for each particular (sub)surface level described in a whole BTSC report" "12 April 2000" "4.1") (XMLElement "BTSC" (DTContainer "BTID" "BTCODE" "BTLEVELS") "OMF-BATHY.html" (Attlist "TStamp" "LatLon" "BId" "SName" "Title" "Depth")) (XMLElement "BTID" (DTString 40) "OMF-BATHY.html" (Attlist "DZ" "Rec" "WS" "Curr-s" "Curr-d" "AV-T" "AV-Sal" "AV-Curr" "Sal")) (Resource "TD" "The dew-point temperature" "12 April 2000" "4.1") (Resource "TRange" "Time Interval" "12 April 2000" "4.1") (Resource "TStamp" "Time Stamp" "12 April 2000" "4.1") (XMLAttr "TD" (DTFloat 6 2 "deg C") #f "OMF-SYNOP.html") (XMLAttr "TRange" (DTString 30) #f "OMF.html") (XMLAttr "TStamp" (DTInt 10 "second") #f "OMF.html") An element 'Resource' describes one resource: its name, description, modification date and the version. Within the whole DISA repository, a resource is referenced by its name, namespace id and the version. Within a single Manifest document, the namespace id must be the same; the version string should be the same. Therefore, we can refer to a resource simply by its name. Let's consider two s-expressions in more detail: (Resource "OMF-SYNOP.html" "Surface Weather Reports from land and sea stations" "12 April 2000" "2.2") (DescribeDoc "OMF-SYNOP.html") If you search Manifest.xml for a "" or "" tags, you will find neither. The 'Resource' s-expression merely declares a resource and serves as a container of its attributes. During the XML transformation, the 'Resource' tag expands to nothing, as the following stylesheet (rewriting) rule indicates: (Resource . ,(lambda (tag name title date version) '())) ; null expansion Resources are described in different ways, depending on their type. For example, (DescribeDoc "OMF-SYNOP.html") tells that "OMF-SYNOP.html" is a textual document. Here's a stylesheet rule for 'DescribeDoc' (DescribeDoc ; Describe a document resource . ,(lambda (tag name) (generate-XML `(AddTransaction (Resource-descr ,name) (InformationResourceTypeDocument (InformationResourceLocation ,name) ))) )) The rule expands 'DescribeDoc' into a set of DISA elements (e.g., 'InformationResourceTypeDocument') distinguished by their unwieldy names, and a (Resource-descr "OMF-SYNOP.html"). During the second pass, the stylesheet transformer will expand Resource-descr according to the rule ; Locate a named resource and expand into its full description. ; We prepend a string OMF to the version string, to make the ; resource reference unique to OMF within the MET namespace. (Resource-descr . ,(lambda (tag name) (let-values* (((name title date version) (lookup-res name))) (generate-XML (list (list 'EffectiveDate date) (list 'Definition title) '(Namespace "MET") (list 'InformationResourceName name) (list 'InformationResourceVersion "OMF" version)))))) The end result will be a difficult-to-read XML fragment at the beginning of this article. The expansion of (Resource-descr "OMF-SYNOP.html") will scan Manifest.scm for a '(Resource "OMF-SYNOP.html" ...)' and use its attributes to generate the proper resource description. In database terms, expansion of '(DescribeDoc "OMF-SYNOP.html")' involved a join of two tables. Tag 'InformationResourceName' is a DISA, primary tag. Tag 'Resource-descr', whose expansion contains 'InformationResourceName', is then a second-order tag. Tag 'DescribeDoc' is therefore a third-order tag. Expansion of an XMLElement is rather similar. Here's a stylesheet rule for it: (XMLElement ; Describe an XML element . ,(lambda (tag name content descr-by . attlist) (generate-XML `(AddTransaction (Resource-descr ,name) (InformationResourceTypeXMLElement ,content (Relationships ,@attlist (DescribedBy (Resource-ref ,descr-by)) )))))) A stylesheet applicator, an SXML transformer, is an applicative-order evaluator. That is, by the time 'XMLElement' in the expression (XMLElement "BTSC" (DTContainer "BTID" "BTCODE" "BTLEVELS") "OMF-BATHY.html" (Attlist "TStamp" "LatLon" "BId" "SName" "Title" "Depth")) is about to be expanded, a nested s-expression '(Attlist "TStamp" "LatLon" "BId" "SName" "Title" "Depth")' has already been processed, by the rule (Attlist . ,(lambda (tag . attr-names) (map (lambda (attr-name) (list 'IsQualifiedByAttribute (list 'Resource-ref attr-name))) attr-names))) The second order tag 'Resource-ref' will be handled at the recursive invocation of generate-XML. The result will be the XML fragment at the beginning of the article, with resource references in the DISA format. A set of (Resource ...) s-expressions in Manifest.scm represents an embedded (relational) table. During the transformations, this table is joined with the others to produce the final XML document (which, as it happens, will be parsed at DISA, normalized, and stored in a set of Oracle tables). IMHO, Manifest.scm describes the collection of OMF resources in a more readable and understandable way. It is instructive to compare the file sizes of Manifest.scm and Manifest.xml: Manifest.scm: 25831 bytes (of which 9220 bytes are the transformation stylesheet and the related code) Manifest.xml: 90377 bytes. The OMF XML format has been submitted to DISA. Manifest.scm was a part of the submission. You can search for "OMF" or "Manifest.scm" using the form: http://diides.ncr.disa.mil/xmlreg/user/search.cfm?adv=yes To be precise, the Manifest.scm URL is http://diides.ncr.disa.mil/xmlreg/package_docs/Public/MET/OMF_package/997725059053/Manifest.scm The stylesheet (which effects the transformation into XML) is included at the end of Manifest.scm. Yes, there is Scheme code in disa.mil domain... Ray Blaak wrote in message news:... > I suggest: > (Date-Revision (year 2001) (month 08) (day 14)) > (Date-Creation (year 2001) (month 04) (day 16)) > (keywords "XML" "GRIB" "grid" "WMO" "FM92") > Client processing is then straight forward and robust. A good idea, thank you. The expressiveness and the power of S-expressions is immense indeed, I'm still discovering it.