From oleg@pobox.com Tue May 22 21:31:34 2001 Date_sent: Tue, 22 May 2001 21:31:32 -0700 (PDT) From: oleg@pobox.com Newsgroups: comp.text.xml,comp.lang.scheme Date: Wed, 23 May 2001 04:26:31 +0000 (UTC) Message-Id: <200105230431.VAA03776@adric.cs.nps.navy.mil> To: comp.lang.scheme@mailgate.org, comp.text.xml@mailgate.org Subject: Literate XML/DTD programming Keywords: XML, SXML, literate programming, higher-order tags, JMGRIB Summary: SXML as a tool for a literate programming Reply-to: oleg@pobox.com Status: OR Designing an XML format for a particular domain involves more than just creating a DTD for a collection of elements, attributes and entities. An important part of the format is the documentation. Ideally it is a hyperlinked document that explains the background, the motivation, and the design principles; describes and cross-references each tag and attribute; and shows representative examples of the proposed markup. This article will demonstrate a tool that helps design a specification and the documentation at the same time -- and to keep them consistent. A literate design document should permit a transformation into a well laid-out, easy-to-read hyperlinked user manual. A literate design document should be easy to write. And yet the user manual should be precise enough to allow automatical extraction of a formal specification. We will demonstrate that SXML [1] is rather suitable for literate XML programming. Writing SXML is similar to writing TeX. SXML transformations do the job of "weaving" a document type specification and of "typesetting" the user manual. DTD offers poor tools for self-documentation. For one thing, content constraints in DTD are crude. It is impossible to specify that the value of a particular attribute is an integer within a certain range. All this meta-information has to be described in comments. The comments however are allowed only between ELEMENT, ATTLIST etc. declarations but not within them. It's hard to document tokens in a large enumerated list of choices. It's hard to display DTD in an off-the-shelf browser, especially in a hyper-linked format. The XML Schema Recommendation alleviates some of the above documentation problems, but it does not solve them all. The Recommendation provides for an 'annotation' element, which may include human documentation strings and hints for applications. However, an XSD file by itself is not user-friendly by any means. The markup is distracting; hyperlinks that may be included in documentation strings are not live. The XML Schema Recommendation "XML Schema Part 1: Structures" [2] makes it apparent why an XSD file still requires a human documentation. The Recommendation's web page [2] is a reader-friendly hyperlinked description of a Schema for XML Schema, which also includes as an appendix a formal, XSD specification. Although the XSD document contains quite a number of annotations, it is clear that it is not the replacement for the human-readable manual. The HTML page "Schema Part 1: Structures" [2] is generated from a master XML file, structures.xml [3]. The appendix with an XSD specification is typeset as follows:

And that is the end of the schema for &XSP1;.

That is, the format definition of the XML Schema to show to a user is pulled from a separate source. There is a chance therefore that the formal and the human-readable descriptions can get out of sync. It would be ideal if a formal specification could be generated from a user-manual itself. This ideal is achievable. We will use the example of JMGRIB [4], a (draft) family of XML formats for gridded binary data. The formats are being currently considered by a Joint METOC (Meteorology and Oceanography) data format working group. JMGRIB.html [4] is a (hopefully) detailed hyperlinked user manual -- with motivations, explanations, illustrations, and proper references. The manual is accompanied by a formal specification, an external DTD subset [5]. Both the user manual and the DTD are the result of applying appropriate "stylesheets" to the master document, an SXML file JMGRIB.scm [6]. SXML is an instance of XML Infoset as S-expressions, an Abstract Syntax Tree of an XML document. The latter property makes SXML roughly twice as easy to compose than raw XML. Authoring SXML has been straightforward indeed. The JMGRIB.scm document was written in Emacs. The standard Emacs scm mode took care of proper indentation and color-coding. Higher-order SXML "tags" that automatically build metadata elements, navigation bars, and a hierarchical table of contents turned out quite helpful too. Here's a definition of a JMGRIB element excerpted from JMGRIB.scm: (Section 3 "Parameter element") (DTD (comment "An environmental parameter") (element "Parameter" "EMPTY")) (DTD (comment "Attributes define a parameter, see Table 2 of GRIB") (attlist "Parameter" (attr "code" "char(4)" "ID" #t "The code that denotes a type of the environmental parameter." (note "The code is formed by prepending a letter 'P' to the code in the WMO code table 0291. This code is carried in PDS Octet 9." (br) "ddds counter: 36443") (example "P11 -- stands for temperature") (domain-expr (code "P1 .. P247"))) (attr "description" "char(256)" "CDATA" #t "The text that describes the environmental parameter" ) (attr "units" "char(35)" "NMTOKEN" #t "The units of measure" (example (code "m") ", " (code "K") ", " (code "Pa")) ))) A higher-order tag 'attr' names an attribute, defines its content constraints, and includes a line of comments as well as more descriptive notes, examples, and the domain expression. The content constraints are given in a DTD and a Data Definition Language formats; the latter correlates with a Joint GRIB data segment. #t is a flag that the attribute is required. A "stylesheet" JMGRIB-html.scm [7] converts the above SXML fragment into the following HTML code. It is excerpted from the JMGRIB user manual [4] and simplified for brevity:  

Parameter element

An environmental parameter

XML DTD
<!ELEMENT Parameter EMPTY>

Attributes define a parameter, see Table 2 of GRIB

Attributes

code
The code that denotes a type of the environmental parameter. char(4) ID
The code is formed by prepending a letter 'P' to the code in the WMO code table 0291. This code is carried in PDS Octet 9. ddds counter: 36443 Example: P11 -- stands for temperature Domain expr: P1 .. P247 The attribute is required
description
The text that describes the environmental parameter char(256) CDATA
The attribute is required
units
The units of measure char(35) NMTOKEN
Example: m, K, Pa The attribute is required
<!ATTLIST Parameter
	code ID #REQUIRED
	description CDATA #REQUIRED
	units NMTOKEN #REQUIRED
>
It must be stressed that the attribute list for a 'Parameter' element appears *twice* in the user manual. The first time it is shown with all the descriptions and comments. The second time it appears just as it will look in DTD. Both forms are produced from the _single_ SXML fragment. Unlike the XML Schema manual [3], there is no need to refer the user to a separate file, which may get out of sync. Another SXML "stylesheet" -- JMGRIB-dtd.scm [7] -- extracts the formal parts of the sample SXML fragment and formats them for DTD: The complete DTD is given in [5]. The goal of a literate XML/DTD programming is indeed achievable -- and practically helpful. JMGRIB is being discussed at this moment. References: [1] SXML Specification http://pobox.com/~oleg/ftp/Scheme/SXML.html [2] XML Schema Part 1: Structures. W3C Recommendation 2 May 2001 http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.html [3] XML Schema Part 1: Structures. W3C Recommendation 2 May 2001 (in XML) http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.xml [4] JMGRIB: Joint METOC XML format for grid data, User Manual http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.html [5] JMGRIB: Joint METOC XML format for grid data DTD http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.dtd [6] JMGRIB Master file http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.scm [7] JMGRIB.scm conversion stylesheets http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB-dtd.scm http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB-html.scm