From oleg@pobox.com Tue May 22 21:31:34 2001
Date_sent: Tue, 22 May 2001 21:31:32 -0700 (PDT)
From: oleg@pobox.com
Newsgroups: comp.text.xml,comp.lang.scheme
Date: Wed, 23 May 2001 04:26:31 +0000 (UTC)
Message-Id: <200105230431.VAA03776@adric.cs.nps.navy.mil>
To: comp.lang.scheme@mailgate.org, comp.text.xml@mailgate.org
Subject: Literate XML/DTD programming
Keywords: XML, SXML, literate programming, higher-order tags, JMGRIB
Summary: SXML as a tool for a literate programming
Reply-to: oleg@pobox.com
Status: OR
Designing an XML format for a particular domain involves more than
just creating a DTD for a collection of elements, attributes and
entities. An important part of the format is the documentation.
Ideally it is a hyperlinked document that explains the background, the
motivation, and the design principles; describes and cross-references
each tag and attribute; and shows representative examples of the
proposed markup. This article will demonstrate a tool that helps
design a specification and the documentation at the same time -- and to
keep them consistent.
A literate design document should permit a transformation into a well
laid-out, easy-to-read hyperlinked user manual. A literate design
document should be easy to write. And yet the user manual should be
precise enough to allow automatical extraction of a formal
specification. We will demonstrate that SXML [1] is rather suitable
for literate XML programming. Writing SXML is similar to writing
TeX. SXML transformations do the job of "weaving" a document type
specification and of "typesetting" the user manual.
DTD offers poor tools for self-documentation. For one thing, content
constraints in DTD are crude. It is impossible to specify that the
value of a particular attribute is an integer within a certain
range. All this meta-information has to be described in comments. The
comments however are allowed only between ELEMENT, ATTLIST
etc. declarations but not within them. It's hard to document tokens in
a large enumerated list of choices. It's hard to display DTD in an
off-the-shelf browser, especially in a hyper-linked format.
The XML Schema Recommendation alleviates some of the above
documentation problems, but it does not solve them all. The
Recommendation provides for an 'annotation' element, which may include
human documentation strings and hints for applications. However, an
XSD file by itself is not user-friendly by any means. The markup is
distracting; hyperlinks that may be included in documentation strings
are not live. The XML Schema Recommendation "XML Schema Part 1:
Structures" [2] makes it apparent why an XSD file still requires a
human documentation. The Recommendation's web page [2] is a
reader-friendly hyperlinked description of a Schema for XML Schema,
which also includes as an appendix a formal, XSD
specification. Although the XSD document contains quite a number of
annotations, it is clear that it is not the replacement for the
human-readable manual.
The HTML page "Schema Part 1: Structures" [2] is generated from a
master XML file, structures.xml [3]. The appendix with an XSD
specification is typeset as follows:
And that is the end of the schema for &XSP1;.
That is, the format definition of the XML Schema to show to a user is
pulled from a separate source. There is a chance therefore that the
formal and the human-readable descriptions can get out of sync. It
would be ideal if a formal specification could be generated from a
user-manual itself. This ideal is achievable.
We will use the example of JMGRIB [4], a (draft) family of XML formats
for gridded binary data. The formats are being currently considered by
a Joint METOC (Meteorology and Oceanography) data format working
group. JMGRIB.html [4] is a (hopefully) detailed hyperlinked user
manual -- with motivations, explanations, illustrations, and proper
references. The manual is accompanied by a formal specification, an
external DTD subset [5].
Both the user manual and the DTD are the result of applying
appropriate "stylesheets" to the master document, an SXML file
JMGRIB.scm [6]. SXML is an instance of XML Infoset as S-expressions,
an Abstract Syntax Tree of an XML document. The latter property makes
SXML roughly twice as easy to compose than raw XML. Authoring SXML has
been straightforward indeed. The JMGRIB.scm document was written in
Emacs. The standard Emacs scm mode took care of proper indentation and
color-coding. Higher-order SXML "tags" that automatically build
metadata elements, navigation bars, and a hierarchical table of
contents turned out quite helpful too.
Here's a definition of a JMGRIB element excerpted from JMGRIB.scm:
(Section 3 "Parameter element")
(DTD
(comment "An environmental parameter")
(element "Parameter" "EMPTY"))
(DTD
(comment "Attributes define a parameter, see Table 2 of GRIB")
(attlist "Parameter"
(attr "code" "char(4)" "ID" #t
"The code that denotes a type of the environmental parameter."
(note "The code is formed by prepending a letter 'P' to the
code in the WMO code table 0291. This code is carried
in PDS Octet 9."
(br) "ddds counter: 36443")
(example "P11 -- stands for temperature")
(domain-expr (code "P1 .. P247")))
(attr "description" "char(256)" "CDATA" #t
"The text that describes the environmental parameter"
)
(attr "units" "char(35)" "NMTOKEN" #t
"The units of measure"
(example (code "m") ", " (code "K") ", " (code "Pa"))
)))
A higher-order tag 'attr' names an attribute, defines its content
constraints, and includes a line of comments as well as more
descriptive notes, examples, and the domain expression. The content
constraints are given in a DTD and a Data Definition Language formats;
the latter correlates with a Joint GRIB data segment. #t is a flag
that the attribute is required.
A "stylesheet" JMGRIB-html.scm [7] converts the above SXML fragment
into the following HTML code. It is excerpted from the JMGRIB user
manual [4] and simplified for brevity:
Parameter element
An environmental parameter
XML DTD |
<!ELEMENT Parameter EMPTY> |
Attributes define a parameter, see Table 2 of GRIB
Attributes
code |
The code that denotes a type of the environmental parameter. |
char(4) ID |
The code is formed by prepending a letter 'P' to the code in the WMO code table 0291. This code is carried in PDS Octet 9.
ddds counter: 36443
Example: P11 -- stands for temperature
Domain expr: P1 .. P247
The attribute is required |
description |
The text that describes the environmental parameter |
char(256) CDATA |
The attribute is required |
units |
The units of measure |
char(35) NMTOKEN |
Example: m , K , Pa
The attribute is required
|
<!ATTLIST Parameter
code ID #REQUIRED
description CDATA #REQUIRED
units NMTOKEN #REQUIRED
>
|
It must be stressed that the attribute list for a 'Parameter' element
appears *twice* in the user manual. The first time it is shown with
all the descriptions and comments. The second time it appears just as
it will look in DTD. Both forms are produced from the _single_ SXML
fragment. Unlike the XML Schema manual [3], there is no need to refer
the user to a separate file, which may get out of sync.
Another SXML "stylesheet" -- JMGRIB-dtd.scm [7] -- extracts the formal
parts of the sample SXML fragment and formats them for DTD:
The complete DTD is given in [5].
The goal of a literate XML/DTD programming is indeed achievable -- and
practically helpful. JMGRIB is being discussed at this moment.
References:
[1] SXML Specification
http://pobox.com/~oleg/ftp/Scheme/SXML.html
[2] XML Schema Part 1: Structures. W3C Recommendation 2 May 2001
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.html
[3] XML Schema Part 1: Structures. W3C Recommendation 2 May 2001 (in XML)
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.xml
[4] JMGRIB: Joint METOC XML format for grid data, User Manual
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.html
[5] JMGRIB: Joint METOC XML format for grid data DTD
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.dtd
[6] JMGRIB Master file
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.scm
[7] JMGRIB.scm conversion stylesheets
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB-dtd.scm
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB-html.scm