A micro-talk presentation at a Workshop on Scheme
and Functional Programming 2000
Montreal, 17 September
2000
This talk will propose consistent or conformant Scheme implementations
of W3C Recommendations: XML Infoset, XPath query language and a small subset
of XSL Transformations.
The following is the view of XML semantics as set out by the W3 Consortium:
At the top of the hierarchy is an XML information set, Infoset: an abstract data set that describes information available in a well-formed XML document. Infoset's goal is to present in some form all relevant pieces of data and their abstract, container-slot relationships to each other. The implementation of this nest of containers as well as means of accessing items and properties are beyond the scope of the Infoset specification. XML document, with tags in familiar angular brackets is one of the concrete instances of an XML Infoset. Conversion is through parsing/unparsing.
The XML Path Language, XPath, makes the tree structure of XML Infoset explicit, and more practical. For example, XPath groups character information items into strings. Still the tree model of XPath is conceptual only, and does not mandate any particular implementation. XPath is a query language over an XPath tree. A Document Object Model (DOM) is another specialization of Infoset. It not only describes a tree view of a document but also makes the tree real to an application, via a set of interfaces to navigate the tree and query or update its nodes. XML Stylesheet Language Transformations, XSLT, build upon the XPath tree to describe transformations from branches and leaves of one tree onto another.
This talk will show a conformant instance of this mostly
abstract hierarchy with Scheme as an implementation language. Scheme
represents the XML Infoset and the XPath tree -- which are data
structures. Scheme is also used to express XPath queries and tree
transformations.
This is a micro-talk, and therefore, will flash a few simple examples. The reference section points to more detailed documents.
The following is a sample XML document to be used throughout this talk.
<Forecasts TStamp='958082142'> <TAF TStamp='958066200' LatLon='36.583, -121.850' BId='724915' SName='KMRY, MONTEREY PENINSULA'> <VALID TRange='958068000, 958154400'>111730Z 111818</VALID> <PERIOD TRange='958068000, 958078800'> <PREVAILING>31010KT P6SM FEW030</PREVAILING></PERIOD> <PERIOD TRange='958078800, 958104000' Title='FM2100'> <PREVAILING>29016KT P6SM FEW040</PREVAILING></PERIOD> <PERIOD TRange='958104000, 958154400' Title='FM0400'> <PREVAILING>29010KT P6SM SCT200</PREVAILING> <VAR Title='BECMG 0708' TRange='958114800, 958118400'> VRB05KT</VAR> </PERIOD></TAF> </Forecasts>
Here is its representation in SXML:
(Forecasts (@ (TStamp "958082142")) (TAF (@ (SName "KMRY, MONTEREY PENINSULA")(BId "724915") (LatLon "36.583, -121.850")(TStamp "958066200")) (VALID (@ (TRange "958068000, 958154400")) "111730Z 111818") (PERIOD (@ (TRange "958068000, 958078800")) (PREVAILING "31010KT P6SM FEW030")) (PERIOD (@ (Title "FM2100") (TRange "958078800, 958104000")) (PREVAILING "29016KT P6SM FEW040")) (PERIOD (@ (Title "FM0400") (TRange "958104000, 958154400")) (PREVAILING "29010KT P6SM SCT200") (VAR (@ (TRange "958114800, 958118400") (Title "BECMG 0708")) "VRB05KT"))))This SXML data structure is produced by four lines of code -- an XML parser:
(call-with-input-string doc-string (lambda (port) (SSAX:element->SXML (SSAX:read-content-norm-ws port) port)))
The SXML form appears a bit more concise. For a document without
external parsed entities and within a single namespace, SXML conforms to the
XML Information Set specification.
Let's consider querying of the above tree. The SXML data structure
above (assumed to be bound to an identifier tree1
throughout this talk) contains three PERIOD
elements. In
the first query, we want to retrieve the body of the
PREVAILING
element from the second PERIOD
.
Here is the query expressed in XPath, abbreviated notation:
//PERIOD[2]/PREVAILING/text()
The following line shows
the query in SXPath (evaluated by a sxpath
interpreter)
((sxpath '(// (PERIOD 2) PREVAILING *text*)) tree1)
The block of code below is an SXML query in the full notation. It is an executable Scheme code.
((node-reduce (node-closure (node-typeof? 'PERIOD)) (node-pos 2) (select-kids (node-typeof? 'PREVAILING)) (select-kids (node-typeof? '*text*)) ) tree1)The result of each of the queries is the same: a nodeset
("29016KT P6SM FEW040")
In the second example, we ask for a TRange
attribute of
those PERIODS
that also have a Title
attribute. Of the three PERIODS
(see SXML above), only the last two have
the Title
attribute. Again, we compare XPath with SXPath:
//PERIOD[@Title]/@TRange ((sxpath '(// (PERIOD (@ Title)) @ TRange)) tree1)In both cases, the result is the same nodeset:
((TRange "958078800, 958104000") (TRange "958104000, 958154400"))
Finally, we are asking for a Title
attribute of such a
PERIOD
whose time range includes the time moment of
958104010
epoch seconds.
((sxpath `(// (PERIOD (@ TRange *text* ,(lambda (trange) (with-input-from-string trange (lambda () (let* ((tbegin (read)) (delim (read-char)) (tend (read))) (or (< tbegin 958104010 tend) '()) ))) ) )) @ Title)) tree1) ==> ((Title "FM0400"))An SXPath expression may contain an arbitrary Scheme code as a filter. The
let*
block parses the TRange
attribute
and makes sure the range includes the given time moment. The
corresponding expression in XPath would be more complex (requiring
escape into JavaScript).
Thus, abbreviated SXPath expressions look identical to XPath expressions
modulo parentheses and path separators.
This piece of code translates SXML back to its markup form. These few lines can handle every SXML tree: the code is the most generic unparser. See the References for the complete source code.
(SRV:send-reply (post-order tree1 `((@ ((*default* ; local override for attrs . ,(lambda (attr-key value) ((enattr attr-key) value)))) . ,(lambda (trigger value) (list '@ value))) (*default* . ,(lambda (tag elems) (apply (entag tag) elems))) (*text* . ,(lambda (trigger str) str)) )) )
Finally this slide shows several examples of SXML transformations: re-writing from one tree to
another. In this particular example, we re-write the sample SXML tree
into a flat tree that contains TRange
attributes of every
PERIOD
element. The first fragment is in XML Stylesheet
Language Transformations (XSLT):
<xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="PERIOD/@TRange"> <xsl:value-of/> </xsl:template>
The next piece of code traverses the tree and applies handlers in post-order:
(post-order tree `( (PERIOD ((TRange ((*text* . ,id)) . ,id)) . ,id) (*text* . ,(lambda (trigger value) '())) (*default* . ,id) ))Most of the handlers are just identity transformers of a node or a branch. The handler
(lambda (trigger value)
'())
suppresses all the text content by default. However, this
default is overridden in the context were a TRange
node
was seen during traversal of a PERIOD
branch.
The last fragment accomplishes the same task by traversing the SXML tree in pre-order. In this case we don't need to specify the whole bunch of identity transformers. The engine does not descend into branches unless necessary, that is, unless a handler is specified for a node or a branch. In contrast, in the post-order example above, the entire tree is always traversed once. Note how SXSLT looks similar to XSLT.
(apply-templates tree1 `((PERIOD @ TRange *text* . ,(lambda (node) node)))) ==> ("958068000, 958078800" "958078800, 958104000" "958104000, 958154400" "958114800, 958118400")
It all works. I have tested all the examples of this talk: I ran
them and pasted the results from an emacs *scheme*
window into the presentation. Details are available in the References
section below.
We have shown an s-expression-style for XML, XPath and XSLT. SXML,
SXPath and SXSLT however are not merely s-expressions: they can denote
executable Scheme code and can be evaluated by an eval
function as they are. Thus an XML document and operations on it
can be expressed in Scheme -- and regarded either as data structures or
as code.
[XML]
World Wide Web Consortium. Extensible Markup Language (XML)
Version 1.0. W3C Recommendation 10 February 1998
[XML Infoset]
World Wide Web Consortium. XML Information Set.
W3C Working Draft.
[XPath]
World Wide Web Consortium. XML Path Language (XPath).
Version 1.0. W3C Recommendation 16 November 1999.
[XSLT]
World Wide Web Consortium. XSL Transformations (XSLT).
Version 1.0. W3C Recommendation 16 November 1999.
[SXML]
SXML is an instance of XML Infoset as S-expressions. SXML is an
Abstract Syntax Tree of an XML document.
SXML.html
[SXmanip]
Evaluating SXML: XSL Transformations in Scheme
oleg-at-okmij.org