SSAX
SXML short presentation
A micro-talk presentation at the Workshop on Scheme and Functional Programming 2000. Montreal, September 17, 2000.
A better XML parser through functional programming
<http://link.springer.de/link/service/series/0558/tocs/t2257.htm>
XML-parsing.ps.gz [69K]
XML-parsing-talk.ps.gz [93K]
The paper is published in LNCS volume 2257 with the Copyright
held by Springer-Verlag. The file XML-parsing.ps.gz
is a
local copy of that copyrighted paper. The file XML-parsing-talk.ps.gz
is the transcript of a presentation at PADL
2002, the 4th International Symposium on Practical Aspects of
Declarative Languages. The topic of the presentation was the
derivation of the SSAX API and the comparison of SSAX with other
functional-style XML parsers and with Expat.
SXML Specification, in SXML/HTML/LaTeX/PDF
ACM SIGPLAN Notices, v.37(6), pp. 52-58, June 2002.
XML, XPath, XSLT implementations as SXML, SXPath, and SXSLT
SXs.pdf [175K]
SXs-talk.pdf [200K]
A paper and a transcript of a complementary talk presented at the International Lisp Conference 2002, October 27-31, 2002.
SXSLT: Manipulation Language for XML
<http://link.springer.de/link/service/series/0558/tocs/t2562.htm>
SXSLT.ps.gz [68K]
SXSLT-talk.pdf [331K]
The paper (with Shriram Krishnamurthi) is published in LNCS
volume 2562 with the Copyright held by Springer-Verlag. The file
SXSLT.ps.gz
is a local copy of that copyrighted
paper. The file SXSLT-talk.pdf
is the transcript of a
presentation at PADL 2003, the 5th International Symposium on
Practical Aspects of Declarative Languages. The paper introduces SXSLT
and compares it with XSLT. Our experience and user comments show that
SXSLT is expressive and easy to use. We argue that this outcome is a
consequence of SXSLT providing right abstractions for XML
transformations, of being higher-order, declarative and extensible.
Advanced SXLST tutorial
sxslt-advanced.scm [16K]
This file describes common patterns of SXSLT on an interesting
example. We demonstrate higher-order tags, pre-order and post-order
transformations, re-writing of SXML elements in regular and special
ways, context-sensitive applications of re-writing rules, and
reflection. Although the file is technically a Scheme source code, it is
actually a tutorial. Of 357 lines in the file, 241 are comments and 24
are just blank lines.
The parsing framework offers support for XML validation, to the full or any user-specified degree. Framework's procedures by themselves detect great many validation errors and almost all well-formedness errors. Furthermore, a user is given a chance to set his own handlers to capture the rest of the errors. Content is validated given user-specified constraints, which the user can derive from a DTD, from an XML schema, or from other competing doctype specification formats.
SSAX is a full-featured, algorithmically optimal, pure-functional parser, which can act as a stream processor. SSAX is an efficient SAX parser that is easy to use. SSAX minimizes the amount of application-specific state that has to be shared among user-supplied event handlers. SSAX makes the maintenance of an application-specific element stack unnecessary, which eliminates several classes of common bugs. SSAX is written in a pure-functional subset of Scheme. Therefore, the event handlers are referentially transparent, which makes them easier for a programmer to write and to reason about. The more expressive, reliable and easier to use application interface for the event-driven XML parsing is the outcome of implementing the parsing engine as an enhanced tree fold combinator, which fully captures the control pattern of the depth-first tree traversal.
ssax.sourceforge.net
SSAX and SXML at SourceForge
CVS tree at SourceForge
The CVS tree includes the complete SSAX/SXML code with
all its dependencies, some documentation, validation tests, as well as
several sample applications, benchmarks and usable examples. You can browse the files in the CVS tree from any web browser.
SSAX.scm [122K]
Well-commented source code of the package. The code comes with an extensive set of self-tests, which verify not only the correct
behavior but also the detection of constraint violations.
http://metapaper.net/xml/ssax/
Current versions of the SSAX ports to various Scheme systems (including Bigloo, Guile, Chicken, and PLT Scheme). These versions are kindly
maintained by Kirill Lisovsky, whose efforts are greatly appreciated.
Haskell Iteratee version of the XML parsing framework
Comparing SSAX and Expat XML parsers in features and performance
SAX/DOM/SXML parsers with support for XML Namespaces and validation [plain text file]
An article posted on comp.lang.scheme and comp.text.xml
newsgroups on Thu, 15 Mar 2001 19:31:16 GMT
A simple XML _lexer_ in Scheme [plain text file]
Announcement of version 1 of SSAX. An article
posted on comp.lang.scheme and comp.text.xml newsgroups on Sun, 28 Nov
1999 23:12:50 GMT
SXML is an abstract syntax tree of an XML document. SXML is also a concrete representation of the XML Infoset in the form of S-expressions. The generic tree structure of SXML lends itself to a compact library of combinators for querying and transforming SXML.
The master SXML specification file is written in SXML itself. With appropriate stylesheets, the master file has been converted to a web page and to a LaTeX file. The latter was processed to a PDF document.
SXML.scm [52K]
The master file, written in SXML. The second part of the file is a stylesheet for rendering the specification in HTML.
SXML.html [51K]
The specification web page, converted from SXML above.
SXML-paper.tex [40K]
The specification in LaTeX, converted from SXML above.
SXML-paper.scm [16K]
The conversion stylesheet, from SXML to LaTeX.
SXML-paper.pdf [71K]
The specification in the PDF format.
ACM SIGPLAN Notices, v.37(6), pp. 52-58, June 2002.
SXPath is a query language for SXML, an instance of an XML Information set (Infoset) in the form of s-expressions. On the other hand, SXPath is an implementation of XPath: there exists a formal re-writing from XPath to SXPath. Abbreviated-form SXPath expression look the same as abbreviated-form XPath modulo delimiters and parentheses. XPath underlies both XSLT and XPointer W3C Recommendations.
The full-form SXPath notation is a combinator library for querying SXML. It is comprised of several primitive converters that can be joined by a small number of powerful combinators. Primitive and composite converters closely correspond to XPath location paths. This makes SXPath an XPath's virtual machine.
The new version of SXPath.scm has more and better comments, more
converters and combinators. It supports following-sibling::
and preceding-sibling::
axes of XPath. The better half
of the SXPath.scm file (lines 463 through 1154) is a built-in test
suite. The tests make sure that examples given in the XPath
Recommendation indeed work in SXPath. The regression tests show the
correspondence between full- and abbreviated-form XPath expressions,
and the corresponding notations of SXPath. One of the examples is to
illustrate and test the difference between //para[1]
and /descendant::para[1]
. The difference is explained in a
note to Section 2.5 of the XPath Recommendation. After considerable
practice, I need to read the note only twice or thrice to remember
what this distinction is. SXPath somehow is capable of reproducing the
said difference.
SXPath.scm [41K]
Complete, very commented source code with included extensive validation tests.
SXSLT is a practical XML transformation language. It is a
higher-order pure functional transformation language based
on flexible tree traversals. SXSLT is implemented as a library in a
pure functional subset of Scheme. At the core of this library is a
function pre-post-order
, which takes an SXML tree and and
a stylesheet, and returns a transformed tree.
pre-post-order:: TREE x BINDINGS -> TREE
where
type TREE = SXML-NODE | SXML-NODELIST type BINDINGS = (BINDING ...) type BINDING = (TRIGGER-SYMB *preorder* . HANDLER) | (TRIGGER-SYMB *macro* . HANDLER) | (TRIGGER-SYMB NEW-BINDINGS . HANDLER) | (TRIGGER-SYMB . HANDLER) type TRIGGER-SYMB = SXML-NAME | *text* | *default* type NEW-BINDINGS = BINDINGS type HANDLER = TRIGGER-SYMB x TREE -> TREE
The function pre-post-order traverses a TREE
,
which is either a SXML-NODE
or SXML-NODELIST
. For each SXML-NODE
of the form (SXML-NAME SXML-NODE...)
the function looks up an
association with the given SXML-NAME
among its BINDINGS
. If failed, the function tries to locate a
*default*
binding. It is an error if the latter
attempt fails as well.
Having found a binding, the function pre-post-order first
checks whether the binding prescribes a pre-order traversal. If
this is the case, the handler is 'applied' to the current
node. Otherwise, the pre-post-order function first calls itself
recursively on each child of the current node, with NEW-BINDINGS
prepended to the current bindings. The handler
is then applied to the head of the current node and the transformed
children. The result of the handler, which should also be an TREE
, becomes a SXML-NODE
of the transformed
tree. If the current SXML-NODE
is a text string, a
special *text*
binding is looked up.
The NEW-BINDINGS
that appears in
(TRIGGER-SYMB NEW-BINDINGS . HANDLER)
has the same form as
BINDINGS
. The NEW-BINDINGS
is a local
stylesheet. It takes effect only when traversing the children of a
particular node (whose name is the TRIGGER-SYMBOL
). The
bindings of NEW-BINDINGS
override the bindings from the
"parent" stylesheet.
A *macro*
binding also specifies a pre-order
traversal. The handler of a *macro*
binding is applied to
the current node. The result however does not become a node in the
output tree. Rather it is traversed again. A *macro*
binding is thus equivalent to a syntactic binding of Scheme. Although
*macro*
could be written as a *preorder*
binding whose handler recursively invokes pre-post-order, this pattern
turns out to be quite frequent to justify its own notation. The word
'macro' highlights the computation of a fixpoint: we successively
re-write an SXML node into other nodes until we reach text fragments
representing the target HTML, XML or LaTeX document.
SRV:send-reply:: FRAGMENT x FRAGMENT ... -> BOOL
The FRAGMENT
s are lists of strings, characters,
numbers, thunks, #f
-- and other fragments. The function
traverses the tree of FRAGMENT
s depth-first pre-order, writes out
strings and characters, executes thunks, and ignores #f
and '()
. The function returns #t
if it
wrote anything.
replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST
The function replace-range traverses the FOREST
depth-first and cut/replaces ranges of nodes. Here
type FOREST = (NODE ...) type NODE = Atom | (Name . FOREST) | FOREST beg-pred:: NODE -> #f | FOREST end-pred:: NODE -> #f | FOREST
The range of nodes to cut/replace is specified by two
predicates, beg-pred
and end-pred
. The nodes
that define a range do not have to have the same immediate parent, do
not have to be on the same level, and the end node of a range does not
even have to exist. The replace-range procedure removes nodes from the
beginning node of the range up to (but not including) the end node of
the range. In addition, the beginning node of the range can be
replaced by a node or a list of nodes. The range of nodes is cut while
depth-first traversing the forest. If all branches of the node are cut
a node is removed as well. The procedure can cut several
non-overlapping ranges from a forest.
SXML-tree-trans.scm [10K]
SXML expression tree transformers
vSXML-tree-trans.scm [3K]
The validation code
SXML specification
The HTML conversion stylesheet in the master SXML
Specification file is a good example of *preorder*
, local
stylesheet, and *macro*
bindings.
apply-templates.scm [11K]
Another approach to SXML transformations, in the
spirit of XSLT.
This message reports on several experiments in Haskell to ascertain plausibility, benefits and drawbacks of typed SXML in a mainstream functional language. It turns out that Haskell as it is can represent semi-structured data in SXML-conforming syntax, with the extensible set of `tags' and statically enforced content model restrictions. Querying, transforming and advanced rendering into HTML and XML are possible. The experience of writing many moderately large and complex web pages in HSXML shows that the typed SXML can be used in practice.
The benefit of representing XML-like documents as a typed
data structure/Haskell code is static rejection of bad documents --
not only those with undeclared tags but also those where elements
appear in wrong contexts. Using HTML-like HSXML as an example, a
document with H1
within the P
element is
rejected as invalid. No HSXML transformation may introduce nested
Hn
or P
elements: the code will not compile
otherwise due to a type error. Thus the generated XML or HTML document
will not only be well-formed but also will satisfy some validity
constraints. Types also guide queries, transformations, and
rendering, making them context sensitive. The type inference of such
expressive types is possible; in fact, no type
annotations are needed in an HSXML document.
It used to be that the type inference and type checking
took time in GHC(i). The latest version is quite fast. Static typing
does not inhibit extensibility. The HSXML library user may always
define new `tags', new transformation rules and new contexts.
Description.txt [8K]
HSXML Haskell library, sample code and descriptions
Takahiro Kido (shelarcy): Typed S-expression-based XML
<http://page.freett.com/shelarcy/log/2012/diary_12.html#typed_s_expression_xml>
Detailed and easy-to-understand introduction to HSXML (in Japanese)
Typed SXML [plain text file]
An old message posted on the SSAX-SXML mailing list on
Mon, 18 Dec 2006 18:02:01 -0800 (PST). The message described version 1.0.
In the current version HSXML not longer uses square brackets and
looks almost the same as SXML: see the examples below.
An example of authoring web pages in HSXML
undelimited.hs [35K]
The master HSXML file
The rendered page
The converse to XML parsing is SXML pretty-printing: a translation from an abstract to a concrete XML syntax, to an XML or HTML document. The pretty-printing makes it possible to author and compose XML documents in their SXML form. SXML is more concise and expressive than a raw markup language. SXML representing regular Scheme code can be entered in any Scheme-sensitive editor. SXML as a data structure -- a list -- can likewise be composed as a literal or quasi-literal expression. Furthermore, SXML can be produced by regular Scheme functions, which may make authoring more succinct, advanced, and less tedious, as the code below illustrates.
The pretty-printing consists of two passes. The first one
traverses an s-expression tree in post-order, and converts it
into a tree of HTML fragments. The tree of fragments, when flattened
and concatenated, yields the resulting HTML document. Of course there
is no need to flatten the tree and concatenate strings
explicitly. Rather, we traverse the tree of fragments depth-first and
write out fragments. In this approach, all character and other data,
once created, stay where they were originally allocated. No data
are ever copied. Functions such as string->number
are
implied as well.
The second pass, a pre-order traversal of a tree of fragments, is
performed by a function SRV:send-reply
. Besides writing
out character data it encounters, the function executes thunks and
ignores '()
and #f
. This feature lets us,
for example, embed conditional forms into an s-expression
representing HTML:
(SXML->HTML `(p "par1" "par2" ,(and test (list "par3" "par4"))))Depending on the value of the
test
, strings "par3" and
"par4" will be either both included, or both omitted from the
marked-up paragraph.
Here is a more meaningful example, which intelligently generates a "navigation bar", a pair of links to the preceding and the following slides. Obviously the first and the last page must have only one navigation link.
(define (print-slide n max-count) (SXML->HTML `((h2 "Slide number:" ,n) ; Note n is used in its native form ,(and (positive? n) `(a (@ (href "base-url&slide=" ,(- n 1))) "prev")) ,(and (< (+ n 1) max-count) `(a (@ (href "base-url&slide=" ,(+ n 1))) "next")) (p "the text of the slide"))))
SXML makes composing XML or HTML documents even more convenient by enabling higher-order "tags" -- just as LaTeX helps typeset documents by offering higher-order "macros". LaTeX macros are eventually expanded by TeX; SXML "tags" are evaluated by Scheme. The SXML specification is an example of such an advanced composition. The SXML.html web page that describes SXML is actually written in SXML itself.
A better example of a logical markup is the following excerpt from SXML.scm. It defines a*TOP*
element of SXML:(productions (production 1 (nonterm "TOP") ((sexp (term-lit "*TOP*") (ebnf-opt (nonterm "aux-list")) (ebnf-* (nonterm "PI")) (ebnf-* (nonterm "comment")) (nonterm "Element")))))When converted to HTML, the above fragment becomes
<table border=0 bgcolor="#f5dcb3"> <tr valign=top><td align=right><a name="prod-1">[1]</a> </td> <td align=right><code><TOP></code></td> <td align=center><code> ::= </code></td> <td align=left><code><strong>(</strong> <em>*TOP*</em> <aux-list>? <PI>* <comment>* <Element> <strong>)</strong></code> </td> </tr></table>In LaTeX, it does not look much better:
\begin{tabular}{rrcp{2.8in}} {[}1{]} & \texttt{<TOP>} & $::=$ & \texttt{\textbf{(} {\itshape *TOP*} <aux-list>? <PI>* <comment>* <Element> \textbf{)} } \\ \end{tabular} \\The SXML form appears most lucid. It was also easier to type, especially given suitable Emacs key bindings, similar to those designed for LAML.
Using SXML to express its grammar has another important
advantage: you can easily write a transformation or an SXPath query on
the whole SXML.scm to make sure that every nonterm
mentioned on the right-hand of some production appears on the
left-hand side of exactly one production. The SXML.scm code thus lends
itself not only to a flexible presentation to a human but to a formal
reasoning about as well.
As a matter of fact, the ability to treat SXML as a "code" or as "data" is used during SXML transformations. The expansion process of a tag can re-scan the SXML document, with a different stylesheet. That is how hierarchical tables of contents are generated. Unlike LaTeX, you do not need to write auxiliary files and do not need to re-run the document processor.
The definition of SXML->HTML
is given
below. The function also handles "higher-level" HTML tags,
for example, html:begin
. These tags make HTML code even
more succinct and expressive, as the SXML code for this page clearly
shows. This page is authored in SXML. Local and site links are
generated automatically.
The advantage of the present approach compared to LAML is
that we do not define any tag-related functions in the
interaction-environment. HTML tags appear to live in their own
"namespace". The function SXML->HTML
handles the
arbitrary set of tags: we do not need to pre-define any tag or
attribute for it to work. On the other hand, if validation is desired,
we can introduce SXML->HTML
's bindings for all allowed tags,
and bind the *default*
wildcard to a procedure that prints an
error. Therefore, our approach is just as flexible and expressive as
LAML. Furthermore, since SXML is also a data structure, you can store
it in a file. Or you you can store standard headers, footers,
etc. templates in files and "pull" them in when needed. This inclusion
of standard templates can be done lazily, for example, through a
higher-level include-and-process
tag.
SXML specification
An extensive example of HTML authoring
in SXML, with a great number of high-level tags and the automatic
generation of a table of contents. Evaluation of SXML.scm file
generates SXML.html page.
xml.scm [60K]
This present page in SXML. This is the master file for the present web page.
SXML-to-HTML.scm [5K]
Commented source code for the top-level HTML
transformer, SXML node handlers that generate the markup, and
auxiliary functions to quote "bad" characters.
vSXML-to-HTML.scm [6K]
Validation (sample) code
SXML-to-HTML-ext.scm [15K]
Definitions of several functions and higher-order
SXML "tags" that are used to compose HTML pages on this web site. In LaTeX terms, this file is similar to article.cls.
LaXmL: SXML as a higher-order markup language [plain text file]
An article posted on a comp.text.xml and comp.lang.scheme newsgroups on Thu, 5 Apr 2001 20:15:12 GMT.
sxslt-advanced.scm [16K]
An extensively commented example that transforms
an SHTML-like SXML document with nested sections into a web page with
recursively numbered sections and subsections. We use appropriate HTML
tags (H2, H3, ...) to set off the title of the sections and
subsections. The example also generates a hierarchical table of
contents and places it before the first section.
"Active Server Pages" in Scheme
An example of using the present HTML generation approach to write active server pages with complex stateful forms.
LAML
Kurt Normark, "Programming World Wide Web Pages in Scheme,"
ACM SIGPLAN Notices, vol. 34, No. 12 - December 1999, pp. 37-46.
The following message discusses a fairly non-trivial example of writing a theoretical computer science paper in SXML. The message shows many snippets of SXML and the corresponding LaTeX code. SXML helped not only in transcribing (hand-written) mathematical notation. The proofs themselves were conducted in Emacs/SXML, rather than on a piece of paper. The Emacs/SXML combination turns out a good assistant for equivalence-based proofs.
The SXML->LaTeX formatter is quite distinct from SLaTeX.
We are not concerned with the typesetting of Scheme code. Rather,
our aim is to use Scheme notation as an easy-to-type and
easy-to-manipulate realization of mathematical notation. The end
result must be conventional mathematical notation, with (lambda (x) (f g))
typeset as $\lambda x. f g$
,
(compose f g h)
as $f\,\cdot\,g\,\cdot\,h$
,
(let ((x 1)) body)
as $\mathrm{let}\,x=1
\mathrm{in} body$
, and the reduction of transformed A to
transformed B, (red* *A *B)
, as $\overline{A}
\vartriangleright^{*} \overline{B}$
.
We also explain that the placement of parentheses is quite a non-trivial problem. The message gives many an example. To determine whether a particular expression should be typeset enclosed in parentheses, we need to know the property of the expression and the property of the context. An expression may be enclosed in an arbitrary number of decorators (i.e., subscripts, overlines, etc), which complicates the matter. The typesetter is implemented as an evaluator of the SXML notation using mixed call-by-name and call-by-value rules and the explicit environment passing.
Writing LaTeX/PDF mathematical papers with SXML [plain text file]
The message with many examples of writing and conducting
proofs in SXML. The messages also discusses the problem of
parentheses. The message was posted on the SSAX-SXML mailing list on
Tue, 26 Apr 2005 16:01:30 -0700.
Technical Report TR611, Department of Computer Science, Indiana
University, 2005
<http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR611>
impromptu-shift-tr.scm [61K]
The master SXML file of the report
The first part of the file is the SXML source for the report. The
second part is the formatter, from SXML to LaTeX (from which PDF was
eventually built).
We give an example of traversing two SXML documents ``side-by-side'' and making pure functional updates to arbitrary encountered nodes. The example illustrates the benefits of zipper -- an update cursor into an immutable data structure.
Our not-so-trivial example is detecting the differences in
character content within the corresponding paragraphs of two SXML
documents. We assume that two SXML documents in question are made of
the same number of paragraphs. The paragraphs may be enclosed in
other elements such as div
, chapters, etc., and that enclosing may
vary between the two documents. We disregard such differences when
comparing the documents. Paragraphs contain text, perhaps interspersed
with markup such as em
, etc. We compare the text of the
respective paragraphs from two SXML documents and annotate the
paragraphs that differ. We return two other SXML documents with so
annotated paragraphs. We compare only the character content,
regardless of how that content is arranged into strings or
enclosed into additional markup elements. For example, the following
two paragraphs are considered the same:
(p "This" " " "is a " (strong "simple") " example!") (p "This is a simple " (br) (em "example") "!")Our example is a non-trivial generalization of the ``same fringe problem''.
Stream-wise processing (differencing) of two SXML documents [plain text file]
The message with the explanation and examples of finding
differences in two SXML documents. The message was posted on the
SSAX-SXML mailing list on Tue, 23 Aug 2005 01:47:49 -0700.
streams-diff.scm [10K]
The implementation of the zipper, of the paragraph
differencing function and of the examples.
Designing an XML format for a particular domain involves more than just creating a DTD for a collection of elements, attributes and entities. An important part of the format is the documentation. Ideally it is a hyperlinked document that explains the background, the motivation, and the design principles; describes and cross-references each tag and attribute; and shows representative examples of the proposed markup. This article will demonstrate a tool that helps design a specification and the documentation at the same time -- and to keep them consistent.
A literate design document should permit a transformation into a well laid-out, easy-to-read hyperlinked user manual. A literate design document should be easy to write. And yet the user manual should be precise enough to allow automatical extraction of a formal specification. We will demonstrate that SXML is rather suitable for literate XML programming. SXML is similar to TeX, but is far easier to write and read. SXML transformations do the job of "weaving" a document type specification and of "typesetting" the user manual.
Literate XML/DTD programming [plain text file]
An article posted on a comp.text.xml and comp.lang.scheme newsgroups on Wed, 23 May 2001 04:26:31 GMT.
JMGRIB.html
JMGRIB.dtd
JMGRIB.scm
JMGRIB: Joint METOC XML format for grid data. A User
Manual, DTD and the master file, i.e., literate DTD.
<tag>data</tag>
into (tag "data")
and vice versa. The SSAX parser and the SXML
manipulation tools can do that easily. However, exporting relational sources into XML often runs into an impedance
mismatch: XML by its nature is a hierarchical database. We will show an
example of generating XML from s-expressions that involves
denormalizations, "table joins" and third-order tags. The
s-expression format turns out not only more understandable and
insightful, but more concise as well, by a factor of four.Re: s-expressions as a file format? [plain text file]
An article posted on a comp.lang.scheme newsgroup on Wed, 29 Aug 2001 19:13:29 -0700.
SXs.pdf paperSection 5 of that paper describes this example in great detail.
Manifest.scm [from www.metnet.navy.mil]
Manifest.scm [from DISA]
Normalized SXML database, which represents a
collection of OMF resources. The file Manifest.scm also includes a stylesheet that converts the SXML database to XML, and "joins" s-expression-based tables in the process.
Manifest.xml [from www.metnet.navy.mil]
Manifest.xml [from DISA]
Denormalized XML database of the collection of OMF resources, in the DISA XML repository format.
apply-templates.scm [11K]
An example of a XSLT-like apply-templates
form.
sxml-db-conv.scm [8K]
Using higher-order transformers for context-sensitive
transformations. The example also shows the treatment of
XML Namespaces in SXML and the namespace-aware
pretty-printing of SXML into XML.
pull-punct-sxml.scm [9K]
A context-sensitive recursive transformation
and breadth-first-like traversals of a SXML document.
sxslt-advanced.scm [16K]
An extensively commented example of SXSLT.
The example transforms an SHTML-like SXML document with nested
sections into a web page with recursively numbered sections and
subsections. We use appropriate HTML tags (H2, H3, ...) to set off
the title of the sections and subsections. The example also generates
a hierarchical table of contents and places it before the first
section.
sxml-nesting-depth-label.scm [5K]
Labeling nested sections of an SXML document by their nesting
depth. This is another example of traversing an SXML document with a
stylesheet that differs from node to node.
sxml-to-sxml.scm [9K]
Transforming SXML to SXML:
Composing SXML transformations. The code introduces and tests a
version of pre-post-order that transforms an SXML document into a
strictly conformant SXML document. That is, the result of a
pre-post-order transformation can be queried with SXPath or
transformed again with SXSLT.
SSAX/SXML examples
A directory of the examples, including the Makefile to run them.
remove-markup.scm [2K]
Given an XML document, remove all markup. In other words,
compute the string-value of a DOM tree of the document, without
constructing the DOM tree.
outline.scm [2K]
Pretty-print the structure of an XML document,
disregarding the character data. This is yet another example of
transforming an XML document on the fly, as we parse it. This example corresponds to outline.c of the Expat distribution.
run-sxml.scm [<1K]
A complete example of parsing an XML into SXML: DOM
parsing with SSAX.
daml-parse-unparse.scm [17K]
Parsing and unparsing (and parsing again) of a
namespace-rich XML document: DAML/RDF. The example illustrates that
the parser and the pretty-printer maintain the invariant PARSE . UNPARSE . PARSE == PARSE
ssax-extraction.scm [7K]
SSAX parsing of a large XML document and
recording only `interesting' parts, namely, the
content of CDATA sections from from a (small) subset of
elements. Because the document is large, it is important to avoid
reading it all in memory and retaining irrelevant data.
SSAX/SXML examples
A directory of the examples, including the Makefile to run them.
SSAX is more than an XML parser -- it is a library, which
includes lexers and parsers of various kinds. The library comes in
handy when we need to permissively parse (or better say, lex) HTML
documents or chunks, which may contain mismatched tags, unclosed tags,
or other imperfections. Because browsers are so lax, many web
pages on the Internet are ill-coded. For example, the FrontPage
editor is notorious for creating such invalid sequences as <b><i>text</b></i>
.
Because we accept even HTML with unmatched tags, we make no attempt in this example to recover the structure. We faithfully record all occurring tags and let the user sort out what matches what. For example, the following ill-composed sample document, which includes comments, parsed entities and attributes of various kinds:
<html><head><title> </title> <title> whatever </title></head> <body> <a href="url">link</a> <p align=center> <ul compact style='aa'> <p> BLah <!-- comment <comment> --> <i> italic <b> bold <tt> end </i> still < bold </b> </body> <P> But not done yet...parses to a token stream:
(#(START html ()) #(START head ()) #(START title ()) " " #(END title) " " #(START title ()) " whatever " #(END title) #(END head) "\n " #(START body ()) " " #(START a ((href . "url"))) "link" #(END a) " " #(START p ((align . "center"))) " " #(START ul ((compact . "compact") (style . "aa"))) "\n " #(START p ()) " BLah " #(START i ()) " italic " #(START b ()) " bold " #(START tt ()) " end " #(END i) " still < bold " #(END b) "\n " #(END body) "\n " #(START P ()) " But not done yet...")The parser has handled
<
and other parsed
entity references and fully preserved all the whitespace, including
newlines. The parser obviously supports the three possible styles of
HTML attributes: <tag name="value">
, <tag
name=value>
, and <tag name>
. Single quotes are also
allowed. Comments are silently ignored.
This is an example of using the SSAX library for tasks other than XML parsing.
html-parser.scm [8K]
The source code of the parser and of the test to run the above example
SSAX examplesA directory of SSAX examples, including the HTML parser, and the Makefile to run them.
HtmlPrag: a permissive HTML parser that emits SXML, by Neil
W. Van Dyke
<http://www.neilvandyke.org/htmlprag/>
SSAX
SSAX parser can be considered a for-each
-like
enumerator. The collection in question is the XML document itself (to
be more precise, the corresponding input port). Indeed, a sequential
reading of an XML document is equivalent to a depth-first traversal of
the corresponding XML Infoset tree. An article General ways
to traverse collections demonstrated that every for-each
enumerator can be turned inside-out. The result is a lazy list, a
demand-driven producer of values, which is called generator in
languages such as Icon, Python and Ruby. The user of the inverted SSAX
is no longer being pushed by the parser -- on contrary, the user
pulls markup and character data from the inverted-SSAX stream.
Thus SSAX can be turned into an XML Pull parser mechanically.
We should note that many parts of the SSAX parser toolkit
have been designed with the pull model in mind. The function lex-html
of the permissive HTML parser illustrates how to pull
with SSAX.
General ways to traverse collections
In this article we discuss and compare five different methods of determining the parent of an SXML node. The existence of these methods is a crucial step in a constructive proof that SXML is a complete model of the XML Information set, and SXPath is a complete implementation of the XPath Recommendation. Four of the five techniques of determining the parenthood require no mutations, and can be implemented in a pure and strict functional language. Three of the five techniques rely on annotations attached to SXML nodes. The techniques differ in the nature of the annotations and in the time complexity of attaching and using these annotations.
The source code shows several advanced examples of XML SAX-style parsing: custom parsers that maintain node dictionaries and parent links as they read the document.
The article itself [plain text file]
parent-pointers.scm [23K]
The complete source code of four techniques.
Discussion forum
Comments posted by Dmitry Lizorkin on Feb 22 and 27, 2003 are
particularly interesting. He has suggested a way of accounting for the
parent pointers during the evaluation of an (S)XPath expression.
XML Schema introduces a number of atomic XML datatypes and the corresponding validation rules. SXML likewise supports boolean, numerical, and other Scheme values [1] in the "character" content of SXML elements and attributes. As we parse the document, we can validate and de-serialize character data from the source XML document into the corresponding numerical etc. Scheme values. The present sample code shows an example of such datatype conversion and validation. This code also demonstrates element content validation: making sure that the sequence of child elements matches a user-specified template.
The validation in the present example is intentionally limited, to keep the code simple and clear. The code can be extended to more complex validation rules.
[1] One may use any Scheme value in the "character" content of an SXML element or attribute as long as this value is not a pair or a symbol. If we need to insert a pair or a symbol in the "character" content, we should wrap them in a vector or a procedure.
CDATA
, #PCDATA
, and ANY
CDATA
, #PCDATA
and ANY
-- as well as the distinction between a mixed content and any content.SOAP 1.2 became a W3C Recommendation on June 24, 2003. SOAP 1.2, which used to be known as XML protocol, is quite different from SOAP 1.1 and XML-RPC. In particular, SOAP 1.2 permits SXML (s-expression) message systems to claim SOAP 1.2 compliance. Indeed, SOAP does not require XML! The Recommendation says,
A SOAP message is specified as an XML Information Set [XML InfoSet]. While all SOAP message examples in this document are shown using XML 1.0 syntax, other representations MAY be used to transmit SOAP messages between nodes (see 4. SOAP Protocol Binding Framework).
...
The binding framework does not require that every binding use the XML 1.0 serialization as the "on the wire" representation of the XML infoset; compressed, encrypted, fragmented representations and so on can be used if appropriate.
In other words, the W3 Consortium specifically leaves it to applications to choose the appropriate on-the-wire representation of the XML Infoset. S-expressions is one of the realizations of the XML Infoset. Thus serializing SOAP messages using SXML is specifically permitted. Indeed, S-expressions may reasonably be characterized as a "compressed representation" of the XML Infoset.
xml
such that an expression(xml '((url "mypage.html" (name "FirstName LastName's page")) (p "my paragraph")))evaluates to a string
<a href="mypage.html"><name>FirstName LastName's page</name></a><p>my paragraph</p>
and
(define myname '(name "FirstName LastName")) (xml `( ((p "par1") (p "par2")) (p "This is " ,myname " page") ))evaluates to
<p>par1</p><p>par2</p><p>This is <name>FirstName LastName</name> page</p>
Evaluating Scheme to XML/HTML [Was: eval to invoke a local function?] [plain text file]
An article posted on a comp.lang.scheme newsgroup
on Thu, 27 Jul 2000 01:41:52 GMT. It contains the source code for the
xml
function above, and discusses several generalizations.
This set of two articles was prompted by a W3C technical note that introduces XEXPR -- a programming (scripting) language with XML notation. The second article offers a detailed critique. What if we still want a programming language with an XML notation? The articles consider two ways of designing such languages. One is to assign an executable semantics, reduction rules, to an abstract syntax tree representation of an XML document. The other approach is to start with an extant programming language and to introduce rules to pretty-print its abstract syntax tree into the XML format.
Both approaches are especially easy if the programming language is Scheme. The concrete syntax of Scheme is basically its abstract syntax as well. On the other hand, SXML -- an abstract syntax for XML -- has a representation that parses not only as Scheme "data" but also as a Scheme "expression" (by R5RS syntax definitions). In SXML, abstract syntaxes of Scheme and XML unify.
The two approaches are illustrated in the articles below. In particular, the second article demonstrates that a SXML pretty-printer is general enough to permit a well-defined XML syntax for Scheme. This is not trivial as the following Scheme expression may show:
(begin (append a (x y) '(b) '(c "d" 1 (f g) '(h i) `g () '() (()) j)))It exhibits a great deal of context-sensitivity. What looks like an application may in reality be something else, if it is preceded by a quote. A quote within a quote loses its special meaning. An XML syntax should be able to capture these nuances -- and it does as the articles show.
To see how we can assign an XML notation to Scheme, or an evaluation semantics to XML, let us consider an expression:
(string-join '("foo" "bar" "baz") ":")The code in the first article pretty-prints this expression as
<string-join> <list> <str>foo</str><str>bar</str><str>baz</str> </list> <str>:</str> </string-join>On the other hand, we start with this XML string and feed it into a SSAX parser. We will immediately see that the output:
(string-join (list (str "foo") (str "bar") (str "baz")) (str ":"))can be assigned a well-defined evaluation semantics -- the semantics of Scheme. Indeed, this is a genuine Scheme expression, and can be evaluated as such. The result will be the same as that of the original expression, provided that we define
str
as the
identity function.
The second article discusses another, more general XML notation for our example:
<apply> <var>string-join</var> <list> <str>foo</str><str>bar</str><str>baz</str> </list> <str>:</str> </apply>This representation holds a stronger invariant with respect to SXML pretty-printing and parsing.
The first article notes that a function post-order
at the heart of SXML pretty-printing can also evaluate a Scheme
expression directly. Indeed, this function is an
implementation of eval
-- or eval
is a
post-order traversal for a Scheme expression tree.
XML as Scheme, or Scheme as XML [plain text file]
The first article, posted on comp.lang.scheme and comp.text.xml newsgroups on Sat Jan 06 02:45:03 2001 GMT.
XML as Scheme, Scheme as XML; XEXPR considered invalid [plain text file]
The second article, posted on comp.lang.scheme and comp.text.xml newsgroups on Wed Jan 17 06:58:14 2001 GMT.
XEXPR - A Scripting Language for XML. Gavin Thomas Nicol, Ed. W3C Note 21 November 2000.
The Next 700 Markup Languages by Philip Wadler.
Invited Talk, Second Conference on Domain Specific Languages (DSL'99), Austin, Texas, October 1999.
Among other things, this talk gives an interesting example of a
functional programming language with an XML syntax.
This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML