From oleg@pobox.com Thu Apr 5 13:17:36 2001 Newsgroups: comp.text.xml,comp.lang.scheme Date-Sent: Thu, 5 Apr 2001 13:17:21 -0700 (PDT) Date: Thu, 5 Apr 2001 20:15:12 +0000 (UTC) From: oleg@pobox.com Message-Id: <200104052017.NAA13565@adric.cs.nps.navy.mil> To: comp.lang.scheme@mailgate.org, comp.text.xml@mailgate.org Subject: LaXmL Keywords: XML, SXML, LaTeX, higher-order tags Summary: SXML as a higher-order markup language CC: lispweb@red-bean.com Reply-to: oleg@pobox.com Status: OR An earlier message on these newsgroups announced an SXML parser, which converts XML to SXML. The latter is an instance of XML Infoset as S-expressions, an Abstract Syntax Tree of an XML document. SXML can be queried (in an XPath way), transformed and evaluated. Such an SXML evaluation helps compose XML or HTML documents, in particular, by enabling higher-order "tags" -- just as LaTeX helps typeset documents by offering higher-order "macros". LaTeX macros are eventually expanded by TeX; SXML "tags" are evaluated by Scheme. The SXML specification is an example of such an advanced composition. The SXML.html web page that describes SXML is actually written in SXML itself. http://pobox.com/~oleg/ftp/Scheme/SXML.html http://pobox.com/~oleg/ftp/Scheme/SXML.scm SXML.scm is the master file. It is comprised of SXML data and the code ("stylesheets") to interpret the former. This arrangement is quite common: for example, the XML Recommendation describes the syntax of XML in EBNF and provides the rules to interpret the EBNF. Rules of interpretation in SXML.scm are however more precise as they are expressed in a formal language -- Scheme. It is instructive to juxtapose SXML.scm with SXML.html. The SXML.scm file starts as follows: (define Content '( (html:begin (Header (title "SXML") (description "Definition of SXML: ...") (keywords "XML, XML parsing, XML Infoset, XPath, SXML, Scheme") (long-title "SXML") (Links (start "index.html" (title "Scheme Hash")) (contents "../README.html") (prev "xml.html") (home "http://pobox.com/~oleg/ftp/"))) (body (navbar) (page-title) (p "SXML is an instance of XML Infoset as S-expressions. SXML is an Abstract Syntax Tree of an XML document.") (p (b "Revision: 2.0")) (TOC) (Section 2 "Introduction") (p "An XML information set (Infoset) ... XML Infoset is described in " (cite "XML Infoset") ". Although technically Infoset is specified for XML, it largely applies to HTML as well.") (p "SXML is...") (Section 2 "Notation") )))) It defines 'Content', a _constant_ data structure (note the straight apostrophe). This data structure could be also stored in a separate file, and then (read) when needed. Unlike LAML, SXML code is a data structure, albeit it can be evaluated as well -- but not in this example. The excerpt above exhibits regular HTML tags (such as 'p' and 'b') as well as a number of higher-level tags. One such tag is a 'Header', which is a collection of meta-information about the document. The tag is expanded into a HTML element, with Title, Meta, and Link sub-elements. Links, a child of the Header, is also used to generate a navigation bar at the top of the HTML page. The bar is succinctly represented by a tag "(navbar)" in the Content body. The most lucid example of single tag's serving several purposes is the automatical generation of the table of contents. When the Content is normally processed, a high-level tag (Section 2 "Introduction") is expanded into
  

Introduction

A higher-level tag "(TOC)" induces another scan of the Content, with different SXML tree transformation rules. These rules re-write the very same (Section 2 "Introduction") tag into
  • Introduction and transform everything else into nothing (empty strings, to be precise). Enclosing the result into "
      " ... "
    " finishes the TOC generation. Another interesting example is presentation of SXML production rules. The HTML code for the top production, excerpted from SXML.html, is

    [1]  <TOP> ::= *TOP* <namespaces>? <PI>* <comment>* <Element>

    The corresponding SXML code (excerpted from SXML.scm) is (productions (production 1 (nonterm "TOP") ((term-lit "*TOP*") (ebnf-opt (nonterm "namespaces")) (ebnf-* (nonterm "PI")) (ebnf-* (nonterm "comment")) (nonterm "Element")))) The SXML code is somewhat easier to read -- and to write as well. Besides, if someone does not like the way I typeset non-terminals (I enclose them into angular brackets), he merely needs to change a transformation rule (nonterm ; Non-terminal of a grammar . ,(lambda (tag term) (list "<" term ">"))) in the stylesheet -- er, post-order tree re-writing rules -- and re-evaluate SXML.scm. Using SXML to express its grammar has another important advantage: I can easily write a transformation or an SXPath query on the whole SXML.scm to make sure that every 'nonterm' mentioned on the right-hand of some production appears on the left-hand side of exactly one production. The SXML.scm code thus lends itself not only to a flexible presentation to a human but to a formal reasoning about as well. References: http://pobox.com/~oleg/ftp/Scheme/SXML.scm http://pobox.com/~oleg/ftp/Scheme/SXML.html Definition of SXML: an instance of XML Infoset as S-expressions, an Abstract Syntax Tree of an XML document. http://pobox.com/~oleg/ftp/Scheme/SSAX.scm The SSAX parser toolkit and XML->SXML parser http://pobox.com/~oleg/ftp/Scheme/xml.html SXPath (SXML query) and SXML Transformations and Evaluations http://pobox.com/~oleg/ftp/Scheme/xml.html#XML-authoring HTML/XML authoring in Scheme http://www.cs.auc.dk/~normark/laml/ Kurt Normark, "Programming World Wide Web Pages in Scheme," ACM SIGPLAN Notices, vol. 34, No. 12 - December 1999, pp. 37-46.