From ssax-sxml-admin@lists.sourceforge.net Tue Apr 26 23:08:52 2005
To: ssax-sxml@lists.sourceforge.net
From: oleg-at-pobox.com
Message-ID: <20050426230130.01AD1ABCC@Adric.metnet.navy.mil>
Subject: [ssax-sxml] Writing LaTeX/PDF mathematical papers with SXML
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum=ssax-sxml>
X-Original-Date: Tue, 26 Apr 2005 16:01:30 -0700 (PDT)
Date: Tue, 26 Apr 2005 16:01:30 -0700 (PDT)
Status: OR


Hello!

	I'd like to point out a fairly non-trivial example of writing
a theoretical computer science paper in SXML. SXML helped not only in
transcribing (hand-written) mathematical notation. The proofs
themselves were conducted in Emacs/SXML, rather than on a piece of
paper.

The paper in question is a technical report

        How to remove a dynamic prompt: static and dynamic
        delimited continuation operators are equally expressible
        http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR611

It concerns with proving propositions about delimited
continuations. Rigorous proofs are indispensable as the intuition
often fails when it comes to delimited continuations.

The ``master file'' of that paper can be found at
	http://pobox.com/~oleg/ftp/Scheme/impromptu-shift-tr.scm

The first part of the file is the SXML source for the report. The
second part is the formatter, from SXML to LaTeX (from which PDF was
eventually built).

As common with theoretical papers, TR includes a fair amount of
greek, various squiggles, superscripts, subscripts, and arrows of
several kinds. It's instructive to compare pieces of SXML code with the
corresponding places of the printed/viewed technical report.

Here's one example of SXML vs. LaTeX, the formulation and the proof of
Lemma 2:

SXML source:

(p "Lemma 2.")
(peq #f
  ($ (red* (B (C (shift g (H (compose h' g f') (lambda (f) e)))))
           (shift g (H (lambda (x) (h' (g (B (C (f' x))))))
		       (lambda (f) e)))))
)

(pn "Informally, the translation image of " ($ (control f e)) " can be
lifted out of " ($ (B (C))) " with the latter context fragment added
to the captured continuation.  Indeed,")

(peq #f
 (align-left
  ($ (B (C (shift g (H (compose h' g f') (lambda (f) e))))))
  (explain "Definition of " ($ (B)))
  ($ (equiv ""
       (h' (reset (HV (C (shift g (H (compose h' g f') (lambda (f) e)))))))))
  (explain "(S2), along with the fact " ($ (HV e)) " may be subsumed ")
  (explain "as a part of a non-reset-crossing context " ($ (C)))
  ($ (red ""
       (h' (reset (let ((g (lambda (x) (reset (HV (C x))))))
		  (H (compose h' g f') (lambda (f) e)))))))
  (explain ($ (H a b)) " is a value, apply (S1) and then (H')")
  ($ (red* ""
       (let ((g (lambda (x) (reset (HV (C x))))))
	 (shift g' (H (compose h' g' h' g f') (lambda (f) e))))))
  ($ (equiv ""
       (shift g' (H (lambda (x) (h' (g' (h' (reset (HV (C (f' x))))))))
		    (lambda (f) e)))))
  ($ (equiv ""
       (shift g' (H (lambda (x) (h' (g' (B (C (f' x))))))
		    (lambda (f) e)))))
))


Here's the generated LaTeX code:
Lemma 2.

\addvspace{\smallskipamount}\par\noindent{}\begin{tabular}{cr}
\parbox{4.4in}{\centering{}$\mathcal{B}\!\left[\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]\right]\,\vartriangleright^{*}\,\mathbf{shift}\;g.\mathsf{H}\,(\lambda x.h'(g(\mathcal{B}\!\left[\mathcal{C}\!\left[f'x\right]\right])))(\lambda f.e)$} & \\\end{tabular}

\addvspace{\smallskipamount}\noindent{}Informally, the translation
image of $\mathbf{control}\;f.e$ can be
lifted out of $\mathcal{B}\!\left[\mathcal{C}\!\left[\right]\right]$
with the latter context fragment added
to the captured continuation.  Indeed,

\addvspace{\smallskipamount}\par\noindent{}\begin{tabular}{cr}
\parbox{4.4in}{\centering{}\parbox[t]{8in}{\vspace{-\smallskipamount}\begin{tabbing}
$\mathcal{B}\!\left[\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]\right]$\\\{ Definition of $\mathcal{B}\!\left[\right]$ \}\\$\,\equiv\,h'(\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]))$\\\{ (S2), along with the fact $\mathsf{HV}\,e$ may be subsumed  \}\\\{ as a part of a non-reset-crossing context $\mathcal{C}\!\left[\right]$ \}\\$\,\vartriangleright\,h'(\mathbf{reset}\;\mathrm{let}\,g\!=\!\lambda x.\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[x\right])\,\mathrm{in}\,\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e))$\\\{ $\mathsf{H}\,ab$ is a value, apply (S1) and then (H') \}\\$\,\vartriangleright^{*}\,\mathrm{let}\,g\!=\!\lambda x.\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[x\right])\,\mathrm{in}\,\mathbf{shift}\;g'.\mathsf{H}\,(h'\,\cdot\,g'\,\cdot\,h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)$\\$\,\equiv\,\mathbf{shift}\;g'.\mathsf{H}\,(\la
 mbda x.h'(g'(h'(\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[f'x\right])))))(\lambda f.e)$\\$\,\equiv\,\mathbf{shift}\;g'.\mathsf{H}\,(\lambda x.h'(g'(\mathcal{B}\!\left[\mathcal{C}\!\left[f'x\right]\right])))(\lambda f.e)$\end{tabbing}
\vspace{-\medskipamount}}} & \\\end{tabular}


The SXML->LaTeX formatter is quite distinct from SLaTeX.  We are not
concerned with the typesetting of the Scheme code. Rather, our aim is
to use Scheme notation as an easy-to-type and easy-to-manipulate
realization of the mathematical notation. The end result must be
conventional mathematical notation, with (lambda (x) (f g)) typeset as
$\lambda x. f g$, (compose f g h) as $f\,\cdot\,g\,\cdot\,h$, 
(let ((x 1)) body) as $\mathrm{let}\,x=1 \mathrm{in} body$, and 
the reduction of transformed A to transformed B, (red* *A *B),
as $\overline{A} \vartriangleright^{*} \overline{B}$.


The proof of Lemma 2 wasn't merely transcribed in SXML. The proof was
_conducted_ in SXML. Indeed, many proofs in computer science are based
on equivalence transformations: we start with the source expression,
replace some part of it with another expression, etc. until we reach
the target expression. In other words, we take the source expression,
make a copy of it, change it, copy it again, change another part of
it, etc. It is of great benefit that the SXML notation clearly shows
the structure of expressions and that Emacs lets us highlight
parenthesized sub-expressions and to cut and paste it them.


The non-trivial part of the SXML->LaTeX formatter deals with
parentheses. For example, we would like to typeset an application of
`f' to `g', written (f g), as $f g$ at the top level. We don't want
extra parentheses. However, if that application appears within another
application, e.g., (h (f g)), the result must look like $h (f g)$. The
parentheses are required now. Still, (C (f g)) should be typeset as
$\mathcal{C}\left[f g\right]$ with no parenthesis around `f g' because
the context `C' has its own brackets.

The typesetting rule then is that a complex expression within a
complex context must be surrounded by parentheses. Otherwise, it
should not. To decide if we need parentheses or not, we, therefore,
need to know the properties of both the context and the
expression. Contexts are classified into simple and complex: the
top-level context is simple, the context `(h \hole)' is complex.
Expressions are too classified into simple and complex: `(f g)' is
complex, `f' is simple. The classification is not actually
straightforward: for example, `(overbar (sub f "1"))' is also a
_simple_ expression. Decorations such as indices, superscripts,
transform marks may make an expression look complex but they don't
affect the classification. There is another subtle point: we typeset
"reset0" as "\\mathbf{reset_0}", with the subscript within the
decoration. OTH, we typeset `(M1 x)' as "\\mathcal{M}_1[x]": the
subscript must be outside of the `mathcal' scope, or it gets mangled.

The typesetting of a mathematical expression is done by the function 
`typeset-exp tag exp', defined in the second half of the file
impromptu-shift-tr.scm. The function handles many formatting
chores, like lowering a numerical index at the end of a name into the
subscript. The leading character '*', the mark of the transform, is
automatically translated into the overline. Thus, `(*M1 x)' is typeset into
$\overline{\mathcal{M}_{1}}[x]$ with only a single rule for 'M'. 

I almost used shift/reset -- but then eventually decided on a simple
environment passing. As shown in Ariola, Herbelin, Sabry (ICFP04),
dynamic scoping is inherent in delimited continuations. The function
typeset-exp relies thus on dynamic scoping implemented conventionally
via environment passing. We use two dynamically-scoped variables:
&paren (a boolean flag telling if the context suggests we need
enclosing parentheses) and &decor (the list of decorations). The
re-writing rules are also a part of the environment, the constant part
of it.

Here's the procedure that formats an application `(a b ...)'.

  (define (handle-app envp app)
    (let ((env (ext-paren envp #t))) ; use paren
      (let ((hd (handle env (car app))))
	(if (procedure? hd)
	  (apply hd envp (cdr app))
	  (add-paren envp
	    (cons hd (map (lambda (x) (handle env x)) (cdr app))))))))

It receives the current environment (envp -- the parent environment)
and `app', which is a list `(a b ...)'. The procedure extends `envp'
with the binding of `&paren' to #t -- meaning that the context of an
application is complex and so `a', `b', etc. expressions need to be
enclosed into parentheses, if they are complex too. Then we format each
of `a', `b', etc. within thus extended environment, and enclose the
result in parentheses if the _parent_ context envp requires that. One
complication stems from the fact that `a' may be a ``macro''. For
example, `(let ((x 1)) body)', although might look like an application,
is actually a macro. In the current environment, `let' is resolved to
the procedure:

      (let
	. ,(lambda (_)
	     (lambda (envp bindings body)
	       (let ((env (ext-paren envp #f)))
		 (add-paren envp
		   (list "\\mathrm{let}\\,"
		     (handle env (caar bindings)) "\\!=\\!"
		     (handle env (cadar bindings))
		     "\\,\\mathrm{in}\\," (handle env body)))))))

We pass the rest of the arguments to that procedure and let it do the
formatting. The `let' macro specifically sets the &paren dynamic
binding to #f: the body of `let' is formatted without enclosing
parentheses.