From ssax-sxml-admin@lists.sourceforge.net Tue Apr 26 23:08:52 2005 To: ssax-sxml@lists.sourceforge.net From: oleg-at-pobox.com Message-ID: <20050426230130.01AD1ABCC@Adric.metnet.navy.mil> Subject: [ssax-sxml] Writing LaTeX/PDF mathematical papers with SXML List-Archive: X-Original-Date: Tue, 26 Apr 2005 16:01:30 -0700 (PDT) Date: Tue, 26 Apr 2005 16:01:30 -0700 (PDT) Status: OR Hello! I'd like to point out a fairly non-trivial example of writing a theoretical computer science paper in SXML. SXML helped not only in transcribing (hand-written) mathematical notation. The proofs themselves were conducted in Emacs/SXML, rather than on a piece of paper. The paper in question is a technical report How to remove a dynamic prompt: static and dynamic delimited continuation operators are equally expressible http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR611 It concerns with proving propositions about delimited continuations. Rigorous proofs are indispensable as the intuition often fails when it comes to delimited continuations. The ``master file'' of that paper can be found at http://pobox.com/~oleg/ftp/Scheme/impromptu-shift-tr.scm The first part of the file is the SXML source for the report. The second part is the formatter, from SXML to LaTeX (from which PDF was eventually built). As common with theoretical papers, TR includes a fair amount of greek, various squiggles, superscripts, subscripts, and arrows of several kinds. It's instructive to compare pieces of SXML code with the corresponding places of the printed/viewed technical report. Here's one example of SXML vs. LaTeX, the formulation and the proof of Lemma 2: SXML source: (p "Lemma 2.") (peq #f ($ (red* (B (C (shift g (H (compose h' g f') (lambda (f) e))))) (shift g (H (lambda (x) (h' (g (B (C (f' x)))))) (lambda (f) e))))) ) (pn "Informally, the translation image of " ($ (control f e)) " can be lifted out of " ($ (B (C))) " with the latter context fragment added to the captured continuation. Indeed,") (peq #f (align-left ($ (B (C (shift g (H (compose h' g f') (lambda (f) e)))))) (explain "Definition of " ($ (B))) ($ (equiv "" (h' (reset (HV (C (shift g (H (compose h' g f') (lambda (f) e))))))))) (explain "(S2), along with the fact " ($ (HV e)) " may be subsumed ") (explain "as a part of a non-reset-crossing context " ($ (C))) ($ (red "" (h' (reset (let ((g (lambda (x) (reset (HV (C x)))))) (H (compose h' g f') (lambda (f) e))))))) (explain ($ (H a b)) " is a value, apply (S1) and then (H')") ($ (red* "" (let ((g (lambda (x) (reset (HV (C x)))))) (shift g' (H (compose h' g' h' g f') (lambda (f) e)))))) ($ (equiv "" (shift g' (H (lambda (x) (h' (g' (h' (reset (HV (C (f' x)))))))) (lambda (f) e))))) ($ (equiv "" (shift g' (H (lambda (x) (h' (g' (B (C (f' x)))))) (lambda (f) e))))) )) Here's the generated LaTeX code: Lemma 2. \addvspace{\smallskipamount}\par\noindent{}\begin{tabular}{cr} \parbox{4.4in}{\centering{}$\mathcal{B}\!\left[\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]\right]\,\vartriangleright^{*}\,\mathbf{shift}\;g.\mathsf{H}\,(\lambda x.h'(g(\mathcal{B}\!\left[\mathcal{C}\!\left[f'x\right]\right])))(\lambda f.e)$} & \\\end{tabular} \addvspace{\smallskipamount}\noindent{}Informally, the translation image of $\mathbf{control}\;f.e$ can be lifted out of $\mathcal{B}\!\left[\mathcal{C}\!\left[\right]\right]$ with the latter context fragment added to the captured continuation. Indeed, \addvspace{\smallskipamount}\par\noindent{}\begin{tabular}{cr} \parbox{4.4in}{\centering{}\parbox[t]{8in}{\vspace{-\smallskipamount}\begin{tabbing} $\mathcal{B}\!\left[\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]\right]$\\\{ Definition of $\mathcal{B}\!\left[\right]$ \}\\$\,\equiv\,h'(\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[\mathbf{shift}\;g.\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)\right]))$\\\{ (S2), along with the fact $\mathsf{HV}\,e$ may be subsumed \}\\\{ as a part of a non-reset-crossing context $\mathcal{C}\!\left[\right]$ \}\\$\,\vartriangleright\,h'(\mathbf{reset}\;\mathrm{let}\,g\!=\!\lambda x.\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[x\right])\,\mathrm{in}\,\mathsf{H}\,(h'\,\cdot\,g\,\cdot\,f')(\lambda f.e))$\\\{ $\mathsf{H}\,ab$ is a value, apply (S1) and then (H') \}\\$\,\vartriangleright^{*}\,\mathrm{let}\,g\!=\!\lambda x.\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[x\right])\,\mathrm{in}\,\mathbf{shift}\;g'.\mathsf{H}\,(h'\,\cdot\,g'\,\cdot\,h'\,\cdot\,g\,\cdot\,f')(\lambda f.e)$\\$\,\equiv\,\mathbf{shift}\;g'.\mathsf{H}\,(\la mbda x.h'(g'(h'(\mathbf{reset}\;\mathsf{HV}\,(\mathcal{C}\!\left[f'x\right])))))(\lambda f.e)$\\$\,\equiv\,\mathbf{shift}\;g'.\mathsf{H}\,(\lambda x.h'(g'(\mathcal{B}\!\left[\mathcal{C}\!\left[f'x\right]\right])))(\lambda f.e)$\end{tabbing} \vspace{-\medskipamount}}} & \\\end{tabular} The SXML->LaTeX formatter is quite distinct from SLaTeX. We are not concerned with the typesetting of the Scheme code. Rather, our aim is to use Scheme notation as an easy-to-type and easy-to-manipulate realization of the mathematical notation. The end result must be conventional mathematical notation, with (lambda (x) (f g)) typeset as $\lambda x. f g$, (compose f g h) as $f\,\cdot\,g\,\cdot\,h$, (let ((x 1)) body) as $\mathrm{let}\,x=1 \mathrm{in} body$, and the reduction of transformed A to transformed B, (red* *A *B), as $\overline{A} \vartriangleright^{*} \overline{B}$. The proof of Lemma 2 wasn't merely transcribed in SXML. The proof was _conducted_ in SXML. Indeed, many proofs in computer science are based on equivalence transformations: we start with the source expression, replace some part of it with another expression, etc. until we reach the target expression. In other words, we take the source expression, make a copy of it, change it, copy it again, change another part of it, etc. It is of great benefit that the SXML notation clearly shows the structure of expressions and that Emacs lets us highlight parenthesized sub-expressions and to cut and paste it them. The non-trivial part of the SXML->LaTeX formatter deals with parentheses. For example, we would like to typeset an application of `f' to `g', written (f g), as $f g$ at the top level. We don't want extra parentheses. However, if that application appears within another application, e.g., (h (f g)), the result must look like $h (f g)$. The parentheses are required now. Still, (C (f g)) should be typeset as $\mathcal{C}\left[f g\right]$ with no parenthesis around `f g' because the context `C' has its own brackets. The typesetting rule then is that a complex expression within a complex context must be surrounded by parentheses. Otherwise, it should not. To decide if we need parentheses or not, we, therefore, need to know the properties of both the context and the expression. Contexts are classified into simple and complex: the top-level context is simple, the context `(h \hole)' is complex. Expressions are too classified into simple and complex: `(f g)' is complex, `f' is simple. The classification is not actually straightforward: for example, `(overbar (sub f "1"))' is also a _simple_ expression. Decorations such as indices, superscripts, transform marks may make an expression look complex but they don't affect the classification. There is another subtle point: we typeset "reset0" as "\\mathbf{reset_0}", with the subscript within the decoration. OTH, we typeset `(M1 x)' as "\\mathcal{M}_1[x]": the subscript must be outside of the `mathcal' scope, or it gets mangled. The typesetting of a mathematical expression is done by the function `typeset-exp tag exp', defined in the second half of the file impromptu-shift-tr.scm. The function handles many formatting chores, like lowering a numerical index at the end of a name into the subscript. The leading character '*', the mark of the transform, is automatically translated into the overline. Thus, `(*M1 x)' is typeset into $\overline{\mathcal{M}_{1}}[x]$ with only a single rule for 'M'. I almost used shift/reset -- but then eventually decided on a simple environment passing. As shown in Ariola, Herbelin, Sabry (ICFP04), dynamic scoping is inherent in delimited continuations. The function typeset-exp relies thus on dynamic scoping implemented conventionally via environment passing. We use two dynamically-scoped variables: &paren (a boolean flag telling if the context suggests we need enclosing parentheses) and &decor (the list of decorations). The re-writing rules are also a part of the environment, the constant part of it. Here's the procedure that formats an application `(a b ...)'. (define (handle-app envp app) (let ((env (ext-paren envp #t))) ; use paren (let ((hd (handle env (car app)))) (if (procedure? hd) (apply hd envp (cdr app)) (add-paren envp (cons hd (map (lambda (x) (handle env x)) (cdr app)))))))) It receives the current environment (envp -- the parent environment) and `app', which is a list `(a b ...)'. The procedure extends `envp' with the binding of `&paren' to #t -- meaning that the context of an application is complex and so `a', `b', etc. expressions need to be enclosed into parentheses, if they are complex too. Then we format each of `a', `b', etc. within thus extended environment, and enclose the result in parentheses if the _parent_ context envp requires that. One complication stems from the fact that `a' may be a ``macro''. For example, `(let ((x 1)) body)', although might look like an application, is actually a macro. In the current environment, `let' is resolved to the procedure: (let . ,(lambda (_) (lambda (envp bindings body) (let ((env (ext-paren envp #f))) (add-paren envp (list "\\mathrm{let}\\," (handle env (caar bindings)) "\\!=\\!" (handle env (cadar bindings)) "\\,\\mathrm{in}\\," (handle env body))))))) We pass the rest of the arguments to that procedure and let it do the formatting. The `let' macro specifically sets the &paren dynamic binding to #f: the body of `let' is formatted without enclosing parentheses.