Low- and high-level macro programming in Scheme


Applicative syntax-rules: macros that compose better

The syntax-rule macro system of R5RS does not seem to scale beyond simple macros. It is very difficult to write macros that compose, to assemble complex macros from already written and tested components. The previous approaches to composable syntax-rules are heavy, notationally and computationally. This article presents an alternative, lightweight style of writing composable syntax-rules, based on the CK abstract machine.

We demonstrate recursive, higher-order applicative macros defined in the style that looks like that of ML or (strict) Haskell. We write composable, call-by-value--like macros without resorting to the continuation-passing-style and thus requiring no macro-level lambda. The syntax remains direct-style, with nested applications.

Syntax-rules are difficult to compose because of their evaluation order: the arguments of a macro-invocation are not expanded. That per se does not preclude functional composition since the normal-order lambda-calculus or non-strict languages like Haskell do not evaluate arguments of a function application either. However, lambda-calculus has first-class anonymous abstractions; Haskell also has the case form that forces evaluation of an expression, to the extent needed to choose among the pattern-match clauses. Syntax-rules have none of these compensating features. Generally, a syntax-rule cannot obtain the result of the expansion of its argument expression. The article on Systematic Macro Programming on this page explains the composability problem in detail. So far, the only way out has been to effectively change the evaluation order by writing macros in the continuation-passing style (CPS). However, CPS code is hard to read. Furthermore, building continuations requires the notation for first-class, preferably anonymous syntax-rule abstractions. Although the latter are possible to emulate, the result is stylistically ugly and computationally expensive. Some macro expanders take the shocking amount of time and memory to expand CPS macros with anonymous abstractions.

This project was inspired by the question posed by Dan Friedman in March 2009:

Write the macro permute that takes any number of arguments and returns the list of their permutations:
(permute a b c) ==> ((a b c) (b a c) (b c a) (a c b) (c a b) (c b a))
The order of the entries in the list is immaterial. One should write permute without resorting to CPS.
Our answer is the transliteration of the standard Haskell code implementing the straightforward algorithm for all permutations:
     perm :: [a] -> [[a]]
     perm [] = [[]]
     perm (h:t) = concatMap (ins h) (perm t)
     ins :: a -> [a] -> [[a]]
     ins x [] = [[x]]
     ins x (h:t) = (x:h:t) : map (h:) (ins x t)
We shall see that our code looks pretty much like the above, only with more parentheses.

Our macros are written in a CK style. We distinguish values, which are always quoted like '(1 2 3), from general applicative expressions such as (c-append '(1) (c-cons '2 '(3 4))), which may contain nested applications. Values are regarded as results of the CK evaluation. Here is the first example, of the CK-style macro cons:

     (define-syntax c-cons
       (syntax-rules (quote)
         ((c-cons s 'h 't) (ck s '(h . t)))))
A CK-macro always receives the extra, first argument, typically called s. The macro should never look at s, passing it to the macro ck described below. All arguments of a CK-macro except for s are always values; the macro is free to pattern-match on them. A CK-macro always expands into the call to the macro ck, passing it the s argument followed by the produced value or by the expression that will produce the resulting value. The macro c-cons produces a value, which is therefore quoted.

We now demonstrate recursion and functional composition, or nested application. We define a macro c-append, using the just defined c-cons.

     (define-syntax c-append
       (syntax-rules (quote)
         ((c-append s '() 'l2)       (ck s 'l2))
         ((c-append s '(h . t) 'l2)  (ck s (c-cons 'h (c-append 't 'l2))))
The first clause yields a value, which is quoted. The second clause yields an expression, with the nested application. The code does look like the idiomatic Haskell code
     append [] l2    = l2
     append (h:t) l2 = h : (append t l2)

At the heart of the CK style is the CK abstract machine, which focuses and re-focuses an applicative expression, relying on user-defined CK-macros for (primitive) reductions. The machine is implemented as a syntax-rule ck. It operates on the stack (the first argument to all CK macros) built out of the frames (op va ... [] ea ...), represented as ((op va ...) ea ...) in the code below. Here op is the name of a CK-macro to do the reduction; zero or more va must all be values; zero or more ea are arbitrary expressions.

     (define-syntax ck
       (syntax-rules (quote)
         ((ck () 'v) v)                      ; yield the value on empty stack
         ((ck (((op ...) ea ...) . s) 'v)    ; re-focus on the other argument, ea
           (ck s "arg" (op ... 'v) ea ...))
         ((ck s "arg" (op va ...))           ; all arguments are evaluated,
           (op s va ...))                    ; do the redex
         ((ck s "arg" (op ...) 'v ea1 ...)   ; optimization when the first ea
           (ck s "arg" (op ... 'v) ea1 ...)) ; was already a value
         ((ck s "arg" (op ...) ea ea1 ...)   ; focus on ea, to evaluate it
           (ck (((op ...) ea1 ...) . s) ea))
         ((ck s (op ea ...))                 ; Focus: handle an application;
           (ck s "arg" (op) ea ...))         ; check if args are values

We get the ball rolling by expanding (ck () exp) to evaluate the CK-expression exp on the empty initial stack. The Scheme expression (ck () (c-append '(1 2 3) '(4 5))) will hopefully macro-expands to (1 2 3 4 5), which the Scheme expression evaluator will try to evaluate, reporting the error since 1 is not a procedure. To see the result of just the macro-expansion, without any further evaluations, we should quote it:

     (define-syntax c-quote
       (syntax-rules (quote)
         ((c-quote s 'x) (ck s ''x))))
We run our first applicative macro obtaining the result shown in the comment underneath.
     (ck () (c-quote (c-append '(1 2 3) '(4 5))))
     ; ==> (1 2 3 4 5)

A higher-order map macro takes as the first, non-s, argument a closure to apply to each element of the list given in the second argument.

     (define-syntax c-map
       (syntax-rules (quote)
         ((c-map s 'f '())            (ck s '()))
         ((c-map s '(f ...) '(h . t)) (ck s (c-cons (f ... 'h) (c-map '(f ...) 't))))
The test below prepends 10 to each element of the argument list.
     (ck () (c-quote (c-map '(c-cons '10) '((1) (2) (3) (4)))))
     ; ==> ((10 1) (10 2) (10 3) (10 4))
The similar concatMap concatenates the result of the mapping:
     (define-syntax c-concatMap
       (syntax-rules (quote)
         ((c-concatMap s 'f '())
           (ck s '()))
         ((c-concatMap s '(f ...) '(h . t)) 
           (ck s (c-append (f ... 'h) (c-concatMap '(f ...) 't))))
     (ck () (c-quote (c-concatMap '(c-cons '10) '((1) (2) (3) (4)))))
     ; ==> (10 1 10 2 10 3 10 4)

We have defined enough of the syntax-rule list processing library to solve Dan Friedman's problem:

     (define-syntax c-perm
       (syntax-rules (quote)
         ((c-perm s '())      (ck s '(())))
         ((c-perm s '(h . t)) (ck s (c-concatMap '(c-ins 'h) (c-perm 't))))))
     (define-syntax c-ins
       (syntax-rules (quote)
         ((c-ins s 'x '())
           (ck s '((x))))
         ((c-ins s 'x '(h . t)) 
           (ck s (c-cons '(x h . t) (c-map '(c-cons 'h) (c-ins 'x 't)))))))
     ; The following macro is a syntactic sugar to invoke c-perm
     (define-syntax perm
       (syntax-rules ()
         ((perm . args) (ck () (c-quote (c-perm 'args))))))
We indeed compute all permutations of a given list:
     (perm 1 2 3)
     ; ==> ((1 2 3) (2 1 3) (2 3 1) (1 3 2) (3 1 2) (3 2 1))
The current version is 1.1, April 2011; original version: March 2009.

CK.scm [9K]
The complete R5RS code for the example.
The code includes further examples, the indispensable factorial and a more practical example of deleting an element from an associative list. The latter two examples are from the CPS Macros-talk. It is instructive to compare the direct-style applicative macros here with the CPS versions in that 2002 talk.

Matthias Felleisen and Daniel P. Friedman: Control operators, the SECD machine, and the lambda-calculus.
In Martin Wirsing, editor, Formal Description of Programming Concepts III, pages 193-217. Elsevier, Amsterdam, 1986.
That paper introduced the CK machine.

Composable macros in the Continuation-passing style


A simple linear pattern matcher

The macro match-case-simple is a simple pattern-matcher for linear patterns. It is small and efficient. It was developed back in 2005 for the miniKanren translation of the leanTAP theorem prover, to transform a first-order logic formula to the Negation Normal Form. It should work on any R5RS and R6RS Scheme system. The macro is a much simpler version of Bigloo's very advanced match-case special form, which is a built-in rather than a macro.

To give the flavor of the pattern-matcher, we write a function to describe values:

     (define (test-match x)
       (match-case-simple x
         (,x (number? x) "number")
         (() ()          "nil")
         (#t ()          "bool")
         (#f ()          "bool")
         ((,x . ,y) ()     
           (string-append "pair of " (test-match x) " and " (test-match y)))
         (__ ()           "something else")))

Formally, match-case-simple has the following syntax and semantics.

     (match-case-simple exp <clause> ...[<else-clause>])
     <clause> ::= (<pattern> <guard> exp ...)
     <else-clause> ::= (else exp ...)
     <guard> ::= boolean exp | ()
     <pattern> :: =
            ,var  -- matches always and binds the var
                     pattern must be linear! No check is done
             __   -- matches always
            'exp  -- comparison with exp (using equal?)
            exp   -- comparison with exp (using equal?)
            (<pattern1> <pattern2> ...) -- matches the list of patterns
            (<pattern1> . <pattern2>)  -- ditto
            ()    -- matches the empty list

As a more interesting example, we show a meta-circular interpreter for a small subset of Scheme:

      (define (int code env)
       (match-case-simple code
         (('quote ,x) () x)
         ((let ((,x ,e)) ,body) (symbol? x)
           (let ((xv (int e env)))
             (int body (cons (cons x xv) env))))
         ((lambda () ,body) ()                 ; thunk
           (lambda () (int body env)))         ; closed over the env
         ((lambda ,argl ,body) (symbol? argl)  ; arglist
           (lambda arglv
             (int body (cons (cons argl arglv) env))))
         ((lambda (,x) ,body) (symbol? x)      ; 1-arg function
           (lambda (xv)
             (int body (cons (cons x xv) env))))
            ; the general case of lambda is skipped to keep the example small
         ((,op . ,args) ()
           (let* ((opv (int op env))
                  (argvs (map (lambda (c) (int c env)) args)))
             (apply opv argvs)))
         (,x (symbol? x) (lookup x env))
         (,x () x)                           ; probably number, string, etc.
The current version is 1.2, July 2013; original version: 2002.

match-case-simple.scm [5K]
The complete R5RS/R6RS code for the example.
The file contains the complete code for the meta-circular interpreter and several tests.
In its present form the macro appeared in the miniKanren translation of the leanTAP theorem prover, written in August 2005. It was based on a match-tree, used in SSAX/examples/pull-punct-sxml.scm example in the SSAX SourceForge repository, whose first version was committed in 2002.


Two pitfalls in programming nested R5RS macros

This article is a follow-up to Al Petrofsky's detailed guide on writing advanced syntax-rule macros. This present article discusses two subtle pitfalls that may arise in macros with nested syntax-rules, i.e., macros whose expansion contains definitions of other macros. The first pitfall is about lexical scoping and shadowing: pattern variables do not shadow. The second pitfall deals with passing parameters to a submacro.

If you are mindful of these pitfalls and use CPS/tail-calls, you will find that -- surprisingly -- writing R5RS macros looks a lot like writing regular tail-recursive Scheme procedures.

The current version is 1.0, Mar 5, 2002.

The USENET article with more examples and implementation [plain text file]
The article was posted as Re: An Advanced Syntax-Rules Primer for the Mildly Insane on a newsgroup comp.lang.scheme on Tue, 5 Mar 2002 13:46:54 -0800

Al Petrofsky. An Advanced Syntax-Rules Primer for the Mildly Insane
A remarkably detailed and insightful explanation of R5RS macro expansions. The article was posted on a newsgroup comp.lang.scheme on 2002-03-03 17:03:50 PST


Syntax-rule-level ??!lambda and ??!apply

A R5RS macro-lambda is a notation for a first-class parameterized future macro-expansion action

     (??!lambda (bound-var ...) body)
Here bound-var is an identifier, bodyis an expression that may contain forms (??! bound-var), and ??!lambda is just a symbol. It must be stressed that ??!lambda is not a syntactic or bound variable. Although the ??!lambda form may look like a macro-application, it is not. This is a critical distinction: Hilsdale and Friedman looked for a macro-level lambda as a macro, but did not succeed. Our macro-level abstractions are not macros themselves. The ??!lambda form is first class: it can be passed to and returned from R5RS macros, and can be nested.

The ??!lambda-form is interpreted by a macro ??!apply. To be more precise, we specify that the following macro-application

     (??!apply (??!lambda (bound-var ...) body) arg ...)
expands to body, with all non-shadowed instances of (??! bound-var) hygienically replaced by arg. In Scheme, question and exclamation marks are considered ordinary characters. Hence ??!lambda, ??!apply and ??! are ordinary symbols -- albeit oddly looking, on purpose. A ??!lambda form can have one or several bound variables; the corresponding ??!apply form should have just as many arguments.
The current version is 3.0, Apr 23, 2002.

macro-lambda.scm [6K]
An implementation of the R5RS macro ??!apply that satisfies the above specification.

Systematic Macro programming

Erik Hilsdale and Daniel P. Friedman. Writing macros in continuation-passing style
Scheme and Functional Programming 2000. September 2000.


Systematic Macro programming

Embedding a domain-specific notation or otherwise syntactically extending a core language often require sophisticated macros. A complex artifact is easier to develop if it can be composed of smaller, separately designed and tested parts. Macro systems such as R5RS Scheme macros and Camlp4 quotations are head-first, call-by-name or normal order re-writing systems. Such systems are in general non-compositional. The familiar idioms of a function call and a functional composition -- let alone higher-level combinators such as fold -- do not easily apply. Therefore, macros tend to be monolithic, highly recursive, ad hoc, and requiring a revelation to write, and the equal revelation to understand. As Section 7.3 of R5RS demonstrates, we have to resort to contortions and hacks even in simple cases.

However, macros written in a continuation-passing style (CPS) always compose. Making CPS macros practical still requires an encoding for macro-continuations. We have found this missing piece, with an insight for anonymous macro-level abstractions. The abstraction-passing technique is general and will apply to Camlp4, Tcl and other head-first code-generation systems.

In the specific case of R5RS macros, we present a stronger result: systematic developing of R5RS macros by a translation from the corresponding Scheme procedures. This is an unexpected result as typically R5RS macros look nothing like regular Scheme procedures.

A number of examples below demonstrate how to construct macros that are always compositional. In some circumstances, we can compose with "legacy" macros written without following our methodology. As a side effect, the technique makes R5RS macros functional, restores the modularity principle, and even makes possible higher-order combinators.

Several other examples elaborate the translation from Scheme procedures to R5RS macros, via the CPS transformation. A Scheme-to-syntax-rules compiler does such a translation mechanically.

The current version is 1.3, Apr 7, 2002.

Macros that Compose: Systematic Macro Programming
CPS-Macros.ps.gz [58K]
Macros-talk.pdf [102K]
The paper is published in LNCS volume 2487 with the Copyright held by Springer-Verlag. The file CPS-Macros.ps.gz is a local copy of that copyrighted paper. The file Macros-talk.pdf is a transcript of the talk presented on October 7, 2002 at GPCE 2002. The talk has a better running example. Full paper reference: In: Don Batory, Charles Consel, Walid Taha (Eds.), Generative Programming and Component Engineering: ACM SIGPLAN/SIGSOFT Conference, GPCE 2002. Pittsburgh, PA, USA, October 6-8, 2002. Lecture Notes in Computer Science, volume 2487, pp. 202-217. Springer-Verlag, October 2002.

The USENET article Transparent macro-CPS programming [plain text file]
posted on a newsgroup comp.lang.scheme on Fri, 30 Nov 2001 17:24:21 -0800
The article demonstrates several examples of developing complex macros by composition. One of the examples is a compile-time implementation of the factorial over Peano-Church numerals. The resulting ?plc-fact macro is truly modular: it is built from separately defined and tested arithmetic (?plc-pred, ?plc-mul) and comparison macros.

The USENET article Scheme procedure -> R5RS macro, by example of quotify [plain text file]
posted on a newsgroup comp.lang.scheme on Wed, 6 Feb 2002 16:21:50 -0800
Another example of the systematical construction of R5RS macros. The article uses the systematic approach in solving a practical problem, posed on comp.lang.scheme. We start by implementing the algorithm of the required S-expression transformation as a regular Scheme procedure. When the algorithm is tested, we mechanically convert the Scheme procedure into a R5RS macro. In the article, we do the conversion by hand for clarity. The mechanical translation is one of the test cases of the Scheme-to-syntax-rules compiler.

quotify.scm [10K]
The code for the above article. It is tested under Bigloo 2.4b and Scheme48.

A Scheme -to- syntax-rules compiler

Lambda-calculator as a R5RS macro
A more challenging R5RS macro: the normal-order evaluator for untyped lambda calculus. The challenge comes from capture-free substitutions and repeated identifications of left-most redexes. The evaluator is truly modular: it relies on a separately designed and tested one-step beta-substitutor, and on a higher-order macro-combinator map. Our technique of CPS and macro-level abstractions guarantees composability of all pieces.

Syntax-rule-level ??!lambda and ??!apply

Erik Hilsdale and Daniel P. Friedman. Writing macros in continuation-passing style
Scheme and Functional Programming 2000. September 2000. The present article suggests an improvement: a design for an anonymous macro abstraction.

Applicative syntax-rules: macros that compose better


Lambda-calculator as a R5RS macro

This article presents compile-time (meta-) lambda-calculators: the normal-order evaluators for untyped lambda calculus. The calculators normalize a term in a pure untyped lambda calculus. The result can be compiled and evaluated in a call-by-value lambda-calculus with constants (i.e., Scheme). The lambda-calculators act as partial evaluators.

Therefore, the code has three kinds of lambdas: of the source language, of the transformer meta-language, and of the target language. Each is handled by its own evaluator, yet they look similar. In fact, the former and the latter lambda forms are identical. In all the cases, the lambda forms play the same role: a denotation for a parameterized future evaluation. The evaluation occurs in different ways, yet the idea of delaying an action and passing its promise as a first-class object is universal.

CPS-style: December 2001 version

This calculator is implemented in CPS. It is an example of a challenging R5RS macro, which was designed modularly and systematically with the help of CPS, macro-lambda and higher-order combinators. The challenge comes from capture-free substitutions and repeated identifications of left-most redexes.

Direct-style: December 2004 version

The December 2004 version implements the normal-order beta-normalizer in direct style. This version is a single, short, fast, stand-alone macro. It does no alpha-renaming directly and no substitutions directly. Everything is integrated with normalization, and everything is delayed until the latest possible moment. The only recursion here is via NORM itself.

This version of the calculator follows the calculus of explicit substitutions, and is very similar to bottom-up parsing. The calculator normalizes a term given a term stack and an environment stack, both of which are initially empty. With two stacks, we have two shifts and two reductions. Applications shift onto the term stack. Redexes reduce the term stack and shift onto the environment stack. Variables reduce from the env stack.

That short, stand-alone, syntax-rule macro clearly shows that lambda-calculus is indeed very simple, on the surface.

The current version is Dec 2004.
The calculator should run on any R5RS Scheme system.

Normal-order direct-style beta-evaluator with syntax-rules, and the repeated applications of call/cc

callcc-fixpoint.txt [plain text file]
The appendix of that article gives the complete code of the normalizer and shows its many tests and examples.

The USENET article posted as Re-writing abstractions, or Lambda: the ultimate pattern macro [plain text file]
on newsgroups comp.lang.functional and comp.lang.scheme on Sun, 16 Dec 2001 13:14:02 -0800
The article, besides the extensive discussion, contains the source code for the meta-language lambda-evaluator along with regression tests.

Systematic Macro programming

Lambda Calculus and Lambda Calculators


A Scheme -to- syntax-rules compiler

A more systematic way of writing syntax-rule macros is a mechanical transformation of the corresponding Scheme procedures into macros. Scheme functions are far easier to develop and debug. After we are sure our Scheme code works, we submit our procedures to a Scheme--to--syntax-rules compiler -- and use but don't look at the result.

The Scheme -to- syntax-rules compiler mechanically converts (desugared) pure-functional Scheme procedures to the corresponding CPS syntax-rule macros. Currently the compilation to syntax-rules is hardly optimized. The result is very slow. In general, the implementation of syntax-rules on Bigloo and many other systems takes a lot of time and memory. For that reason, in Bigloo it is off by default: We need to specify a special flag -hygien to activate syntax-rule macros on Bigloo. It seems no one has bothered to optimize the macro expander, because no one asked for that.

We must emphasize however the speed of writing macros in the systematic way. We do all the development and testing with regular Scheme procedures. Once we have written the correct procedures, the translated syntax-rule macros just work.

The current version is 2.2, Jul 15, 2002.

cps-macro-conv.scm [25K]
The source code for the compiler and several sample programs. The code extensively relies on a match-case form provided by a Bigloo system.

Systematic Macro programming


A stress test of the syntax-rule macro-expander

Performance of the syntax-rule macro-expander across different Scheme systems can vary by as much as three orders of magnitude. This conclusion was the result of timing a syntax-rule stress test below. When evaluated, the test prints a symbol 'prime, which means that the number 5 is prime. The primality check itself is done at the macro-expand time. The algorithm is a pure-functional implementation of the sieve of Eratosthenes (see the macro ?sieve). The test is a "side effect" of an exercise of translating Scheme procedures into syntax-rules by the Scheme -to- syntax-rule compiler.

Try loading the file below in your favorite Scheme system. The results may vary by a few orders of magnitude: from 0.9 seconds (the latest Chez Scheme) to 15 minutes, on a comparable capable hardware.

R. Kent Dybvig has very kindly sent me the results for Chez Scheme running on a 1.2MHz Athlon CPU, 512MB RAM, Linux. The latest version of Chez Scheme, version 6.9a, with a built-in macro-expander finished the stress test in 0.920 seconds (user time). Both a commercial Chez Scheme and a free Petite Chez Scheme, version 6.9, finished the test in 5.6 seconds user time incurring 400-500 page faults. Both tests used a built-in macro-expander, compiled at optimize-level 3. With a portable macro-expander, Chez Scheme v6.9 finishes the test in 9.7 user seconds whereas the Petite Chez Scheme needs 62.6 user seconds. The slowdown in the last case is due to the fact the expander itself is interpreted, while in the other cases it is compiled. Local macros, i.e., those introduced by let-syntax and letrec-syntax, are always interpreted. Top-level macros are compiled in Chez Scheme and interpreted in Petite Chez Scheme. Even an old version of Petite Chez Scheme, 6.0, with a built-in expander, finished the test in 30 seconds (albeit taking around 70 MB of memory). For comparison, on a twice as capable platform (2GHz Pentium 4, 1GB RAM, FreeBSD), SCM 5d6 finishes the test in 5.2 seconds user time with the resident size of 2MB, Bigloo 2.4b takes 36.3 seconds with the resident size of 34 MB, and Scheme 48 v0.57 takes 652.9 seconds given the heap size of 100,000,000 words.

We must caution that the speedups in the latest version of Chez Scheme are of course specific to the stress test. The differences in performance for less stressful uses of macros are less notable. It is safe however to make one conclusion: the mechanical derivation of syntax-rule macros from the corresponding Scheme functions is viable. Although the resulting macros may be slow to expand on some platforms, they are not inherently slow. Carefully designed and optimized macro-expanders have little trouble with the syntax-rule-compiled macros.

The current version is 1.1, Sep 4, 2002.

syntax-rule-stress-test.scm [7K]
The source code for the stress test, to be loaded or fed into a Scheme system.

The USENET article Scheme->syntax-rules [Was: hygienic macro primality tester] [plain text file]
posted on a newsgroup comp.lang.scheme on Mon, 15 Jul 2002 20:25:53 -0700
The article explains a non-pure and pure functional implementations of the sieve of Eratosthenes in Scheme. The article converts the pure one to syntax-rules, using the Scheme -to- syntax-rules compiler.

A Scheme -to- syntax-rules compiler

www.scheme.comThe home page of Cadence Research Systems and Chez Scheme. The free version of Chez Scheme (Petite Chez Scheme) is available from that site.


How to Write Seemingly Unhygienic and Referentially Opaque Macros with Syntax-rules

This paper details how folklore notions of hygiene and referential transparency of R5RS macros are defeated by a systematic attack. We demonstrate syntax-rules that seem to capture user identifiers and allow their own identifiers to be captured by the closest lexical bindings. In other words, we have written R5RS macros that accomplish what commonly believed to be impossible. We build on the fundamental technique by Petrofsky of extracting variables from arguments of a macro. The present paper generalizes Petrofsky's idea to attack referential transparency.

This paper also shows how to shadow the lambda form. The shadowed lambda acts as if it was infected by a virus, which propagates through the lambda's body infecting other lambdas in turn. The virus re-defines the macro being camouflaged after each binding. This redefinition is the key insight in achieving the overall referential opaqueness. Although we eventually subvert all binding forms, we preserve the semantics of Scheme as given in R5RS.

The novel result of this paper is a demonstration that although R5RS macros are deliberately restricted in expressiveness, they still wield surprising power. We have exposed faults and the lack of precision in commonly held informal assertions about syntax-rule macros, and pointed out the need for proper formalization. For a practical programmer this paper offers an encouragement: more and more powerful R5RS macros turn out to be possible.

The current version is April 2003.

Dirty-Macros.pdf [223K]
An extended version of a paper presented at the Third Workshop on Scheme and Functional Programming, sponsored by ACM SIGPLAN. October 3, 2002, Pittsburgh, Pennsylvania, USA.

Dirty-Macros-talk.pdf [104K]
A presentation at the Workshop. October 3, 2002. Pittsburgh, PA.


setf -- a polymorphic, generic setter -- as a simple Scheme macro

An article discussing how a LISP-like setf special form can be used and implemented in Scheme. Examples of setf!:
     > (define al '((1 "one") (2 "two")))
     > (assv 1 al)
     (1 "one")
     > (setf! (cadr (assv 1 al)) "-one-")
     > (assv 1 al)
     (1 "-one-")
     > (tree-ref tree #f #f)
     > (setf!
          (tree-ref tree #f #f) 'x)
     > (tree-ref tree #f #f)
The article shows more examples as well as the complete implementation of setf! as a simple Scheme macro.
The current version is 1.0, Jul 7, 1998.

The USENET article with more examples and implementation [plain text file]
The article was posted as setf -- a polymorphic, generic setter -- as a simple Scheme macro on a newsgroup comp.lang.scheme on Tue, 07 Jul 1998 16:18:18 GMT

How to map set!


A dark, under-specified corner of R5RS macros

The behavior of syntax-rules -- in particular, the renaming of identifiers that will appear at the top level -- is not completely specified in R5RS. Different implementations of macro-expanders vary widely in this respect.

The following is an illustrative test. It is a syntax-rule for letrec (taken verbatim from Section 7.3 of R5RS) with only one change: the fragment

     (let ((var1 <undefined>) ...)
              (let ((temp1 init1) ...)
                (set! var1 temp1)
                body ...)
is replaced by the "equivalent" code:
       (define var1 #f) ...
       (define temp1 init1) ...
       (set! var1 temp1) ...
       body ...))
We also changed the name to letrec1 to avoid confusion with the standard letrec. If we use letrec1 at the top level and overlook the pollution of the global namespace, we may expect that letrec1 is semantically equivalent to letrec.

Is it? To test this claim, we evaluate the following expression:

     (letrec1 ((x 1) (y 2)) (display (+ x y)) (newline))
It turns out, this expression may print 3, print 4, or raise an unbound variable error. You might wish to load the file referenced below into your Scheme system. The existing Scheme systems almost evenly fall into three categories regarding this test.
Result 3 is printed by:
Scheme48, Chez Scheme, MIT Scheme 7.7.1, Chicken
Result 4 is printed by:
PLT Scheme 202, SCM (native macro-expander), Gauche Scheme, Gambit (with Hieb and Dybvig implementation of syntax-rules/syntax-case), Al Petrofsky's portable macro-expander
Unbound variable error is raised by:
SCM with Will Clinger's expander (aka Macros that work), Petite Chez Scheme 6.0a, Bigloo 2.4b

Furthermore, Richard Kelsey suggested on the discussion thread that such a 'non-deterministic' behavior is innate, and a consequence of an R5RS-blessed uncertainty whether top-level definitions are binding occurrences. We should point out however that SCM gives two different answers depending on the macro-expander. It appears what matters is not how a Scheme system treats top-level defines but how a macro-expander thinks the underlying system treats top-level defines.

The top level of Scheme is indeed under-specified and treacherous. On the bright side, we can now trace the lineage of macro-expanders in various implementations of Scheme.

The current version is 1.1, Dec 19, 2002.

syntax-rule-dark-corner.scm [3K]
The source code for the test.

Google discussion thread
The thread shows where the puzzle came from.


quote -- as a macro

How fundamental is quote? Can quote truly be implemented as a macro, with no cheating? The following article gives the affirmative answer. We define a low-level macro my-quote that acts exactly like quote but does not itself use quote. Nor any other special form sans the raw or sugared lambda (i.e., let, cond). The macro relies on a R5RS function string->symbol, which is an ordinary Scheme function.

The article thus demonstrates that quote is not a primitive construct. An implementor may therefore leave out quote from the core of his Scheme system.

The macro my-quote acts precisely as the native Scheme quote:
     (append (my-quote (1 2 3)) (my-quote ()) (my-quote (4))) ;=> (1 2 3 4)
     (symbol? (my-quote abcdef)) ; => #t
The current version is 1.0, May 9, 2001.
The USENET article with the implementation and more examples [plain text file]
The article was posted as Quote as a macro on a newsgroup comp.lang.scheme on Thu, 10 May 2001 01:00:27 +0000

How to map set!

The following article describes how to overload set! so we can do
     (set! (if (< x y) x y) 3.1415)
We can even evaluate
     (map (lambda (var val) (set! var val))
       (list x y z) (list 1 2 3))
to change the values associated with the variables x, y and z. We can still use set! to mutate ordinary bindings as in (set! u (+ x y)).
The current version is 1.0, Jun 23, 2002.
The USENET article with more examples and implementation [plain text file]
The article was posted as How to map a set! [Was: How to play with the first arg of set! ?] on a newsgroup comp.lang.scheme on Sun Jun 23 19:03:25 2002

Macros with keyword (labeled) arguments

Labeled (aka keyword) arguments are present in many languages, e.g., Tcl/Tk, Ocaml, Perl -- and arguably contribute to their expressiveness. Some Scheme systems implement DSSSL extensions, which let us pass arguments to procedures as keyword-value pairs in arbitrary order. It becomes clear which argument receives which value. The programmer does not have to memorize the precise order of arguments to a procedure. He can even omit some arguments if the defaults are appropriate.

Alas we still have to use positional arguments for macros. Even if the DSSSL extensions are available, they do not apply to special forms. This article introduces portable, R5RS-compliant keyword (labeled) arguments for macros and procedures. The order of label-value pairs is arbitrary; arguments for which defaults exist may be omitted; required labeled arguments must be given. We implement labeled arguments via a meta macro. It takes the name of an ordinary (positional-argument) macro and the description of its arguments and defines a new macro, which accepts labeled arguments. A label is anything that can be used for identification: in the simplest case, a label is an identifier. However, strings, numbers, characters, and DSSSL keywords (if available), can all be used as labels. The article shows many examples. If we label an argument by a string, that string becomes that argument's documentation. Our approach can also be used for regular procedures. So we can enjoy keyword arguments for (first-order) functions even on those systems that don't support DSSSL.

Here are a few more interesting examples. The first one makes an ordinary procedure accept keyword arguments, with keywords being strings. The keyword resolution is done at macro-expand time.

     (gen:define-labeled-arg-macro llookup
            ; the following are the descriptors for lookup's three arguments
         ("the predicate" eq?) ; default value is eq?
         ("key")               ; no defaults: the argument is required
         ("the associative list") ; ditto

Applications become self-documenting:

       "the associative list" '((a 1) (b 2))
       "key" 'b)
     ; ==> (b 2)

Any datum may be a label, even characters and booleans. It seems especially fitting to use booleans to mark branches on the standard if special form:

     (gen:define-labeled-arg-macro lif ; keyword-arg variant of `if'
       (if         ; the positional special form: the ordinary `if'
         (#\?)     ; Character '? marks the test expression
         (#t)      ; then-branch, required
         (#f 17)   ; else-branch, optional
     (let ((x 1)) (lif #t "OK #\? (positive? x) " #f (/ x 0)))
     ; ==> "OK"
The current version is 1.0, May 2, 2004.

The USENET article with more examples and the self-contained implementation [plain text file]
All the code is R5RS compliant and has been tested on Petite Chez Scheme 6.0a, scheme48, and SCM.
The article was posted as Macros and procedures with labeled arguments on a newsgroup comp.lang.scheme on Sun, 2 May 2004 16:14:04 -0700

Functional XML parsing framework
The keyword meta-macro ssax:define-labeled-arg-macro appears in the main module of the SSAX XML parser: SSAX.scm, version 5.0. That file shows an ``industrial'' example of a macro with labeled arguments, ssax:make-parser. The parser is R5RS compliant and runs on many Scheme platforms. An interested reader might find in SSAX.scm a few other curious R5RS macros.

Implementing keyword and default arguments, function's resource database


Turing-completeness of syntax-rules and hygiene

It has been claimed that Turing-completeness of syntax-rules makes the hygiene of macros unenforceable. One may think that the power of Turing-completeness would let us implement a non-hygienic macro-expander atop syntax-rules. Therefore, when we see a macro invocation (foo expr) we allegedly cannot be sure if the expansion satisfies the hygiene condition.

We formally prove that the above conclusion is unconsequential: even if we allow our code to be processed by an unknown macro expressed in a Turing-complete system, not all bets are off. It depends on the macro system.

Indeed, let us consider a strict subset of the R5RS macro system, such that define-syntax, let-syntax, and letrec-syntax may not appear in the template of any syntax-rule. The set of all syntactic transformers is therefore known before the macro-expansion starts. Such a macro system preserves the hygiene condition for macro expansions, defined in Kohlbecker, E.E., Friedman, D.P., Felleisen, M., Duba, B. 'Hygienic macro expansion' (1986) as: "Generated identifiers that become binding instances in the completely expanded program must only bind variables that are generated at the same transcription step." That paper formally proves the condition for the macro system that is equivalent to ours. And yet, such a macro system is Turing complete: we can write a Turing machine with the restricted R5RS macros. The code below demonstrates this constructive proof.

Thus even if we wrap our code in an unknown macro expressed in a Turing-complete system, we may still be certain that the hygiene condition for macro expansions is preserved.

The current version is 1.1, Sep 16, 2003.

turing-completeness-limited-macros.scm [5K]
The source code and the tests.
The finite control unit of the Turing machine is a set of simple syntax-rules. The tape is realized as a pair of lists: one list describes the portion of the tape from the beginning till the current point. The other list represents the tape from the current point onward. See the comments in the code for more details.
As an example, the code implements and tests two particular Turing machines: the addition of natural numbers (in unary) and a decision machine. The latter decides if the number of tokens on the tape between two delimiting blanks is even or odd.

The code has been described in a USENET article Turing-completeness of simple syntax-rules posted on a newsgroup comp.lang.scheme on Mon, 15 Sep 2003 22:14:40 -0700.


How to write symbol? with syntax-rules

We show a syntax-rule macro-predicate that can be applied to any expression and can soundly and reliably determine if that expression is a symbol/literal -- rather than a pair, a vector, a literal string, a literal number, etc. The mere existence of such a predicate is surprising and counter-intuitive. For completeness and reference, we also show two macro-identity predicates on identifiers: id-eq?? and id-eqv??.

Syntax-rules can determine the literal type of some arguments: a pair, a vector, an empty list, a boolean. We can also write a syntax-rule that tests if its argument is the specific number, the specific character, the specific string. A syntax-rule cannot determine if its argument is a string or an integer.

For a long time I used to think that we cannot write a syntax-rule that tests if its argument is a symbol. In particular, I wanted a macro-expand-time equivalent of the library function symbol?. We will call that macro symbol?? to avoid the confusion with the library function. That macro should take three arguments: any syntactic form plus two continuations, kt and kf. The macro symbol?? will expand to kt if and only if its first argument is a symbol. The macro would expand to kf if its argument is anything other than a symbol. I believed such a macro symbol?? is impossible with syntax-rules -- until I wrote it in August of 2003.

     (define-syntax symbol??
       (syntax-rules ()
         ((symbol?? (x . y) kt kf) kf)       ; It's a pair, not a symbol
         ((symbol?? #(x ...) kt kf) kf)      ; It's a vector, not a symbol
         ((symbol?? maybe-symbol kt kf)
                (syntax-rules ()
                  ((test maybe-symbol t f) t)
                  ((test x t f) f))))
             (test abracadabra kt kf)))))

The macro is based on the observation that if a form F is an identifier and a syntax-rule pattern P is anything but a literal identifier, then P can match F if and only if P is an identifier (symbol). The final form of this macro incorporates an improvement suggested by Al Petrofsky in a USENET discussion thread.

The current version is 1.2, Oct 1, 2003.

How to write symbol? with syntax-rules [plain text file]
The article with a detailed explanation, comparisons with defmacro and syntax-case, the source code, and test cases.
The article was posted on a newsgroup comp.lang.scheme on Tue, 30 Sep 2003 12:03:44 -0700.

An assert macro with informative reports on failure
Thanks to the syntax-rule symbol??, the good assert form can, after all, be written in R5RS Scheme.

A good assert macro, now as a syntax-rule [plain text file]
The article illustrates the assert macro on several examples and gives the complete implementation. The code includes several general-purpose syntax-rules, such as id-memv??, symbol??, and k!reverse -- as well as syntax-rules to compute a list of all identifiers (without duplicates) that occur in an arbitrary Scheme expression. We rely on light-weight CPS syntax-rule macros throughout.
The article was posted on a newsgroup comp.lang.scheme on Tue, 14 Oct 2003 11:53:09 -0700.


CPS transform and beta-normalization via syntax-rules: Macro-expander as a proof assistant

The macro-expander was instrumental in proving the theorem about repeated applications of call/cc. The proof relied on a rather tedious CPS transformation, which was almost at the limit of human patience. When complex CPS transformations are done by hand, mistakes are easy to make and difficult to find. In order to simplify the proof and increase our confidence in it, we wrote a syntax-rule macro to CPS transform terms. The macro is simple so its correctness is apparent. Alas, the simplicity of the CPS transform leads to many administrative redexes. The resulting terms are difficult to analyze. Therefore, we have implemented another syntax-rule macro to normalize terms and reduce away administrative redexes. The latter macro acts as a lambda-calculator.
Self-application as the fixpoint of call/cc

lambda as a library syntax

Is lambda a fundamental syntactic form? Can we re-write lambda in terms of something else? Can we re-implement the top-level lambda with syntax-rules? This article shows that we can, which gives us an ability to overload the top-level lambda, for tracing or for mischief. Once again, the R5RS macro system has exceeded our expectations.

In the article, we overload lambda to trace all explicit and implicit lambda-applications. The implicit ones arise from the use of let forms. We overload lambda at the top level as follows:

     (define-syntax lambda
       (syntax-rules ()
         ((lambda bindings body1 body2 ...)
          (make-lambda bindings
            (display "Entering lambda: ") (display 'bindings)
            (display " of values ") (display (list . bindings))
            (display "Happy Holidays!") (newline)
            (begin body1 body2 ...)))))

The article falls within the gray area of R5RS. R5RS does not promise that our lambda-overloading will work on every Scheme system. On the other hand, R5RS does not declare our method to be in error. Following Niels Bohr, the lambda-overloading technique may then be called `non-trivial'. The technique has been tested on Scheme48 and Bigloo 2.4b. With some changes the examples work on Petite Chez Scheme as well. In a follow-up message, Matthias Radestock explained the reasons for the behavior of Chez Scheme -- treating top-level let-syntax and letrec-syntax as splicing constructs, similar to the top-level begin. He pointed out that derived-lambda is probably the first example showing that the splicing behavior of let-syntax leads to a loss of expressiveness.

The current version is 1.0, Dec 28, 2003.
The USENET article with more examples and the implementation [plain text file]
The article was posted as lambda as a _library_ syntax on a newsgroup comp.lang.scheme on Sun, 28 Dec 2003 16:05:43 -0800

Run-time macros as a pattern-matching library

This article shows that macros do exist at run-time. Macros -- that is, syntax transformers and syntax objects -- can be stored in variables, passed to ordinary procedures and applied at run-time. We show three examples. The last one uses exclusively syntax-rules and is specifically written not to contradict R5RS in any way. The example runs on Scheme48, Petite Chez Scheme, SCM, and Gauche and demonstrates the pattern-matching machinery of syntax-rules at run time.

For example, evaluating the following expression compiles a transformer and associates the result with the name foo:

     (compile-pattern 'foo (list '(_ (x ...) (y ...)) ''((x y) ...)))
we can later apply foo to an S-expression to obtain the transformed S-expression:
     (display (apply-pattern 'foo (list (list 10 20 30) (list 1 2 3))))
prints the desired result ((10 1) (20 2) (30 3)).
The current version is 1.0, Jan 28, 2004.
The USENET article with more examples [plain text file]
The article was posted as First-class macros: pattern-match at run-time on a newsgroup comp.lang.scheme on Wed, 28 Jan 2004 20:30:54 -0800

portable, R5RS syntax-rule tracer

When developing complex syntax-rules, one may wish to see the results of intermediate expansions. Some Scheme systems may have macro tracing facilities, such as expand1. The following shows a trivial and not fully general, but nevertheless helpful trick that lets us see the intermediate macro-expansion results. The trick should work in any R5RS Scheme:
     (define-syntax mtrace
       (syntax-rules ()
         ((mtrace x)
           (display "Trace: ") (write 'x) (newline)
One may wish to enclose a macro expression or the template part of a syntax-rule into mtrace. The article below shows an example of using mtrace to trace the expansion of a complex recursive macro -- letrec.
The current version is 1.0, Nov 2, 2004.
The USENET article with more examples and implementation [plain text file]
The article was posted as Re: See results of macro expansion on a newsgroup comp.lang.scheme on Tue, 2 Nov 2004 17:15:22 -0800

Macro-expand-time environments and S-expressions as identifiers

We introduce the macro make-env that maintains a macro-expand-time environment. We can bind values in the environment (associate them with keys) -- and then look them up by their keys. The environment is lexically-scoped. We can shadow one binding with another. For example:

      (bind ((a 1))
        (bind ((b 2))
          (list (lookup a) (lookup b)
            (bind ((a 3))
              (list (lookup a) (lookup b)))))))
The example looks and feels as if bind were just let. However, bindings and look-ups occur at macro-expand time. The whole code above expands to this:
     ; ==> (list 1 2 (list 3 2))

We used macro-expand-time environments for an unusual purpose: writing code where identifiers are S-expressions rather than symbols. A program is a closed term. The meaning of a program does not change if all identifiers are alpha-renamed. Therefore, an identifier does not have to be a symbol. It can be anything that can be compared in equality. Al Petrofsky has pointed out that key insight to me. The macro make-env1 below extends this idea to identifiers-as-lists: the macro maintains a syntactic environment in which S-expressions are the keys, and identifiers are the corresponding values. For example, the following is the correct Scheme code

      (lambda ((id a b c))
        (lambda ((id a b)) 
          (list (id a b c) (id a b)
                '(id a b x) '(id a b) 'x))))
Its macro-expansion is equivalent to the following:
     (lambda (temp~1)
       (lambda (temp~2)
         (list temp~1 temp~2
            'abx 'ab 'x)))
A form (lambda ((id a b ...)) body) binds the key (a b c ...) to the value temp~xxx. The value is a temporary, colored identifier generated by syntax-rules macros. The hygiene property makes such identifiers unique. The macro make-env1 re-writes (lambda ((id a b ...)) body) into (lambda (temp~xxx) re-written-body). We may indeed use composite identifiers such as (id a b ...) in the body of lambda. Here id is a synonym for 'look-up'. The expansion of id in the current environment (maintained implicitly by make-env1) replaces (id a b ...) with the corresponding value -- which is the identifier temp~xxx -- the same identifier that was bound by lambda. That was the critical condition that makes everything work.

The goal of representing identifiers by complex S-expressions is that the latter may be manipulated by syntax-rules. We can truly write syntax-rule macros that concatenate identifiers. The code referenced below does exactly that.

The current version is 1.2, June 12, 2002.

Dirty-Macros-talk.pdf [104K]
How to Write Seemingly Unhygienic and Referentially Opaque Macros with Syntax-rules.
A presentation at the Third Workshop on Scheme and Functional Programming. October 3, 2002. Pittsburgh, PA.
The explanation of the macro-expand-time environments starts on page 45.

dirtier-macros.scm [8K]
The source code and the tests.

How to build `identifiers' with syntax-rules
Portable case-sensitive symbols, and the meaning of identifiers
These two links show practical applications of the idea of ``extended'' identifiers: portable case-sensitive symbols and the implementation of the define-structure macro with syntax-rules.


Parameterized syntax: interpreters without run-time interpretive overhead

Writing interpreters is the textbook application of Scheme, as a way to implement domain-specific languages. We may want to write an interpreter even if the source language is essentially (a subset of) Scheme itself: because the evaluation order of the source language differs from that of the host Scheme system, because we want to add tracing, or because we want to interpret a Scheme expression in non-standard ways. Partial evaluation and many program analyses such as data- and control flow can be expressed as non-standard, abstract interpretations.

Interpreters fold over the source language expression represented as a data structure. Therefore, they must, at run time, repeatedly traverse the expression, detect primitive expression forms and dispatch to their processors: `dispatch on syntax'. The dispatch is a source of a so-called `interpretive overhead'. If the only difference between the source language and the host system is the semantics of standard library functions, we can turn a source language expression into a thunk, compile it, and evaluate after re-defining the standard library functions with set!. The source language expression can now run at `full-speed' -- almost. When compiling the source expression, we must tell the compiler not to inline the standard functions; otherwise their re-definition has no effect. Thus the overhead, of indirect standard function calls, remains. This method of overhead reduction does not apply if we want to re-define special forms such as if, lambda or application -- for example, to fix a particular evaluation order.

We describe how to avoid the run-time dispatch on syntax, how to abstract over special forms and interpret an expression using various, non-standard meanings of lambda, application, etc. The code is in R5RS Scheme. The key is performing the dispatch on syntax at macro-expand time. We present the R5RS macro xlate, which takes an expression and the environment. The environment is an associative list with interpretations for built-ins, library functions and special forms, including application. The macro xlate traverses the given source language expression, detects primitive forms and replaces them using the environment as a substitution. The resulting Scheme expression (with re-defined built-ins and special forms) can be evaluated or compiled by the host Scheme system as usual. To avoid the overhead of searching variable environment, we use higher-order abstract syntax.

The first example is the standard interpretation: we interpret a Scheme expression as the host Scheme system would evaluate it. The translation below is the identity.

     (define-syntax standard
       (syntax-rules ()
         ((standard exp)
             ((id  (syntax-rules () ((id x) x)))
              (app (syntax-rules () ((app x . xs) (x . xs)))))
               (("inj" id) ("*" *) ("sub1" (lambda (x) (- x 1))) ("zero?" zero?)
                ("lambda" id) ("app" app) ("seq" begin))
     (standard ((lambda (x) (* (x 2) 3)) (lambda (n) (* n n))))
     ; 12
A trivial non-standard interpretation is counting the number of `constructors' in an expression. Every expression, including lambda-expression, now evaluate to an integer. We no longer treat abstractions as values.
     (define-syntax count-ctr
       (syntax-rules ()
         ((count-ctr exp)
           (let ((ctr1 (lambda (x) 1))
                 (op   (lambda args (apply + 1 args))))
               (("inj" ctr1) ("*" op) ("sub1" op) ("zero" op)
                ("lambda" (lambda (f) (+ 1 (f 0)))) ; a variable counts as 0
                ("app" op) ("seq" op))
     (count-ctr (lambda (x) (* x 2)))
     ; 3    a variable counts as 0, lambda counts as 1
     (count-ctr ((lambda (x) (* (x 2) 3)) (lambda (n) (* n n))))
     ; 8
In the non-standard interpretation, our running example evaluates to 8, because it contains two lambdas, two literal constants, two multiplication operations, and two applications.

Our main example is evaluating a source language expression using call-by-value, call-by-ref, and call-by-name evaluation strategies. We can also do partial evaluation and CPS conversion, as described in the tagless-typed paper.

The current version is 1.2, July 4, 2009. The original version was February 1, 2007.

call-by-any.scm [8K]
The R5RS source code and the tests.

Tagless-Final Style

Parameterizing expressions by the evaluation order

Last updated July 4, 2013

This site's top page is http://okmij.org/ftp/

Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML