The need for portable case-sensitive symbols arose in several real projects. The most natural answer is a conservative lexical extension: a portable notation for case-sensitive symbols. The notation fully preserves the lexical structure of R5RS Scheme and can be used on any R5RS system. Denoted case-sensitive symbols are transcribed into genuine case-sensitive symbols by a portable macro. We consider a low-level-macro and a slightly less general syntax-rule implementations. The discussion of the notation and its transcription on the comp.lang.scheme newsgroup has revealed surprisingly deep insights: into portable lexical extensions of Scheme, into treating code as data, into the capabilities of syntax-rules macros, and into the very meaning of identifiers.
This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com.
According to R5RS, "Upper and lower case forms of a letter are never distinguished except within character and string constants." There are legitimate applications however that greatly benefit from case-sensitive symbols. One such application is an S-expression-based form of XML [SXML]. PLT XML collections, SXML and all other similar projects map tag names to identifiers. Such a representation is highly appropriate as tag names are not usually mutable but heavily used in identity comparisons. The need for case-sensitive identifiers in describing semi-structured data as S-expressions was recognized in DSSSL.
A great number of Scheme systems already offer a case-sensitive reader, which often has to be activated through a compiler option or pragma. A web page [Scheme-case-sensitivity] discusses case sensitivity of various Scheme systems in detail.
'"CooL"
notation and its transcriptionThis section describes the notation and its implementation that were first presented in an article [CooL-symbols].
According to R5RS, symbols created by string->symbols
, e.g.,
(string->symbol "ASymbol")retain their case, while symbols
read
or entered literally(with-input-from-string "ASymbol" read) 'ASymbolmay get their case changed on many Scheme systems. Therefore, the following expression
(eq? (string->symbol "ASymbol") 'ASymbol)evaluates to
#f
on many Scheme systems, e.g., on SCM (which
downcases all literal symbols) and Bigloo (which uppercases
them).
A SSAX XML parser [SSAX] relies on string->symbol
to turn tag and attribute names into
case-sensitive symbols. A test suite for the parser however needed a
way to enter such case-sensitive symbols literally. Test cases are
embedded into the SSAX code, and are always enclosed within a special
form run-test
:
(run-test (test1) (test2) ...)If a user wants to run self-tests, he defines this form as
(define-macro run-test (lambda body `(begin (display "\n-->Test\n") ,@body)))Otherwise, he defines
run-test
as(define-macro run-test (lambda body '(begin #f)))which effectively switches all the tests off. This fortuitous circumstance suggested that the
run-test
can do a bit
more than just expanding into a begin
form. The run-test
form can enable truly portable and truly concise case-sensitive
symbols.We introduce a notation '"ASymbol"
-- a quoted
string -- to stand for a case-sensitive ASymbol
. This notation is valid only within the body of a run-test
or similar form.
The notation is implemented by scanning the run-test's body and
replacing every occurrence of (quote "str")
with the
result of (string->symbol "str")
.
To make the implementation more general, we separate the task of
scanning and replacing into a macro sensitize-case
.
(define-macro sensitize-case (lambda (body) (define (re-write body) (cond ((vector? body) (list->vector (re-write (vector->list body)))) ((not (pair? body)) body) ((and (eq? 'quote (car body)) (pair? (cdr body)) (string? (cadr body))) (string->symbol (cadr body))) (else (cons (re-write (car body)) (re-write (cdr body)))))) (re-write body))) (define-macro run-test (lambda body `(sensitize-case (begin ,@body))))
It must be stressed that '"ASymbol"
behaves truly
like a Scheme symbol with its case preserved: the operation (string->symbol "ASymbol")
is performed at a macro-expand time rather than at run time. An evaluator sees
no quotes or function invocations at the place where '"ASymbol"
used to appear: the evaluator sees a genuine literal symbol. Thus '"ASymbol"
can be used in a case
statement in positions
where only literal values are allowed.
SSAX since version 5.0 implements run-test
as a
portable, R5RS-compliant syntax-rule macro.
The following expression:
(run-test (and (symbol? ''"ASymbol") (symbol? (car '('"ASymbol"))) (eq? (string->symbol "ASymbol") ''"ASymbol") (case (string->symbol "ASymbol") (('"ASymbol") #t) (else #f))) )returns
#t
on Gambit, SCM, MIT Scheme, and
Bigloo, that is, regardless of the case-sensitivity of a Scheme
system. Notice a
curious notation -- ''"ASymbol"
-- a double-quote
following double quotes.
The SSAX.scm source code [SSAX] gives many more examples, e.g.,
(run-test ; Definition of ; test:: XML-string * doctype-defn * expected-SXML-term -> void ; elided (test "<BR/>" dummy-doctype-fn '(('"BR"))) (test "<!DOCTYPE T SYSTEM 'system1' ><!-- comment -->\n<T/>" (lambda (elem-gi seed) (assert (equal? elem-gi ''"T")) (values #f '() '() seed)) '(('"T"))) )
'"CooL"
notationAt first sight, the transcription of the ''"ASymbol"
notation can only be effected by a low-level macro.
High-level (a.k.a., R5RS or syntax-rules) macros cannot express this
transformation. By design, syntax-rules prohibit manufacturing of
symbols and identifiers: otherwise, it would be impossible to guarantee
hygiene.
It is therefore astonishing to realize that a syntax-rule macro can nevertheless carry out a (less general) transcription task. Al Petrofsky had a remarkable insight: the examples in the previous section will still hold if we, rather than replacing a quoted string with a symbol, re-write expressions where the quoted string appears. Al Petrofsky wrote [Petrofsky]:
Although your implementation supports case-sensitive variable names, it appears that you don't really desire them, you just want case-sensitive literals. In r5rs, there are only three expression types in which literals occur: quote, quasiquote, and case. What you need is for the tests to be evaluated in a syntactic environment that has modified versions of these syntaxes that understand the'"ASymbol"
notation. The only constraint hygiene imposes is that you must pass in to the macro the names of the keywords that will be rebound (in other words, because run-test is really a binding construction, the identifiers being bound must be lexically visible from the expressions that use them).Below is an implementation of run-test that takes as extra arguments the identifiers to be bound to the
'"ASymbol"
-aware versions of quote, quasiquote, and case. It is called like so:(run-test '`case (and (symbol? ''"ASymbol") (symbol? (car '('"ASymbol"))) (eq? (string->symbol "ASymbol") ''"ASymbol") (case (string->symbol "ASymbol") (('"ASymbol") #t) (else #f)))) ;=> #t
The syntax-rule implementation of run-test
can be found in
[Petrofsky]. The difference between the sensitize-case
and
Petrofsky's approaches is best illustrated by peeking at the expansion
of a sample run-test
expression. Specifically we examine the
transcription of a literal expression '('"a")
, which is
a literal one-element list containing a case-sensitive symbol.
In Petrofsky's implementation,
(run-test '`case '('"a"))expands into an expression
(cons (if (string? '"a") (begin (string->symbol '"a")) (begin (cons 'quote (cons '"a" '())))) '())whereas
(sensitize-case '('"a"))expands into a literal
'(a)
.Another use case for the case-sensitive symbols was pointed out by Jens Axel Soegaard. He wrote (ref. [case-command]):
I used this construct(case command ((F !) (draw distance)) ((G) (move distance)) ((+) (begin (right (* turns angle)) (set! turns 1)))and since case useseqv?
, I experienced that none of the cases where fulfilled, wherecommand
was the symbolF
(originating from a string). In a case clause, one has to use datums, so I can not repair my code writing((string->symbol "F") !)
.
Both approaches discussed above can solve this problem. We can
indeed do a case-sensitive case
-match of symbols on any
R5RS Scheme system. We only need to: (i) encode case-sensitive symbols
as '"SymBol"
(that is, a quote followed by the string
that spells the symbol), and (ii) enclose such code in a
sensitize-case
or Al Petrofsky's run-test
macros.
For example, the following expression
(sensitize-case (let ((command (string->symbol "Go"))) (case command (('"Go" !) (display "Went!")) (('"Move") (display "Moved")) (else (display "stuck!")))))prints
Went!
, when evaluated with Gambit (a
case-sensitive Scheme system) and with case-insensitive SCM and MIT
Scheme.The article [S-exp-as-identifiers] shows how to truly concatenate 'identifiers' with syntax rules.
Ray Dillinger [Dillinger] wondered about using
"non-classical" symbols (created by symbol->string
and perhaps
containing spaces and other bad characters) as identifiers.
The sensitize-case
macro truly replaces quoted
strings with the corresponding symbols -- even in binding positions of
special forms. Therefore, the macro can be used to create utmost
bizarre bindings.
(sensitize-case (define (foo) (let (('"1" 5) ('"" 7) ('"(" 25)) (display (+ '"1" '"" '"1" '"(")))))
No matter the looks, foo
is a correct procedure. The
evaluation of (foo)
indeed prints the number 42, on
Gambit-C, Bigloo, SCM and MIT Scheme. This example looks especially
spectacular in MIT Scheme, which can print out a closure. If you enter
the above code, evaluate (foo)
to check that the code
runs, and then ask MIT Scheme to show the body of foo
, you will see:
1 ]=> (pp foo) (named-lambda (foo) (let ((1 5) ( 7) (( 25)) (display (+ 1 1 ())))Numbers, empty strings and even parentheses can be legitimate Scheme identifiers! I like
(let ((1 5)) (+ 1 ...))
the most. What a nice illustration of a difference between notation
and denotation![Lisovsky] Kirill Lisovsky: Case sensitivity of Scheme systems.
<http://pair.com/lisovsky/scheme/case-sensitivity.html>
[SXML] SXML Specification. Section 6. Case-sensitivity of SXML names.
<SXML.html>
[SSAX] Functional XML parsing framework: SAX/DOM and
SXML parsers with support for XML Namespaces and validation.
<SSAX.scm>
<http://ssax.sourceforge.net/>
[CooL-symbols] About ''"CooL": low-level macros considered useful
A message on a comp.lang.scheme newsgroup, posted on Thu, 29
Mar 2001 00:32:29 +0000 (UTC)
Message-ID: <200103290030.QAA99292@adric.cs.nps.navy.mil>
<http://groups.google.com/groups?selm=200103290030.QAA99292%40adric.cs.nps.navy.mil>
[Petrofsky] Al Petrofsky: About '`case [was About ''"CooL"]
A message on a comp.lang.scheme newsgroup, posted on 14 Apr
2001 02:44:34 -0700
Message-ID: <8766g7g6il.fsf@app.dial.idiom.com>
<http://groups.google.com/groups?selm=8766g7g6il.fsf%40app.dial.idiom.com>
[case-command] Portable case-sensitive and insensible identifiers [Was:
Symbols in DrScheme - bug?]
A message on a comp.lang.scheme newsgroup, posted on Mon, 5 Nov
2001 15:03:54 -0800
Message-ID: <7eb8ac3e.0111051503.d8cf750@posting.google.com>
<http://groups.google.com/groups?selm=7eb8ac3e.0111051503.d8cf750%40posting.google.com>
[Dillinger] Ray Dillinger: Re: Symbols
A message on a comp.lang.scheme newsgroup, posted on Fri, 04
Jan 2002 03:44:25 GMT
Message-ID: <3C352512.E29484BA@sonic.net>
<http://groups.google.com/groups?selm=3C352512.E29484BA%40sonic.net>
[S-exp-as-identifiers] Macro-expand-time environments and S-expressions as identifiers
This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML