Portable case-sensitive symbols, and the meaning of identifiers

The need for portable case-sensitive symbols arose in several real projects. The most natural answer is a conservative lexical extension: a portable notation for case-sensitive symbols. The notation fully preserves the lexical structure of R5RS Scheme and can be used on any R5RS system. Denoted case-sensitive symbols are transcribed into genuine case-sensitive symbols by a portable macro. We consider a low-level-macro and a slightly less general syntax-rule implementations. The discussion of the notation and its transcription on the comp.lang.scheme newsgroup has revealed surprisingly deep insights: into portable lexical extensions of Scheme, into treating code as data, into the capabilities of syntax-rules macros, and into the very meaning of identifiers.

This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com.

  1. Introduction
  2. A '"CooL" notation and its transcription
    1. Specification
    2. Implementation by a low-level macro
    3. Implementation by a syntax-rule macro
    4. Examples
  3. A syntax-rule transcriber for the '"CooL"notation
  4. Another case for case-sensitive symbols
  5. Unusual cases and the meaning for identifiers
  6. References

  

Introduction

According to R5RS, "Upper and lower case forms of a letter are never distinguished except within character and string constants." There are legitimate applications however that greatly benefit from case-sensitive symbols. One such application is an S-expression-based form of XML [SXML]. PLT XML collections, SXML and all other similar projects map tag names to identifiers. Such a representation is highly appropriate as tag names are not usually mutable but heavily used in identity comparisons. The need for case-sensitive identifiers in describing semi-structured data as S-expressions was recognized in DSSSL.

A great number of Scheme systems already offer a case-sensitive reader, which often has to be activated through a compiler option or pragma. A web page [Scheme-case-sensitivity] discusses case sensitivity of various Scheme systems in detail.


  

A '"CooL" notation and its transcription

This section describes the notation and its implementation that were first presented in an article [CooL-symbols].

According to R5RS, symbols created by string->symbols, e.g.,

     (string->symbol "ASymbol")
retain their case, while symbols read or entered literally
     (with-input-from-string "ASymbol" read)
     'ASymbol
may get their case changed on many Scheme systems. Therefore, the following expression
     (eq? (string->symbol "ASymbol") 'ASymbol)
evaluates to #f on many Scheme systems, e.g., on SCM (which downcases all literal symbols) and Bigloo (which uppercases them).

A SSAX XML parser [SSAX] relies on string->symbol to turn tag and attribute names into case-sensitive symbols. A test suite for the parser however needed a way to enter such case-sensitive symbols literally. Test cases are embedded into the SSAX code, and are always enclosed within a special form run-test:

     (run-test (test1) (test2) ...)
If a user wants to run self-tests, he defines this form as
     (define-macro run-test (lambda body `(begin (display "\n-->Test\n") ,@body)))
Otherwise, he defines run-test as
     (define-macro run-test (lambda body '(begin #f)))
which effectively switches all the tests off. This fortuitous circumstance suggested that the run-test can do a bit more than just expanding into a begin form. The run-test form can enable truly portable and truly concise case-sensitive symbols.
  

Specification

We introduce a notation '"ASymbol" -- a quoted string -- to stand for a case-sensitive  ASymbol. This notation is valid only within the body of a run-test or similar form.


  

Implementation by a low-level macro

The notation is implemented by scanning the run-test's body and replacing every occurrence of (quote "str") with the result of (string->symbol "str"). To make the implementation more general, we separate the task of scanning and replacing into a macro sensitize-case.

     (define-macro sensitize-case
       (lambda (body)
         (define (re-write body)
           (cond
            ((vector? body)
             (list->vector (re-write (vector->list body))))
            ((not (pair? body)) body)
            ((and (eq? 'quote (car body)) (pair? (cdr body))
                  (string? (cadr body)))
             (string->symbol (cadr body)))
            (else (cons (re-write (car body)) (re-write (cdr body))))))
         (re-write body)))
     
     (define-macro run-test
       (lambda body
         `(sensitize-case (begin ,@body))))

It must be stressed that '"ASymbol" behaves truly like a Scheme symbol with its case preserved: the operation (string->symbol "ASymbol") is performed at a macro-expand time rather than at run time. An evaluator sees no quotes or function invocations at the place where '"ASymbol" used to appear: the evaluator sees a genuine literal symbol. Thus '"ASymbol" can be used in a case statement in positions where only literal values are allowed.


  

Implementation by a syntax-rule macro

SSAX since version 5.0 implements run-test as a portable, R5RS-compliant syntax-rule macro.


  

Examples

The following expression:

     (run-test
      (and
       (symbol? ''"ASymbol")
       (symbol? (car '('"ASymbol")))
       (eq? (string->symbol "ASymbol") ''"ASymbol")
       (case (string->symbol "ASymbol")
         (('"ASymbol") #t) (else #f)))
     )
returns #t on Gambit, SCM, MIT Scheme, and Bigloo, that is, regardless of the case-sensitivity of a Scheme system. Notice a curious notation -- ''"ASymbol" -- a double-quote following double quotes.

The SSAX.scm source code [SSAX] gives many more examples, e.g.,

     (run-test
       ; Definition of
       ; test:: XML-string * doctype-defn * expected-SXML-term -> void
       ; elided
      
       (test "<BR/>" dummy-doctype-fn '(('"BR")))
      
       (test "<!DOCTYPE T SYSTEM 'system1' ><!-- comment -->\n<T/>"
             (lambda (elem-gi seed) (assert (equal? elem-gi ''"T"))
                     (values #f '() '() seed))
             '(('"T")))
     )

  

A syntax-rule transcriber for the '"CooL"notation

At first sight, the transcription of the ''"ASymbol" notation can only be effected by a low-level macro. High-level (a.k.a., R5RS or syntax-rules) macros cannot express this transformation. By design, syntax-rules prohibit manufacturing of symbols and identifiers: otherwise, it would be impossible to guarantee hygiene.

It is therefore astonishing to realize that a syntax-rule macro can nevertheless carry out a (less general) transcription task. Al Petrofsky had a remarkable insight: the examples in the previous section will still hold if we, rather than replacing a quoted string with a symbol, re-write expressions where the quoted string appears. Al Petrofsky wrote [Petrofsky]:

Although your implementation supports case-sensitive variable names, it appears that you don't really desire them, you just want case-sensitive literals. In r5rs, there are only three expression types in which literals occur: quote, quasiquote, and case. What you need is for the tests to be evaluated in a syntactic environment that has modified versions of these syntaxes that understand the '"ASymbol" notation. The only constraint hygiene imposes is that you must pass in to the macro the names of the keywords that will be rebound (in other words, because run-test is really a binding construction, the identifiers being bound must be lexically visible from the expressions that use them).

Below is an implementation of run-test that takes as extra arguments the identifiers to be bound to the '"ASymbol"-aware versions of quote, quasiquote, and case. It is called like so:

     (run-test '`case
       (and
        (symbol? ''"ASymbol")
        (symbol? (car '('"ASymbol")))
        (eq? (string->symbol "ASymbol") ''"ASymbol")
        (case (string->symbol "ASymbol")
          (('"ASymbol") #t) (else #f))))
     ;=> #t 

The syntax-rule implementation of run-test can be found in [Petrofsky]. The difference between the sensitize-case and Petrofsky's approaches is best illustrated by peeking at the expansion of a sample run-test expression. Specifically we examine the transcription of a literal expression '('"a"), which is a literal one-element list containing a case-sensitive symbol.

In Petrofsky's implementation,

     (run-test '`case
       '('"a"))
expands into an expression
     (cons (if (string? '"a")
               (begin (string->symbol '"a"))
               (begin (cons 'quote (cons '"a" '()))))
           '())
whereas
     (sensitize-case
       '('"a"))
expands into a literal '(a).
  

Another case for case-sensitive symbols

Another use case for the case-sensitive symbols was pointed out by Jens Axel Soegaard. He wrote (ref. [case-command]):

I used this construct
     (case command
       ((F !)  (draw distance))
       ((G)    (move distance))
       ((+)    (begin (right (* turns angle)) (set! turns 1)))
and since case uses eqv?, I experienced that none of the cases where fulfilled, where command was the symbol F (originating from a string). In a case clause, one has to use datums, so I can not repair my code writing ((string->symbol "F") !).

Both approaches discussed above can solve this problem. We can indeed do a case-sensitive case-match of symbols on any R5RS Scheme system. We only need to: (i) encode case-sensitive symbols as '"SymBol" (that is, a quote followed by the string that spells the symbol), and (ii) enclose such code in a sensitize-case or Al Petrofsky's run-test macros.

For example, the following expression

     (sensitize-case
      (let ((command (string->symbol "Go")))
        (case command
          (('"Go" !) (display "Went!"))
          (('"Move") (display "Moved"))
          (else (display "stuck!")))))
prints Went!, when evaluated with Gambit (a case-sensitive Scheme system) and with case-insensitive SCM and MIT Scheme.
  

Unusual cases and the meaning for identifiers

The article [S-exp-as-identifiers] shows how to truly concatenate 'identifiers' with syntax rules.

Ray Dillinger [Dillinger] wondered about using "non-classical" symbols (created by symbol->string and perhaps containing spaces and other bad characters) as identifiers.

The sensitize-case macro truly replaces quoted strings with the corresponding symbols -- even in binding positions of special forms. Therefore, the macro can be used to create utmost bizarre bindings.

     (sensitize-case
      (define (foo)
        (let (('"1" 5) ('"" 7) ('"(" 25))
          (display (+ '"1" '"" '"1" '"(")))))

No matter the looks, foo is a correct procedure. The evaluation of (foo) indeed prints the number 42, on Gambit-C, Bigloo, SCM and MIT Scheme. This example looks especially spectacular in MIT Scheme, which can print out a closure. If you enter the above code, evaluate (foo) to check that the code runs, and then ask MIT Scheme to show the body of foo, you will see:

     1 ]=> (pp foo)
     
     (named-lambda (foo)
       (let ((1 5) ( 7) (( 25))
         (display (+ 1  1 ())))
Numbers, empty strings and even parentheses can be legitimate Scheme identifiers! I like (let ((1 5)) (+ 1 ...)) the most. What a nice illustration of a difference between notation and denotation!
  

References

[Lisovsky] Kirill Lisovsky: Case sensitivity of Scheme systems.
<http://pair.com/lisovsky/scheme/case-sensitivity.html>

[SXML] SXML Specification. Section 6. Case-sensitivity of SXML names.
<SXML.html>

[SSAX] Functional XML parsing framework: SAX/DOM and SXML parsers with support for XML Namespaces and validation.
<SSAX.scm>
<http://ssax.sourceforge.net/>

[CooL-symbols] About ''"CooL": low-level macros considered useful
A message on a comp.lang.scheme newsgroup, posted on Thu, 29 Mar 2001 00:32:29 +0000 (UTC)
Message-ID: <200103290030.QAA99292@adric.cs.nps.navy.mil>
<http://groups.google.com/groups?selm=200103290030.QAA99292%40adric.cs.nps.navy.mil>

[Petrofsky] Al Petrofsky: About '`case [was About ''"CooL"]
A message on a comp.lang.scheme newsgroup, posted on 14 Apr 2001 02:44:34 -0700
Message-ID: <8766g7g6il.fsf@app.dial.idiom.com>
<http://groups.google.com/groups?selm=8766g7g6il.fsf%40app.dial.idiom.com>

[case-command] Portable case-sensitive and insensible identifiers [Was: Symbols in DrScheme - bug?]
A message on a comp.lang.scheme newsgroup, posted on Mon, 5 Nov 2001 15:03:54 -0800
Message-ID: <7eb8ac3e.0111051503.d8cf750@posting.google.com>
<http://groups.google.com/groups?selm=7eb8ac3e.0111051503.d8cf750%40posting.google.com>

[Dillinger] Ray Dillinger: Re: Symbols
A message on a comp.lang.scheme newsgroup, posted on Fri, 04 Jan 2002 03:44:25 GMT
Message-ID: <3C352512.E29484BA@sonic.net>
<http://groups.google.com/groups?selm=3C352512.E29484BA%40sonic.net>

[S-exp-as-identifiers] Macro-expand-time environments and S-expressions as identifiers



Last updated April 2, 2013

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML