From posting-system@google.com Sun Sep 23 20:23:34 2001
Date: Sun, 23 Sep 2001 13:23:29 -0700
Reply-To: oleg@pobox.com
From: oleg@pobox.com (oleg@pobox.com)
Newsgroups: comp.lang.scheme
Subject: A good assert macro
Message-ID: <7eb8ac3e.0109231223.71be6320@posting.google.com>
Status: OR

This article describes design and implementation of a _portable_
assert macro with improved reporting capabilities. The message printed
upon assertion failure shows, among other things, the bindings for
interesting variables used within the asserted conditions. A
programmer can also specify any strings or other expressions to print
at that point. Entering a REPL or a debugger after the failure might
seem like the most informative approach. However, it is too
heavyweight, precludes automated (regression) testing, and is
platform-specific. The assert macro discussed in this article turns
out informative, lightweight, and portable. I tested it on four
systems: Gambit, Bigloo, SCM and MIT Scheme. The fact the macro works
on these different systems gives confidence it will work on many more
platforms. The assert macro must be implemented via a low-level macro
system. The implementation section shows the reason for it.

Introduction
Assertion checking in various systems
A better assert macro
Examples
Implementation
Availability


The goal of an assert macro is to check that the run-time state of a
program at a certain point agrees with our expectations. We can
expect, for example, that our factorial function will be called with a
non-negative integer argument. If our implementation is correct, we
can expect that the result will be a positive integer.

If the expectation check fails, the user or a programmer will be
alerted, somehow. Often there is a way to switch the assertion
checking on or off, via a compiler flag, pragma or a similar
facility. In Bigloo, the assertion checking is off by default. I could
never understand such policies. It's like wearing a seatbelt only when
you're learning to drive -- but not afterwards.

I'm tempted to compare assert macro with a type system, especially
with a static type system used with non-exhaustive pattern matching --
but I don't. That's not the topic of this article.

The entry and exit points of a function in particular invite placing
of assertions. An assertion at entry will make sure the function is
called with the arguments it can really handle. An assertion at the
exit point is to increase our confidence that the function has been
coded correctly.  Assertions at these points essentially check pre-
and post-conditions of a function. This approach relates to a
design-by-contract, another can of worms we'd like to keep shut for
now.


Assertion checking in various systems
The basic assert form:
       assert condition
Our expectation of a program state is encoded as a 'condition' -- an
expression that is expected to yield a non-#f result. The assert form
will evaluate the condition and check its result. If the condition
holds, the assert form acts as an empty expression. If the condition
evaluates to #f, an error is signaled. The error message indicates
the failed condition, and perhaps the location of the assert statement
in the source code. The message is printed on the standard error. All
four Scheme systems mentioned above have a notion of the standard
error stream. This notion is different in all four systems. Our
assertion macro doesn't mind this, and works anyway.

Slightly more elaborate form:
     assert condition condition ...
if several conditions are present, they are implicitly ANDed.

The assert form in Bigloo is more advanced and helpful. The error
message shows some bindings at the point of failure:
   assert (var ...) s-expression
"If the expression EXP does not evaluate to #t, an error is signaled
and the interpreter is launched in an environment where VAR... are
bound to their current values."


A better assert macro

	syntax: assert ?expr ?expr ... [report: ?r-exp ...]

If (and ?expr ?expr ...) evaluates to anything but #f, the result is
the value of that expression.

If (and ?expr ?expr ...) evaluates to #f, an error is reported.  The
error message will show the failed expressions, as well as the values
of selected variables (or expressions, in general).  The user may
explicitly specify the expressions whose values are to be printed upon
assertion failure -- as ?r-exp that follow the identifier 'report:'.
The identifier report: is an ordinary symbol -- whose name happens to
end in a colon.

Typically, ?r-exp is either a variable or a string constant; in
general, it's an arbitrary expression.  If the user specified no
?r-exp, the values of interesting variables that are referenced in
?expr will be printed upon the assertion failure.

Examples

  (let ((n (begin (display "Enter a positive integer:")
		  (newline) (read))))
    (assert (integer? n) (> n 0)
            report: "Domain error" n
		    "You should've entered a positive value" #\!)
    (fact n))

if you run this example and enter -1 at the prompt, you'll see

failed assertion: ((integer? n) (> n 0))
Domain error
n: -1
You should've entered a positive value!
*** ERROR IN (stdin)@3.5 -- assertion failure

We don't have to write a poem to the user, however. A simpler assertion

  (let ((n (begin (display "Enter a positive integer:")
		  (newline) (read))))
    (assert (integer? n) (> n 0))
    (fact n))

will do just as well. It will print, in the same circumstances:

failed assertion: ((integer? n) (> n 0))
bindings
n: -1
*** ERROR IN (stdin)@10.5 -- assertion failure

Assert is especially useful in regression tests. Numerous built-in
regression tests in the SSAX XML parser code all have the form:
       (let ((expected expected-result)
	     (computed (computation)))
	  (assert (equal? expected computed)))
These tests proved to be highly useful in the development of SSAX.


Implementation.

The assert macro must be implemented via low-level (aka Lisp-style)
macros. High-level (aka R5RS) macros cannot implement the algorithm to
determine the set of interesting variables within an
expression. Interesting variables within an expression are the
variables that are used as arguments to some functions. For example,
in expression (if (zero? x) 0 (/ y x)), variables 'x' and 'y' are
interesting while 'zero?' and '/' are not. The values of the latter
doesn't make much sense to print.

To determine the set of interesting variables we need to check if an
object in a form is _an_ identifier. R5RS macros can't do that.  In
this respect, the difference between the high- and low-level macros is
akin to the difference between 'case' and 'cond'. 'case' can branch on
_some_ symbols or numbers or characters:

	(case x
	  ((sym) on-symbol-sym)
	  ((1 2) on-numbers-1-2)
	  (else ...))

In contrast, cond can branch on _any_/_all_ symbols, numbers, etc.
   (cond
     ((symbol? x) on-any-symbol)
     ((number? x) on-any-number)
     (else ...))

Determining the set of interesting variables runs into the obvious
stumbling block: how to process special forms, especially user-defined
special forms? The form "(x y (+ z 1))" looks like an application, and
identifiers 'y' and 'z' seem to denote interesting variables. However,
if 'x' is a special form, the same identifiers 'y' and 'z' don't have
to denote any variable at all. A special form in addition may
introduce new variables (whose names are not apparent from the special
form's invocation). The assert macro itself is an example of a
troublesome special form: in "(assert condition report: x)", symbol
'report:' may look like a variable on some Scheme system, but it is
not.

One solution is to be clever by two thirds. Some Scheme systems have a
form 'macroexpand' which does what it says. Before searching the
assert conditions for free variables, we macroexpand them first. Only
primitive special forms will be left, whose set is small and known. We
use this approach where available. We can also do nothing about
user-defined special forms. The assert macro lets the user explicitly
specify the bindings to print.  Our analysis of the asserted
expressions doesn't have to be precise -- if we miss variables to
print or introduce spurious ones, the user can easily correct the
error and specify what he wants to print explicitly.

Availability
	http://pobox.com/~oleg/ftp/Scheme/myenv.scm
and similarly, myenv-scm.scm, myenv-mit.scm, and myenv-bigloo.scm
in the same directory.
The validation code is really the same, for all systems
	http://pobox.com/~oleg/ftp/Scheme/vmyenv.scm