From posting-system@google.com Sun Sep 23 20:23:34 2001 Date: Sun, 23 Sep 2001 13:23:29 -0700 Reply-To: oleg@pobox.com From: oleg@pobox.com (oleg@pobox.com) Newsgroups: comp.lang.scheme Subject: A good assert macro Message-ID: <7eb8ac3e.0109231223.71be6320@posting.google.com> Status: OR This article describes design and implementation of a _portable_ assert macro with improved reporting capabilities. The message printed upon assertion failure shows, among other things, the bindings for interesting variables used within the asserted conditions. A programmer can also specify any strings or other expressions to print at that point. Entering a REPL or a debugger after the failure might seem like the most informative approach. However, it is too heavyweight, precludes automated (regression) testing, and is platform-specific. The assert macro discussed in this article turns out informative, lightweight, and portable. I tested it on four systems: Gambit, Bigloo, SCM and MIT Scheme. The fact the macro works on these different systems gives confidence it will work on many more platforms. The assert macro must be implemented via a low-level macro system. The implementation section shows the reason for it. Introduction Assertion checking in various systems A better assert macro Examples Implementation Availability The goal of an assert macro is to check that the run-time state of a program at a certain point agrees with our expectations. We can expect, for example, that our factorial function will be called with a non-negative integer argument. If our implementation is correct, we can expect that the result will be a positive integer. If the expectation check fails, the user or a programmer will be alerted, somehow. Often there is a way to switch the assertion checking on or off, via a compiler flag, pragma or a similar facility. In Bigloo, the assertion checking is off by default. I could never understand such policies. It's like wearing a seatbelt only when you're learning to drive -- but not afterwards. I'm tempted to compare assert macro with a type system, especially with a static type system used with non-exhaustive pattern matching -- but I don't. That's not the topic of this article. The entry and exit points of a function in particular invite placing of assertions. An assertion at entry will make sure the function is called with the arguments it can really handle. An assertion at the exit point is to increase our confidence that the function has been coded correctly. Assertions at these points essentially check pre- and post-conditions of a function. This approach relates to a design-by-contract, another can of worms we'd like to keep shut for now. Assertion checking in various systems The basic assert form: assert condition Our expectation of a program state is encoded as a 'condition' -- an expression that is expected to yield a non-#f result. The assert form will evaluate the condition and check its result. If the condition holds, the assert form acts as an empty expression. If the condition evaluates to #f, an error is signaled. The error message indicates the failed condition, and perhaps the location of the assert statement in the source code. The message is printed on the standard error. All four Scheme systems mentioned above have a notion of the standard error stream. This notion is different in all four systems. Our assertion macro doesn't mind this, and works anyway. Slightly more elaborate form: assert condition condition ... if several conditions are present, they are implicitly ANDed. The assert form in Bigloo is more advanced and helpful. The error message shows some bindings at the point of failure: assert (var ...) s-expression "If the expression EXP does not evaluate to #t, an error is signaled and the interpreter is launched in an environment where VAR... are bound to their current values." A better assert macro syntax: assert ?expr ?expr ... [report: ?r-exp ...] If (and ?expr ?expr ...) evaluates to anything but #f, the result is the value of that expression. If (and ?expr ?expr ...) evaluates to #f, an error is reported. The error message will show the failed expressions, as well as the values of selected variables (or expressions, in general). The user may explicitly specify the expressions whose values are to be printed upon assertion failure -- as ?r-exp that follow the identifier 'report:'. The identifier report: is an ordinary symbol -- whose name happens to end in a colon. Typically, ?r-exp is either a variable or a string constant; in general, it's an arbitrary expression. If the user specified no ?r-exp, the values of interesting variables that are referenced in ?expr will be printed upon the assertion failure. Examples (let ((n (begin (display "Enter a positive integer:") (newline) (read)))) (assert (integer? n) (> n 0) report: "Domain error" n "You should've entered a positive value" #\!) (fact n)) if you run this example and enter -1 at the prompt, you'll see failed assertion: ((integer? n) (> n 0)) Domain error n: -1 You should've entered a positive value! *** ERROR IN (stdin)@3.5 -- assertion failure We don't have to write a poem to the user, however. A simpler assertion (let ((n (begin (display "Enter a positive integer:") (newline) (read)))) (assert (integer? n) (> n 0)) (fact n)) will do just as well. It will print, in the same circumstances: failed assertion: ((integer? n) (> n 0)) bindings n: -1 *** ERROR IN (stdin)@10.5 -- assertion failure Assert is especially useful in regression tests. Numerous built-in regression tests in the SSAX XML parser code all have the form: (let ((expected expected-result) (computed (computation))) (assert (equal? expected computed))) These tests proved to be highly useful in the development of SSAX. Implementation. The assert macro must be implemented via low-level (aka Lisp-style) macros. High-level (aka R5RS) macros cannot implement the algorithm to determine the set of interesting variables within an expression. Interesting variables within an expression are the variables that are used as arguments to some functions. For example, in expression (if (zero? x) 0 (/ y x)), variables 'x' and 'y' are interesting while 'zero?' and '/' are not. The values of the latter doesn't make much sense to print. To determine the set of interesting variables we need to check if an object in a form is _an_ identifier. R5RS macros can't do that. In this respect, the difference between the high- and low-level macros is akin to the difference between 'case' and 'cond'. 'case' can branch on _some_ symbols or numbers or characters: (case x ((sym) on-symbol-sym) ((1 2) on-numbers-1-2) (else ...)) In contrast, cond can branch on _any_/_all_ symbols, numbers, etc. (cond ((symbol? x) on-any-symbol) ((number? x) on-any-number) (else ...)) Determining the set of interesting variables runs into the obvious stumbling block: how to process special forms, especially user-defined special forms? The form "(x y (+ z 1))" looks like an application, and identifiers 'y' and 'z' seem to denote interesting variables. However, if 'x' is a special form, the same identifiers 'y' and 'z' don't have to denote any variable at all. A special form in addition may introduce new variables (whose names are not apparent from the special form's invocation). The assert macro itself is an example of a troublesome special form: in "(assert condition report: x)", symbol 'report:' may look like a variable on some Scheme system, but it is not. One solution is to be clever by two thirds. Some Scheme systems have a form 'macroexpand' which does what it says. Before searching the assert conditions for free variables, we macroexpand them first. Only primitive special forms will be left, whose set is small and known. We use this approach where available. We can also do nothing about user-defined special forms. The assert macro lets the user explicitly specify the bindings to print. Our analysis of the asserted expressions doesn't have to be precise -- if we miss variables to print or introduce spurious ones, the user can easily correct the error and specify what he wants to print explicitly. Availability http://pobox.com/~oleg/ftp/Scheme/myenv.scm and similarly, myenv-scm.scm, myenv-mit.scm, and myenv-bigloo.scm in the same directory. The validation code is really the same, for all systems http://pobox.com/~oleg/ftp/Scheme/vmyenv.scm