From www@deja.com Sun Sep 10 15:25:47 2000 Message-ID: <8ph1q8$ekq$1@nnrp1.deja.com> From: oleg@pobox.com Subject: Sendmail as a Turing machine Date: Sun, 10 Sep 2000 22:26:33 GMT Reply-To: oleg@pobox.com Keywords: sendmail, Turing machine, rewriting system, string rewriting Newsgroups: comp.lang.functional,comp.mail.sendmail Summary: Rewriting rules to make sendmail to act as a Turing machine X-Article-Creation-Date: Sun Sep 10 22:26:33 2000 GMT Status: OR Many people have tried to get sendmail to do many wonderful things, preferably on behalf of the root. This article is not about cracking and other unseemly activities. We will show however that sendmail as it stands is indeed capable of executing every possible computation. Turing machine is a finite control unit operating on a modifiable tape that has the beginning but no end. Sendmail's (address) re-writing engine works on a potentially unbounded string. The engine is controlled by a finite set of rules. The content of the string as well the previously fired rule (the state) determine the rule to be executed next. A rule may modify the string-tape. It appears there is the perfect match between sendmail and the Turing machine. This article will show how to get sendmail to operate as a Turing machine, and give examples of two Turing programs: addition of natural numbers and a decision function (evenp/oddp). We will model Turing machine's tape with a character string. Just as the tape is made of cells, the string will be made of 'tokens' separated by a character '.' (a single dot). A token is a sequence of one or more alphanumeric characters; characters @ % ! [ ] may not appear in a token. A character '@' has a special meaning: it denotes a blank tape cell. The infinite tape is thus modeled by a finite string of tokens followed by an _assumed_ infinite sequence of blank cells. We will never write out this tail of blanks explicitly; rather we will assume that there is always a blank cell after the right end of every finite string under re-writing. A character '%' also has a special meaning: it represents Turing machine's 'read/write head'. That is, % placed before a token marks the token as current -- the one to be analyzed by rules and perhaps overwritten. For example, a string @.a.bc.%def stands for a Turing machine configuration with a tape containing a blank, three symbols, and an infinite sequence of blanks. The read/write head is over the symbol 'def'. A string I.I.%@ represents a tape with two non-blank symbols ('I') at the very beginning; the head is pointing to the first blank after them. To run the Turing machine, we first need to locate the sendmail program and its configuration file. The application itself can usually be found in /usr/sbin/ or /usr/lib/ (a 'locate' command may help). The configuration file for sendmail v8.8+ is located in /etc/mail/sendmail.cf. We will use version 8.9.3 as it allows named rulesets, which are rather convenient. Besides, if your site is running sendmail v8.8 or below, your system administrator should be sued for negligence and reckless disregard of system's security. Sendmail configuration file is the place for, among other things, rewriting rules -- the finite control of the Turing machine. It is quite a hassle to write a correct configuration file from scratch. Therefore, we will use an existing file, and add our own rules to it. Of course all the changes will be made to a copy of the configuration file rather than to /etc/mail/sendmail.cf itself. Thus we will start by copying /etc/mail/sendmail.cf to a working directory, e.g., /tmp. We will open /tmp/sendmail.cf in a text editor, scroll to the very end and add the following lines: # Move the head to the left Smove_left R$* @ . % \t $@ $1 % @ R$* $- . % @ \t $@ $1 % $2 R$* $- . % $* \t $@ $1 % $2 . $3 R$* \t $#error underflow: $1 # Move the head to the right Smove_right R$* % $- . $+ \t $@ $1 $2 . % $3 R$* % $- \t $@ $1 $2 . %@ \t there is always blank at the right R$* % \t $@ $1 @ . %@ \t edge of the tape R$* \t $#error no-head: $1 It must be stressed that sendmail -- along with make, syslogd and a few other UNIX programs -- distinguishes between spaces and tabs. Although a tab looks like a sequence of blanks, its meaning is very different. Apparently UNIX founding fathers had terminals that somehow graphically displayed tabs. In the rules above, spaces are written as they are, but tabs are denoted as \t. You have to press a tab key wherever you see this two-character combination. All the gory details about the rules are explained in a manual http://www.sendmail.org/~ca/email/doc8.9/op-sh-5.html#sh-5 Just for reference, sendmail rewriting engine applies a ruleset to a string. A line Sxxx begins a ruleset named 'xxx', where xxx is a number or an identifier. A ruleset consists of rules, lines of the form R lhs \t rhs \t comments The lhs is a pattern made of alphanumeric characters (which must match the input string literally) and meta-symbols. The characters and meta-symbols may be separated by spaces. We will use the following meta-symbols in the lhs: $* matches zero or more tokens of the input string $+ matches one or more tokens $- matches exactly one token When a ruleset is applied to a string, the rewriting engine scans the rules starting with the first one, trying to match their lhs patterns against the input string. The first rule that matches becomes the active one. The engine then replaces the input string with the rhs of the active rule, after some processing. $n combinations in the rhs (where n is a digit 1..9) are replaced with whatever the n-th meta-symbol ($+, $-, $*) of the lhs has matched. The $>name syntax causes the remainder of the line to be substituted as usual and then passed as the argument to ruleset 'name'. The final value of ruleset 'name' then becomes the substitution for the active rule. After the input string is replaced with the rhs of the active rule, the active rule is retried against the new current string. If the rule does not match, the next rule is tried. If the ruleset has no more rules, the current string becomes the result of the ruleset application. Symbols $@ or $: placed at the beginning of the rhs change the default behavior of retrying the active rule until it fails. A $@ prefix causes the ruleset to return with the remainder of the rhs as the value. A $: prefix tells the engine not to retry the active rule: after the rhs replacement is performed, the engine should continue processing of the ruleset starting with the next rule after the active one. Prefix $: therefore can be used to avoid continued application of a rule. The rules that we have added to /tmp/sendmail.cf above move the "head of the tape" left or right. To test the rules, we enter at the command prompt /usr/sbin/sendmail -C /tmp/sendmail.cf -bt -v and then enter the following test at the sendmail's prompt: move_left a.b.%c That is, we are asking sendmail to apply the ruleset 'move_left' to a string "a.b.%c". Sendmail prints that the ruleset returns a string "a . % b . c" Indeed, the read-write head denoted by the symbol % moved left. Note that the space character is universally ignored. Although sendmail accepts named rules, it prints them out in a trace as numbers (e.g., ruleset 183). Other tests to try (the expected result follows the => sign) move_left % => $# error underflow : % move_left @.% => % @ move_left a.@.% => a . % @ move_right % => @ . % @ move_right %b.c => b . % c move_right @.%b => @ . b . % @ move_left,move_left,move_right,move_right,move_right,move_right,move_right a.b.%c.d => a . b . c . d . @ . % @ You can download the sendmail.cf file with the rules already entered from http://pobox.com/~oleg/ftp/packages/sendmail.cf The comments at the end of the file point out to more tests to run. Now we can run real Turing programs. Program 1: Addition of natural numbers. Two numbers to add are encoded in unary, in the following format @ n1 @ n2 %@ where n1 is a sequence of 0 or more tokens 'a' corresponding to the first number to add (in unary notation), n2 stands for the second number. The numbers are terminated with a blank cell; the head is at the blank cell that terminates the second number. The algorithm is simple: move the head left until it hits the blank separating the two numbers, replace this blank with token 'a', move the head right to the first blank, replace the previous token with the blank, and stop. Here's the corresponding program Sadd R$+. % \t $: $1.%@ R$+. %@ $* \t $: $>move_left $1.%@$2 R$+. %a $* \t $>move_left $1.%a$2 \t until the first blank R$+. %@ $* \t $: $>move_right $1.%a$2 \t replace blank with 'a' R$+. %a $* \t $>move_right $1.%a$2 \t until the first blank R$+. %@ $* \t $: $>move_left $1.%@$2 R$+. %$- $* \t $@ $1.%@$3 \t overwrite $2 with a blank If we add these rules to the /tmp/sendmail.cf file above, run /usr/sbin/sendmail -C /tmp/sendmail.cf -bt -v and enter add @.a.a.a.@.a.a.%@ at the sendmail's prompt, we will see (after a long trace of applied rulesets) @ . a . a . a . a . a . % @ That is, 3 + 2 is 5. One of the most expensive ways to add two numbers does seem to work after all. Other tests to try: add @.a.a.@.%@ => @ . a . a . % @ (that is, 2 + 0 = 2) add @.@.a.a.%@ => @ . a . a . % @ (that is, 0 + 2 = 2) add @.@.%@ => @ . % @ (that is, 0 + 0 = 0) Program 2: A decision machine. Decide if the number of tokens on the tape between two delimiting blanks is even or odd. The program is self-explanatory. Sevenp R$* %@ $* \t $: $>move_left $1%@$2 R$* %@ $* \t $: $>move_right $1%@$2 R$* %@ $* \t $@ $>move_right $1 %Yes $2 \t set the answer, halt R$* %$- $* \t $@ $>oddp $1%@$3 \t overwrite $2 with a blank Soddp R$* %@ $* \t $: $>move_left $1%@$2 R$* %@ $* \t $: $>move_right $1%@$2 R$* %@ $* \t $@ $>move_right $1 %No $2 \t set the answer, halt R$* %$- $* \t $@ $>evenp $1%@$3 \t overwrite $2 with a blank In the rules above, a rhs prefix "$: $>ruleset" effectively means a jump (switch) to the ruleset named 'ruleset'. $@ rhs prefix means 'halt'. Tests: evenp @.a.a.a.%@ => @ . No . % @ (three tokens: an odd number) oddp @.www.sendmail.org.%@ => @ . Yes . % @ evenp @.%@ => @ . Yes . % @ (zero is an even number) oddp @.%@ => @ . No . % @ evenp @.prep.ai.mit.edu.%@ => @ . Yes . % @ (four is an even number) Again, you can download the sendmail.cf file with the rules already entered from http://pobox.com/~oleg/ftp/packages/sendmail.cf A practical aspect of this article, if required, can be formulated as follows: one may wish for a tool to check sendmail.cf file. For example, to see if a given set of re-writing rules yields a result for _any_ input string. Alas, as the discussion in this article shows, such a general tool is impossible.