From oleg-at-pobox.com Mon Aug 28 12:30:32 2000
Date: Mon, 28 Aug 2000 12:30:30 -0700 (PDT)
Message-ID: <200008281930.MAA27465@adric.cs.nps.navy.mil>
To: editors-at-ddj.com
Subject: Konrad Zuse deserves even more credit
Status: OR

Dear DDJ,

	I'm writing regarding a remarkable article "Simulating Konrad
Zuse's Computers" (DDJ, September 2000). First of all, I'd like to
thank Dr. Rojas for his article and for his work to make Konrad Zuse's
extraordinary achievements known.

	I have a comment about the following statement at the end of
the article: "From a practical perspective, and in a way the Z3 was
really programmed, it was not equivalent to the modern computers". It
appears Z3 was farther ahead that we might've realized.

	The above statement seems to refer to the way Z3 deals with
computations that include branches. As the article well explains, Z3
did not have any branching instructions. An algorithm with conditional
jumps has to be transformed for Z3. We have to identify 'traces' --
sections of code without branches. We assemble the traces into Z3 code
and write them one after another on Z3's tape glued in a loop. Each
trace starts with computing a binary number 't' based on the current
state of the algorithm stored in Z3 memory. Operations a = b op c in
the algorithm are written in Z3 code as a = a*(1-t) + t*(b op c). All
traces were repeatedly executed, but only the one with the predicate 't'
equal to 1 altered the memory.

	Modern processors do have branch instructions. These
instructions however cause a lot of pain to designers of super-scalar
CPUs as taking a branch disrupts the pipeline of already fetched and
analyzed instructions. Therefore, super-scalar CPUs and the
corresponding compiler systems try to "eliminate" branches as much as
possible: either by prediction and speculative execution, or -- in
recent years -- by predication.

	The following is a quotation from an article "Technology
Outlook: Introduction to Predicated Execution" by Wen-mei Hwu, IEEE
Computer, Vol. 31, No. 1: January 1998, pp. 49-50. It is interesting
to compare this quote with the above description of Z3's handling of
conditional jumps.

"The story of Merced, Intel's first processor based on its
next-generation 64-bit architecture, will continue to unfold in
1998. Intel expects this product of its collaboration with
Hewlett-Packard to reach volume production in 1999. To date, however,
the two companies have released few details about Intel Architecture
64 (IA-64). One significant change they did admit to at the October
1997 Microprocessor Forum was the switch to full predicated execution,
a technique that no other commercial general-purpose processor
employs.

Predicated execution is a mechanism that supports the conditional execution
of individual operations [1]. Compared to a conventional instruction set, an
operation in a predicated-execution architecture has an additional input
operand -- a predicate --  that can assume a value of true or false.
During runtime, a predicated-execution processor fetches operations
regardless of their predicate value. The processor executes operations with
true predicates normally; it nullifies operations with false predicates and
prevents them from modifying the processor state.
 
Using predication inherently changes the representation of a program's
control flow. A conventional instruction set requires all control flow
to be explicitly represented in the form of branches, the only
mechanism available to conditionally execute operations. An
instruction set with predicated execution, however, can support
conditional execution via either conventional branches or predicated
operations.

Predication is most commonly utilized in a compiler by employing
if-conversion. This technique converts conditional branches in an acyclic
region of the control flow into predicate-defining operations. With
if-conversion, a straight-line sequence of predicated code can replace
complex nets of branching code. As Figure 1 shows, the execution of the
code after if-conversion does not involve any branch. This means the
compiler can eliminate problematic branches and avoid their associated
overhead. It also facilitates increased instruction-level parallelism by
allowing the compiler to overlap the execution of separate control flow
paths. This allows the processor to simultaneously execute multiple paths
from a single thread of control. An important benefit of predication not
illustrated in Figure 1 is that it allows overlapping and independent
control flow constructs without expanding the code. 

[1]. B. Rau et al., "The Cydra 5 Departmental Supercomputer," IEEE
Computer, Jan. 1989, pp. 12-35. 

....
CONCLUSION
 
In the future, integrating control and data speculation with
predicated execution will enable advanced compiler techniques to
increase the performance of future processors. With the adoption of
advanced full predication support in IA-64 and perhaps many other
architectures, predicated execution may become one of the most
significant advances in the history of computer architecture and
compiler design."


Note the last sentence: Predicate execution -- computing without
branches -- is considered one of the most revolutionary advances in
computer architecture. This technique was also implemented by Konrad
Zuse back in 1938.  I stand in awe before such an intellectual
achievement and foresight. The fact the name of Konrad Zuse is not
widely recognized in this country is very disturbing. I wonder what
can be done to make amends.  Maybe the Germany Technology Museum will
consider a tour of the U.S. dedicated to Konrad Zuse.  Maybe PBS will
make a documentary about him, with sponsorships from ACM, IEEE, and the
computing industry. After all, Konrad Zuse is morally entitled to a
share of every sale of Itanium, UltraSparc-III and Crusoe chips. I
want to thank DDJ for publishing such a remarkable article about
Konrad Zuse, and giving credit to a man who invented the future.

	Sincerely,
	Oleg Kiselyov