From oleg-at-pobox.com Mon Aug 28 12:30:32 2000 Date: Mon, 28 Aug 2000 12:30:30 -0700 (PDT) Message-ID: <200008281930.MAA27465@adric.cs.nps.navy.mil> To: editors-at-ddj.com Subject: Konrad Zuse deserves even more credit Status: OR Dear DDJ, I'm writing regarding a remarkable article "Simulating Konrad Zuse's Computers" (DDJ, September 2000). First of all, I'd like to thank Dr. Rojas for his article and for his work to make Konrad Zuse's extraordinary achievements known. I have a comment about the following statement at the end of the article: "From a practical perspective, and in a way the Z3 was really programmed, it was not equivalent to the modern computers". It appears Z3 was farther ahead that we might've realized. The above statement seems to refer to the way Z3 deals with computations that include branches. As the article well explains, Z3 did not have any branching instructions. An algorithm with conditional jumps has to be transformed for Z3. We have to identify 'traces' -- sections of code without branches. We assemble the traces into Z3 code and write them one after another on Z3's tape glued in a loop. Each trace starts with computing a binary number 't' based on the current state of the algorithm stored in Z3 memory. Operations a = b op c in the algorithm are written in Z3 code as a = a*(1-t) + t*(b op c). All traces were repeatedly executed, but only the one with the predicate 't' equal to 1 altered the memory. Modern processors do have branch instructions. These instructions however cause a lot of pain to designers of super-scalar CPUs as taking a branch disrupts the pipeline of already fetched and analyzed instructions. Therefore, super-scalar CPUs and the corresponding compiler systems try to "eliminate" branches as much as possible: either by prediction and speculative execution, or -- in recent years -- by predication. The following is a quotation from an article "Technology Outlook: Introduction to Predicated Execution" by Wen-mei Hwu, IEEE Computer, Vol. 31, No. 1: January 1998, pp. 49-50. It is interesting to compare this quote with the above description of Z3's handling of conditional jumps. "The story of Merced, Intel's first processor based on its next-generation 64-bit architecture, will continue to unfold in 1998. Intel expects this product of its collaboration with Hewlett-Packard to reach volume production in 1999. To date, however, the two companies have released few details about Intel Architecture 64 (IA-64). One significant change they did admit to at the October 1997 Microprocessor Forum was the switch to full predicated execution, a technique that no other commercial general-purpose processor employs. Predicated execution is a mechanism that supports the conditional execution of individual operations [1]. Compared to a conventional instruction set, an operation in a predicated-execution architecture has an additional input operand -- a predicate -- that can assume a value of true or false. During runtime, a predicated-execution processor fetches operations regardless of their predicate value. The processor executes operations with true predicates normally; it nullifies operations with false predicates and prevents them from modifying the processor state. Using predication inherently changes the representation of a program's control flow. A conventional instruction set requires all control flow to be explicitly represented in the form of branches, the only mechanism available to conditionally execute operations. An instruction set with predicated execution, however, can support conditional execution via either conventional branches or predicated operations. Predication is most commonly utilized in a compiler by employing if-conversion. This technique converts conditional branches in an acyclic region of the control flow into predicate-defining operations. With if-conversion, a straight-line sequence of predicated code can replace complex nets of branching code. As Figure 1 shows, the execution of the code after if-conversion does not involve any branch. This means the compiler can eliminate problematic branches and avoid their associated overhead. It also facilitates increased instruction-level parallelism by allowing the compiler to overlap the execution of separate control flow paths. This allows the processor to simultaneously execute multiple paths from a single thread of control. An important benefit of predication not illustrated in Figure 1 is that it allows overlapping and independent control flow constructs without expanding the code. [1]. B. Rau et al., "The Cydra 5 Departmental Supercomputer," IEEE Computer, Jan. 1989, pp. 12-35. .... CONCLUSION In the future, integrating control and data speculation with predicated execution will enable advanced compiler techniques to increase the performance of future processors. With the adoption of advanced full predication support in IA-64 and perhaps many other architectures, predicated execution may become one of the most significant advances in the history of computer architecture and compiler design." Note the last sentence: Predicate execution -- computing without branches -- is considered one of the most revolutionary advances in computer architecture. This technique was also implemented by Konrad Zuse back in 1938. I stand in awe before such an intellectual achievement and foresight. The fact the name of Konrad Zuse is not widely recognized in this country is very disturbing. I wonder what can be done to make amends. Maybe the Germany Technology Museum will consider a tour of the U.S. dedicated to Konrad Zuse. Maybe PBS will make a documentary about him, with sponsorships from ACM, IEEE, and the computing industry. After all, Konrad Zuse is morally entitled to a share of every sale of Itanium, UltraSparc-III and Crusoe chips. I want to thank DDJ for publishing such a remarkable article about Konrad Zuse, and giving credit to a man who invented the future. Sincerely, Oleg Kiselyov