for-each, map, filter higher-order procedures -- of which the most general is fold. The second approach relies on streams, a.k.a. cursors, lazy lists. Generators such as the ones in Icon, Ruby and Python are a hybrid approach.
It is well-known that given a cursor interface to a collection, we can implement an enumerator. It is less appreciated that given an enumerator interface, we can always derive a cursor -- in an automatic way. We demonstrate that generic procedure for languages with and without first-class continuations.
Now that cursors and enumerators are inter-convertible, an implementor of a collection has a choice: which of the two interfaces to implement natively? We argue that he should offer the enumerator interface as the native one. The paper elaborates that enumerators are superior: in efficiency; in ease of programming; in more predictable resource utilization and avoidance of resource leaks. We present a design of the overall optimal collection traversal interface, which is based on a left-fold-like combinator with premature termination. The design has been implemented and tested in practice.
Enumerating collections: searching for the best iterator
From enumerators to cursors: turning the left fold inside out, in Haskell
enumerators-callcc-code.scm [4K]
The general enumerator-inversion code in Scheme.
Our running example is HTTP request processing in Haskell, specifically, reading lines (terminated by CR, LF or CRLF) from a file descriptor and then from the chunk-encoded body. The main example illustrates multiplexing across two file descriptors and the full IO interleaving. The same line parser is used to process data from the file descriptor stream and from the embedded chunk-encoded stream, which is incrementally decoded.
We focus on:
We achieve both high performance and encapsulation of input processing layers that can be freely composed. The technique has been validated in a production web-application server written in Haskell and database access library Takusen.
README.dr [3K]
Description of the Iteratee IO library and the code accompanying the talk.
Lazy-vs-correct.txt [5K]
Lazy vs correct IO: A message on composability and performance of Iteratee IO, posted on the Haskell-Cafe mailing list on Thu, 18 Sep 2008 23:51:59 -0700 (PDT).
We show that although the Iteratee IO library is not optimized at all, it already outperforms Lazy IO. The final benchmark, counting words in all GHC documentation, demonstrates the differences between Lazy and Iteratee IO: Iteratee IO finishes the test whereas Lazy IO runs out of file descriptors.
io.txt [26K]
An early draft of an ambitious layered IO proposal, cast in the context of Scheme. More polished drafts exist, and even a prototype Scheme implementation. Unfortunately, once it became clear that the ideas are working out, the motivation fizzled.
A stream-wise access to a collection is an important access method, which may even be supported by hardware. For example, Pentium III floating-point extension (Internet Streaming SIMD Extension) lets programmers designate arrays as streams and provides instructions to handle such data efficiently (Internet Streaming SIMD Extensions, Shreekant (Ticky) Thakkar and Tom Huff, Computer, Vol. 32, No. 12, December 1999, pp. 26-34). Streaming is a typical memory access model of DSPs: that's why DSP almost never incorporate a data cache (See "DSP Processors Hit the Mainstream," Jennifer Eyre and Jeff Bier, Computer, Vol. 31, No. 8, August 1998, pp. 51-59). A memory architecture designed in an article "Smarter Memory: Improving Bandwidth for Streamed References" (IEEE Computer, July 1998, pp.54-63) achieves low overall latencies because the CPU is told by a compiler that a stream operation is to follow. LinAlg offers this streaming access model to an application programmer.
Matrix streams may stride a matrix by an arbitrary amount. This lets us traverse a matrix along the diagonal, by columns, by rows, etc. Streams can be constructed of a Matrix itself, or from other matrix views (MatrixColumn, MatrixRow, MatrixDiag). In the latter case, the streams are confined only to specific portions of the matrix.
Many functions of LinAlg are written in terms of streams, for example, the computation of vector norms, the addition of a vector to the diagonal or the anti-diagonal of a matrix, Aitken-Lagrange interpolation. Singular value decomposition SVD demonstrates many applications of streams: e.g., multiplying a matrix by a rotation matrix avoids random access to matrix elements and the corresponding range checks and offset calculations. The stream code is also more lucid. One may create a stream that spans over a part of another stream. We use substreams, for example, to efficiently reflect the upper triangle of a square matrix onto the lower one, yielding a symmetric matrix. The SVD computation uses subranging extensively, e.g., for left Householder transformations.
LinAlg's streams may span an arbitrary rectangular block of a matrix, including the whole matrix, a single matrix element, a matrix row or a column, or a part thereof. Assigning a block of one matrix to a block of another takes only one line -- which, due to inlining, is just as efficient as the direct loop with pointer manipulation.
LinAlg: Basic Linear Algebra and Optimization classliboleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML