The current version of the Haskell library ``Iteratee-based I/O'' is on Hackage
< http://hackage.haskell.org/package/iteratee >
Here we give motivation, justification and explanation, pointing out earlier and experimental versions as well as related projects.
for-each, map, filter higher-order procedures -- of which the most general is fold. The second approach relies on streams, a.k.a. cursors, lazy lists. Generators such as the ones in Icon, Ruby and Python are a hybrid approach.
It is well-known that given a cursor interface to a collection, we can implement an enumerator. It is less appreciated that given an enumerator interface, we can always derive a cursor -- in an automatic way. We demonstrate that generic procedure for languages with and without first-class continuations.
Now that cursors and enumerators are inter-convertible, an implementor of a collection has a choice: which of the two interfaces to implement natively? We argue that he should offer the enumerator interface as the native one. The paper elaborates that enumerators are superior: in efficiency; in ease of programming; in more predictable resource utilization and avoidance of resource leaks. We present a design of the overall optimal collection traversal interface, which is based on a left-fold-like combinator with premature termination. The design has been implemented and tested in practice.
Enumerating collections: searching for the best iterator
From enumerators to cursors: turning the left fold inside out, in Haskell
enumerators-callcc-code.scm [4K]
The general enumerator-inversion code in Scheme.
Our running example is HTTP request processing in Haskell, specifically, reading lines (terminated by CR, LF or CRLF) from a file descriptor and then from the chunk-encoded body. The main example illustrates multiplexing across two file descriptors and the full IO interleaving. The same line parser is used to process data from the file descriptor stream and from the embedded chunk-encoded stream, which is incrementally decoded.
We focus on:
We achieve both high performance and encapsulation of input processing layers that can be freely composed. The technique has been validated in a production web-application server written in Haskell; database access library Takusen; TIFF and WAVE file readers.
IterateeIO-talk.pdf [163K]
IterateeIO-talk-notes.pdf [198K]
Talks about Iteratee IO
The first version of the talk was presented at DEFUN08, ACM SIGPLAN 2008 Developer Tracks on Functional Programming. September 27, 2008. Victoria, Canada.
The present version of the talk (given at the research colloquium of the Center for Software Technology, Department of Information and Computing Sciences, Utrecht University on December 17, 2009) describes the December 2009 version of the Iteratee IO.
The talk is aimed at practitioners not afraid of Haskell, in particular, server programmers. That is, programmers who program long-running distributed applications and are painfully aware of the issues of reading from sockets, latency, buffering, many layers of decoding, proper resource disposal and sustaining high load.
README.dr [3K]
Description of the Iteratee IO library and the code accompanying the talk.
Lazy-vs-correct.txt [5K]
Lazy vs correct IO: A message on composability and performance of Iteratee IO, posted on the Haskell-Cafe mailing list on Thu, 18 Sep 2008 23:51:59 -0700 (PDT).
We show that although the Iteratee IO library is not optimized at all, it already outperforms Lazy IO. The final benchmark, counting words in all GHC documentation, demonstrates the differences between Lazy and Iteratee IO: Iteratee IO finishes the test whereas Lazy IO runs out of file descriptors.
hyena: Simple web application server, by Johan Tibell
< http://hackage.haskell.org/package/hyena >
In essence, an iteratee takes a chunk of the input stream and returns one of:
We assume that all iteratees are `good' -- given bounded input, they do the bounded amount of computation and take the bounded amount of resources. We also assume that given a terminated stream, an iteratee moves to the done state, returning the results computed so far.
It turns out, iteratee forms a monad. Indeed, Stream is a gradually consumed state; to read another chunk, the iteratee has to make an `operating system call' to the stream producer. So, Iteratee is a State+Cont monad, and hence an Error monad. We can use the familiar do notation for composing iteratees.
The basic design suggests two implementations. The comments in the code below debate them, justifying our choice for the implementation.
Each enumerator takes an iteratee and returns an iteratee: an Enumerator is an iteratee transformer. The enumerator normally stops when the stream is terminated or when the iteratee moves to the done state, whichever comes first. We may compose enumerators (as functions) just as we compose iteratees (with bind). Composing enumerators effectively `concatenates' the sources of their streams.
To clarify the delimited-control nature of iteratees, we provide a CPS implementation. The types of Enumerators and Enumeratees are the simplest in that version. The code for high-level iteratees and for enumeratees remains almost the same.
Version 2.0 of the library introduces Enumeratees, or stream adaptors, processors of nested streams.
IterateeM.hs [42K]
Monadic and general iteratees: the main library. Parsing of a stream chunk-encoded in another stream; IO multiplixing.
IterateeMCPS.hs [38K]
The CPS version of the above, which is in some sense simpler, making monadic iteratees look pure.
io.txt [26K]
An early draft of an ambitious layered IO proposal, cast in the context of Scheme. More polished drafts exist, and even a prototype Scheme implementation. Unfortunately, once it became clear that the ideas are working out, the motivation fizzled.
A stream-wise access to a collection is an important access method, which may even be supported by hardware. For example, Pentium III floating-point extension (Internet Streaming SIMD Extension) lets programmers designate arrays as streams and provides instructions to handle such data efficiently (Internet Streaming SIMD Extensions, Shreekant (Ticky) Thakkar and Tom Huff, Computer, Vol. 32, No. 12, December 1999, pp. 26-34). Streaming is a typical memory access model of DSPs: that's why DSP almost never incorporate a data cache (See "DSP Processors Hit the Mainstream," Jennifer Eyre and Jeff Bier, Computer, Vol. 31, No. 8, August 1998, pp. 51-59). A memory architecture designed in an article "Smarter Memory: Improving Bandwidth for Streamed References" (IEEE Computer, July 1998, pp.54-63) achieves low overall latencies because the CPU is told by a compiler that a stream operation is to follow. LinAlg offers this streaming access model to an application programmer.
Matrix streams may stride a matrix by an arbitrary amount. This lets us traverse a matrix along the diagonal, by columns, by rows, etc. Streams can be constructed of a Matrix itself, or from other matrix views (MatrixColumn, MatrixRow, MatrixDiag). In the latter case, the streams are confined only to specific portions of the matrix.
Many functions of LinAlg are written in terms of streams, for example, the computation of vector norms, the addition of a vector to the diagonal or the anti-diagonal of a matrix, Aitken-Lagrange interpolation. Singular value decomposition SVD demonstrates many applications of streams: e.g., multiplying a matrix by a rotation matrix avoids random access to matrix elements and the corresponding range checks and offset calculations. The stream code is also more lucid. One may create a stream that spans over a part of another stream. We use substreams, for example, to efficiently reflect the upper triangle of a square matrix onto the lower one, yielding a symmetric matrix. The SVD computation uses subranging extensively, e.g., for left Householder transformations.
LinAlg's streams may span an arbitrary rectangular block of a matrix, including the whole matrix, a single matrix element, a matrix row or a column, or a part thereof. Assigning a block of one matrix to a block of another takes only one line -- which, due to inlining, is just as efficient as the direct loop with pointer manipulation.
LinAlg: Basic Linear Algebra and Optimization classlibWe use random and binary IO to write a general-purpose TIFF library. The library emphasizes incremental processing, relying on iteratees and enumerators for on-demand reading of tag values. The library extensively uses nested streams, tacitly converting the stream of raw bytes from the file into streams of integers, rationals and other user-friendly items. The pixel matrix is presented as a contiguous stream, regardless of its segmentation into strips and physical arrangement.
We show a representative application of the library: reading a sample TIFF file, printing selected values from the TIFF dictionary, verifying the values of selected pixels and computing the histogram of pixel values. The pixel verification procedure stops reading the pixel matrix as soon as all specified pixel values are verified. The histogram accumulation does read the entire matrix, but incrementally. Neither pixel matrix processing procedure loads the whole matrix in memory. In fact, we never read and retain more than the IO-buffer-full of raw data.
Version 2.0 of the library demonstrates restartable exceptions to seek (that is, control the position) in a stream.
Tiff.hs [25K]
TIFF library and its sample application
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML