Binary I/O and applications

Binary i/o in Scheme is platform-specific: R5RS specifies no read-byte or a similar function. However, almost all Scheme systems either provide such a function or let us emulate one, as a composition of char->integer and read-char. All the code in this section assumes such a native or emulated read-byte function.


 

Binary Parsing

Binary parsing and unparsing are transformations between primitive or composite Scheme values and their external binary representations. Examples include reading and writing JPEG, TIFF, ELF file formats, communicating with DNS, Kerberos, LDAP, SLP internet services, participating in Sun RPC and CORBA/IIOP distributed systems, storing and retrieving (arrays of) floating-point numbers in a portable and efficient way.

This short position talk proposes a set of low- and intermediate- level procedures that make binary parsing possible.

Version
The current version is 1.0, Sep 15, 2000.
References
binary-parsing-slides.ps.gz [34K]
The slides and the transcript of a micro presentation at a Workshop on Scheme and Functional Programming 2000. Montreal, 17 September 2000.
 

Reading variable number of bits from a sequential input stream

The bit reader is the first part of a binary parsing framework. The bit reader code was intended to be as general and optimal as possible. The timing study given in the article referenced below shows the extent both goals have been met.

The bit reader lets us read one bit from an arbitrary stream. We can then read one more bit -- or two more bits, or 3, 7, 8, 23, 30, or 32 more bits. And then 33 bits, 65535, 81920,... bits. Following the tradition of Scheme, the bit reader does not impose any artificial upper limit whatsoever on the number of bits we can attempt to read from a stream. We are naturally bounded by the size of the stream and the amount of the virtual space on the system. The validation tests included with the source code really read all the bits from a file in one swoop -- as well as one bit at a time, and many cases in-between.

The bit reader intentionally does not read ahead: no byte is read until the very moment we really need (some of) its bits. Therefore, the reader can be used to handle a concatenation of different bit/byte streams strictly sequentially, without 'backing up a char', 'unreading-char' etc. tricks. For example, make-bit-reader has been used to read GRIB files of meteorological data, which are made of several bitstreams with headers and tags. Careful attention to byte-buffering and optimization are the features of this bit reader.

The code is tested on Gambit-C 3.0 and MIT Scheme 5d2. The code has R5RS versions of needed logical primitives, so it should work for any R5RS Scheme system. Despite the bit reader being so general, it is nevertheless optimized for common uses. The optimized cases stand out in the timing benchmarks discussed in the article.

Insightful discussions with Daniel Ortmann are gratefully acknowledged.

Version
The current version is 1.1, Oct 20, 2000.
References

binary-parse.scm [15K]
A commented source code, validation tests, and a timing benchmark.

The bit reader in the Scheme Untergrund Library
kindly ported by Martin Gasbichler.

A USENET article [plain text file]
that describes the code and presents the results of several performance benchmarks. The timings also give an insight into the performance of a Scheme system. The article was posted as _wide-range_ optimized bit reader and its performance on a newsgroup comp.lang.scheme on Sun, 22 Oct 2000 20:20:56 GMT.

 

Endian I/O

An Endian I/O port lets us read or write integers of various sizes taking a byte order into account.

The TIFF library, for example, assumes the existence of a data structure EPORT with the following operations:

     endian-port-set-bigendian!   EPORT -> UNSPECIFIED
     endian-port-set-littlendian! EPORT -> UNSPECIFIED
     endian-port-read-int1        EPORT -> UINTEGER (byte)
     endian-port-read-int2        EPORT -> UINTEGER
     endian-port-read-int4        EPORT -> UINTEGER
     endian-port-setpos           EPORT INTEGER -> UNSPECIFIED

The endian port can be implemented in a R5RS Scheme system if we assume that the composition of char->integer and read-char yields a byte and if we read the whole file into a string or a u8vector (SRFI-4). Obviously, there are times when such a solution is not satisfactory. Therefore, tiff-prober and the validation code vtiff.scm rely on a Gambit-specific code. All major Scheme systems can implement endian ports in a similar vein -- alas, each in its own particular way.

Version
The current version is 2.0, Oct 2003.
References

tiff-prober.scm [5K]
This TIFF prober code provides an implementation of the input Endian port specifically tuned for Gambit.

Advanced i/o and arithmetic compression C++ class library
Our goal is to reproduce all the functionality of that C++ library in Scheme.

 

Handling a TIFF file

TIFF library is a Scheme library to read and analyze TIFF image files. We can use the library to obtain the dimensions of a TIFF image; the image name and description; the resolution and other meta-data. We can then load a pixel matrix or a colormap table. An accompanying tiff-prober program prints out the TIFF dictionary in a raw and polished formats.

Features:

Interface:

TAGDICT
A data structure: a tag dictionary, which helps translate between tag-symbols and their numerical values.
tagdict-get-by-name TAGDICT TAG-NAME -> INT
tagdict-get-by-num TAGDICT INT -> TAG-NAME or #f
tagdict-tagval-get-by-name TAGDICT TAG-NAME VAL-NAME -> INT
tagdict-tagval-get-by-num TAGDICT TAG-NAME INT -> VAL-NAME or #f
make-tagdict ((TAG-NAME INT (VAL-NAME . INT) ...) ...) -> TAGDICT
tagdict? TAGDICT -> BOOL
tagdict-add-all DEST-DICT SRC-DICT -> DEST-DICT
Here TAG-NAME and VAL-NAME are symbols.
tiff-standard-tagdict
The dictionary of standard TIFF tags.
 
TIFF-DIR-ENTRY
A data structure that describes the tag, the type, the item count of the entry, the offset or an immediate value of the entry, and a promise for entry's value. The value may be an integer, a rational, a floating-point number, a string, or a uniform vector (u8vector, u16vector or u32vector).
 
TIFF-DIRECTORY
TIFF Image File Directory, a data structure. TIFF directory is a collection of TIFF directory entries. The entries are stored in an ascending order of their tags.
read-tiff-file EPORT [PRIVATE-TAGDICT] -> TIFF-DIR
print-tiff-directory TIFF-DIR OPORT -> UNSPECIFIED
tiff-directory? SCHEME-VALUE -> BOOL
tiff-directory-size TIFF-DIR -> INT
tiff-directory-empty? TIFF-DIR -> BOOL
tiff-directory-fold-left TIFF-DIR FN SEED ... -> [SEED ...]
tiff-directory-get TIFF-DIR KEY [ABSENCE-THUNK] -> VALUE
KEY may be either a symbol or an integer
tiff-directory-get-as-symbol TIFF-DIR KEY [ABSENCE-THUNK] -> VALUE
Here KEY must be a symbol. If it is possible, the VALUE is returned as a symbol, as translated by the tagdict.

The library accesses the input TIFF image file solely through the methods defined for the endian port.

Version
The current version is 2.0, Sep 2003.
References

The announcement of the library, with examples and details [plain text file]
The article was posted as [ANN] Reading TIFF files on a newsgroup comp.lang.scheme on Tue, 7 Oct 2003 18:24:12 -0700.

tiff.scm [30K]
The commented source code. It explains the interface above in far more detail. Dependencies: util.scm, char-encoding.scm, myenv.scm.

vtiff.scm [9K]
The validation code.The validation code includes a function test-reading-pixel-matrix that demonstrates loading a pixel matrix of an image in an u8vector. The code can handle a single or multiple strips.

gnu-head-sm.tif [16K]
A sample TIFF file for the validation code.It is the image of the GNU head (www.gnu.org) converted from JPEG to TIFF by xv. Copyleft by GNU.

tiff-prober.scm [5K]
A TIFF prober program: a sample application of the TIFF library. The prober prints out the contents of a TIFF dictionary of input TIFF files.

Iteratee-based TIFF library

 

Reading IEEE binary floats in R5RS Scheme

We show how to read IEEE binary floating-point numbers using only procedures defined in R5RS. No special language extensions, foreign function interfaces, or libraries are required. The only assumption is that char->integer returns an integer with the same bit pattern as the function's argument, a single 8-bit ASCII character. The assumption holds for many Scheme systems.

The code can read 4-byte single-precision IEEE floating-point numbers from minfloat to maxfloat inclusively. The code does not handle +Inf, -Inf and NaNs, although this is trivial to add, as explained in the comments to the code.

One can twiddle bits in Scheme after all: it is just arithmetics...

Version
The current version is 1.1, Aug 9, 2002.
References
A USENET article with the complete code and the test suite [plain text file]
The article was posted as Reading IEEE binary floats in R5RS Scheme on a newsgroup comp.lang.scheme on Wed, 08 Mar 2000 03:24:25 GMT.


Last updated December 5, 2008

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML