previous next up top

A Conceptual Sequence as a Tangible and Efficient Data Structure

A series of list appends or monadic binds for many monads performs algorithmically worse when left-associated. Continuation-passing style (CPS) is well-known to cure this severe dependence of performance on the association pattern. The advantage of CPS dwindles or disappears if we have to examine or modify the intermediate result of that series of appends or binds, before continuing the series. Such examination is frequently needed, for example, to control search in non-determinism monads.

We present an alternative approach that is just as general as CPS but more robust: it makes series of binds and other such operations efficient regardless of the association pattern -- and also provides efficient access to intermediate results. The key is to represent such a conceptual sequence as an efficient sequence data structure. Efficient sequence data structures from the literature are homogeneous and cannot be applied as they are in a type-safe way to a series of monadic binds. We generalize them to type aligned sequences and show how to construct their (assuredly order-preserving) implementations. We demonstrate that our solution solves previously undocumented, severe performance problems in iteratees, LogicT transformers, free monads and extensible effects.

Joint work with Atze van der Ploeg.

Introduction
Slow-downs and speed-ups of list concatenations
Slow-downs and speed-ups of monad concatenations
Conclusions
Surprisingly simple fast typed-aligned queues

Introduction

Lists are the most familiar example of an ordered sequence of elements, but not the most efficient one. Adding and removing an element is the constant-time operation at the list's left edge but not at the right edge. Appending two lists also takes linear time. There are many more efficient sequence data structures, described in the well-known book by Chris Okasaki, in which the above operations are (on average) constant-time.

The performance advantage of efficient sequences is tangible, e.g., when building a sequence by concatenating many smaller ones, as typically happens in serialization. With lists, whose append runs in time proportional to the length of the left argument, even simple examples set off the performance trap, as we shall see below. Turning to difference lists lets us out: the underlying continuation-passing style (CPS) magically re-associates appends in the most efficient pattern, to the right. Alas, CPS saves us so long as we do not examine the list built so far. Efficient sequences once and for all solve the performance problems of building sequences from smaller ones and examining intermediate results. This work applies what is well-known about the efficient concrete sequences to abstract ones.

The pattern of building a complex structure, transformation or monadic action by adjoining, composing or binding smaller parts is pervasive. Also frequent is the need to examine intermediate results to decide on further steps. A transformation or a monadic action are not sequences. But the series of steps taken to construct them are a conceptual sequence. Representing that abstract series of steps as a concrete efficient sequence helps here as well.

For illustration, we show a simple but slow implementation of iteratees as a Free Monad. The monadic bind here, like list append, is associative and performs much better when right-associated. Representing the abstract series of binds as an efficient type-aligned sequence data structure algorithmically improves the performance of iteratees. The Haskell Symposium paper shows more examples of the speed-up, taken from the implementations of delimited control and non-determinism.

Our observations about the performance problems of appends and monadic binds, and the offered solutions hold in call-by-value as they do in call-by-need and can be implemented in, say, OCaml or Scala. For concreteness, on this page we will be using Haskell.

References

zseq.pdf [271K]
Reflection without Remorse: Revealing a hidden sequence to speed up monadic reflection
Paper published in Proceedings of the Haskell Symposium 2014. September 4-5, 2014. Gothenburg, Sweden.

Video of Atze van der Ploeg's presentation at the Haskell Symposium 2014. September 5, 2014.
< https://www.youtube.com/watch?v=_XoI65Rxmss&list=UUP9g4dLR7xt6KzCYntNqYcw >

< https://github.com/atzeus/reflectionwithoutremorse >
Accompanying source code

Slow-downs and speed-ups of list concatenations

This background section recalls the real performance problem of building lists by appending many smaller ones, its partial solution by the difference lists and the complete solution by efficient sequences such as Okasaki's catenable queues. Everything in this section is well-known. Novelty comes next, in generalizing from concrete sequences to conceptual ones, such as sequences of monadic binds.

Although our examples are in Haskell, everything we describe here applies to any other functional language. Recall, Haskell lists [a] of elements of the type a is a built-in data structure, which could have been defined as

     data [a] = [] | a : [a]

Furthermore, the type String is an alias of [Char]. Henceforth we will be using strings as concrete lists. The list append operation is the infix (++), defined as

     (++) :: [a] -> [a] -> [a]
     []    ++ y = y
     (h:t) ++ y = h : (t ++ y)

It traverses the left operand but merely passes around the right one. The operation is clearly associative: (l1 ++ l2) ++ l3 and l1 ++ (l2 ++ l3) give the same result for any lists l1, l2 and l3. Yet the asymmetry of handling the operands shows up as the difference in performance: if the lists l1, l2 and l3 have the length n, the right-associative version l1 ++ (l2 ++ l3) needs 2n cons-operations (:) whereas the left-associative version needs 3n. For appending many small lists, the right-associative pattern does algorithmically better than the left-associative one: linearly rather than quadratically. The reader is encouraged to verify this as an exercise. As a slightly more advanced exercise, confirm that the conclusion holds regardless of the call-by-value or call-by-need evaluation strategy.

The problem is clear, and so is the solution: just associate append operations to the right. That is easier said than done, as we are about to see, already on a simple example.

Our running example is printing out (serializing) a JSON-like document represented in memory as the data structure

     data JSON = A Int | Node String JSON | Arr [JSON]
     
     sample = Node "Root" (
       Arr [Node "Ch1" (A 1),
            Node "Ch2" (A 2)])

Integers are atomic documents; documents may be grouped into an array and given a name, as we see in the sample document. Another, deeply nested, sample is produced by makej n, with n nested nodes and no arrays:

     makej :: Int -> JSON
     makej 0 = A 1
     makej d = Node "node-name" (makej (d-1))

The most straightforward function to show this document as a string

     showj :: JSON -> String
     showj (A x)     = show x
     showj (Node t j) = "{" ++ t ++ " " ++ showj j ++ "}"
     -- Don't handle Arr yet

has a surprisingly poor performance: showj (makej 4096) produces scant 49Kb of output and yet running length $ showj (makej 4096) takes 4.8 seconds in GHCi on a modern Core i3 laptop. As the size of the output increases, the running time grows super-linearly. The performance trap of list append is truly easy to fall into.

Fortunately, there is a way out: a more efficient representation for lists, namely, difference lists. They are used for intermediate results and converted to the real list, by fromDiff, at the very end:

     type DiffList a = [a] -> [a]
     
     fromDiff :: DiffList a -> [a]
     fromDiff x = x []
     
     toDiff :: [a] -> DiffList a
     toDiff l = \t -> l ++ t
     
     showjCPS :: JSON -> String
     showjCPS j = fromDiff (go j)
      where
        go :: JSON -> DiffList Char
        go (A x)      = toDiff $ show x
        go (Node t j) = (toDiff "{") . (toDiff t)  .
                        (toDiff " ") . go j . (toDiff "}")
        -- Don't handle arrays yet

A difference list is a function that represents the list prefix, to be added to the suffix received as an argument. Appending difference lists is composing the corresponding functions, which is constant-time. Re-writing showj to use difference lists internally is mere replacing the regular list append (++) with the difference-list append (.). It is easy to recognize the re-writing as the conversion of showj into the continuation-passing style: the continuation here is the tail of the list being constructed. Now, length $ showjCPS (makej 4096) runs in 0.03 seconds -- more than 100 times improvement! One can really feel such speed-ups, however puzzling. After all, eventually we do rely on the same list append (++) to concatenate strings, see toDiff. However, CPS always associates these appends to the right, which is most efficient. It is another good exercise for the reader to reflect on how CPS manages such a profitable re-association.

We have not handled the array grouping yet. We'd like to show them the way Haskell shows lists, e.g. "[1,2,3,4]" -- with the elements separated by commas and the whole output enclosed in brackets. The format is slightly irregular: there is a comma after each element, except for the last one. Everyone who has dealt with separators knows how irritating this exception is. Yet there is an elegant way of writing this formatting by taking its definition literally:

     showLst :: Show a => [a] -> String
     showLst []    = "[]"
     showLst (h:t) =
       let (p:s) = showLst t in
       p : show h ++ (if s == "]" then s else ","++s)

One can read the code as: to show a non-empty list (h:t), show its tail and then insert the result of show h right after the opening bracket, with the comma. Do not put the comma before the closing bracket. The program demonstrates the examining and modifying the intermediate result (the result of showing the tail of the list), to wedge-in show h and the comma. This inspection and modification of the intermediate result is called reflection.

Let's add this algorithm to our JSON serialization function:

     showjCPS1 :: JSON -> String
     showjCPS1 j = fromDiff (go j)
      where
        -- this part is as before
        go (A x)      = toDiff $ show x
        go (Node t j) = (toDiff "{") . (toDiff t)  .
                        (toDiff " ") . go j . (toDiff "}")
        -- the new part: showing arrays of nodes
        go (Arr [])    = (toDiff "[]")
        go (Arr (h:t)) = let (p:s) = fromDiff (go (Arr t)) in
          (p:) . go h . toDiff (if s == "]" then s else "," ++ s)

Recall, the elegant list serialization needs to examine the intermediate result. The internally used difference list is a function, and cannot be examined. It has to be converted to a regular list first, by fromDiff, and then toDiff back to the difference list to continue the formatting. The fromDiff conversion is expensive, forcing all the list appends that have been put off. That is not a good sign. It is not good indeed: showing an array of a single deeply nested document showjCPS1 (Arr $ replicate 1 (makej 80000)) produces 96Kb of output and takes 0.50 seconds. On the other hand, showing an array of many shallow documents showjCPS1 (Arr $ replicate 2000 (makej 10)) produces one-quarter of the output but takes 16 times longer!

Thus difference lists, or the CPS, only partly solve the performance problem of badly associated appends. In other words, reflection destroys the performance advantage of CPS.

We can reflect without remorse if we take a better working data structure. Instead of difference lists, with their efficient concatenation but inefficient deconstruction, we turn to the data structure where appending and the left-edge construction/deconstruction are all efficient. Okasaki's book lists several candidates. The Data.Sequence module in the Haskell standard library provides the data structure with exactly the desired properties. With these sequences (imported with the prefix Seq), the serialization function becomes:

     showjSeq :: JSON -> String
     showjSeq j = toList (go j)
      where
        char :: Char -> Seq.Seq Char
        char = Seq.singleton
        str  :: [Char] -> Seq.Seq Char
        str = Seq.fromList
        
        go :: JSON -> Seq.Seq Char
        go (A x)       = str (show x)
        go (Node t j)  = char '{' >< str t >< char ' ' >< go j >< char '}'
     
        go (Arr [])    = str "[]"
        go (Arr (h:t)) = let (p:<s) = Seq.viewl (go (Arr t)) in
          char p >< go h ><
          (case Seq.viewl s of {']' :< _ -> s; _ -> char ',' >< s})

The code is quite like the original version, but with the different ``append operator'': (><) in place of (++). Sequences can be deconstructed, with viewl, obtaining the left-most element p and the rest s -- which remains the sequence, rather than a list, and so does not have to be converted back to the sequence after prepending a comma. The results speak for themselves: showjSeq (Arr $ replicate 1 (makej 80000)) takes 1 second (slower than with showjCPS1) but showjSeq (Arr $ replicate 2000 (makej 10)) takes only 0.18 seconds, which is 60 times faster than before. Again, a small change in the algorithm gives a remarkable speed-up.

In the upshot, the efficient sequences fully solve the performance problems of building a structure from small components and examining the intermediate results. The solution applies to any associative operation that recursively examines only one of its arguments.

References

ReflList.hs [4K]
The complete Haskell code for the article

Slow-downs and speed-ups of monad concatenations

We now apply what we have learned about building lists to building monads. The performance of the monad bind, like the list append, may vary with the left argument but not the right one. Hence left-associated binds are a performance hit, similar to the one we saw with list appends. Although a monad action is not a sequence, the steps to build it do make a sequence -- a conceptual one. Representing this sequence concretely, as an efficient data structure, solves the performance problems, even when we examine the intermediate results.

The efficient data structure for monadic binds is not one of the Okasaki sequences. The latter are all homogeneous. Monads bound in a series, however, have in general different result types. Yet the Okasaki structures easily generalize to the heterogeneous, type-aligned sequences; please see the Haskell Symposium paper for details. Below we use FastTCQueue from type-aligned on Hackage.

The running example is the incremental parsing, or iteratee. For the sake of explanation, it is not written in full generality. Abstracting over the effect signature gives the Free Monad, and further, Extensible Effects. The speed-ups we obtain here apply to all these general constructions. An iteratee It is defined by its effects, which in our case are:

     data It i a = Pure a
                 | Get (i -> It i a)

That is, the computation It i a may be pure, producing the value of the type a without any effects. Or it may request to input a value of the type i, and, upon receiving it, eventually produce the value of the type a perhaps after asking for more i values. Here is the simple iteratee that inputs a value and immediately returns it:

     get :: It i i
     get = Get return

It is analogous to getchar in C, only the type of its input is not limited to characters. Like getchar, the iteratee get is the simplest parser. To build bigger parsers we have to specify how to compose iteratees:

     instance Monad (It i) where
       return = Pure
     
       Pure x >>= k = k x
       Get f  >>= k = Get (f >>> k)

where the operation (>>>), technically called Kleisli composition, is just the composition of two effectful functions:

     (>>>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
     f >>> g = (>>= g) . f

Here are a couple of bigger parsers, built from get by monadic bind. The second, addN, inputs and sums up n numbers from the input stream.

     addGet :: Int -> It Int Int
     addGet x = get >>= \i -> return (i+x)
     -- glibly: addGet x = (+ x) <*> get  
     
     addN :: Int -> It Int Int
     addN n = foldl (>>>) return (replicate n addGet) 0

Finally, feedAll below runs our parsers on the input stream [a]. If an iteratee asks for input, feedAll gives one off the stream, unless the stream is exhausted. In iteratee papers, this function is called an enumerator.

     feedAll :: It a b -> [a] -> Maybe b
     feedAll (Pure a) _  = Just a
     feedAll _        [] = Nothing
     feedAll (Get f)  (h : t) = feedAll (f h) t

The last clause in the monadic bind, Get f >>= k = Get (f >>> k), has the telltale pattern of recursively examining the left argument and merely passing around the right one. One suspects there could be a performance problem similar to the one with list append. It is there indeed: the running time of feedAll (addN n) [1..] increases quadratically with n. For n=4000, it takes 9.30 seconds. There has to be a faster way to sum up 4000 integers on a modern laptop, even with all the GHCi interpretation overhead.

Just like with list appends, one can try CPS. Alas, feedAll has another telltale pattern -- examining intermediate result, the iteratee, to find out if it is finished or wants more data. Thus findAll is built around reflection, which is a bad news for CPS.

We now write an efficient iteratee, to be called ItZ. Looking at the monad bind again shows that iteratees are built by (>>>)-composing ``effectful functions'' a -> It i b, for which we define the abbreviation:

     newtype Arr i a b = Arr{unArr :: a -> ItZ i b}

The conceptual series of composition steps can be represented concretely, as a type-aligned sequence:

     type Arrs i a b = FastTCQueue (Arr i) a b

We can treat Arrs i a b also as an effectful function a -> ItZ i b: the operation appZ below does exactly this conversion. Arrs i a b is build by composing a series of Arr i functions. Unlike the functional composition, we can ``uncompose'' Arrs i a b:

     appZ :: Arrs i a b -> a -> ItZ i b
     appZ q x = case tviewl q of
         TEmptyL -> return x
         Arr h :| t  -> case h x of
           ZPure x -> appZ t x
           ZGet f  -> ZGet (f >< t)

The efficient iteratee hence becomes:

     data ItZ i a = ZPure a
                  | ZGet (Arrs i i a)

The continuation -- what to do after acquiring the input value i -- is now represented as the concrete sequence of small compositions rather than one opaque i -> It i a function as before. The monad bind appends a new continuation to that sequence:

     instance Monad (ItZ i) where
       return = ZPure
     
       ZPure x >>= k = k x
       ZGet f  >>= k = ZGet (f |> Arr k)     -- NEW

The simple parser is now

     getZ :: ItZ i i
     getZ = ZGet tempty

where tempty is the empty sequence. The other parsers, like addGetZ and addNZ, are exactly as before; see the code below for details. The enumerator is almost the same as before, only using appZ to apply the ``data structure continuation'' Arrs i a b to the argument a:

     feedAllZ :: ItZ a b -> [a] -> Maybe b
     feedAllZ (ZPure a) _  = Just a
     feedAllZ _         [] = Nothing
     feedAllZ (ZGet f)  (h : t) = feedAllZ (appZ f h) t

Now, feedAllZ (addNZ n) [1..] runs in time linear with n; for n=4000, it takes 0.13 seconds, which is 70 times faster than the original version. The improved version also takes 147 times less memory allocated during the GHCi interpretation. The key to the improvement is appZ: when applying the series of continuations Arrs i a b to an argument it avoids (>>>)-composing out all the continuations in the sequence. Rather, it peels off the first one and applies to the argument. If the resulting iteratee waits for input, we append the tail of the original series of continuations, as it was, to that iteratee's continuation. It is hence important we do the compositions incrementally and the peeling-off and appending all take constant time.

References

ReflIter.hs [4K]
The complete Haskell code for the article

Conclusions

We have described a general way to improve the performance of Free Monads, delimited control, iteratees and coroutines. Unlike CPS, it works regardless of reflection. Our approach deals with a conceptual, abstract series of steps -- which are not lists, not homogeneous, and not even data structures -- and represents it as a concrete and efficient sequence data structure. The approach applies to any associative operator that recursively traverses only one of its arguments.

Last updated December 31, 2018

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML