{-# LANGUAGE ScopedTypeVariables #-} -- Incremental pretty-printing without laziness module PPYield (mainD, mainS) where {- The manuscript Laziness vs Continuations: Prettyprinting without Backtracking Amr Sabry and Simon Peyton-Jones (hereafter SPJ) describes the online incremental pretty-printing algorithm implemented with full co-routines (full continuations). This code shows that a low-powered generators (which were introduced in CLU and were present in early version of Python 2) are sufficient. We mean the generators in which yield reports the value `upstairs' without receiving anything back. The information flows strictly up the stack. These low-powered generators (unlike full continuations) can be easily implemented, on a single linear stack without copying. In our approach, the entire formatting is a composition of two stream transducers, operating on the stream of nodes of the source document. The computation is efficient since it is clear by inspection that width computations are repeated (such repetition was the cause of exponential or O(n*w) complexity in naive solutions to the problem). The traces below demonstrate that the look-ahead is limited, by PageWidth. Doaitse Swierstra has also developed an online linear pretty-printing algorithm: http://www.cs.uu.nl/research/techreps/repo/CS-2004/2004-025a.pdf which crucially relies on laziness. The main feature of our approach is that it is not lazy. It can be implemented in a strict language. Mainly, it is compatible with IO and effectful computations: we can read the source document from a file, and format is as we read. -} import Control.Monad.Trans import Data.Sequence (ViewL(..),ViewR(..),(<|),(|>)) -- Queue import qualified Data.Sequence as S -- Queue import Data.List (intersperse) import GenT -- simple generators -- For benchmarking import Data.Monoid import Control.Monad.Writer import System.Environment (getArgs) import System.Mem import GHC.Stats -- ------------------------------------------------------------------------ -- Problem Specification, from the SPJ paper (Sec 2) -- A document is represented as a tree data Doc = Text String | Line | Doc :+: Doc | Group Doc deriving Show -- A sample document doc1 = Group (Text "A" :+: (Line :+: Group (Text "B" :+: (Line :+: Text "C")))) -- It seems long lines are not split, and line -- breaks can only occur at `Line' -- The reference implementation -- See Sec 2 of the paper and also Chitil (ref 3) type Fit = Bool type WidthLeft = Int type Width = Int type PageWidth = Int pretty1 :: PageWidth -> Doc -> String pretty1 w d = fst $ format False w d where format :: Fit -> WidthLeft -> Doc -> (String, WidthLeft) format f r (Text z) = (z, r - length z) format True r Line = (" ", r-1) format False r Line = ("\n",w) format f r (d1 :+: d2) = (s1++s2, r2) where (s1,r1) = format f r d1 (s2,r2) = format f r1 d2 format f r (Group d) = format (f || width d <= r) r d width :: Doc -> Width width (Text z) = length z width Line = 1 width (d1 :+: d2) = width d1 + width d2 width (Group d) = width d -- Sample formatting t01 = pretty1 6 doc1 -- "A B C" t02 = pretty1 5 doc1 -- "A B C" t03 = pretty1 4 doc1 -- "A\nB C" t04 = pretty1 3 doc1 -- "A\nB C" t05 = pretty1 2 doc1 -- "A\nB\nC" t06 = pretty1 1 doc1 -- "A\nB\nC" {- The naive algorithm is exponential because the width of a deeply nested group is repeatedly re-computed. Solution: compute the width of all groups beforehand. Let's traverse the whole Doc tree and annotate each node with its width. That is the standard post-order traversal, which is linear in the size of the tree. Alas, the intermediate tree needs as much space as the original tree. Since we have to traverse all children of the root node (that is, all the tree) to compute the width of the root node, we really have to build the whole tree first before we start formatting. We thus recover the standard lazy algorithm (Sec 3.1), but without laziness. As the lazy algorithm, ours works in linear time (two traversals of the tree) but needs extra space equal to the size of the tree. But we didn't need laziness, we didn't need to build extra closures, and we didn't need to play the tricks with tying the knot. Because of that, we can do IO. -} -- Here we demonstrate the building of the intermediary tree, -- DocW, in which each group is annotated with its width. data DocW = TextW String | LineW | JoinW DocW DocW | GroupW Width DocW deriving Show buildDocW :: Doc -> DocW buildDocW d = fst $ go 0 d where -- We thread the running width throughout go :: Width -> Doc -> (DocW, Width) go p (Text z) = (TextW z, p + length z) go p Line = (LineW, p + 1) go p (d1 :+: d2) = let (d1',p1) = go p d1 (d2',p2) = go p1 d2 in (JoinW d1' d2', p2) go p (Group d) = let (d',p') = go p d in (GroupW (p'-p) d',p') docw1 = buildDocW doc1 {- GroupW 5 (JoinW (TextW "A") (JoinW LineW (GroupW 3 (JoinW (TextW "B") (JoinW LineW (TextW "C")))))) -} {- We come to the second optimization: If the width of a tree node exceeds the PageWidth, we don't need to know by how much. We merely need to compute the width up to PageWidth (which is the constant, independent of the Doc size). Thus, we can assign the width to a node before finishing traversal of its children. Thus we can do with less auxiliary space. To realize this savings and make the algorithm incremental, we use the standard trick. We traverse the tree and yield the encountered nodes, converting a tree to a stream. -} -- First, we re-write the above algorithm to be incremental, to produce -- the stream. -- One may imagine that the document is written on disk, as a sort -- of XML. The procedure genB below then reads the nodes -- of the document one by one. -- This is the data type for the elements of the document stream -- The data type is parametrized by annotations; GBeg may have -- its own annotation. data E ab a = TE a String | LE a | GBeg ab | GEnd a deriving Show type StreamB = E () () -- bare stream, no annotations -- The type shows that we emit StreamB elements as we go -- We use any base monad, including IO. genB :: Monad m => Doc -> GenT StreamB m () genB (Text "") = return () genB (Text z) = yield (TE () z) genB Line = yield (LE ()) genB (d1 :+: d2) = genB d1 >> genB d2 genB (Group d0) | Just d <- norm d0 = do yield (GBeg ()) genB d yield (GEnd ()) where norm (Group d) = norm d norm (Text "") = Nothing norm (Text "" :+: d) = norm d norm d = Just d genB _ = return () -- Let's check the stream for the sample document doc1 streamB :: IO () streamB = runGenT (genB doc1) (\i -> putStrLn $ "Generated: " ++ show i) -- The function genB also performs normalization of the -- document: -- every group is strictly wider than any of its children groups -- (thus eliminating Group $ Group ...) -- any group is at least one-character wide (eliminating Group (Text "")) -- Analysis: latency is unit, total time to generate the whole stream -- is linear in the size of the document, space is O(depth) since -- genB is non-tail-recursive. -- To easily compute the width of any group without repetition, we -- precompute the ``rolling width'', or the horizontal position -- assuming the whole document is formatted in one line. -- TE, LE, and GEnd elements are annotated with the horizontal position -- after the element has been formatted. -- GBeg is annotated with the horizontal position before the group -- is formatted. -- We stress the modularity of adding the annotation. type HP = Int -- Horizontal position, after formatting type HPB = Int -- position before the formatting type StreamHPB = E HPB HP trHPB :: Monad m => GenT StreamB (StateT HP (GenT StreamHPB m)) () -> GenT StreamHPB m () trHPB = foldG_ go 0 where go :: Monad m => HP -> StreamB -> GenT StreamHPB m HP go p (TE _ z) = let p' = p + length z in yield (TE p' z) >> return p' go p (LE _) = let p' = p + 1 in yield (LE p') >> return p' go p (GBeg _) = yield (GBeg p) >> return p go p (GEnd _) = yield (GEnd p) >> return p -- Let's check the stream for the sample document doc1 streamHPB :: IO () streamHPB = runGenT (trHPB . genB $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) {- Generated: GBeg 0 Generated: TE 1 "A" Generated: LE 2 Generated: GBeg 2 Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" Generated: GEnd 5 Generated: GEnd 5 -} -- Complexity analysis: latency is unit, overall time complexity -- is linear in the length of the stream. Space: constant. -- Changing the stream so that GBeg has HP of the end of the group -- rather than of the beginning. type StreamHPA = E HP HP -- Look-ahead buffer -- The buffer is a list with an element BufHPA for each nested group type Buffer m = [Buf StreamHPA m] type Buf e m = GenT e m () -- Buf, the look-ahead buffer proper, should permit the following operations -- -- empty -- -- append one element in constant time -- -- prepend one element in constant time -- -- concatenate two buffers in constant time -- (could be relaxed to, say log time) -- -- fold over the buffer and emit all accumulated elements, in linear time buf_empty :: Monad m => Buf e m buf_empty = return () infixl 5 ||> infixr 5 <|| (||>) :: Monad m => Buf e m -> e -> Buf e m b ||> e = b >> yield e (<||) :: Monad m => e -> Buf e m -> Buf e m e <|| b = yield e >> b buf_ccat :: Monad m => Buf e m -> Buf e m -> Buf e m buf_ccat b1 b2 = b1 >> b2 buf_emit :: Monad m => Buf e m -> GenT e m () buf_emit b = b trHPA :: Monad m => GenT StreamHPB (StateT (Buffer m) (GenT StreamHPA m)) () -> GenT StreamHPA m () trHPA = foldG_ go [] where -- switch-on the look-ahead go q (GBeg _) = return (buf_empty:q) -- GEnd flushes the top layer go (b:q) e@(GEnd p) = pop q (GBeg p <|| (b ||> e)) -- if no look-ahead, emit immediately go [] e = yield e >> return [] -- otherwise, delay go (b:q) e = return ((b ||> e):q) pop [] b = buf_emit b >> return [] pop (b':q) b = return ((buf_ccat b' b):q) streamHPA :: IO () streamHPA = runGenT (trHPA . trHPB . genB $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) {- Generated: GBeg 5 Generated: TE 1 "A" Generated: LE 2 Generated: GBeg 5 Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" Generated: GEnd 5 Generated: GEnd 5 -} -- Analysis: latency: n (have to wait till GEnd) -- Extra space: n (the look-ahead buffer) -- Total time: for each read element, give credit 2. -- GEnd uses credits for all the elements in the group it terminates. -- So, total time is 2n -- Thus to find out the width of a group, one has to wait till -- the corresponding GEnd element. -- Pruned HP data HPP = Small HP | TooFar deriving Show type StreamHPP = E HPP HP -- HPL is the limit (starting position for the group + PageWidth) type HPL = Int -- Look-ahead buffer -- It is like Buffer before; we need a queue rather than a simple list -- Also, we separately keep track of the limit position for each -- nested group, and for the outer group. type BufferP m = (HPL, S.Seq (HPL, Buf StreamHPP m)) bufferP_empty = (0,S.empty) -- trHPP :: (Monad m, MonadIO m) => trHPP :: (Monad m) => PageWidth -> GenT StreamHPB (StateT (BufferP m) (GenT StreamHPP m)) () -> GenT StreamHPP m () trHPP w = foldG_ go' bufferP_empty where go' s e = do -- liftIO . putStrLn $ "trHPP: read: " ++ show e go s e go b@(_,q) (TE p z) | S.null q = yield (TE p z) >> return b go b@(_,q) (LE p) | S.null q = yield (LE p) >> return b go b@(_,q) (GEnd p) | S.null q = yield (GEnd p) >> return b -- GBeg switches on the look-ahead go (_,q) (GBeg p) | S.null q = return (p+w,S.singleton (p+w,buf_empty)) go (p0,q) (GBeg p) = check (p0,q |> (p+w,buf_empty)) p go (p0,q) (GEnd p) | q' :> (_,b) <- S.viewr q = pop p0 q' (GBeg (Small p) <|| (b ||> (GEnd p))) go (p0,q) (TE p z) = check (p0,push (TE p z) q) p go (p0,q) (LE p) = check (p0,push (LE p) q) p push e q | q' :> (p,b) <- S.viewr q = q' |> (p,b ||> e) pop p0 q b | S.null q = buf_emit b >> return bufferP_empty pop p0 q b | q' :> (p,b') <- S.viewr q = return (p0, q' |> (p,buf_ccat b' b)) -- Check if the outer group is already too wide check (p0,q) p | p <= p0 && S.length q <= w = return (p0,q) check (_,q) p | (_,b) :< q' <- S.viewl q = buf_emit (GBeg TooFar <|| b) >> check' q' p check' q p | (p',_) :< _ <- S.viewl q = check (p',q) p | otherwise = return bufferP_empty streamHPP :: PageWidth -> IO () streamHPP w = runGenT (trHPP w . trHPB . genB $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) {- Here is a different edge case Group (Group (Group ... :+: Group ...) :+: Group (Group ... :+: Group ...) ) An unwary algorithm has to traverse to the leafs before it decides if a group fits; the leaves may be arbitrarily deep. The solution is to notice that if the document is normalized (and each group is at least one character wide), we can stop the traversal after w groups (even if we found no Text element). -} -- The trace below clearly shows the look-ahead -- and the fact that it is limited by the page width. -- For page width < 5, we start emitting StreamHPP elements -- before we read all of StreamHPB. {- *PPYield> streamHPP 5 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 trHPP: read: TE 5 "C" trHPP: read: GEnd 5 trHPP: read: GEnd 5 Generated: GBeg (Small 5) Generated: TE 1 "A" Generated: LE 2 Generated: GBeg (Small 5) Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" Generated: GEnd 5 Generated: GEnd 5 *PPYield> streamHPP 4 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 trHPP: read: TE 5 "C" Generated: GBeg TooFar Generated: TE 1 "A" Generated: LE 2 trHPP: read: GEnd 5 Generated: GBeg (Small 5) Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" Generated: GEnd 5 trHPP: read: GEnd 5 Generated: GEnd 5 *PPYield> streamHPP 3 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 Generated: GBeg TooFar Generated: TE 1 "A" Generated: LE 2 trHPP: read: TE 5 "C" trHPP: read: GEnd 5 Generated: GBeg (Small 5) Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" Generated: GEnd 5 trHPP: read: GEnd 5 Generated: GEnd 5 *PPYield> streamHPP 2 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" Generated: GBeg TooFar Generated: TE 1 "A" Generated: LE 2 trHPP: read: LE 4 trHPP: read: TE 5 "C" Generated: GBeg TooFar Generated: TE 3 "B" Generated: LE 4 Generated: TE 5 "C" trHPP: read: GEnd 5 Generated: GEnd 5 trHPP: read: GEnd 5 Generated: GEnd 5 *PPYield> streamHPP 1 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 Generated: GBeg TooFar Generated: TE 1 "A" Generated: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 Generated: GBeg TooFar Generated: TE 3 "B" Generated: LE 4 trHPP: read: TE 5 "C" Generated: TE 5 "C" trHPP: read: GEnd 5 Generated: GEnd 5 trHPP: read: GEnd 5 Generated: GEnd 5 -} -- Analysis: quite like trHPA. However, we keep in mind that the buffer -- length is bound by w. Therefore, latency and extra space are -- all bounded by w rather than n. -- Formatting -- Producing the stream of Strings -- The final step is the formatting: transforming the pruned -- StreamHPP to a stream of Strings. -- Indication if Lines within a group should be formatted as spaces. -- This is the stack reflecting Fit for the nested groups type Fits = [Bool] -- Again we use the finite state transducer foldG, whose state -- is (Fits,HPL). The second component is the HP at the _end_ of -- the current line (any HP past that is over the page width). trFormat0 :: Monad m => PageWidth -> GenT StreamHPP (StateT (Fits, HPL) (GenT String m)) () -> GenT String m () trFormat0 w = foldG_ go ([False],w) where go (f,l) (TE _ z) = yield z >> return (f,l) go (f@(True:_),l) (LE _) = yield " " >> return (f,l) go (f@(False:_),l) (LE p) = yield "\n" >> return (f,p+w) go (f,l) (GBeg TooFar) = return (False:f,l) go (f,l) (GBeg (Small p)) = return ((p <= l):f,l) go (_:f,l) (GEnd _) = return (f,l) -- Analysis: unit latency (we immediately emit the received data) -- Overall processing time is linear in the size of the stream. -- Alas, we need extra space for the stack Fits. We will -- eliminate that below. streamF0 :: PageWidth -> IO () streamF0 w = runGenT (trFormat0 w . trHPP w . trHPB . genB $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) -- Formatting of the sample document -- Again, for small PageWidth, we emit the formatted strings before -- we have read the whole input stream. Our algorithm is on-line. {- streamF0 5 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 trHPP: read: TE 5 "C" trHPP: read: GEnd 5 trHPP: read: GEnd 5 Generated: "A" Generated: " " Generated: "B" Generated: " " Generated: "C" streamF0 4 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 trHPP: read: TE 5 "C" Generated: "A" Generated: "\n" trHPP: read: GEnd 5 Generated: "B" Generated: " " Generated: "C" trHPP: read: GEnd 5 streamF0 3 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 Generated: "A" Generated: "\n" trHPP: read: TE 5 "C" trHPP: read: GEnd 5 Generated: "B" Generated: " " Generated: "C" trHPP: read: GEnd 5 streamF0 2 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 "B" Generated: "A" Generated: "\n" trHPP: read: LE 4 trHPP: read: TE 5 "C" Generated: "B" Generated: "\n" Generated: "C" trHPP: read: GEnd 5 trHPP: read: GEnd 5 streamF0 1 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 Generated: "A" Generated: "\n" trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 Generated: "B" Generated: "\n" trHPP: read: TE 5 "C" Generated: "C" trHPP: read: GEnd 5 trHPP: read: GEnd 5 -} -- Better formatting -- The stack Fits has a particular property, following -- from the observation that if a group fits, all of its children -- groups fit as well. Therefore, if an element of the stack is False, -- the elements below must all be False. Conversely, if an element -- is True, the elements above it, through the top of the stack, must -- all be True. Thus the whole stack is adequately represented -- by a single number, the number of True elements at the top. type FitI = Int trFormat :: Monad m => PageWidth -> GenT StreamHPP (StateT (FitI, HPL) (GenT String m)) () -> GenT String m () trFormat w = foldG_ go (0,w) where go (f,l) (TE _ z) = yield z >> return (f,l) go (0,l) (LE p) = yield "\n" >> return (0,p+w) go (f,l) (LE _) = yield " " >> return (f,l) go (0,l) (GBeg TooFar) = return (0,l) go (0,l) (GBeg (Small p)) = return (if p <= l then 1 else 0,l) go (f,l) (GBeg _) = return (f+1,l) go (0,l) (GEnd _) = return (0,l) go (f,l) (GEnd _) = return (f-1,l) streamF :: PageWidth -> IO () streamF w = runGenT (trFormat w . trHPP w . trHPB . genB $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) {- streamF 5 streamF 4 streamF 3 streamF 2 streamF 1 trHPP: read: GBeg 0 trHPP: read: TE 1 "A" trHPP: read: LE 2 Generated: "A" Generated: "\n" trHPP: read: GBeg 2 trHPP: read: TE 3 "B" trHPP: read: LE 4 Generated: "B" Generated: "\n" trHPP: read: TE 5 "C" Generated: "C" trHPP: read: GEnd 5 trHPP: read: GEnd 5 -} -- Final pretty-printer pp :: Monad m => PageWidth -> Doc -> GenT String m () pp w = trFormat w . trHPP w . trHPB . genB {- -- Old version -- Elements of the stream -- Each element is annotated with the rolling width -- (the width after the element has been emitted) data DocE1 = TE1 Width String | LE1 Width | GBeg1 GID Width | GEnd1 GID Width deriving Show type GID = Int -- Group identifier -- We transform DocE1 to a different stream. -- It is the beginning of the group that contains the width -- of the group. -- The crucial part is that we compute the width up to -- PageWidth; anything wider is TooWide. data LWidth = Small Width | TooWide deriving Show data DocE2 = TE2 String | LE2 | GBeg2 GID LWidth | GEnd2 GID deriving Show -- We use the queue and the IntMap -- Operations in IntMap are essentially constant-time -- We use a queue for accumulating look-ahead. -- The queue is bounded because of PageWidth! -- Invariant: the left end of the non-empty queue is GBeg1 type Queue = S.Seq DocE1 -- Binding for `logic variables' -- Each group as an associated logic variable -- Status of the logic variable -- If a `logic variable' contains LV_abs, it will -- be `rebound' to LV_rel. If the logic variable -- is already bound to LV_rel, it will never be rebound. -- Thus, width computations are done only once and never -- repeated. -- in imperative language, we can implement LVar using simple references -- cells. -- We stress that complexity of pointer dereference and lookup in IntMap are the -- same: if we take pointer dereference to be constant, under the same -- assumptions IntMap lookup is constant. If the size of the memory is -- not bounded, then both pointer dereference and Map lookup take -- O(log N) time where N is the size of the heap. data LVar = LV_abs Width -- absolute position | LV_rel Width -- width of the group type Subst = M.IntMap LVar -- As the type shows, we transform the stream of DocE1 to the stream -- of DocE2, using the finite state transducer foldG whose state -- is (Queue,Subst) transp_stream2 :: forall m. (Monad m, MonadIO m) => PageWidth -> GenT DocE1 (StateT (Queue, Subst) (GenT DocE2 m)) () -> GenT DocE2 m () transp_stream2 w gen = foldG go (S.empty,M.empty) gen where go :: (Queue,Subst) -> DocE1 -> GenT DocE2 m (Queue,Subst) go (q,subst) e = do liftIO . putStrLn $ "transp: read: " ++ show e check q subst e -- no look-ahead: emit the elements as we got them check q subst (TE1 _ z) | S.null q = yield (TE2 z) >> return (q,subst) check q subst (LE1 _) | S.null q = yield LE2 >> return (q,subst) check q subst (GEnd1 g _) | S.null q = yield (GEnd2 g) >> return (q,subst) -- GBeg1 switches on the look-ahead -- We enter a fresh logic variable for the group check q subst e@(GBeg1 g p) = flush p (q |> e) (M.insert g (LV_abs p) subst) check q subst e@(GEnd1 g p) = flush' p (M.adjust adj g subst) (S.viewl (q |> e)) where adj (LV_abs p') = LV_rel (p - p') check q subst e@(TE1 p z) = flush p (q |> e) subst check q subst e@(LE1 p) = flush p (q |> e) subst -- check to see if the group whose GBeg is at the left end of -- the queue has become too wide. If so, we can emit GBeg2 -- and we flush all delayed elements until the first GBeg whose -- size is not determined yet. -- Since we normalize the stream (see norm above), if the number -- of open groups exceeds w, the topmost group is too wide. -- If subst[g] = LV_abs _, then the width of the group g is not yet -- determined. -- BTW, viewl is the O(1) operation. flush p q subst | GBeg1 g p' :< q' <- S.viewl q = if p - p' > w then yield (GBeg2 g TooWide) >> flush' p subst (S.viewl q') else return (q,subst) -- now really do the flushing -- As we emit GEnd2 g, we may remove -- the binding for g from subst; we won't need it again. flush' p subst EmptyL = return (S.empty, subst) flush' p subst (TE1 _ z :< q') = yield (TE2 z) >> flush' p subst (S.viewl q') flush' p subst (LE1 _ :< q') = yield LE2 >> flush' p subst (S.viewl q') flush' p subst (GEnd1 g _ :< q') = yield (GEnd2 g) >> flush' p (M.delete g subst) (S.viewl q') flush' p subst (GBeg1 g _ :< q') | (Just (LV_rel pd)) <- M.lookup g subst = yield (GBeg2 g (Small pd)) >> flush' p subst (S.viewl q') flush' p subst (a :< q) = flush p (a <| q) subst -- Check the sample stream emit_stream2_r :: PageWidth -> IO () emit_stream2_r w = runGenT (transp_stream2 w . emit_stream1 $ doc1) (\i -> putStrLn $ "Generated: " ++ show i) -} -- ------------------------------------------------------------------------ -- Pretty-printing of a tree example, from Swierstra-linear data Tree = Tree String [Tree ] tree1 = Tree "aaa" [Tree "bbbb" [Tree "ccc" [ ] , Tree "dd" [ ] ] , Tree "eee" [ ] , Tree "ffff" [Tree "gg" [ ] , Tree "hhh" [ ] , Tree "ii" [ ] ] ] show_tree (Tree s ts) = Group ( Text s :+: -- nest (length s) (show_kids ts) ) where show_kids [] = Text "" show_kids ts = Text "[" :+: show_forest ts :+: Text "]" show_forest ts = foldr (:+:) (Text "") (intersperse (Text "," :+: Line) (Prelude.map show_tree ts)) {- 15 and 20 aaa[bbbb[ccc, aaa[bbbb[ccc, dd], dd], eee, eee, ffff[gg, ffff[gg, hhh, hhh, ii]] ii]] -} doct = show_tree tree1 treeB :: IO () treeB = runGenT (genB doct) (\i -> putStrLn $ "Generated: " ++ show i) treeHPP :: PageWidth -> IO () treeHPP w = runGenT (pp w doct) (\i -> putStrLn $ "Generated: " ++ show i) {- treeHPP 15 treeHPP 20 -} -- ------------------------------------------------------------------------ -- Benchmarking -- build a `realistic' benchmark document tree of the depth d doc_bench :: Int -> Doc doc_bench d = go 1 d where go l 0 = Group (Text (show l)) go l d = Group (Text (show l ++ "[") :+: go (2*l) (d-1) :+: Line :+: go (2*l+1) (d-1) :+: Text "]") {- doc_bench 2 Group ((((Text "1[" :+: Group ((((Text "2[" :+: Group (Text "4")) :+: Line) :+: Group (Text "5")) :+: Text "]")) :+: Line) :+: Group ((((Text "3[" :+: Group (Text "6")) :+: Line) :+: Group (Text "7")) :+: Text "]")) :+: Text "]") -} -- Size the content of the tree -- (doing so also forces the tree) doc_count :: Doc -> Int doc_count (Text s) = length s doc_count Line = 1 doc_count (d1 :+: d2) = doc_count d1 + doc_count d2 doc_count (Group d) = doc_count d -- doc_count $ doc_bench 2 -- 16 -- Generate the stream that corresponds to doc_bench, without -- constructing the doc itself -- The document tree is the main consumer of space -- The function is not tail recursive, and so consumes space -- proportional to the depth of the tree d. stream_bench :: Monad m => Int -> GenT StreamB m () stream_bench d = go 1 d where go l 0 = do yield (GBeg ()) yield (TE () (show l)) yield (GEnd ()) go l d = do yield (GBeg ()) yield (TE () (show l ++ "[")) go (2*l) (d-1) yield (LE ()) go (2*l+1) (d-1) yield (TE () "]") yield (GEnd ()) -- Let's check the stream for the sample document doc1 {- runGenT (stream_bench 2) (\i -> putStrLn $ "Generated: " ++ show i) -} benchF :: Int -> PageWidth -> IO () benchF d w = runGenT (pp w $ doc_bench d) (\i -> putStrLn $ "Generated: " ++ show i) {- benchF 2 7 Generated: "1[" Generated: "2[" Generated: "4" Generated: "\n" Generated: "5" Generated: "]" Generated: "\n" Generated: "3[" Generated: "6" Generated: " " Generated: "7" Generated: "]" Generated: "]" benchF 2 2 -} -- Using the already generated document stream pp' w = trFormat w . trHPP w . trHPB benchF' :: Int -> PageWidth -> IO () benchF' d w = runGenT (pp' w $ stream_bench d) (\i -> putStrLn $ "Generated: " ++ show i) {- benchF' 2 7 Generated: "1[" Generated: "2[" Generated: "4" Generated: "\n" Generated: "5" Generated: "]" Generated: "\n" Generated: "3[" Generated: "6" Generated: " " Generated: "7" Generated: "]" Generated: "]" -} -- For benchmarking purposes, instead of writing the formatted output, -- we count the total size and the number of lines -- Initially I didn't have strict fields below, and the memory -- usage was linear! Laziness always gets you! data Counts = Counts !Int !Int deriving Show instance Monoid Counts where mempty = Counts 0 0 mappend (Counts c1 l1) (Counts c2 l2) = Counts (c1+c2) (l1+l2) fmt_count :: Doc -> PageWidth -> Counts fmt_count d w = execWriter $ runGenT (pp w d) (\i -> tell $ Counts (length i) (if i == "\n" then 1 else 0)) -- Building the tree first benchD :: Int -> PageWidth -> Counts benchD d w = fmt_count (doc_bench d) w {- benchD 2 7 -- Counts 16 2 benchD 2 2 -- Counts 16 3 benchD 15 10 -- Counts 414870 32767 benchmarking: benchC 10 50 -} -- Working directly from the stream, avoiding constructing the original -- document, which takes a lot of memory. -- Use benchS to show formatting proceeds in constant memory. benchS :: Int -> PageWidth -> Counts benchS d w = execWriter $ runGenT (pp' w (stream_bench d)) (\i -> tell $ Counts (length i) (if i == "\n" then 1 else 0)) {- benchS 2 7 -- Counts 16 2 benchS 2 2 -- Counts 16 3 benchS 15 10 -- Counts 414870 32767 -} -- Building the tree first mainD :: IO () mainD = getArgs >>= go where go [ds,ws] | [(d,"")] <- reads ds, [(w,"")] <- reads ws = do print (d,w) let doc = doc_bench d putStrLn $ "Content " ++ show (doc_count doc) -- also actualizes the tree stat0 <- get_stat print (fmt_count doc w) stat1 <- get_stat putStrLn $ "Content " ++ show (doc_count doc) -- making sure tree is in memory print_stat stat1 stat0 -- Print statistics. Must run the program with -T rts flag get_stat = performGC >> getGCStats print_stat s1 s0 = do -- putStrLn $ "before" ++ show s0 -- putStrLn $ "after" ++ show s1 putStrLn . (">>maxBytesUsed0 "++) . show $ maxBytesUsed s0 putStrLn . (">>maxBytesUsed1 "++) . show $ maxBytesUsed s1 putStrLn . (">>maxBytesUsed-currentUsed "++) . show $ maxBytesUsed s1 - currentBytesUsed s1 putStrLn . (">>Average-formatting "++) . show $ ((fromIntegral (cumulativeBytesUsed s1 - cumulativeBytesUsed s0)) / (fromIntegral (numByteUsageSamples s1 - numByteUsageSamples s0)) - fromIntegral (currentBytesUsed s1)) putStrLn . (">>peakMegabytesAllocated "++) . show $ peakMegabytesAllocated s1 putStrLn . (">>numCGs "++) . show $ (numGcs s1 - numGcs s0) putStrLn . (">>mutatorCpuSeconds "++) . show $ mutatorCpuSeconds s1 - mutatorCpuSeconds s0 putStrLn . (">>gcCpuSeconds "++) . show $ gcCpuSeconds s1 - gcCpuSeconds s0 putStrLn . (">>cpuSeconds "++) . show $ cpuSeconds s1 - cpuSeconds s0 -- Using the stream, without building the big tree mainS :: IO () mainS = getArgs >>= go where go [ds,ws] | [(d,"")] <- reads ds, [(w,"")] <- reads ws = do print (d,w) print (benchS d w) -- Compile the code -- ghc -O2 -rtsopts -main-is PPYield.mainD PPYield.hs -- ghc -O2 -rtsopts -main-is PPYield.mainS PPYield.hs -- To run this code -- GHCRTS="-tstderr" ./PPYield 15 50 -- for w in 10 15 20 25 30 35 40 45 50 55; do for i in 1 2 3 4 5; do GHCRTS="-tstderr" ./PPYield 15 $w; done; echo "===="; done -- for d in 10 11 12 13 14 15 16 17 18; do for i in 1 2 3 4 5; do GHCRTS="-tstderr" ./PPYield $d 50; done; echo "===="; done