The contest happened at a computer-focused community Lobsters. Somewhat like HN, Lobsters is a forum for its members to submit and discuss articles: from journal papers to blog posts, announcements and videos. The community is what one may call the `mainstream IT': developers, from embedded systems to the Web, system administrators as well as students, computer hobbyists, and a few academics.
One day an article about a common job interview question was posted for discussion. Spontaneously, members started submitting their own solutions, in their language of choice. They also commented on their own and others solutions.
It has struck me how the the solutions neatly fall into two classes -- irrespective of the language they are written in, be it a functional (such as Haskell, Erlang or Clojure) or imperative (Go, Ruby, Raku). Only now I realized that recursion is dual to iteration not just in theory. Like loops, it is low-level: selecting one apple at a time. Likewise is the `pure-functional' and imperative handling of state.
How do we grasp all the apples at once? As in learning and teaching of natural languages, should we focus less on grammar and more on conversation and storytelling?
The talk was given at an online meeting of IFIP WG 2.1 on July 1, 2021.
Alexey Grigorev (@Al_Grigor) February 3, 2021 (Twitter)Most candidates cannot solve this interview problem:Input:
"aaaabbbcca"
Output:[("a",4), ("b", 3), ("c", 2), ("a", 1)]
Write a function that converts the input to the output
I ask it in the screening interview and give it 25 minutes
How would you solve it?
Below is a sample of posted solutions. They are written in various languages by various people -- yet somehow fall into two groups, which I call for now Group I and Group II. I'll tell what they mean only later -- I hope you'd allow me a bit of suspense.
(->> "aaaabbbcca" (partition-by identity) (map (juxt (comp str first) count))))
The submitter also gave another solution, ``a bit chattier but also more legible:''
(for [p (partition-by identity "aaaabbbcca" )] [(-> p first str) (count p)])
It is worth noting that comp
is the functional composition, and
juxt
, with the rough type (a->b * a->c) -> a -> (b * c)
,
is the combinator that takes a tuple of functions and a value, applies
each function to it and returns the results as a tuple.
"aaaabbbcca".split("").chunk_while {|a,b| a==b}.map {|a| [a.first, a.size]} "aaaabbbcca".chars.slice_when {|c,n| c != n }.map {|s| [s[0], s.size] } 'aaaabbbcca'.chars.chunk(&:itself).map{|x,y| [x, y.size]}
(((1∘+ +\) ⊢ ≠ ¯1∘⌽) (((⊃,≢) ⊢) ⌸) ⊢) 'aaaabbbcca' f←(⊃,≢)¨⊢⊆⍨1+(+\0,2≠/⊢) ⊢x←f'aaaabbbcca'
type result struct { letter string count int } func main() { const input = "aaaabbbcca" var ret []result currentLetter := string(input[0]) countCurrentLetter := 1 for _, elem := range input[1:] { elemAsString := string(elem) if currentLetter == elemAsString { countCurrentLetter++ } else { ret = append(ret, result{currentLetter, countCurrentLetter}) currentLetter = elemAsString countCurrentLetter = 1 } } ret = append(ret, result{currentLetter, countCurrentLetter}) fmt.Printf("%+v", ret) }
The salient feature of this and similar solutions is explicit
iteration: the for
loop.
(for/fold ([p #f] [cnt 0] [counts null] #:result (cdr (reverse (cons `(,p ,cnt) counts)))) ([c (in-string "aaaabbbcca")]) (if (equal? p c) (values c (add1 cnt) counts) (values c 1 (cons `(,p ,cnt) counts))))
The solution is pure functional -- yet closely resembling the Go solution above.
"aaaabbbcca".subst( /(.) $0*/ , {"({$0},{$/.chars})" }, :g )
Raku is the artist formerly known as Perl 6. Perl has always been known for its (ir)regular expressions, whose power is on display.
L = "aaaabbbcca", lists:reverse(lists:foldl( fun (Elt, [ {[Elt], N} | R ]) -> [ {[Elt], N + 1} | R]; (Elt, Acc) -> [{[Elt], 1} | Acc] end, [], L )).
It is rather similar to the Racket solution. We see the
characteristic for Erlang non-linear pattern matching: the variable
Elt
appears twice in the first pattern.
[(k, len(list(g))) for k, g in itertools.groupby("aaaabbbcca")]
fn run_length_encode(ins: &str) -> Vec<(char, usize)> { let mut out = vec![]; let mut i = ins.chars(); if let Some(mut c) = i.next() { let mut count = 1; for new_c in i { if new_c == c { count += 1; } else { out.push((c, count)); count = 1; c = new_c; } } out.push((c, count)); } out}
It is quite like the solution in Go -- but cleaner and with a better handling of edge cases. The author did grasp the nature of the problem: run-length encoding.
function rle(itr) acc = Tuple{eltype(itr), Int}[] x = iterate(itr) isnothing(x) && return acc # iterable is empty last, state = x n = 1 for v in Iterators.rest(itr, state) if last == v n += 1 else push!(acc, (last, n)) last = v n = 1 end end push!(acc, (last, n)) return acc end
It mirrors the Rust solution above.
main = interact soln soln = show . reverse . go [] go counts [] = counts go counts (x:xs) = go (inc x counts) xs inc c ((x,ct):xs) | c == x = (x,ct+1):xs inc c xs = (c, 1):xs
Although this solution relies on the explicit recursion rather than
fold
, it is rather similar to the Erlang solution.
(A different person later mentioned another solution, using group
).
One commenter wondered:
``Reading the Clojure solution involves following the thought process of how the author ended up there. Reading the equivalent Go solution has very little cognitive overhead.
How does Clojure programmers write code that is not just brief and elegant in the eyes of the author, but can also be read quickly by others without time consuming mental gymnastics?''
and another replied:
``I guess it's just a matter of what you're used to. I find both Go solutions above arduous to understand because you have to follow the algorithm and construct what it functionally does in your head. Whereas in most of the clojure solutions the code directly tells you what's happening.Eg. the top post with comments:
; threading macro (take input, then do stuff to it in steps) (->> "aaaabbbcca" ; split into lists of consecutive same character (partition-by identity) ; map each item (list of chars) with a function (map ; return list of outputs of functions (juxt ; first character as string (comp str first) ; count of characters count)))The information is much more dense. If you are good at reading it, you understand it at a glance.
I believe the same is true for the APL solutions. I can't read them, but someone who does sees the algorithm clearly.''
Yet another commenter concurred:
``Same here; they [Go solutions] just seem super tedious and low-level; like you're spoon-feeding the compiler on basic things instead of just describing the algorithm naturally.
If you don't have the vocabulary to describe an algorithm, of course the low-level version is going to be easier to read, much like someone who is only just learning English will find the `Simplified English' wikipedia more accessible than the regular one that often uses specialized jargon. But if you work with algorithms every day, you owe it to yourself to invest in learning the jargon!''
``Suppose we have a box of apples from which we wish to select all of the good apples. Not wishing to do the task ourselves, we write a list of instructions to be carried out by a helper.The instructions corresponding to a conventional language might be expressed something like the following: Select an apple from the box. If it is good, set it aside in some place reserved for the good apples; if it is not good, discard it. Select a second apple; if it is good put it in the reserved place, and if it is not good discard it.... Continue in this manner examining each apple in turn until all of the good apples have been selected. In summary, then, examine the apples one at a time starting with the first one we pick up and finishing with the last one in the box until we have selected and set aside all of the good apples.
On the other hand the instructions corresponding to an array language could be stated simply as ``Select all of the good apples from the box.'' Of course, the apples would still have to be examined individually, but the apple-by-apple details could be left to the helper.
Conventional programming languages may be considered then ``one-apple-at-a-time'' languages with machine languages being a very primitive form, whereas array languages may be considered ``all-the-apples-at-once'' languages.''
Keith Smillie: ``My Life with Array Languages''. October 2005. This analogy is credited to Frederick P. Brooks: that Fred Brooks, of the Mythical Man-Month fame.
The second, surprising observation is that the submitted solutions fall into two classes:
Here are the languages used to write the solutions in each group:
Comparing the pure functional Racket solution with the imperative Go (or, better, Rust or Julia) solutions shows them surprisingly similar. Recursion is indeed dual to iteration, and fold is just a for-loop in disguise. Neither recursion nor fold help us grasp `all the apples at once'.
Looking at the Racket and Rust (or Go, Julia) solutions leads to
another conclusion, about state. Although the Racket solution is `pure
functional', it is just as stateful as Rust and Go solutions. There is
the same amount of state. The state is updated in the Racket solution
at once, with the values
clause; the update is matched to its target
by position rather by name, which is quite error prone. (As a side
observation, the handling of edge cases is clearer and obviously
correct in Rust and Julia than in the Racket or Go solutions.)
Thus being `pure functional' does not eradicate state; it merely changes the way state is handled -- which could be either more or less clearer and convenient. What unequivocally improves the handling of the state is partitioning and encapsulating it into `higher-level', composable operations like `groupBy': see Clojure, Ruby, Python solutions. The groupBy may itself be implemented with mutable state or without -- no matter, the state is abstracted away and the user of groupBy does not have to care about it.
The blog post by Savo Djuric that prompted the contest has also described a solution, which is worth quoting.
Since I recently started learning Clojure, and I regularly do 4clojure exercises, this problem seemed intriguing, so I decided to try to solve it immediately. I fired up my REPL, and within several minutes i came up with:(interleave (map str (map first (partition-by identity "aaaabbbcca"))) (map count (partition-by identity "aaaabbbcca"))) => ("a" 4 "b" 3 "c" 2 "a" 1)Yaay! It works! But it is not exactly what was asked for. Nor is it very ‘clean’, so to say. What bothers me the most in this solution is that i have repetitive code, so I decided to solve that with let:...
(defn chop-chop [coll] (let [x (partition-by identity coll)] (map list (map (comp str first) x) (map count x))))I must say that it felt great to solve a problem within several minutes (albeit it was not that hard), since I'm only at the beginning of my journey with Clojure.
The blog post shows the ideal, to me, way to solve the problem:
grasping its nature -- groupBy
, or
partition-by
in Clojure, -- prototyping the solution, understanding
its imperfections and wanting to do better, taking steps to improve.
I think the author has a bright future as a programmer.
From his blog I learned that the author never went to college, got into programming by accident (but prompted by his talented brother), and has been studying programming languages (now, Clojure) entirely on his own. May be that's the secret.
Another pleasant surprise was seeing the solution that in my eyes is ideal:
[(k, len(list(g))) for k, g in itertools.groupby("aaaabbbcca")]This Python solution is expressed in what is essentially the conventional mathematical notation, readable by everyone (unlike APL). More importantly, it is linguistically economical -- again, unlike APL. It uses just the two directly relevant facilities:
groupBy
and comprehension. Notation is clearly here the tool of thought.
A natural language is also a tool of thought. Building vocabulary and understanding when and how to use a particular word is the hardest part of learning a language. That natural and formal languages have a lot in common is the idea with a long pedigree -- famously and forcefully articulated by Montague in 1970s. Keith Smillie, in his paper about array languages cited earlier further argues that teaching programming ought to adapt the best ideas of teaching foreign languages.
Both in programming and natural languages, one can acquire basics relatively quickly. Fluency takes decades. There doesn't seem to be anything better than constant practice, speaking and listening, writing and rewriting. The ending of the famous interview between Ernest Hemingway and George Plimpton (the `Interviewer', the magazine's founder and editor) (Paris Review, issue 18, Spring 1958) comes to mind:
- INTERVIEWER
- Do you do any rewriting as you read up to the place you left off the day before? Or does that come later, when the whole is finished?
- HEMINGWAY
- I always rewrite each day up to the point where I stopped. When it is all finished, naturally you go over it. You get another chance to correct and rewrite when someone else types it, and you see it clean in type. The last chance is in the proofs. You’re grateful for these different chances.
- INTERVIEWER
- How much rewriting do you do?
- HEMINGWAY
- It depends. I rewrote the ending to Farewell to Arms, the last page of it, thirty-nine times before I was satisfied.
- INTERVIEWER
- Was there some technical problem there? What was it that had stumped you?
- HEMINGWAY
- Getting the words right.
groupBy
, which is the common name for
the operation to group consecutive elements of a sequence. We then
need to present the groups in the form of (group-element, counter)
pairs.
I will use OCaml for concreteness (and for other reasons that should be clear soon). Suppose we have the operation
group : 'a list -> 'a list listwhich groups equal (according to some criterion) consecutive elements of a list in their own list, and returns the list of such groups. (A group cannot be empty). Then the first approximation to the solution is as follows, quite close to the first Clojure solution above:
let rll (group : char list -> char list list) : char list -> (char * int) list = group >> List.(map (fun g -> (hd g, length g)))Type annotations are optional, but I prefer to write them. (One reason to use OCaml is types: not fancy, but sufficient to visualize data transformations, without too many details.) Since
group
is the assumption, it appears as the parameter (argument).
Here, (>>)
is the left-to-right functional composition (as in F#
).
It is convenient to think of input as a sequence, such as list. But a
string is not a list in OCaml -- but can be easily turned into one,
with the help of a couple standard library functions:
let rll (group : char list -> char list list) : string -> (char * int) list = String.to_seq >> List.of_seq >> group >> List.(map (fun g -> (hd g, length g)))With the
fork
combinator, so frequent in APL (but a left-to-write
version):
let fork : ('a -> 'b) -> ('a -> 'c) -> ('b -> 'c -> 'd) -> 'a -> 'd = fun f g h -> fun x -> h (f x) (g x)the solution can be written even cleaner:
let rll (group : char list -> char list list) : string -> (char * int) list = String.to_seq >> List.of_seq >> group >> List.(map (fork hd length pair))
We only need the implementation of group
itself to run this
code. Alas, the OCaml standard library at present does not provide a
grouping operation. I can of course write it. But let me rephrase this
approach in more detail.
Let us start from scratch and develop the solution in smaller steps, emphasizing tighter and tighter `grasping' of the data flow. The stress is on understanding data flow, rather than on writing loops or recursive functions and struggling with termination conditions and edge cases. We won't obsess about state either: it comes naturally when needed. Grouping also comes naturally, and simpler. When we reach the solution, one look at it will tell that it must be right. It is. It is also reasonably efficient. It can even be later mechanically recast into C for better efficiency -- not the sort of C one typically writes by hand, but correct and problem-free.
First, from the problem statement one quickly understands that we are
facing a sequential problem: transforming a sequence of
characters into some other sequence (of pairs),
locally. Indeed, appending more characters to the sample input
"aaaabbbcca"
will not affect the prefix ("a",4), ("b", 3), ("c", 2)
of the sample output. Once this sequential character is
understood, the rest follows almost automatically.
To think about and develop the solution, we need a vocabulary, which
we build up as we go along. First we need a word for a sequence. Let's
call it 'a seq
: a type of a sequence of elements of type 'a
. Let
us not worry for now how it is actually implemented and treat it as an
abstract type.
Since the input is given as string, we obviously need the operation
of_string : string -> char seqto transform (or, render) a string as a sequence of characters. Assume it is given. Again, what is important for now is the type, which represents the data flow.
Next we have to deal with grouping of neighboring characters, which is inherently stateful: which group the character belongs to depends on the context, that is, on the character that came before, or follows next. Let's go with the latter and assume the operation
look_ahead : 'a seq -> ('a * 'a option) seqthat pairs the current element of sequence with the element that comes next. The next element may be absent (if the sequence is about to end), hence the
'a option
type, allowing for the missing value.
The grouping itself is then inserting group breaks, determined by peeking at the next element:
type 'a annot = Plain of 'a | Break of 'a let group : ('a * 'a option) seq -> 'a annot seq = map @@ function | (x,Some next) when x = next -> Plain x | (x,_) -> Break xThat is, we transform a looked-ahead sequence into a sequence of elements annotated with group boundaries:
Plain x
means the current
group continues, Break x
means x
is the last element in its
group. The next element (if any) begins a different group.
(The 'a annot
type could have been realized differently, as a pair
'a * bool
.) We have also assumed the existence of the mapping
operation
map : ('a -> 'b) -> 'a seq -> 'b seq
Finally, we have to count, which again requires state (the count of elements in the current group) and emit the tuples at the group boundary:
let count : 'a annot seq -> ('a * int) seq = map_accum_filter 0 @@ fun cnt -> function | Plain x -> (None, cnt+1) | Break x -> (Some (x,cnt+1), 0)The group break forces the output (describing the group that has just ended) and resets the counter. We assumed the existence of accumulating map-filter, a rather popular operation in stream processing:
map_accum_filter : 'state -> ('state -> 'a -> 'b option * 'state) -> 'a seq -> 'b seqThe mapping
('state ->'a -> 'b option * 'state)
takes the current
state and the current element and returns a tuple of a possibly
transformed element and the new state. If the possibly transformed
element is None
, nothing is emitted in the output stream; the state
advances nevertheless. From the type of it, map_accum_filter
looks
`pure', with functional-state passing. Hold that thought.
That is it. The solution to the problem, the run-length-encoded sequence of characters, is then the composition of what we have developed so far:
let rll = of_string >> look_ahead >> group >> countwhich expresses the problem in the concise and obviously correct way. We only need to add to our chain an observation operation, to transform the resulting
(char * int) seq
to a list, or
to print it out.
One advantage of our solution is the ease of modification. For example, if the source string is UTF-8 encoded, we only need to add to the pipeline one more operation: UTF-8 decoding and grapheme clustering:
utf_decode : char seq -> uchar seqWe leave this as an exercise to the reader.
To actually run the rll
program, we have to fulfill our promises:
to pick a concrete representation for 'a seq
and implement the
operations that we assumed for it. In OCaml terms, we have to
implement the following signature:
module type simple_seq = sig type 'a seq val of_string : string -> char seq val map : ('a -> 'b) -> 'a seq -> 'b seq val look_ahead : 'a seq -> ('a * 'a option) seq val map_accum_filter : 'state -> ('state ->'a -> 'b option * 'state) -> 'a seq -> 'b seq val iter : ('a -> unit) -> 'a seq -> unit val to_list : 'a seq -> 'a list endIt turns out that the OCaml standard library has almost everything we need: the
Seq.t
data type and the mapping, string/list conversion,
etc. operations on it. What is missing is map_accum_filter
and
look_ahead
. However, the standard library provides filter_map
on
sequences. We merely need to add state, realized as mutable state:
let map_accum_filter : 'state -> ('state -> 'a -> 'b option * 'state) -> 'a seq -> 'b seq = fun init f seq -> let stat = ref init in seq |> Seq.filter_map (fun a -> let (bo,st) = f !stat a in stat := st; bo)The overall
map_accum_filter
does have a `functional' interface with
the explicit state-passing; the implementation is imperative however. There is
no inherent conflict between a clear specification and a fast
implementation. The remaining look_ahead
turns out easily expressed
via map_accum_filter
(we leave the implementation to the reader --
or see the accompanying code).
Looking back at our solution, we notice that all state ends up
encapsulated. The state for look-ahead is encapsulated in
look_ahead
. The grouping per se turned out stateless. Counting
requires one more piece of state. Each operation keeps the state it
needs, without knowing or interfering with the state of others.
We also notice that there are no loops or recursive functions. Neither did we have to deal with termination conditions. That is the feature of all Group I solutions, clearly articulated back in APL: specify how to change the content of a collection or its shape, but don't waste time thinking how to get the elements or where to put them.
Our 'a seq
can also be realized as iterator/generator, to be composed via
for
-loops -- with good performance. Even OCaml Seq.t
fuses by design (albeit incompletely: there are still intermediate data
structures, but of constant size). Perfect fusion is achievable, if one
uses, say, strymonas, which also permits the generation of C
code. Here it is.
void rll(const int64_t * aa_1,const int64_t al_2) { int64_t v_3 = 0; int64_t v_4 = 0; bool v_5 = 0; int64_t v_6 = 0; while (v_6 <= al_2) { int64_t t_7; t_7 = v_6; v_6++; if (t_7 < al_2) { int64_t t_9; t_9 = aa_1[t_7]; if (v_5) { int64_t t_10; t_10 = v_4; v_4 = t_9; if (!(t_10 == t_9)) { int64_t t_11; t_11 = v_3; v_3 = 0; printf("%ld\n",(long)t_10); printf("%ld\n",(long)(t_11 + 1)); } else { v_3++; } } else { v_4 = t_9; v_5 = 1; } } else { if (v_5) { int64_t t_8; t_8 = v_3; v_3 = 0; printf("%ld\n",(long)v_4); printf("%ld\n",(long)(t_8 + 1)); } } }}
One more thing: it all worked on the first try.
apples.c [2K]
The strymonas solution: generated C code. For the generator, see the
directory examples/run-length-encoding
in the
strymonas distribution.
Incremental, Linear Pretty-printing
The stepwise pipeline development can be taken quite
far. The above reference develops a
rather non-trivial algorithm with the optimal, difficult to achieve performance.