Patterns of data flow in words

APL, developed in early 1960s, is still in use, especially in finance. More felt is its influence on algebra of programming (via Backus' FP) and on Matlab. Started as a notation for a (business) data processing course, APL lets us look at computation from a vantage point, discern patterns of data flow and put them into words.

APL was developed when parsing technology and the art and science of programming language design were just emerging. It is only now that we understand, from experience and information theory, that trying to assign meaning to any possible combination of tokens is a bad idea. Spartan syntax, pervasive overloading, uniformity (everything is an array), lack of types all might seem like good ideas. The reality is quite more complex.

How to liberate APL from its style? For three years, I have been trying to explain APL ideas to bright sophomores, by implementing step by step a small array DSL embedded in OCaml. The students never heard of OCaml, let alone APL. This page is to give a taste of that seminar course and tell what came out of it: a new understanding of Game of Life and Plasma Fractals, leading to many tantalizing extensions.



This page collects the material for the seminar course on APL-like programming, and related tutorial and talks. The goal is not implementing APL. It is not even to learn to program in APL or its dialect. The goal is to understand APL's appeal, ideals and enduring ideas, and realize them in one's favorite language.

There is hardly a better way to understand an idea than discovering it for themselves, in the process of solving a practical problem (or, at least, an exercise). Therefore, the course is structured around the progression of specially selected exercises, implementing common tasks of array processing. The result is a simple array processing DSL. Solving an exercise is only half the job. The other half is looking back at the solutions, trying to discern common patterns -- and, when found, naming them and using consciously from now on.

Besides the useful array processing DSL and a set of named common patterns (combinators), the result is the experience of looking at computation from a higher point of view: higher than loops or recursion; seeing, and making explicit, data flow; using, taking performance advantage of, and designing operations on whole arrays.

The main ideas are very old. We are going back to the origins of procedural, flowchart, and modular programming: the ideas of Iverson, McIlroy, Parnas, Goguen -- but look at them with today's eyes and using today's languages.

The course deliberately uses a very simple subset of OCaml: essentially, typed lambda-calculus with some data types and constants. No libraries or packages beyond the standard OCaml library are needed. The code can be easily re-implemented in Scala, Haskell, F#, Rust -- probably Swift and even Java or Python.

The knowledge of neither APL nor OCaml is presupposed. I tell the students that I explain the needed bits of OCaml as we go along. They seemed to have picked OCaml quickly.

Tutorial at <Programming>'23
<> [37K]
The complete code used at the tutorial [32K]
A bit more extended code used over three years in an actual course. The code also includes operations such as tiling and searching.


A bit of history

`Iverson notation', as APL was first known, was developed as a teaching aid, to explain (business) data processing and algorithms such as simplex algorithm often used in economic planning. Iverson states his motivation as follows:
``Applied mathematics is concerned with the design and analysis of algorithms or programs.

The systematic treatment of complex algorithms requires a suitable programming language for their description, and such a programming language should be concise, precise, consistent over a wide area of application, mnemonic, and economical of symbols; it should exhibit clearly the constraints on the sequence in which operations are performed; and it should permit the description of a process to be independent of the particular representation chosen for the data.

Existing languages prove unsuitable for a variety of reasons.''

(``A Programming Language'', 1962)

The idea of notation as a language was further developed in ``Algebra as a Language'', Appendix A of the book Algebra: An Algorithmic Treatment, 1972.

APL is an old language, with a long and rich history. The following are just a few notable milestones:

D.B. McIntyre. Language as an intellectual tool: From hieroglyphics to APL. IBM Systems J., v30, N4, 1991, pp. 554-581
Much longer view, starting from the numerical notation used in ancient Egypt

K.E. Iverson. A personal view of APL. IBM Systems J., v30, N4, 1991, pp. 582-593


Vantage point

The standard example to make an impression, the example that appears in almost any APL introduction is computing the mean value of an array of numbers. It is a very old example. One of its early, notable appearances is in a technical report by then master student Gary Kildall.

Computing the average in most conventional languages involves some sort of loop. For example, in C

    float mean(const int n, const float * a) {
      float sum = 0;
      for(int i=0; i<n; i++)
        sum += a[i];
      return sum / n;
(Kildall used Algol 60, which was popular at that time for writing algorithms.)

In APL, however, computing the mean is simply

which may seem awfully cryptic, needing unpacking. It is best to draw this expression as a diagram:
         ┌──────► +/ ─────┐
         │                ▼
    ────►│                ÷ ───►
         │                ▲
         └──────► ⍴ ──────┘
The shape of the computation becomes easier to see for everyone. The data (array) enters from the left and then forks: the array is passed to the +/ operator (which sums the array elements) -- and to the operator, which returns the length of the array. The results of the two operations are then joined by the division ÷. The summing-up operation +/ can also be visualized as inserting + between every pair of neighboring array elements. The fork operation is not denoted by any symbol in APL: it is implicit in the juxtaposition of two unary operators with a binary in-between.

Nowadays, we recognize +/ (to be precise, /) as the reduce (or, more generally, fold) operation. Uncannily, fold appeared in APL a little bit before it was described by Strachey (in 1962), and quite a bit before it entered functional programming, in Miranda.

In the array DSL developed in the tutorial, the mean value function is written as

    let mean : (d1,float) arr -> float = 
     fork (/.) (reduce (+.)) (length >> float_of_int)
We now see types. Although the type signature is optional in OCaml, writing it out for top-level functions is helpful: after all, it, too, gives an overview of the computation. (It also helps making error messages more precise and informative.) The operation fork is clearly visible. Conversions from integers to floats are also explicit: experience has shown that implicit conversions, albeit seemingly convenient, are a nest of many subtle bugs.

The advantage of APL in showing the overall shape of the computation is described in the following parable, quoted from Keith Smillie.

``Suppose we have a box of apples from which we wish to select all of the good apples. Not wishing to do the task ourselves, we write a list of instructions to be carried out by a helper.

The instructions corresponding to a conventional language might be expressed something like the following: Select an apple from the box. If it is good, set it aside in some place reserved for the good apples; if it is not good, discard it. Select a second apple; if it is good put it in the reserved place, and if it is not good discard it.... Continue in this manner examining each apple in turn until all of the good apples have been selected. In summary, then, examine the apples one at a time starting with the first one we pick up and finishing with the last one in the box until we have selected and set aside all of the good apples.

On the other hand the instructions corresponding to an array language could be stated simply as ``Select all of the good apples from the box.'' Of course, the apples would still have to be examined individually, but the apple-by-apple details could be left to the helper.

Conventional programming languages may be considered then ``one-apple-at-a-time'' languages with machine languages being a very primitive form, whereas array languages may be considered ``all-the-apples-at-once'' languages.''

The apple analogy originated with Frederick P. Brooks who worked closely with Iverson in the early days of APL. That Frederick P. Brooks, of Mythical Man-month fame. Incidentally, he named his eldest son after Kenneth Iverson.

Gary A. Kildall. APL\5500: The language and its implementation. U. of Washington, TR 70-09-04, Computer Science Group. September 1970
Gary Kildall -- the creator of CP/M, once the OS for microcomputers -- was in his student days an implementor of APL. APL history is rife with uncanny coincidences.

Keith Smillie. My Life with Array Languages. October 2005.


What's in an Array

Before we start implementing an array DSL, we have to figure out what array is, actually. As a hint, an array is characterized by its shape and contents.

Let's look at the following three arrays:

    1 2 3 4 5 6          1 2 3              1 2
                         4 5 6              3 4
                                            5 6
It is intuitively clear they all have the same contents, and differ in shape. Let's make this intuition precise. Consider the first array
    1 2 3 4 5 6
call it a. In detailed mathematical notation, it can be specified, rather verbosely, as
    a_0 = 1     a_1 = 2   ...    a_5 = 6
This is a table specification of a function, with the domain [0..5] and range [1..6]. As for the second sample array (call it b)
    1 2 3
    4 5 6
it, too, can be mathematically specified as
    b_00 = 1     b_01 = 2   b_02 = 3  .... b_12 = 6
It also looks like a (tabulated) function: from the domain which is a pair of numbers. More precisely, it is a function [0..1] ⨯ [0..2] -> [1..6]. The last sample array is also a function, from a pair of numbers: [0..2] ⨯ [0..1] -> [1..6]. We see that the range of these three functions is the same: [1..6]. This is the contents. The domains differ: this is the shape.

Thus, array is a function, from a finite domain of indices to the set of elements: the function that given an index returns array's element at that index. That an array is a sort of a function must have occurred to the designers of Fortran: in that language, A(1) may mean either the first element of array A, or invoking the function A on the argument 1. (Forgetting to declare an array and getting an `undefined function' linking error was a mistake that probably every Fortran programmer made.)

A rectangular index domain can be specified by its two opposite points, viz. the smallest and the largest index. If we assume the indices start with 0, we only need the largest index to fully describe the rectangular domain. We hence arrive at the following array representation (in OCaml)

    type ('i,'a) arr = Arr of 'i * ('i -> 'a)
The type 'a is the type of array elements: that is, the (description) of contents. The type 'i is the index type. The arr is hence the pair: the largest index to specify the domain (i.e., shape) and the indexing function, returning an array element given its index.

Question: is there a way to specify an empty array?

For example, the mathematical description of the sample array a above can be transcribed (almost literally) into OCaml as

    let a : (int,int) arr = 
      Arr (5,function 
               | 0 -> 1 | 1 -> 2 | 2 -> 3
               | 3 -> 4 | 4 -> 5 | 5 -> 6 | _ -> assert false)
(There are more concise ways of defining arrays, introduced later). The index type could be specified more precisely (even in OCaml, let alone in languages like Idris or Agda), and assert could hence be avoided. (In this tutorial, we'll err on the side of simplicity.)

The two-dimensional case is similar: the sample array b is

    let b : (int*int,int) arr = 
        Arr ((1,2), function 
        | (0,0) -> 1  | (0,1) -> 2 | (0,2) -> 3 
        | (1,0) -> 4  | (1,1) -> 5 | (1,2) -> 6 | _ -> assert false)
This, too, matches the detailed mathematical notation earlier. Now it is machine-readable and checkable.

What about numbers and other scalars? Can they be represented as arr?

Indeed they can: after all, a scalar is isomorphic to an array with one element, whose shape is the singleton domain. OCaml has the special type for such domain: unit. Thus a scalar, say, number 5, may be represented as

    let five : (unit,int) arr = Arr ((),fun _ -> 5)
This how APL treats numbers: they are also arrays. In (original) APL, everything was array. We make a different choice, however. Just because scalars are isomorphic to 1-element arrays does not mean they are identical to such arrays. Linear algebra, for one, sharply distinguishes vectors (elements of a vector space) and scalars (elements of the underlying field). In our DSL, scalars are not vectors, and have different type.
Philip S. Abrams. An APL machine. Ph.D Thesis. Stanford Linear Accelerator Center, Stanford University. SLAC-114 UC-32 (MISC). February 1970.
The thesis shows (and proves) that many array operations may be decomposed into operations only on shape, and only on contents.


A taste of the course: Vectors

The full seminar course is fairly long. To give a taste of it, this section presents a small excerpt, about vectors.

A vector is a one-dimensional array. Recall, such arrays have the index type int, for which we define the nickname

    type d1 = int
The shape of a vector -- the index domain -- is, hence, a sequence of consecutive integers. Assuming the indexing starts with 0, the shape can be described by the largest index.

Creating a vector as we did earlier -- enumerating the correspondence between an index and a value -- is quite tedious. We'd better take advantage of native OCaml arrays and their convenient notation (example: [|1;2;3;4;5;6|]) and then convert OCaml arrays to arr:

    let of_array : 'a array -> (d1,'a) arr = fun arr ->
      let n = Array.length arr in
      assert (n>0);
      Arr(n-1, fun i -> arr.(i))
(as we saw earlier, in our DSL arrays are never empty). The code can be written without lambda-expressions (which APL did not have): in OCaml, arr.(i) is a syntax sugar for Array.get arr i, which can then be eta-reduced.
    let of_array (arr:'a array) : (d1,'a) arr = 
      let n = Array.length arr in
      assert (n>0);
      Arr(n-1, Array.get arr)
To see the vectors, we also define the operation pr_arr1 : (d1,int) arr -> unit, to be explained later. For now, just use it to print vectors. For example,
    let a1 = of_array [|3;1;4;1;5;9;2|]
    let _ = pr_arr1 a1
    |3 1 4 1 5 9 2|

Let's try reversing the array a1. How would you write it? Hint: if b1 is the reversed array, its relation to a1 can be written mathematically as

    a1_0 = 3 = b1_6
    a1_1 = 1 = b1_5
    a1_6 = 2 = b1_0
Or, generalizing, a1_i = b1_{6-i}. That's the answer. The OCaml code is exactly that expression, modulo syntax:
    let rev : (d1,'a) arr -> (d1,'a) arr = 
      function Arr (upb,a) ->
      Arr(upb,fun i -> a (upb-i))
To check,
    let _ = pr_arr1 (rev a1) 
    |2 9 5 1 4 1 3|
as expected. The printing expression can also be written as
    let _ = pr_arr1 @@ rev a1
where @@ is the low-precedence application operator. It lets us write applications without parenthesis. We will often use, however, a flipped version of the application operator, which is |> (pronounced `piped to'). For example:
    let _ = rev a1 |> pr_arr1
or even
    let _ = a1 |> rev |> pr_arr1
which can be pronounced as ``a1 piped to rev piped to printer''. The piped-to operator makes the data flow left-to-right -- as in circuit diagrams. Pipes are easy to extend by plugging-in another segment:
    let _ = a1 |> rev |> rev |> pr_arr1
Can you guess the output?

The rev functions changed the mapping between indices and elements, but did not affect the elements themselves. Let us now try to change the array contents: write a function that doubles each element (in general, multiplies each element by a given number, returning a new vector). Again, it is recommended to first to try to solve the problem in mathematical notation: b_i = 2*a_i. This is the answer. In OCaml syntax, it becomes:

    let mul1 : int -> (d1,int) arr -> (d1,int) arr
        = fun n -> function Arr (upb,a) ->
          Arr(upb, fun i -> n * (a i))
The index function here (and in rev above) has the pattern that is worth noting and making explicit: passing the argument to a function (a, in our case) and the result to another function; in other words, function composition. Keeping with our left-to-right-flow principle, let's define the left-to-right composition:
    let (>>) f g = fun x -> f x |> g
Unlike the piped-to operator, the composition is not (yet) pre-defined in OCaml (it is in F#). Composition and application relate in an obvious way:
    x |> f |> g  ≡  x |> (f >> g)
We will talk more about such algebraic identities later. With the function composition, mul1 can be written as
    let mul1 : int -> (d1,int) arr -> (d1,int) arr
        = fun n -> function Arr (upb,a) ->
          Arr(upb, a >> ( * ) n)
(In OCaml, enclosing a binary operation in parenthesis turns it into an ordinary function. That is, 2 * 3 can also be written as ( * ) 2 3 .) To test,
    let _ = a1 |> mul1 2 |> pr_arr1
    |6 2 8 2 10 18 4|
We have duplicated several numbers `at once' -- without even writing a loop.

Next, let's try adding a number to each element of an array. It should not take too much time to come up with

    let add1 : int -> (d1,int) arr -> (d1,int) arr
        = fun n -> function Arr (upb,xf) ->
          Arr(upb, xf >> ( + ) n)
Clearly, mul1 and add1 are asking for a generalization: apply an arbitrary user-supplied operation (not just addition to a number or multiplication by a number) to each element of an array, and return the array of results. This general operation is called map (we write map1 to make clear it applies to 1-dim arrays: vectors).
    let map1 : ('a -> 'b) -> (d1,'a) arr -> (d1,'b) arr
        = fun f -> function Arr (upb,a) ->
          Arr(upb, a >> f)
The map operation has such a significance in APL that it is denoted the most succinctly: by nothing at all. In APL, any unary operation on numbers may, as is, be applied to a vector of numbers; mapping is implicit.

With map, the earlier element doubling can be written as map1 (( * ) 2) and element increment as map1 ((+) 1). To double each element and then increment it, we chain the two operations

    let _ = a1 |> map1 (( * ) 2) |> map1 ((+) 1) |> pr_arr1
The result is
    |7 3 9 3 11 19 5|
Using the composition-application identity x |> f |> g ≡ x |> (f >> g), the expression can be re-written as
    let _ = a1 |> (map1 (( * ) 2) >> map1 ((+) 1)) |> pr_arr1
and even further to
    let _ = a1 |> map1 ( ( * ) 2 >> (+) 1) |> pr_arr1
which seems more efficient (or at least, concise) since it has only one map1. The last simplification relied on
    map1 f >> map1 g  ≡  map1 (f >> g)
so-called ```map fusion law''. This is an algebraic identity, like the ones seen in elementary school, such as
    neg a + neg b  ≡  neg (a + b) 
How can we check that the map fusion law is correct? Inlining the map1 definition leads to
    (a >> f) >> g  ≡  a >> (f >> g)
which is evident, since function composition is associative.

This has been an example of applying algebraic identities (viz., composition-application law and map-fusion law) to turn an expression to a `better' (simpler, more efficient) form -- just like elementary school algebra exercises, of simplifying expressions or solving equations by reshuffling symbols according to certain rules. We have thus come across algebra of programming. As in elementary school algebra, if the rules are followed, even mindlessly or mechanically, the result is guaranteed to be correct: meaning-preserving.

The course continues in the same spirit, to: zip_with, reduce, (generalized) dot product, statistics (mean, standard deviation, covariance), matrices, generalized matrix product, distances, norms, all-to-all shortest distance problem.

References [37K]
The complete code used in the course/tutorial


Applicatives: Patterns of function application

An unary operation such as abs : int -> int applies to an int argument directly: e.g., abs (5-7). It cannot be applied to a function. Still, a function t -> int does produce an int value, given a suitable argument of type t. One can apply abs to that value then. In a sense, abs does apply to functions, too, in the following way:
    fun x -> f x |> abs  = f >> abs
That is, `applying' abs to a function f is composing f and abs. The operation abs does not apply directly to an array either. Still, one may reasonably define the application of abs to an array as applying the operation to each element of the array, uniformly. In other words, applying abs to an array a in this extended sense is map abs a. Thus the ordinary application, composition and map are all patterns of applying an unary operation -- to an argument that may be an immediately given value, a value that is produced upon receiving some other value, or a value incorporated into a collection. Incidentally, the definition of map shows even the closer connection to function composition:
    let map1 : ('a -> 'b) -> (d1,'a) arr -> (d1,'b) arr
        = fun f -> function Arr (upb,a) ->
          Arr(upb, a >> f)

The three-prong application pattern extends to binary operations. An operation like (+) : int -> int -> int applies to two int arguments directly: e.g., 2 + 3, which can also be written in prefix notation as (+) 2 3. Two arrays (of the same size) may be added element-wise. One may then say that (+) applies to int arrays a and b, as zip_with (+) a b. Likewise, we may add two functions f: t -> int and g: t -> int `element-wise', as

    fun x -> f x + g x
This application pattern -- take a value, pass it to two functions f and g and join the result using a binary operation -- is none other than fork we saw earlier. It can be defined as the combinator
    let fork h f g x = fun x -> h (f x) (g x)
Hence fork is zip_with on functions -- or zip_with is fork on arrays. This isn't just the turn of the phrase: after all, zip_with is defined as
    let zip_with : ('a -> 'b -> 'c) -> (d1,'a) arr -> (d1,'b) arr -> (d1,'c) arr
       = fun f (Arr (upbx,xf)) (Arr (upby,yf)) ->
          assert (upbx = upby);
          Arr(upbx, fork f xf yf)
Thus, a binary application, fork and zip_with are patterns of applying a binary operation.

Since map, composition, fork and zip_with are all patterns of application, they are, as the ordinary application, denoted in APL by nothing at all, merely by juxtaposition.

McBride and Paterson (J. Functional Programming, 2008) explicated the construction of applicative, as a polymorphic `collection' with operations map and app (satisfying obvious equational laws):

    module type applicative = sig
       type 'a t
       val pure : 'a -> 'a t
       val map  : ('a -> 'b) -> 'a t -> 'b t
       val app  : ('a -> 'b) t -> 'a t -> 'b t
Less known is the alternative, equivalent formulation, with app replaced by
    val map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
(often called zip_with or lift2). Indeed, app can be expressed in terms of map2:
    let app  : ('a -> 'b) t -> 'a t -> 'b t = map2 (@@)
and map2 in terms of app and map. The alternative formulation is easier to generalize, to first-order applicatives (e.g., when 'a and 'b are base types), which are quite more widespread: e.g., APL and Matlab. An applicative is hence a collection to which one may apply unary and binary operations.

Thus functions and arrays are applicatives. It is for that reason that Iverson notation works so well. The applicative pattern was intuited and put to work back in 1960.



Programming by discerning patterns of data flow, and by finding, or coining, words to aptly express them.