APL was developed when parsing technology and the art and science of programming language design were just emerging. It is only now that we understand, from experience and information theory, that trying to assign meaning to any possible combination of tokens is a bad idea. Spartan syntax, pervasive overloading, uniformity (everything is an array), lack of types all might seem like good ideas. The reality is quite more complex.
How to liberate APL from its style? For three years, I have been trying to explain APL ideas to bright sophomores, by implementing step by step a small array DSL embedded in OCaml. The students never heard of OCaml, let alone APL. This page is to give a taste of that seminar course and tell what came out of it: a new understanding of Game of Life and Plasma Fractals, leading to many tantalizing extensions.
There is hardly a better way to understand an idea than discovering it for themselves, in the process of solving a practical problem (or, at least, an exercise). Therefore, the course is structured around the progression of specially selected exercises, implementing common tasks of array processing. The result is a simple array processing DSL. Solving an exercise is only half the job. The other half is looking back at the solutions, trying to discern common patterns -- and, when found, naming them and using consciously from now on.
Besides the useful array processing DSL and a set of named common patterns (combinators), the result is the experience of looking at computation from a higher point of view: higher than loops or recursion; seeing, and making explicit, data flow; using, taking performance advantage of, and designing operations on whole arrays.
The main ideas are very old. We are going back to the origins of procedural, flowchart, and modular programming: the ideas of Iverson, McIlroy, Parnas, Goguen -- but look at them with today's eyes and using today's languages.
The course deliberately uses a very simple subset of OCaml: essentially, typed lambda-calculus with some data types and constants. No libraries or packages beyond the standard OCaml library are needed. The code can be easily re-implemented in Scala, Haskell, F#, Rust -- probably Swift and even Java or Python.
The knowledge of neither APL nor OCaml is presupposed. I tell the students that I explain the needed bits of OCaml as we go along. They seemed to have picked OCaml quickly.
tutorial.ml [37K]
The complete code used at the tutorial
intro2023.ml [32K]
A bit more extended code used over three years in an actual course.
The code also includes operations such as tiling and searching.
(``A Programming Language'', 1962)``Applied mathematics is concerned with the design and analysis of algorithms or programs.The systematic treatment of complex algorithms requires a suitable programming language for their description, and such a programming language should be concise, precise, consistent over a wide area of application, mnemonic, and economical of symbols; it should exhibit clearly the constraints on the sequence in which operations are performed; and it should permit the description of a process to be independent of the particular representation chosen for the data.
Existing languages prove unsuitable for a variety of reasons.''
The idea of notation as a language was further developed in
``Algebra as a Language'', Appendix A of the book
Algebra: An Algorithmic Treatment, 1972.
<https://www.jsoftware.com/papers/algebra.htm>
APL is an old language, with a long and rich history. The following are just a few notable milestones:
K.E. Iverson. A personal view of APL. IBM Systems J., v30, N4, 1991, pp. 582-593
Computing the average in most conventional languages involves some sort of loop. For example, in C
float mean(const int n, const float * a) { float sum = 0; for(int i=0; i<n; i++) sum += a[i]; return sum / n; }(Kildall used Algol 60, which was popular at that time for writing algorithms.)
In APL, however, computing the mean is simply
+/÷⍴which may seem awfully cryptic, needing unpacking. It is best to draw this expression as a diagram:
┌──────► +/ ─────┐ │ ▼ ────►│ ÷ ───► │ ▲ └──────► ⍴ ──────┘The shape of the computation becomes easier to see for everyone. The data (array) enters from the left and then forks: the array is passed to the
+/
operator (which sums the array elements) -- and to the
⍴
operator, which returns the length of the array. The results of
the two operations are then joined by the division ÷
. The summing-up
operation +/
can also be visualized as inserting +
between every pair of neighboring array elements.
The fork operation is not denoted by any symbol in APL: it
is implicit in the juxtaposition of two unary operators with a binary
in-between.
Nowadays, we recognize +/
(to be precise, /
) as the reduce (or, more
generally, fold) operation. Uncannily, fold appeared in APL a little
bit before it was described by Strachey (in 1962), and quite a bit
before it entered functional programming, in Miranda.
In the array DSL developed in the tutorial, the mean value function is written as
let mean : (d1,float) arr -> float = fork (/.) (reduce (+.)) (length >> float_of_int)We now see types. Although the type signature is optional in OCaml, writing it out for top-level functions is helpful: after all, it, too, gives an overview of the computation. (It also helps making error messages more precise and informative.) The operation
fork
is clearly
visible. Conversions from integers to
floats are also explicit: experience has shown that implicit
conversions, albeit seemingly convenient, are a nest of many subtle
bugs.
The advantage of APL in showing the overall shape of the computation is described in the following parable, quoted from Keith Smillie.
``Suppose we have a box of apples from which we wish to select all of the good apples. Not wishing to do the task ourselves, we write a list of instructions to be carried out by a helper.The instructions corresponding to a conventional language might be expressed something like the following: Select an apple from the box. If it is good, set it aside in some place reserved for the good apples; if it is not good, discard it. Select a second apple; if it is good put it in the reserved place, and if it is not good discard it.... Continue in this manner examining each apple in turn until all of the good apples have been selected. In summary, then, examine the apples one at a time starting with the first one we pick up and finishing with the last one in the box until we have selected and set aside all of the good apples.
On the other hand the instructions corresponding to an array language could be stated simply as ``Select all of the good apples from the box.'' Of course, the apples would still have to be examined individually, but the apple-by-apple details could be left to the helper.
Conventional programming languages may be considered then ``one-apple-at-a-time'' languages with machine languages being a very primitive form, whereas array languages may be considered ``all-the-apples-at-once'' languages.''
The apple analogy originated with Frederick P. Brooks who worked closely with Iverson in the early days of APL. That Frederick P. Brooks, of Mythical Man-month fame. Incidentally, he named his eldest son after Kenneth Iverson.
\
5500: The language and its implementation.
U. of Washington, TR 70-09-04, Computer Science Group. September 1970Keith Smillie. My Life with Array Languages. October 2005.
Let's look at the following three arrays:
1 2 3 4 5 6 1 2 3 1 2 4 5 6 3 4 5 6It is intuitively clear they all have the same contents, and differ in shape. Let's make this intuition precise. Consider the first array
1 2 3 4 5 6call it
a
. In detailed mathematical notation, it can be specified,
rather verbosely, as
a_0 = 1 a_1 = 2 ... a_5 = 6This is a table specification of a function, with the domain
[0..5]
and range
[1..6]
. As for the second sample array (call it b
)
1 2 3 4 5 6it, too, can be mathematically specified as
b_00 = 1 b_01 = 2 b_02 = 3 .... b_12 = 6It also looks like a (tabulated) function: from the domain which is a pair of numbers. More precisely, it is a function
[0..1] ⨯ [0..2] -> [1..6]
.
The last sample array is also a function, from a pair of numbers:
[0..2] ⨯ [0..1] -> [1..6]
.
We see that the range of these three functions is the same: [1..6].
This is the contents. The domains differ: this is the shape.
Thus, array is a function, from a finite domain of indices to the set
of elements: the function that given an index returns array's
element at that index. That an array is a sort of a function must have
occurred to the designers of Fortran: in that language,
A(1)
may mean either the first element of array A
, or invoking the
function A
on the argument 1
. (Forgetting to declare an array and
getting an `undefined function' linking error was a mistake that probably
every Fortran programmer made.)
A rectangular index domain can be specified by its two opposite points, viz. the smallest and the largest index. If we assume the indices start with 0, we only need the largest index to fully describe the rectangular domain. We hence arrive at the following array representation (in OCaml)
type ('i,'a) arr = Arr of 'i * ('i -> 'a)The type
'a
is the type of array elements: that is, the
(description) of contents. The type 'i
is the index type. The arr
is hence the pair: the largest index to specify the domain (i.e.,
shape) and the indexing function, returning an array element given its index.
Question: is there a way to specify an empty array?
For example, the mathematical description of
the sample array a
above can be transcribed (almost literally) into OCaml as
let a : (int,int) arr = Arr (5,function | 0 -> 1 | 1 -> 2 | 2 -> 3 | 3 -> 4 | 4 -> 5 | 5 -> 6 | _ -> assert false)(There are more concise ways of defining arrays, introduced later). The index type could be specified more precisely (even in OCaml, let alone in languages like Idris or Agda), and
assert
could hence be
avoided. (In this tutorial, we'll err on the side of simplicity.)
The two-dimensional case is similar: the sample array b
is
let b : (int*int,int) arr = Arr ((1,2), function | (0,0) -> 1 | (0,1) -> 2 | (0,2) -> 3 | (1,0) -> 4 | (1,1) -> 5 | (1,2) -> 6 | _ -> assert false)This, too, matches the detailed mathematical notation earlier. Now it is machine-readable and checkable.
What about numbers and other scalars? Can they be represented as
arr
?
Indeed they can: after all, a scalar is isomorphic to an array with
one element, whose shape is the singleton domain. OCaml has the
special type for such domain: unit
. Thus a scalar, say, number 5,
may be represented as
let five : (unit,int) arr = Arr ((),fun _ -> 5)This how APL treats numbers: they are also arrays. In (original) APL, everything was array. We make a different choice, however. Just because scalars are isomorphic to 1-element arrays does not mean they are identical to such arrays. Linear algebra, for one, sharply distinguishes vectors (elements of a vector space) and scalars (elements of the underlying field). In our DSL, scalars are not vectors, and have different type.
A vector is a one-dimensional array. Recall, such arrays have the
index type int
, for which we define the nickname
type d1 = intThe shape of a vector -- the index domain -- is, hence, a sequence of consecutive integers. Assuming the indexing starts with 0, the shape can be described by the largest index.
Creating a vector as we did earlier -- enumerating the
correspondence between an index and a value -- is quite tedious. We'd better
take advantage of native OCaml arrays and their convenient notation
(example: [|1;2;3;4;5;6|]
) and then convert OCaml arrays to arr
:
let of_array : 'a array -> (d1,'a) arr = fun arr -> let n = Array.length arr in assert (n>0); Arr(n-1, fun i -> arr.(i))(as we saw earlier, in our DSL arrays are never empty). The code can be written without lambda-expressions (which APL did not have): in OCaml,
arr.(i)
is a syntax sugar for Array.get arr i
, which can
then be eta-reduced.
let of_array (arr:'a array) : (d1,'a) arr = let n = Array.length arr in assert (n>0); Arr(n-1, Array.get arr)To see the vectors, we also define the operation
pr_arr1 : (d1,int) arr -> unit
, to be explained later. For now, just use it to
print vectors. For example,
let a1 = of_array [|3;1;4;1;5;9;2|] let _ = pr_arr1 a1prints:
|3 1 4 1 5 9 2|
Let's try reversing the array a1
. How would you write it? Hint: if
b1
is the reversed array, its relation to a1
can be written
mathematically as
a1_0 = 3 = b1_6 a1_1 = 1 = b1_5 ... a1_6 = 2 = b1_0Or, generalizing,
a1_i = b1_{6-i}
. That's the answer. The OCaml
code is exactly that expression, modulo syntax:
let rev : (d1,'a) arr -> (d1,'a) arr = function Arr (upb,a) -> Arr(upb,fun i -> a (upb-i))To check,
let _ = pr_arr1 (rev a1)prints
|2 9 5 1 4 1 3|as expected. The printing expression can also be written as
let _ = pr_arr1 @@ rev a1where
@@
is the low-precedence application operator. It lets us
write applications without parenthesis. We will often use, however, a flipped
version of the application operator, which is |>
(pronounced
`piped to'). For example:
let _ = rev a1 |> pr_arr1or even
let _ = a1 |> rev |> pr_arr1which can be pronounced as ``
a1
piped to rev
piped to
printer''. The piped-to operator makes the data flow left-to-right --
as in circuit diagrams. Pipes are easy to extend by plugging-in
another segment:
let _ = a1 |> rev |> rev |> pr_arr1Can you guess the output?
The rev
functions changed the mapping between indices and elements,
but did not affect the elements themselves. Let us now try to change
the array contents: write a function that doubles each element (in
general, multiplies each element by a given number, returning a new
vector). Again, it is recommended to first to try to solve the problem
in mathematical notation: b_i = 2*a_i
. This is the answer. In OCaml
syntax, it becomes:
let mul1 : int -> (d1,int) arr -> (d1,int) arr = fun n -> function Arr (upb,a) -> Arr(upb, fun i -> n * (a i))The index function here (and in
rev
above) has the pattern that is
worth noting and making explicit: passing the argument
to a function (a
, in our case) and the result to another function;
in other words, function composition.
Keeping with our left-to-right-flow principle,
let's define the left-to-right composition:
let (>>) f g = fun x -> f x |> gUnlike the piped-to operator, the composition is not (yet) pre-defined in OCaml (it is in F#). Composition and application relate in an obvious way:
x |> f |> g ≡ x |> (f >> g)We will talk more about such algebraic identities later. With the function composition,
mul1
can be written as
let mul1 : int -> (d1,int) arr -> (d1,int) arr = fun n -> function Arr (upb,a) -> Arr(upb, a >> ( * ) n)(In OCaml, enclosing a binary operation in parenthesis turns it into an ordinary function. That is,
2 * 3
can also be written as
( * ) 2 3
.)
To test,
let _ = a1 |> mul1 2 |> pr_arr1prints
|6 2 8 2 10 18 4|We have duplicated several numbers `at once' -- without even writing a loop.
Next, let's try adding a number to each element of an array. It should not take too much time to come up with
let add1 : int -> (d1,int) arr -> (d1,int) arr = fun n -> function Arr (upb,xf) -> Arr(upb, xf >> ( + ) n)Clearly,
mul1
and add1
are asking for a generalization: apply an
arbitrary user-supplied operation (not just addition to a number or
multiplication by a number) to each element of an array, and return
the array of results. This general
operation is called map
(we write map1
to make clear it applies to
1-dim arrays: vectors).
let map1 : ('a -> 'b) -> (d1,'a) arr -> (d1,'b) arr = fun f -> function Arr (upb,a) -> Arr(upb, a >> f)The
map
operation has such a significance in APL that it is denoted
the most succinctly: by nothing at all. In APL, any unary operation on
numbers may, as is, be applied to a vector of numbers; mapping is
implicit.
With map
, the earlier element doubling can be written as
map1 (( * ) 2)
and element increment as map1 ((+) 1)
. To
double each element and then increment it, we chain the two
operations
let _ = a1 |> map1 (( * ) 2) |> map1 ((+) 1) |> pr_arr1The result is
|7 3 9 3 11 19 5|Using the composition-application identity
x |> f |> g ≡ x |> (f >> g)
,
the expression can be re-written as
let _ = a1 |> (map1 (( * ) 2) >> map1 ((+) 1)) |> pr_arr1and even further to
let _ = a1 |> map1 ( ( * ) 2 >> (+) 1) |> pr_arr1which seems more efficient (or at least, concise) since it has only one
map1
. The last simplification relied on
map1 f >> map1 g ≡ map1 (f >> g)so-called ```map fusion law''. This is an algebraic identity, like the ones seen in elementary school, such as
neg a + neg b ≡ neg (a + b)How can we check that the map fusion law is correct? Inlining the
map1
definition leads to
(a >> f) >> g ≡ a >> (f >> g)which is evident, since function composition is associative.
This has been an example of applying algebraic identities (viz., composition-application law and map-fusion law) to turn an expression to a `better' (simpler, more efficient) form -- just like elementary school algebra exercises, of simplifying expressions or solving equations by reshuffling symbols according to certain rules. We have thus come across algebra of programming. As in elementary school algebra, if the rules are followed, even mindlessly or mechanically, the result is guaranteed to be correct: meaning-preserving.
The course continues in the same spirit, to:
zip_with
, reduce
, (generalized) dot product,
statistics (mean, standard deviation, covariance),
matrices, generalized matrix product, distances,
norms, all-to-all shortest distance problem.
abs : int -> int
applies to an int
argument directly: e.g., abs (5-7)
. It cannot be applied to a function.
Still, a function t -> int
does produce
an int
value, given a suitable argument of type t
. One can apply
abs
to that value then. In a sense, abs
does apply to
functions, too, in the following way:
fun x -> f x |> abs = f >> absThat is, `applying'
abs
to a function f
is composing f
and abs
.
The operation abs
does not apply directly to an array either. Still,
one may reasonably define the application of abs
to an array as
applying the operation to each element of the array, uniformly.
In other words, applying abs
to an array a
in this extended sense is
map abs a
. Thus the ordinary application, composition and map
are all patterns of applying an unary operation -- to an argument
that may be an immediately given value, a value that is produced upon
receiving some other value, or a value incorporated into a
collection. Incidentally, the definition of map
shows even the closer
connection to function composition:
let map1 : ('a -> 'b) -> (d1,'a) arr -> (d1,'b) arr = fun f -> function Arr (upb,a) -> Arr(upb, a >> f)
The three-prong application pattern extends to binary
operations. An operation like
(+) : int -> int -> int
applies to two int
arguments directly:
e.g., 2 + 3
, which can also be written in prefix notation as (+) 2 3
.
Two arrays (of the same size) may be added element-wise.
One may then say that
(+)
applies to int
arrays a
and b
, as zip_with (+) a b
. Likewise, we may add two functions
f: t -> int
and g: t -> int
`element-wise', as
fun x -> f x + g xThis application pattern -- take a value, pass it to two functions
f
and g
and join the result using a binary operation -- is none other
than fork
we saw earlier. It can be defined as the combinator
let fork h f g x = fun x -> h (f x) (g x)Hence
fork
is zip_with
on functions -- or zip_with
is fork
on
arrays. This isn't just the turn of the phrase: after all,
zip_with
is defined as
let zip_with : ('a -> 'b -> 'c) -> (d1,'a) arr -> (d1,'b) arr -> (d1,'c) arr = fun f (Arr (upbx,xf)) (Arr (upby,yf)) -> assert (upbx = upby); Arr(upbx, fork f xf yf)Thus, a binary application,
fork
and zip_with
are patterns of
applying a binary operation.
Since map
, composition, fork
and zip_with
are all patterns of
application, they are, as the ordinary application, denoted in APL by
nothing at all, merely by juxtaposition.
McBride and Paterson (J. Functional Programming, 2008) explicated the
construction of applicative, as a polymorphic `collection' with
operations map
and app
(satisfying obvious equational laws):
module type applicative = sig type 'a t val pure : 'a -> 'a t val map : ('a -> 'b) -> 'a t -> 'b t val app : ('a -> 'b) t -> 'a t -> 'b tLess known is the alternative, equivalent formulation, with
app
replaced by
val map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t(often called
zip_with
or lift2
). Indeed, app
can be
expressed in terms of map2
:
let app : ('a -> 'b) t -> 'a t -> 'b t = map2 (@@)and
map2
in terms of app
and map
. The alternative formulation is
easier to generalize, to first-order applicatives (e.g., when 'a
and
'b
are base types), which are quite more widespread: e.g., APL and
Matlab. An applicative is hence a collection to which one may apply
unary and binary operations.
Thus functions and arrays are applicatives. It is for that reason that Iverson notation works so well. The applicative pattern was intuited and put to work back in 1960.