This web page, written in the form of lecture notes, is (being) built in response to such questions. It presents the standard introductory material from the field of Universal Algebra -- however, selected and arranged specifically for programmers, especially those interested in the tagless-final approach. We only use programming examples, and, as far as possible, a concrete programming language notation rather than the mathematical notation.
He then followed up with a `tentative formal definition', which is still in use (and which we recall later).``By an `abstract algebra' is meant, loosely speaking, any system of elements and operations such as a ring, a field, a group, or a Boolean algebra'' (Birkhoff, 1935).
Universal Algebra is a field of Mathematics. A significant part of a Universal Algebra course and textbooks is devoted to lattices and combinatorics. There is not much connection, it seems, to common programming tasks. And yet it is. The automata theory (finite state machines, Kleene Algebra, regular expressions, etc.) was one of the first applications of algebras in Computer Science. According to Goguen et al. (1977), Burstall and Landin's ``Programs and their proofs: An algebraic approach'' (1969) was the first use of universal algebra and (implicitly) initiality in programming language semantics. F.L.Morris (``Correctness of translations of programming languages'', Ph.D. Thesis, Stanford, 1972) brought in many-sorted algebras, most commonly of interest in programming languages. The ADJ collaboration (J.A. Goguen, J.W. Thatcher, E.G. Wagner, and J.B. Wright) was responsible for the all-out introduction of algebraic and categorical methods in Computer Science -- and the subsequent development of algebraic data types, module systems and algebraic specifications. In particular, Goguen et al. (1977) demonstrated how the initial algebra semantics unifies a great number of semantic formalisms, and elegantly answers the questions of what is syntax and what is semantics. Recently, algebraic effects put algebras into spotlight once again. The tagless-final approach is another application of algebras, very closely related to the initial algebra semantics.
The present web page in intended as a reference, to collect the definitions and standard results from universal algebra that are of most interest to computer scientists and especially tagless-final programmers. The overarching goal of being relevant to programming dictates the selection of the material and the choice of notation. We use only programming examples and a concrete programming language notation (namely, OCaml). The standard mathematical practice relies on a multitude of superscripts and subscripts and the extensive, sometimes confusing, overloading to avoid writing even more deeply nested subscripts. A programming-language notation is more regular, less ambiguous -- and can be `mechanically checked', by a compiler.
The fact that the OCaml module notation turns out adequate is not all surprising: Joseph Goguen has been a major influence in the design of the ML module system.
Stanley Burris and H. P. Sankappanavar: A Course in Universal Algebra
Graduate Texts in Mathematics, 1981, v. 78
10.1007/978-1-4613-8130-3
Available online
<http://www.math.uwaterloo.ca/~snburris/htdocs/UALG/univ-algebra2012.pdf>
The standard, detailed graduate text, for Math graduate students.
Initial algebra is mentioned only as a remark, as a special case of
free algebra with the empty set of generators.
J.A. Goguen, J.W. Thatcher, E.G. Wagner, and J.B. Wright:
Initial Algebra Semantics and Continuous Algebras
Journal of the ACM, Vol 24, No 1, January 1977, pp 68-95
This is the paper that properly introduced initial algebras in
Computer Science, see particularly Sections 1-4.
Bart Jacobs and Jan Rutten: A Tutorial on (Co)Algebras and
(Co)Induction
EATCS Bulletin, 1997, 62, pp. 222--259
The tutorial, for computer scientists, explains algebras and
initiality in detail and at a good slow pace.
It demonstrates proofs by initiality as an alternative to inductive proofs --
often a simpler and clearer alternative. The tutorial uses
a different presentation of Sigma-algebras: so-called
F-algebras, relying on polynomial functors to represent
signatures.
Garrett Birkhoff: On the structure of abstract algebras
Proceedings of the Cambridge Philosophical Society, 1935, v31,
433-454
<http://math.hawaii.edu/~ralph/Classes/619/birkhoff1935.pdf>
The foundational paper of the field of Universal Algebra
algebra.ml [6K]
The complete OCaml code accompanying the article: the executable
specification
A
, now
called the carrier set, with a class of operations on it: each
operation is associated with a set of sequences of elements of A
and maps
each sequence from its associated set to some element of A
.
We shall see what it means in concrete, programming terms in a moment.
We will be working with so-called many-sorted algebras, with not one
but several carrier sets. To tell them apart and conveniently refer
to, each carrier set is associated with its own `sort'.
One may think of the `sort' as a
symbol that names a carrier set. For many-sorted algebras, it is
customary to first define a signature -- the `type' of
an algebra. Here is the formal definition, to be illustrated below.
``Formally, a (many-sorted algebraic) signature Sigma is a pair
<S,F>
where S
is a set (of sorts) and F
is a set (of function
symbols) such that F
is equipped with the mapping
F \mapsto S^n \times S
for some n >=0
. The mapping, for a
particular f
of F
is often denoted as
f: s_1 ... s_n -> s
where {s,s_1,...,s_n} \subset S
''
(Wirsing, 1990).
The number n
is called the arity of f
. The
operations of zero arity are called constants.
Wirsing's definition is the straightforward generalization of Birkhoff's original definition to the many-sorted case. (It is also a restriction: Birkhoff's definition admitted operations of infinite arity, and algebras whose carriers and operations are not just infinite sets but proper classes).
In programming terms, a many-sorted algebraic signature is a module signature. For example,
module type NAT = sig type nat val zero : nat val succ : nat -> nat val plus : nat * nat -> nat endHere, the set of sorts
S
is the singleton set {nat}
. The set F
has three elements: {zero, succ, plus}
and is equipped with the
mapping
zero \mapsto nat succ \mapsto nat \times nat plus \mapsto (nat \times nat) \times natThat mapping is (more perspicuously) represented by the OCaml declarations:
val zero : nat val succ : nat -> nat val plus : nat * nat -> natThus `sort' is what in programming may be called `type'. (Joseph Goguen has remarked that `type' is too ambiguous a word and so `sort' has been chosen instead.) We shall continue to use the term `sort', however odd it may sound to an (OCaml) programmer.
Exercise: give several examples of OCaml module signatures that are not algebraic signatures.
With the signature Sigma thus defined,
``A Sigma-Algebra consists of an S
-sorted family
of non-empty (carrier) sets {A_s}_{s\in S}
and a total
function f^A: A_{s_1},\ldots,A_{s_n} -> A_s
for each
f: s_1,\ldots,s_n -> s
in F
.'' (Wirsing, 1990).
Here is an example of a NAT
-algebra:
module NatUL = struct type nat = unit list let zero = [] let succ x = () :: x let plus (x,y) = x @ y endSince
NAT
has only one sort, the NatUL
algebra has only one
carrier set, which is the set {[], [()], [();()], ...}
of
unit list
s. The constant zero
is the empty list,
succ
maps a list to a longer, by one ()
, list; and plus
maps a pair of lists to the concatenated list. All these operations
are total as required: succ
and plus
can handle any lists supplied
as arguments, with no exceptions.
Here is another NAT
algebra:
module NatBool : (NAT with type nat = bool) = struct type nat = bool let zero = false let succ x = not x let plus (x,y) = not (x = y) (* exclusive or *) endIts carrier is the set of Booleans. We added the signature ascription to the
NatBool
module so that the OCaml
compiler could check that all operations required by the Nat
signature are defined. We have exposed the particular type of the
carrier set (that is, the association between the abstract sort nat
and the concrete carrier type).
From now on, we shall use the OCaml notation: OCaml signatures for Sigma signatures, and OCaml modules implementing the signature for the corresponding algebra.
S
be
an algebraic signature (with sorts s1
, s2
, ... sk
and
the operations f1
, f2
, ... fm
). Let M1: S
and M2: S
be two
S
-algebras: two OCaml modules of that signature. A homomorphism H12
from
M1
to M2
(notated as H12: M1 ~> M2
) is a collection of functions
h_s1,...,h_sk
, one for each sort, which can be represented by the
following OCaml signature:
module H12 : sig val h_s1: M1.s1 -> M2.s1 val h_s2: M1.s2 -> M2.s2 ... val h_sk: M1.sk -> M2.sk endThe functions
h_si
(i=1..k
) are non necessarily injective
or `onto' -- but have to
be total: h_si : M1.si -> M2.si
should be able to handle any value
from the set M1.si
received as the argument. The functions h_si
must also `preserve the operations': that
is, for every operation f : s_1 ... s_n -> s
of arity n
,
h_s (M1.f (a_1, ..., a_n)) = M2.f (h_s1 a_1, ..., h_sn a_n)for every
a_1 \in M1.s1, ... a_n \in M1.sn
.
There is always a homomorphism from any algebra to itself: the
identity homomorphism, to be written as Id
.
It is the identity mapping on the carriers (which
clearly preserves the operations).
A bit more interesting is the homomorphism from NatUL
to NatBool
:
module HULtoBool : sig val h : NatUL.nat -> NatBool.nat end = struct let rec h = function | [] -> false | _::t -> not (h t) endThe operation preservation conditions
h NatUL.zero = NatBool.zero h (NatUL.succ x) = NatBool.succ (h x) h (NatUL.plus (x,y)) = NatBool.plus (h x,h y)are clearly satisfied, for any
unit list
values x
and y
, as one
can easily see by unrolling the definitions of h
and the operations.
The reader may want to see it for themself.
Exercise: show that there is no homomorphism from NatBool
to
NatUL
.
Exercise: Consider yet another NAT
-algebra:
module NatStr : (NAT with type nat = string) = struct type nat = string let zero = "Z" let succ x = "S" ^ x let plus (x,y) = "(" ^ x ^ "+" ^ y ^ ")" endAre there any homomorphisms involving
NatStr
and the earlier
introduced algebras? Note that NatStr
's carrier is the set of all
OCaml strings (which includes not only "SZ"
but also "foo"
).
Is there any way to adjust NatStr
so that there is a homomorphism
to NatUL
?
Homomorphism is just a collection of functions mapping carriers of one
algebra to the corresponding carriers of the other algebra. As
functions, they may be composed. We use the symbol \cdot
of
functional composition also for the composition of homomorphisms.
Theorem: if H1: M1 ~> M2
and H2: M2 ~> M3
are two homomorphisms,
then their composition H2 \cdot H1
is also a homomorphism, from M1
to M3
.
Proof: clearly, H2 \cdot H1
maps the carriers of M1
to the carriers of
M3
. As for operation preservation, for any
f : s_1,...,s_n -> s
,
(H2.hs \cdot H1.hs) M1.f(x_1,...,x_n) = H2.hs (H1.hs (M1.f(x_1,...,x_n))) = H2.hs (M2.f(H1.hs1 x_1,...,H1.hsn x_n)) = M3.f(H2.hs1 (H1.hs1 x_1),...,H2.hsn (H1.hsn x_n)) = M3.f((H2.hs1 \cdot H1.hs1) x_1,...,(H2.hsn \cdot H1.hsn) x_n)If a homomorphism
H: M1~>M2
is `onto' -- that is, every element
of every carrier set of M2
is an image by H
of some element
of an M1
carrier -- the algebra M2
is called a
homomorphic image of M1
. For example, the homomorphism HULtoBool
is onto and NatBool
is hence a homomorphic image of NatUL
.
If all carrier-mapping functions in a homomorphism H: M1 ~> M2
are
one-to-one (that is, injective and onto), the homomorphism is called
isomorphism, and M1
and M2
isomorphic. The identity homomorphism
Id
is in fact the isomorphism.
Theorem: A homomorphism H: M1 ~> M2
is isomorphism if and only if
there is also a homomorphism H': M2 ~> M1
such that H \cdot H' = Id
and H' \cdot H = Id
. (H'
is also isomorphism). The proof is
left as an exercise.
NAT
signature. Let TFc
be the set of OCaml functors:
module type TF = functor(N:NAT) -> sig val e : N.nat endconstructed as follows. The functor
module TFzero(N:NAT) = struct let e = N.zero endis in
TFc
. If a functor T1
is in TFc
, so is
module TS(N:NAT) = struct let e = let module M1 = T1(N) in N.succ M1.e endTo lighten the notation, we shall write the above as
module TS(N:NAT) = struct let e = N.succ T1(N).e end(In the current version of OCaml, it is necessary to code
T(N).e
as let module M = T(N) in M.e
; see the accompanying code
for details.)
If functors T1
and T2
are in TFc
, so is
module TP(N:NAT) = struct let e = N.plus (T1(N).e, T2(N).e) endFinally, nothing else is in
TFc
.
A different way to write an element of TFc
is illustrated by the
following sample:
module TFN(N:NAT) = struct open N (* A term constructed using only the operations succ, zero and plus *) let e = plus (succ zero, succ (succ zero)) endwhich should look very familiar to the readers who have encountered the tagless-final style.
We turn TFc
into a NAT
algebra by adding the operations in the
obvious way:
module TFC : (NAT with type nat = (module TF)) = struct type nat = (module TF) let zero = (module TFzero : TF) let succ (module T1:TF) : nat = (module (functor (N:NAT) -> struct let e = N.succ T1(N).e end)) let plus ((module T1:TF), (module T2:TF)) : nat = (module (functor (N:NAT) -> struct let e = N.plus (T1(N).e, T2(N).e) end)) endThe sample element
TFN
shown earlier can now be written as
module TFCsample : TF = (val let open TFC in plus (succ zero, succ (succ zero)))For any
NAT
algebra N
there exists a homomorphism from TFC
to that algebra:
module HTFC(N:NAT) = struct let h : TFC.nat -> N.nat = fun (module T:TF) -> T(N).e endIn the following demonstration, let
N
be an arbitrary NAT
algebra and let
h
be HTFC(N).h
, that is,
fun (module T:TF) -> T(N).e
. Clearly h
maps every element
of the carrier TFc
(that is, every functor TF
) to some element of
N.nat
. For example, the result of mapping TFN
is TFN(N).e
, which
is the N.nat
value computed
as follows, using the operations zero
, succ
and plus
of N
:
let open N in let e = plus (succ zero, succ (succ zero))Now that we have seen that
h
is the total mapping, we verify that it preserves
the operations:
h TFC.zero = N.zero h (TFC.succ x) = N.succ (h x) h (TFC.plus (x,y)) = N.plus (h x, h y)for every
x
and y
in TFc
. Indeed, unrolling the definitions
h TFC.zero = h (module TFzero) = TFzero(N).e = N.zero h (TFC.succ x) = let (module T1:TF) = x in h (module (functor (N:NAT) -> struct let e = N.succ T1(N).e end)) = let (module T1:TF) = x in N.succ T1(N).e = N.succ (h x)(the verification of the
plus
preservation is left to the reader.)
The banality of the proof is the indication that TFC
preserves the operations by its very construction.
We now demonstrate that h
is a unique homomorphism from TFC
to
N
. Suppose there is another mapping, called g
, from TFc
to
N.nat
that preserves operations. Since both h
and g
must
preserve the zero
operation,
h TFC.zero = N.zero = g TFC.zeroThat is,
h
and g
coincide on TFC.zero
. If h
and g
map some t
of
TFc
to the same element of NAT.nat
, then
h (TFC.succ t) = N.succ (h t) = N.succ (g t) = g (TFC.succ t)That is, they also have to agree on
TFC.succ t
. The case of
TFC.plus
is similar. By structural induction h
and g
coincide on all elements of TFc
. Thus, h
is unique.
A Sigma-algebra with a unique homomorphism to any other algebra of the same signature is called initial algebra.
The TFc
construction is standard, used already by Birkhoff to
show the existence of initial algebras (actually, he talked about free
algebras, of which initial algebras are a particular case.) The
carrier constructed like TFc
is called `Herbrand universe' for the
signature Sigma (cf. Herbrand universe and Herbrand basis in Logic
Programming). The TFC
algebra itself is called word-algebra or
ground-term algebra. The construction easily generalizes to the
many-sorted case, giving:
Theorem: For any signature Sigma whose Herbrand universe has a term of every sort, there exists an initial Sigma-algebra
Exercise: NatBool
is not an initial NAT
-algebra
(why?). Show that NatUL
is not an initial NAT
-algebra
either. (Hint: show that there is no homomorphism from NatUL
to
TFC
).
Exercise: (not entirely straightforward):
Although NatUL
fails to be an initial NAT
-algebra, there is
still an initial NAT
-algebra with the same carrier set, viz. unit list
.
Construct it.
Exercise: Carry out the initial NAT
algebra construction
using functions instead of functors (the functions may take modules as
arguments)
Theorem: two initial algebras of the same signature are isomorphic
Proof: Let M1
and M2
be two initial algebras. Then there exist
(the unique) homomorphisms H12: M1 ~> M2
and H21: M2 ~> M1
.
Their composition H21 \cdot H12: M1 ~> M1
must be a homomorphism
too. There is also the identity homomorphism Id: M1 ~> M1
.
Since M1
is initial, there is only one homomorphism from it
to another algebra, including itself. Therefore, the two previously mentioned
homomorphisms from M1
to itself must be the same:
H21 \cdot H12 = Id
. Likewise, we find that H12 \cdot H21 = Id
.
Hence H12
(and H21
) are in fact isomorphisms.
The proof relies on the uniqueness of homomorphisms from
an initial algebra -- the so-called `universality property'. The tutorial by
Jacobs and Rutten shows more proofs by initiality, which, they argue,
are often more perspicuous than the corresponding proofs by induction.
Theorem: An algebra isomorphic to an initial algebra is initial
as well
Proof: a simple application of the homomorphism composition theorem.
S
as the abstract syntax, and the other Sigma algebras A
as
semantics of the language. ``The semantic function is the uniquely
determined homomorphism hA : S ~> A
, assigning a meaning hA(s)
in
A
to each syntactic structure s in S
'' -- that is, to each term in
the language. That initial algebras of the same signature
are isomorphic strengthens their claim to be the
`abstract syntax': there may be multiple presentations of abstract
syntax, but they are `essentially the same'.
In our running example, NAT
is the grammar of the tiny language for
adding natural numbers; TFc
is the set of its terms (TFN
or
TFCsample
being sample terms); the other NAT
algebras provide
interpretations, or semantics: NatUL
is the unary number semantics,
NatBool
is the parity semantics.
``From this viewpoint it becomes clear that a major aspect of formal semantics (both practical and theoretical) is constructing intended semantic algebras for particular programming languages.'' (Goguen et al., (1977))
Tagless-final takes this to heart, and to the letter. Once the
signature is decided, one can write sample terms like TFN
right
away. Implementations of the signature provide interpreters:
semantics. Applying the TFN
functor to, say, the NatUL
module,
amounts to interpreting the TFN
term, or
computing its NatUL
-semantics.
In fact, the very module that defines a NAT
algebra (its carrier and
operations), such as NatUL
, is also the module that implements an
interpreter of tagless-final terms like TFN
-- that is, implements
the homomorphism from TFC
. An algebra is hence literally defined by
its homomorphism from the word-algebra. Writing a tagless-final
interpreter is defining an algebra, and vice versa.
module ADT = struct type nat = Z | S of nat | P of nat * nat let zero = Z let succ x = S x let plus (x,y) = P (x,y) endAs we saw in the tagless-final approach,
ADT
is the homomorphism
from TFC
. Moreover, its mapping from the TFc
to the
data type ADT.nat
is injective and onto.
Hence it is an isomorphism and ADT
is also an initial algebra.
Exercise: show the homomorphism from ADT
to TFC
, which must
exist.
Although TFC
and ADT
are isomorphic and hence `essentially the
same', there may still be useful presentation differences between
them. After all, two programs that produce the same outputs for all
the same inputs are also `essentially the same'. Yet there may be
many reasons to prefer one over the other: one program may be faster,
smaller, written in a familiar language, easier to maintain and
extend, etc.
Consider the following CFG for a small subset of English:
S -> NP VP DET -> the NP -> DET N N -> cat VP -> TV NP N -> mouse TV -> TV CTV TV -> chased CTV -> CRD TV TV -> caught CRD -> andThe non-terminals are in upper case and the terminals are in lower case. The grammar happens to be in the Chomsky Normal Form (CNF): each production has the form
NONTERM -> NONTERM NONTERM
or NONTERM -> term
. (The Chomsky Normal form will later help us relate CFG to
AB grammars. It also makes the algebraic presentation perspicuous.)
In this form, the grammar productions fall into two classes: the
(proper) `rules' (the left column of the above figure) and the
`lexicon' (the right column). Incidentally, every CFG can be
converted to the equivalent (in the sense of recognizing/generating
the same language) CNF grammar.
A context-free grammars may be viewed as a set of rules for generating
a language, i.e., a set of `strings' where each string is a sequence
of terminal symbols of the grammar. In fact, each non-terminal
generates its own language. For example, the non-terminal NP
of our
grammar generates only two strings: ``the cat'' and ``the mouse''. (We
write the strings with spaces to delimit the terminals for
readability.) On the other hand, the language of S
is infinite, with
the strings like ``the cat chased the mouse'' or ``the cat chased and
chased and caught the mouse'', etc. Each grammar production can be
interpreted as generating a string of the corresponding language: N -> cat
produces the string ``cat'' of the N
language, NP -> DET N
forms a string of the NP
language from an arbitrary string of the
DET
language and an arbitrary string of the N
language. Looked
this way, we see the sets of objects (strings of the language) and
the operations defined on those sets -- quite like the algebra in Birkhoff's
original definition. However, instead of a single carrier set there
are several sets, one for each `sort', that is, non-terminal of the
grammar. We thus have a many-sorted algebra.
The algebraic signature in our example is as follows, written in the form of the OCaml module signature:
module type CATMOUSE = sig type s type vp type np type tv type n type ctv type det type crd val prod1 : np * vp -> s val the : det val prod2 : det * n -> np val cat : n val prod3 : tv * np -> vp val mouse : n val prod4 : tv * ctv -> tv val chased : tv val prod5 : crd * tv -> ctv val caught : tv val and_ : crd end(Since
and
is reserved in OCaml, we write and_
.) Unlike the NAT
signature in the earlier examples, we clearly see many sorts -- eight,
as a matter of fact, corresponding to the non-terminals of our
grammar. The signature looks strikingly similar to the grammar,
modulo switching the sides of the productions. Such close
correspondence is the particular advantage of the Chomsky Normal Form.
The only notable change is attaching the names such as prod1
to the
rules of the grammar. Naming of the rules is often useful. However,
the conventional CFG notation does not have the conventional way of
doing that. In the algebraic presentation of CFG, the naming is
inescapable. Also worth pointing out is that each terminal symbol of
the grammar is treated as an operator symbol of zero arity in the
algebra.
One algebra of the CATMOUSE
signature is the string algebra:
module CMString = struct type s = string type vp = string type np = string type tv = string type n = string type ctv = string type det = string type crd = string let ccat : string * string -> string = fun (x,y) -> x ^ " " ^ y let prod1 = ccat let the = "the" let prod2 = ccat let cat = "cat" let prod3 = ccat let mouse = "mouse" let prod4 = ccat let chased = "chased" let prod5 = ccat let caught = "caught" let and_ = "and" endEach non-terminal denotes a set of strings, i.e., a language. In other words, a carrier of the sort, say,
vp
, is a language generated by
the grammar with the start symbol VP
.
Exercise: Is CMString
an initial algebra?
Exercise: One may say that the CMString
algebra realizes
the view of a context-free grammar as a set of rules for generating
languages. However, CMString
only shows how to produce one string of a
language. For example, prod1
tells how to generate a
string of the S
language from a string of NP
and VP
languages.
Using CMString
, write a program that produces all strings of the NP
,
VP
, S
, etc. languages.
One may wonder what would be the analogue of TFC
(see Initial Algebra)
for the CATMOUSE
signature. It should be a many-sorted algebra -- call it TFCATMOUSE
--
with sorts s
, np
,
vp
, etc., whose carrier set of the sort, say, np
is a set of OCaml
functors of the type TFCM_np
:
module type TFCM_np = functor(CM:CATMOUSE) -> sig val e : CM.np end module type TFCM_s = functor(CM:CATMOUSE) -> sig val e : CM.s endLikewise, the carrier set of the sort
s
contains the functors of the
type TFCM_s
. Here are a couple of typical elements of these carrier
sets:
module TFcm_thecat(CM:CATMOUSE) = struct open CM let e = prod2 (the,cat) end module TFcm_chase(CM:CATMOUSE) = struct open CM let e = prod1(prod2(the,cat), prod3( prod4(chased, prod5(and_, prod4(chased, prod5(and_, caught)))), prod2 (the,mouse))) endbuilt in the manner of
TFN
of the earlier section.
Exercise: Taking TFC
as an example,
write TFCATMOUSE
explicitly as an OCaml module.
Exercise: Again, with TFC
as an example, show that
TFCATMOUSE
is an initial algebra. As a pre-condition, show that all
carrier sets of TFCATMOUSE
are populated.
Let's look closer at the elements of the TFCATMOUSE
carriers, such as
TFcm_chase
. The term TFcm_chase.e
is a term built using only the
constants and operations of the algebra. In CFG terms, it shows a
particular application of CFG productions. That is, TFcm_chase.e
denotes a parse tree (a derivation tree) of the CATMOUSE
grammar.
If we enter the TFcm_chase
module into OCaml, we see it is
accepted by the type checker. It means TFcm_chase.e
is well-typed --
that is, it represents a valid application of CFG productions, or,
it represents a valid parse tree.
Thus, we have reached the conclusion stated in Goguen et al. (1977), that an initial algebra of the signature representing a CFG has carriers that are parsed trees for derivations in the grammar from each non-terminal.
Furthermore, if we write the initial algebra or its carrier elements
in the manner of TFcm_chase
, the validity of the derivations (of
parsed trees) can be mechanically checked by the OCaml type checker.
Since TFCATMOUSE
is an initial algebra, there is a unique
homomorphism to any other algebra of the same signature, say,
CMString
. It maps a TFCATMOUSE
carrier element to a CMString
carrier element (i.e., a string) as
let module M = TFcm_chase(CMString) in M.e - : string = "the cat chased and chased and caught the mouse"(The indented line shows the evaluation result.) In other words, the homomorphism from
TFCATMOUSE
to CMString
computes the yield of the corresponding
parse tree -- that is, the string which the parse tree is the
derivation of.
Exercise: Is the homomorphism from TFCATMOUSE
to CMString
injective? Is it onto? How does the injectivity of this homomorphism
relate to the ambiguity of the grammar?
Goguen et al. (1977) write that in general, if G
is a CFG grammar and
TG
is the initial algebra for the signature of the grammar, that any
other algebra S
of the same signature provides the semantics for the
context-free language generated by G
. ``TG
,
being initial, gives the unique homomorphism hs : TG ~> S
which assigns
"meanings" in S
to all syntactically well-formed phrases of the
language.''
Exercise: As a further example, Goguen et al. (1977) outline
how Knuth's attribute grammars may be thought as algebras giving
meaning to CFG derivations (``but Knuth's definitions and notation seem more
complex than is necessary''). The carriers of those algebras
are sets of tuples: synthesized attributes. (Inherited attributes
require a bit more elaboration). Think of an attribute grammar for
CATMOUSE
and implement it as a CATMOUSE
algebra.
Finally, the classical denotational semantics is but another example of the initial algebra semantics for CFG. Goguen et al. note that Scott's ``Data types as lattices'' begins with the motto ``Extend BNF to semantics''. Scott and Strachey in ``Towards a mathematical semantics for computer languages'' write: ``The semantical definition is syntax directed in that it follows the same order of clauses and transforms each language construct into the intended operations on the meanings of the parts.'' Goguen et al.: ``This essentially says that syntax is context-free and semantics is a homomorphism.''