previous next up top

Lambda: the ultimate syntax-semantics interface

Spreadsheets and Matlab are popular because they let domain experts write down a problem in familiar terms and quickly play with potential solutions. Natural-language semanticists have a better tool. It displays truth conditions, infers types, simplifies terms, and computes yields. Its modularity facilities make it easy to try fragments out, scale them up, and abstract encoding details out of semantic theories.

This tool is not just a niche experiment among semanticists, but a proven combination of techniques using a mature, general-purpose programming language. This tool was recently created by functional programmers, but they are unaware of its application to semantics just as most semanticists are. Our hands-on course teaches this application simultaneously to linguists and programmers, so as to bridge the two communities. We work our way from propositional logic and context-free grammars to dynamic and continuation treatments of quantification and anaphora.

This course on computational Montagovian semantics has been presented together with Chung-chieh Shan at the following schools. Although each presentation was slightly different, the core content was the same.

Five-day course, August 5 through 9, at the 25th European Summer School in Logic, Language and Information, ESSLLI 2013. Duesseldorf, Germany.
< http://esslli2013.de/ >

Five-day course at the North American Summer School on Logic, Language and Information, NASSLLI 2012, at University of Texas, Austin, TX, USA, June 18-22, 2012.
< http://nasslli2012.com/courses/lambda-the-ultimate >

Two-day short course at NASSLLI 2010 in Bloomington, IN in June 2010.

Goals of the course
Overview
Main ideas
Context-Free Grammars (CFG) derivations as a domain-specific language embedded in Haskell
The syntax and interpretations of semantics
Many interpretations of CFG derivations
Context-free grammars with quantifiers
Expressives
Dynamic Logic and donkey anaphora
References and further reading

Goals of the course

The course aims to bring together two distinct communities: (i) programmers (computer scientists, computational linguists) who know little about natural language semantics; (ii) linguists (semanticists, philosophers of language) who know little about any programming language.

We hope that by working together on embedding English fragments in Haskell and implementing their semantics, each group will come to appreciate the other's point of view while learning something useful in their professional work:

Programmers will learn a systematic way of building, interpreting, and transforming domain-specific languages. Moreover, they will acquire the linguistic perspective: they will look at each program or library or API as an embedding of some language, and they would ponder what they are trying to communicate and what their language means.
Linguists will discover that their intuitions about languages and logic help them make sense of computer programs: programming languages do behave fairly intuitively. Linguists and logicians will see the value of Haskell in expressing their ideas and theories (besides the obvious value in analyzing corpora, counting words, etc.), as a complement to the usual pen-and-paper approach in building theories. Linguists and philosophers will learn several concepts of computer science such as language embeddings, typed program transformation, continuation -- which they may be unknowingly trying to re-invent. The commonality between programming and natural language semantics goes far beyond the untyped lambda-calculus.

Most of all, we hope that linguists and programmers will see the point of the other side and be inclined to collaborate.

We have a grander goal in mind. We hope that linguists will draw more advantage of the ideas of side effects, continuations, regions, staging (a.k.a. quotation) and dependent types. These ideas happen to have been developed more in programming language theory and are only recently being consciously applied to natural language semantics. Linguistic applications of these ideas are certain to prompt further development, benefiting the programming language theory as well. We look forward to computer scientists learning from linguists how to build theories of programming language competence. Emotional arguments about ``the best'' programming language should to be replaced by a scientific, predictive theory of how programmers perceive and apply a programming language or its feature.

Overview

Our plan for the course is to talk about syntax (context-free grammars, or CFG), semantics (Simple Theory of Types, or lambda-calculus) and a calculational way of relating the two. Since the course centers on Haskell as metalanguage, we give a short introduction to it. Like in an introduction to a foreign language before a trip abroad, the goal is not to produce fluent speakers -- that takes years, for natural and programming languages alike. Rather, the goal is to acquire an intuitive feel for the language, to become comfortable reading `signs' and guessing the meaning of many unknown parts.

We develop many fragments of natural and formal languages in a series of examples. The progression of the examples reflects how we propose linguistic theories be expressed. We first use Haskell as a calculator to express linguistic derivations for a trivial fragment of English as programs. The very form of the programs is intuitive in that it resembles familiar notation and thus makes our intentions clear. Next, we teach our calculator to check our syntactic categories for us. This move begins our journey to expressing theories at a higher level of abstraction, to expressing our intuitions in terms of programs and types. We demonstrate how our calculator builds form and meaning in tandem.

We apply the same approach -- representing valid derivations as well-typed programs and abstracting over interpretations -- to formal languages: propositional logic and higher-order predicate logic (Ty2).

We then grow our languages. To the context-free English fragment, we add quantifiers and obtain their proper treatment. A slight enhancement to our logic, language of meanings, lets us transcribe the common analyses of expressives and intensionality. We extend our fragment with pronouns. To explain their meaning, we grow our logic to add information states, which we then abstract over to create a lambda-calculus with a constant `it'. We add the interpretation of a sentence as an imperative program performing an information ``update''. We explain de Groote's dynamic logic analysis of donkey sentences. The step-by-step extensions are so modular that even our lexical entry for `every' -- written without anaphora in mind -- can then be reused to calculate simplified truth conditions for donkey sentences. Barker and Shan's account of donkey anaphora and Moortgat's symmetric categorial grammar can also be expressed.

We use the programming language Haskell not to implement a parser or framework for syntax and semantics, but as a metalanguage in which to directly express analyses or theories of syntax and semantics. Written in Haskell, the analyses look quite like TeX, but are automatically type-checked and can be simplified.

nat-sem.pdf [265K]
A short paper presenting the course

slides.pdf [111K]
Slides for the lectures, including the exercises

language-map.pdf [19K]
The map of languages and interpretations

Main ideas

All throughout the course we will repeatedly encounter the following points:

Calculemus: Let us calculate yields and denotations.
The multitude of fragments and languages: In the course we will be talking about many languages and fragments: Fragments of the natural language to analyze, languages in which to express meanings, and the language in which to describe building the analyses and the descriptions. We call the latter language our metalanguage. It will be Haskell rather than English. After all, we want to not only describe analyzes but mechanically execute them.
Multiple interpretations: A single CFG derivation can be interpreted to yield a text string in English or Japanese, an audio file -- or a semantic denotation, a formula in classical or dynamic logic.
Growing fragments and languages: We stress stepwise refinement and extensibility of fragments, grammars, and interpretations. When we add new features (such as quantification to our fragment or state to our logic), we preserve all existing interpretations, reusing the previous work rather than re-writing it from scratch.
Interactivity: We emphasize ``command line'' testing of sample phrases and derivations, developing our fragments and interpretations by small increments, immediately testing all new additions. We aim to give an impression of GHCi as a syntax-semantic calculator.
The interactivity applies to the course as well. There will be many exercises, both small and advanced, and chances to work in groups. Hopefully all the participants will be interactive.
Montagovian tradition: We hope that our exposition could be thought of as a ``rational reconstruction'' of Montague's approach.
Representing published analyses and theories: In the second part of the course we will be representing, or transcribing, published analyses or theories as they are: Potts' treatment of expresssives, de Groote's dynamic semantics and donkey anaphora, Pollard's Agnostic possible worlds semantics, examples from Zimmermann's NASSLLI 2012 course on Intensionality and semantic bootcamp notes.

language-map.pdf [19K]
The map of languages and interpretations

Context-Free Grammars (CFG) derivations as a domain-specific language embedded in Haskell

For linguists not familiar with Haskell, we introduce the language by appealing to the intuition of a calculator. GHCi can calculate with numbers, boolean formulas and strings. A particular strength of the metalanguage is the ability to name frequently occurring expressions. One can think of such definitions as shortcuts, or bookmarks. As we keep entering similar shortcuts, we may want to abstract over their differences -- to parameterize the definitions.

CFG1EN.hs [<1K]
Definitions (or, `bookmarks') and CFG-like derivations

CFG1Sem.hs [<1K]
Semantic interpretation of a CFG derivation

CFG2EN.hs [<1K]
CFG2Sem.hs [<1K]
Same as before, but now with type annotations. The file CFG2Sem.hs tries to repair way too permissive grammar embedding with semantics: ``Using semantics to fix up syntax''

CFG2ENDyn.hs [2K]
Preventing bad derivations `at run time'

CFG3EN.hs [3K]
Introducing type constants; accomplishing the goal that our terms represent all and only valid CFG derivations

CFG3Sem.hs [2K]
Type functions: from syntactic categories to semantic types

CFG4.hs [3K]
Unifying syntax with semantics

The syntax and interpretations of semantics

Our language for denotations is essentially Church's ``Simple Theory of Types,'' also known as simply-typed lambda-calculus. It is a form of a higher-order predicate logic, which is often called Ty2.

We have demonstrated how to interpret syntactic (CFG) derivations in several ways. We apply the same approach to semantic forms, interpreting a semantic formula so to evaluate it in a particular world, to print it out, or to simplify it.

Prop.hs [6K]
Warm-up: Embedding Propositional Logic, the language of very simple denotations

Lambda.hs [3K]
Another warm-up: embedding pure lambda-calculus, illustrating higher-order abstract syntax (HOAS)

Semantics.hs [7K]
The grammar of the language of denotations, and its many interpretations

Many interpretations of CFG derivations

CFG.hs [4K]
Context-free grammars, in the tagless-final style; syntactic (as English phrase) and semantic interpretations

RickPerry.hs [3K]
Extending the fragment with adjectives and copula for a Rick Perry example from the bootcamp

CFGJ.hs [3K]
Interpreting a CFG derivation as a string in Japanese

Context-free grammars with quantifiers

We introduce the first major extension: quantifiers.

QCFG.hs [4K]
Adding QNP in the tradition of Montague

QCFGJ.hs [2K]
Likewise, extending the Japanese interpretation

QHCFG.hs [3K]
A different way to add quantification, relying on higher-order abstract syntax (HOAS). We thus attempt a `rational reconstruction' of Montague's general approach of `administrative pronouns', which gave rise to Quantifier Raising (QR).

Expressives

The sentence ``I have seen most bloody Monty Python sketches!'' does not commonly mean (at least to a UK listener) that the speaker has seen most of those Monty Python sketches that were drenched in blood. Rather, the speaker has seen most Monty Python sketches, and the speaker has the negative attitude towards Monty Python sketches. Two dimensions of meaning are apparent: the at-issue content (what is being asserted) and the expressive content about speaker's attitude. The negated sentence ``I have not seen most bloody Monty Python sketches!'' makes the opposite assertion about seeing the sketches but has the same expressive content of speaker's disapproval. Repeating `bloody' is not redundant but serves to strengthen the irritation of the speaker.

Christopher Potts reviews the features of expressives such as `bloody', epithets such as `the stupid thing', and honorifics. He then proposes a compositional analysis. His analysis tracks the at-issue content and the expressive content as two separate, non-interacting dimensions of meaning. We use our semantics calculator to illustrate Potts' analysis.

Previously, we calculated truth conditions by interpreting a grammar derivation of category NP as a formula lrepr Entity in the language of higher-order logic; the derivation of category S is interpreted as lrepr Bool and the derivation of category VP as lrepr (Entity->Bool). Now we interpret the NP derivation as i (lrepr Entity) and similarly for the others. Here i is an applicative functor (or, Applicative, for short). Like Monads, Applicatives represent computational effects such as mutation, dynamic binding, non-determinism or input-output. With monads, we can choose what computation to perform next based on the result of the previous computation. Applicatives do not give us such a choice: the structure of the computation is fixed before the applicative program is run. For the analysis of expressives, we choose the Writer applicative, whose side-effect is accumulating attitudes.

Our calculation illustrates several principles of Potts' analysis of expressives. First, lexical items like john are mapped to forall i. i (lrepr Entity). The polymorphism over i ensures that such lexical items contribute only to the at-issue meaning. Second, Applicatives guarantee by design that the value produced by an applicative cannot contribute to Applicative's side-effect. In other words, the content at issue cannot affect the expressive dimension. The contribution to the expressive dimension can only come from special lexical items such as `bloody' or from special combination modes (not present in our analysis).

Version

The current version is August 2013.

References

Expressives.hs [7K]
Potts' analysis of expressive in semantics calculator

Christopher Potts: The Logic of Conventional Implicatures
PhD thesis, UC Santa Cruz, 2003.
< http://www.stanford.edu/~cgpotts/dissertation/potts-dissertation-1up.pdf >

Christopher Potts: The expressive dimension
Theoretical Linguistics 33, (2):165-197, 2007.
< http://www.stanford.edu/~cgpotts/papers/potts-expressives06.pdf >

Conor McBride and Ross Paterson: Applicative Programming with Effects
Journal of Functional Programming 18:1 (2008), pages 1-13.
< http://www.soi.city.ac.uk/~ross/papers/Applicative.html >

Dynamic Logic and donkey anaphora

< http://www.inria.fr/rocquencourt/rendez-vous/modele-et-algo/dynamic-logic-a-type-theoretic-view >
Philippe de Groote. Dynamic logic: a type-theoretic view
Talk slides at `Le modele et l'algorithme', Rocquencourt, 2010.

Dynamics.hs [5K]
Implementing de Groote's approach: extending our fragment with pronouns, and the language of denotations with state

CCG.hs [6K]
A sketch of Combinatorial Categorial Grammar (CCG)

Tower.hs [8K]
Chung-chieh Shan's implementation of the continuation semantics of
  Chung-chieh Shan and Chris Barker. 2006. Explaining crossover and superiority as left-to-right evaluation.
  Linguistics and Philosophy 29(1):91-134.
and the tower notation of
  Chris Barker and Chung-chieh Shan. 2008. Donkey anaphora is in-scope binding. Semantics and Pragmatics 1(1):1-46.

References and further reading

Closely related to the present course in spirit but with a different subject matter or metalanguage:

Kees Doets and Jan van Eijck: The Haskell Road to Logic, Maths and Programming
College Publications, 2004. 444 pp.
< http://homepages.cwi.nl/~jve/HR/ >
Chris Barker and Jim Pryor: What Philosophers and Linguists Can Learn From Theoretical Computer Science But Didn't Know To
Seminar in Semantics / Philosophy of Language, taught in the Fall 2010 at NYU.
< http://lambda.jimpryor.net/ >
The course notes include many references for background reading.
Fernando C. N. Pereira and Stuart M. Shieber: Prolog and Natural-Language Analysis
Center for the Study of Language and Information, Stanford, CA, 1987
< http://www.mtome.com/Publications/PNLA/pnla.html >
Embedding dynamic epistemic logic in Haskell

Closely related to the present course in subject matter (semantics):

Lucas Champollion, Joshua Tauberer and Maribel Romero. The Penn Lambda Calculator: Pedagogical software for natural language semantics
Proceedings of the grammar engineering across frameworks workshop, 2007
< http://www.ling.upenn.edu/lambda/ >
Jan van Eijck and Christina Unger: Computational Semantics with Functional Programming
Cambridge University Press, 2010. 418 pp.
< http://www.computational-semantics.eu >
Aarne Ranta and collaborators: Grammatical Framework
< http://www.grammaticalframework.org/ >
Grammatical framework is implemented in Haskell but not embedded in Haskell

The technique of extensible language embeddings is described in the following publications:

Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages
Typed tagless-final interpretations: Lecture notes
Philippe de Groote, Sylvain Pogodalla and Makoto Kanazawa: Abstract Categorial Grammars
< http://calligramme.loria.fr/acg/ >

Version

The current version is July 2012.

Last updated September 1, 2013

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML