previous   next   up   top

Lambda: the ultimate syntax-semantics interface

 

Spreadsheets and Matlab are popular because they let domain experts write down a problem in familiar terms and quickly play with potential solutions. Natural-language semanticists have a better tool. It displays truth conditions, infers types, simplifies terms, and computes yields. Its modularity facilities make it easy to try fragments out, scale them up, and abstract encoding details out of semantic theories.

This tool is not just a niche experiment among semanticists, but a proven combination of techniques using a mature, general-purpose programming language. This tool was recently created by functional programmers, but they are unaware of its application to semantics just as most semanticists are. Our hands-on course teaches this application simultaneously to linguists and programmers, so as to bridge the two communities. We work our way from propositional logic and context-free grammars to dynamic and continuation treatments of quantification and anaphora.

This course on computational Montagovian semantics has been presented together with Chung-chieh Shan at the following schools. Although each presentation was slightly different, the core content was the same.


 

Goals of the course

The course aims to bring together two distinct communities: (i) programmers (computer scientists, computational linguists) who know little about natural language semantics; (ii) linguists (semanticists, philosophers of language) who know little about any programming language.

We hope that by working together on embedding English fragments in Haskell and implementing their semantics, each group will come to appreciate the other's point of view while learning something useful in their professional work:

Most of all, we hope that linguists and programmers will see the point of the other side and be inclined to collaborate.

We have a grander goal in mind. We hope that linguists will draw more advantage of the ideas of side effects, continuations, regions, staging (a.k.a. quotation) and dependent types. These ideas happen to have been developed more in programming language theory and are only recently being consciously applied to natural language semantics. Linguistic applications of these ideas are certain to prompt further development, benefiting the programming language theory as well. We look forward to computer scientists learning from linguists how to build theories of programming language competence. Emotional arguments about ``the best'' programming language should to be replaced by a scientific, predictive theory of how programmers perceive and apply a programming language or its feature.

 

Overview

Our plan for the course is to talk about syntax (context-free grammars, or CFG), semantics (Simple Theory of Types, or lambda-calculus) and a calculational way of relating the two. Since the course centers on Haskell as metalanguage, we give a short introduction to it. Like in an introduction to a foreign language before a trip abroad, the goal is not to produce fluent speakers -- that takes years, for natural and programming languages alike. Rather, the goal is to acquire an intuitive feel for the language, to become comfortable reading `signs' and guessing the meaning of many unknown parts.

We develop many fragments of natural and formal languages in a series of examples. The progression of the examples reflects how we propose linguistic theories be expressed. We first use Haskell as a calculator to express linguistic derivations for a trivial fragment of English as programs. The very form of the programs is intuitive in that it resembles familiar notation and thus makes our intentions clear. Next, we teach our calculator to check our syntactic categories for us. This move begins our journey to expressing theories at a higher level of abstraction, to expressing our intuitions in terms of programs and types. We demonstrate how our calculator builds form and meaning in tandem.

We apply the same approach -- representing valid derivations as well-typed programs and abstracting over interpretations -- to formal languages: propositional logic and higher-order predicate logic (Ty2).

We then grow our languages. To the context-free English fragment, we add quantifiers and obtain their proper treatment. A slight enhancement to our logic, language of meanings, lets us transcribe the common analyses of expressives and intensionality. We extend our fragment with pronouns. To explain their meaning, we grow our logic to add information states, which we then abstract over to create a lambda-calculus with a constant `it'. We add the interpretation of a sentence as an imperative program performing an information ``update''. We explain de Groote's dynamic logic analysis of donkey sentences. The step-by-step extensions are so modular that even our lexical entry for `every' -- written without anaphora in mind -- can then be reused to calculate simplified truth conditions for donkey sentences. Barker and Shan's account of donkey anaphora and Moortgat's symmetric categorial grammar can also be expressed.

We use the programming language Haskell not to implement a parser or framework for syntax and semantics, but as a metalanguage in which to directly express analyses or theories of syntax and semantics. Written in Haskell, the analyses look quite like TeX, but are automatically type-checked and can be simplified.

nat-sem.pdf [265K]
A short paper presenting the course

slides.pdf [111K]
Slides for the lectures, including the exercises

language-map.pdf [19K]
The map of languages and interpretations

 

Main ideas

All throughout the course we will repeatedly encounter the following points:
Calculemus
Let us calculate yields and denotations.
The multitude of fragments and languages
In the course we will be talking about many languages and fragments: Fragments of the natural language to analyze, languages in which to express meanings, and the language in which to describe building the analyses and the descriptions. We call the latter language our metalanguage. It will be Haskell rather than English. After all, we want to not only describe analyzes but mechanically execute them.
Multiple interpretations
A single CFG derivation can be interpreted to yield a text string in English or Japanese, an audio file -- or a semantic denotation, a formula in classical or dynamic logic.
Growing fragments and languages
We stress stepwise refinement and extensibility of fragments, grammars, and interpretations. When we add new features (such as quantification to our fragment or state to our logic), we preserve all existing interpretations, reusing the previous work rather than re-writing it from scratch.
Interactivity
We emphasize ``command line'' testing of sample phrases and derivations, developing our fragments and interpretations by small increments, immediately testing all new additions. We aim to give an impression of GHCi as a syntax-semantic calculator.
The interactivity applies to the course as well. There will be many exercises, both small and advanced, and chances to work in groups. Hopefully all the participants will be interactive.
Montagovian tradition
We hope that our exposition could be thought of as a ``rational reconstruction'' of Montague's approach.
Representing published analyses and theories
In the second part of the course we will be representing, or transcribing, published analyses or theories as they are: Potts' treatment of expresssives, de Groote's dynamic semantics and donkey anaphora, Pollard's Agnostic possible worlds semantics, examples from Zimmermann's NASSLLI 2012 course on Intensionality and semantic bootcamp notes.

language-map.pdf [19K]
The map of languages and interpretations

 

Context-Free Grammars (CFG) derivations as a domain-specific language embedded in Haskell

For linguists not familiar with Haskell, we introduce the language by appealing to the intuition of a calculator. GHCi can calculate with numbers, boolean formulas and strings. A particular strength of the metalanguage is the ability to name frequently occurring expressions. One can think of such definitions as shortcuts, or bookmarks. As we keep entering similar shortcuts, we may want to abstract over their differences -- to parameterize the definitions.

CFG1EN.hs [<1K]
Definitions (or, `bookmarks') and CFG-like derivations

CFG1Sem.hs [<1K]
Semantic interpretation of a CFG derivation

CFG2EN.hs [<1K]
CFG2Sem.hs [<1K]
Same as before, but now with type annotations. The file CFG2Sem.hs tries to repair way too permissive grammar embedding with semantics: ``Using semantics to fix up syntax''

CFG2ENDyn.hs [2K]
Preventing bad derivations `at run time'

CFG3EN.hs [3K]
Introducing type constants; accomplishing the goal that our terms represent all and only valid CFG derivations

CFG3Sem.hs [2K]
Type functions: from syntactic categories to semantic types

CFG4.hs [3K]
Unifying syntax with semantics

 

The syntax and interpretations of semantics

Our language for denotations is essentially Church's ``Simple Theory of Types,'' also known as simply-typed lambda-calculus. It is a form of a higher-order predicate logic, which is often called Ty2.

We have demonstrated how to interpret syntactic (CFG) derivations in several ways. We apply the same approach to semantic forms, interpreting a semantic formula so to evaluate it in a particular world, to print it out, or to simplify it.

Prop.hs [6K]
Warm-up: Embedding Propositional Logic, the language of very simple denotations

Lambda.hs [3K]
Another warm-up: embedding pure lambda-calculus, illustrating higher-order abstract syntax (HOAS)

Semantics.hs [7K]
The grammar of the language of denotations, and its many interpretations

 

Many interpretations of CFG derivations

CFG.hs [4K]
Context-free grammars, in the tagless-final style; syntactic (as English phrase) and semantic interpretations

RickPerry.hs [3K]
Extending the fragment with adjectives and copula for a Rick Perry example from the bootcamp

CFGJ.hs [3K]
Interpreting a CFG derivation as a string in Japanese

 

Context-free grammars with quantifiers

We introduce the first major extension: quantifiers.

QCFG.hs [4K]
Adding QNP in the tradition of Montague

QCFGJ.hs [2K]
Likewise, extending the Japanese interpretation

QHCFG.hs [3K]
A different way to add quantification, relying on higher-order abstract syntax (HOAS). We thus attempt a `rational reconstruction' of Montague's general approach of `administrative pronouns', which gave rise to Quantifier Raising (QR).

 

Expressives

The sentence ``I have seen most bloody Monty Python sketches!'' does not commonly mean (at least to a UK listener) that the speaker has seen most of those Monty Python sketches that were drenched in blood. Rather, the speaker has seen most Monty Python sketches, and the speaker has the negative attitude towards Monty Python sketches. Two dimensions of meaning are apparent: the at-issue content (what is being asserted) and the expressive content about speaker's attitude. The negated sentence ``I have not seen most bloody Monty Python sketches!'' makes the opposite assertion about seeing the sketches but has the same expressive content of speaker's disapproval. Repeating `bloody' is not redundant but serves to strengthen the irritation of the speaker.

Christopher Potts reviews the features of expressives such as `bloody', epithets such as `the stupid thing', and honorifics. He then proposes a compositional analysis. His analysis tracks the at-issue content and the expressive content as two separate, non-interacting dimensions of meaning. We use our semantics calculator to illustrate Potts' analysis.

Previously, we calculated truth conditions by interpreting a grammar derivation of category NP as a formula lrepr Entity in the language of higher-order logic; the derivation of category S is interpreted as lrepr Bool and the derivation of category VP as lrepr (Entity->Bool). Now we interpret the NP derivation as i (lrepr Entity) and similarly for the others. Here i is an applicative functor (or, Applicative, for short). Like Monads, Applicatives represent computational effects such as mutation, dynamic binding, non-determinism or input-output. With monads, we can choose what computation to perform next based on the result of the previous computation. Applicatives do not give us such a choice: the structure of the computation is fixed before the applicative program is run. For the analysis of expressives, we choose the Writer applicative, whose side-effect is accumulating attitudes.

Our calculation illustrates several principles of Potts' analysis of expressives. First, lexical items like john are mapped to forall i. i (lrepr Entity). The polymorphism over i ensures that such lexical items contribute only to the at-issue meaning. Second, Applicatives guarantee by design that the value produced by an applicative cannot contribute to Applicative's side-effect. In other words, the content at issue cannot affect the expressive dimension. The contribution to the expressive dimension can only come from special lexical items such as `bloody' or from special combination modes (not present in our analysis).

Version
The current version is August 2013.
References
Expressives.hs [7K]
Potts' analysis of expressive in semantics calculator

Christopher Potts: The Logic of Conventional Implicatures
PhD thesis, UC Santa Cruz, 2003.
< http://www.stanford.edu/~cgpotts/dissertation/potts-dissertation-1up.pdf >

Christopher Potts: The expressive dimension
Theoretical Linguistics 33, (2):165-197, 2007.
< http://www.stanford.edu/~cgpotts/papers/potts-expressives06.pdf >

Conor McBride and Ross Paterson: Applicative Programming with Effects
Journal of Functional Programming 18:1 (2008), pages 1-13.
< http://www.soi.city.ac.uk/~ross/papers/Applicative.html >

 

Dynamic Logic and donkey anaphora

< http://www.inria.fr/rocquencourt/rendez-vous/modele-et-algo/dynamic-logic-a-type-theoretic-view >
Philippe de Groote. Dynamic logic: a type-theoretic view
Talk slides at `Le modele et l'algorithme', Rocquencourt, 2010.

Dynamics.hs [5K]
Implementing de Groote's approach: extending our fragment with pronouns, and the language of denotations with state

CCG.hs [6K]
A sketch of Combinatorial Categorial Grammar (CCG)

Tower.hs [8K]
Chung-chieh Shan's implementation of the continuation semantics of
  Chung-chieh Shan and Chris Barker. 2006. Explaining crossover and superiority as left-to-right evaluation.
  Linguistics and Philosophy 29(1):91-134.
and the tower notation of
  Chris Barker and Chung-chieh Shan. 2008. Donkey anaphora is in-scope binding. Semantics and Pragmatics 1(1):1-46.

 

References and further reading

Closely related to the present course in spirit but with a different subject matter or metalanguage:

Closely related to the present course in subject matter (semantics):

The technique of extensible language embeddings is described in the following publications:

Version
The current version is July 2012.


Last updated September 1, 2013

This site's top page is http://okmij.org/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML