Semantics of Natural Languages

Montagovian semantics and Scala programming

Scala programming language has a feature that is prominent in natural languages. Perhaps the feature was introduced into Scala to make our linguistic intuitions guide our programming.

The feature in question is the notation for anonymous functions: the underscore. It seems that in Scala

• `_ ` (just underscore) is the abbreviation for the identity function
`f => f`
• `((_:Integer).intValue) ` is the abbreviation for
`((x:Integer) => x.intValue)`
• `(Integer.valueOf(_)) ` abbreviates
`(x => Integer.valueOf(x))`
• `anObject.method(_ ~ xxx ~ yyy) ` is the abbreviation of
`anObject.method(f => f ~ xxx ~ yyy)`

Let us briefly review the Montagovian semantics of natural languages. A declarative sentence such as

```     	 (1) John saw Mary.
```
denotes a proposition, which is true just in case a person denoted by the name John indeed saw a person denoted by the name Mary. We can write the proposition as a formula in the Simple Theory of Types (STT):
```     	 (2) see mary john
```
where `mary` and `john` are individuals (of the type `e`) who go by the names Mary and John, respectively; `see` , the meaning of the transitive verb `see', is a function symbol of the type `e -> e -> t` . Here `t` is the type of propositions. (In Church's STT, the type `e` is called `i` and the type `t` called `o` . For some reason, Montague renamed the types).

Let us consider a sentence

```     	 (3) It was Mary who John saw.
```
or, simpler
```     	 (4) Mary, John saw.
```
We'd like to write a logical formula representing the meaning of (4). We would like our semantics to be compositional: the meaning of a sentence should be composed from the meaning of its parts. Thus, we have to determine the meaning of the part ``John saw.'' A transitive verb requires two arguments: first an object, then a subject. The phrase ``John saw'' seems to be missing an object. To handle such phrases -- verb phrases (VP) missing a component -- linguists have introduced a so-called `trace', often written as an underscore. The trace is assumed silent (not pronounced). Thus the sentence (4) is to be written
```     	 (4') Mary, John saw _.
```

The sentences with traces are common. Examples include raised questions and relative clauses:

```     	(5) Who did John see _.
(6) I know whom John saw _.
(7) The person John saw _ ran away.
```
Some theories of quantifier scope use trace to explain how a quantifier word such as `someone' influences the meaning of -- `takes scope over' -- the entire sentence. This so-called `quantifier raising' posits that a sentence such as
```     	 Alice thinks someone left.
```
should be analyzed as if it were
```     	 someone (Alice thinks _ left).
```

It seems like a good idea to take the meaning of ``John saw _'' to be

```     	 (8) fun x -> see x john
```
The meaning of (4') then can be composed from the meaning of its components as
```     	 (9) (fun x -> see x john) mary
```
A beta-reduction then gives us (2). Chomsky might say that the transformation from the surface form of the utterance (4') to the logical, hidden form (2) involves a movement of Mary into the position occupied by the trace.

We now face the question of the meaning of trace and using that meaning to compositionally derive (8) from ``John saw _.'' We observe that ``John saw _'' looks like ``John saw Mary'' with ``Mary'' taken out. Therefore, ``John saw _'' is a context , a term with a hole; trace denotes the hole. Continuations are the meanings of contexts. Since ``John saw _'' is a sub-phrase of the larger sentence (4'), our context is partial -- or delimited, -- with a delimited continuation as its meaning. A delimited continuation can be captured by a delimited control operator `shift`:

```     	 reset (see (shift (fun k -> k)) john)
```
indeed reduces to
```     	 fun x -> see x john
```
Therefore, we may take the meaning of trace to be `(shift (fun (k -> k))` . To type that expression, we need shift with the answer-type modification.

The topic of delimited control in natural languages is discussed in great detail by Chung-chieh Shan in a book chapter ``Linguistic Side Effects.'' He analyzes more complex sentences, for example:

```     	(10) Who _ saw his mother?
```

A pronoun could be treated like a trace. The sentence (10) hence exhibits three components with side-effect: the question word, the trace, and the pronoun. Chung-chieh Shan explains their interactions.

Coming back to Scala, it seems that `_` is the trace and the expression boundary is a reset.

It is remarkable that natural and programming languages can inform each other. Consciously or not, Scala seems like the best example of such mutual development.

Version
The current version is November 2, 2009.
References
Chung-chieh Shan: Linguistic Side Effects.
In ``Direct compositionality,'' ed. Chris Barker and Pauline Jacobson, pp. 132-163. Oxford University Press, 2007.

Canonical Constituents and Non-canonical Coordination: Simple Categorial Grammar Account

[The Abstract of the paper]
A variation of the standard non-associative Lambek calculus with the slightly non-standard yet very traditional semantic interpretation turns out to straightforwardly and uniformly express the instances of non-canonical coordination while maintaining phrase structure constituents. Non-canonical coordination looks just as canonical on our analyses. Gapping, typically problematic in Categorial Grammar--based approaches, is analyzed just like the ordinary object coordination. Furthermore, the calculus uniformly treats quantification in any position, quantification ambiguity and islands. It lets us give what seems to be the simplest account for both narrow- and wide-scope quantification into coordinated phrases and of narrow- and wide-scope modal auxiliaries in gapping.

The calculus lets us express standard covert movements and anaphoric-like references (analogues of overt movements) in types -- as well as describe how the context can block these movements.

Version
The current version is November 2014.
References
NL.pdf [252K]
The extended article
The slightly shorter paper was published in the Proceedings of LENLS 11, the International Workshop on Logic and Engineering of Natural Language Semantics. November 22-24, 2014, Japan, pp. 138-151.

lenls-talk.pdf [216K]
The annotated slides of the talk at LENLS 11. November 23, 2014. Keio University, Japan.

HOCCG.hs [29K]
The accompanying source code to verify the analyses described in the paper
The code checks NL derivations and computes the logical formula representing the meaning of the corresponding sentences. The code is in Haskell, in the tagless-final style.

Continuation Hierarchy and Quantifier Scope

[The Abstract of the chapter]
We present a directly compositional and type-directed analysis of quantifier ambiguity, scope islands, wide-scope indefinites and inverse linking. It is based on Danvy and Filinski's continuation hierarchy, with deterministic semantic composition rules that are uniquely determined by the formation rules of the overt syntax. We thus obtain a compositional, uniform and parsimonious treatment of quantifiers in subject, object, embedded-NP and embedded-clause positions without resorting to Logical Forms, Cooper storage, type-shifting and other ad hoc mechanisms.

To safely combine the continuation hierarchy with quantification, we give a precise logical meaning to often used informal devices such as picking a variable and binding it off. Type inference determines variable names, banishing ``unbound traces''.

Quantifier ambiguity arises in our analysis solely because quantifier words are polysemous, or come in several strengths. The continuation hierarchy lets us assign strengths to quantifiers, which determines their scope. Indefinites and universals differ in their scoping behavior because their lexical entries are assigned different strengths. PPs and embedded clauses, like the main clause, delimit the scope of embedded quantifiers. Unlike the main clause, their limit extends only up to a certain hierarchy level, letting higher-level quantifiers escape and take wider scope. This interplay of strength and islands accounts for the complex quantifier scope phenomena.

We present an economical ``direct style'', or continuation hierarchy on-demand, in which quantifier-free lexical entries and phrases keep their simple, unlifted types.

Joint work with Chung-chieh Shan.

Version
The current version is 2014.
References
inverse-scope.pdf [380K]
The paper appeared as a chapter in the book Formal Approaches to Semantics and Pragmatics: Japanese and Beyond published in the series Studies in Linguistics and Philosophy (E. McCready, K. Yabushita and K. Yoshimoto, eds.)
Springer Netherlands, 2014, pp. 105-134.
doi:10.1007/978-94-017-8813-7_6

QuanCPS.hs [24K]
The accompanying source code to verify the analyses described in the paper and compute the denotations.
The code is in Haskell, in the tagless-final style.

Applicative Abstract Categorial Grammar: Syntax-semantics interface and the non-trivial computation of meaning

Applicative ACG was introduced in two talks for two distinct audiences. The following complementary abstracts give a well-rounded view of the applicative ACG.
Talk at the meeting of the Association for Symbolic Logic, December 2011

We describe the relationship between the surface form of a sentence and its meaning as two non-trivial but mechanical interpretations of the same (hidden) abstract form. Our approach, in the tradition of abstract categorial grammar (ACG), deals with several languages: The language `A` is the language of abstract form; the language `I` is the language in which we write a compositional interpreter of `A`, which produces a term in a language `T`. The language `A` is simply-typed lambda-calculus with constants.

A syntactic interpretation `Isyn` gives a surface form for a sentence corresponding to `A`; `Tsyn` is a set of strings or utterances. This interpretation is responsible for word order, case marking, subject-verb agreement, etc. Therefore, `A` is abstracted from these syntactic details. `A` can be determined from `Tsyn` by parsing. If the abstract language is simple -- that is, its constants have types of low order -- parsing is tractable. This is the case for our ACGs.

A semantic interpretation `Isem` computes the meaning of `A` as a logical formula; `Tsem` is thus the language of predicate logic. `Isem` deals with quantifiers, pronouns, question words and their interactions. Our `Isem` interpreter is non-trivial: it may fail. Although we may parse a sentence to an abstract form, we may fail to compute any meaning for it, if the sentence just `doesn't make sense'. Our `Isem` is not a regular lambda-calculus: it expresses computational effects such as mutation and continuations, and it is not normalizable. The interpreter `Isem :: A -> Tsem` may be represented as a composition of two interpreters `I1 :: A -> T1` and `I2 :: T1 -> Tsem` for some intermediate language `T1`. A complex transformation may be assembled from simpler steps.

The computational effects let us express in a modular fashion the interaction of a phrase with its context, and analyze different scopes of quantifiers and scoping islands.

Talk at the Toukyou Semantics Group Research seminar

Abstract Categorial Grammar (ACG) is a grammar formalism based on typed lambda-calculus. The syntactic side of ACGs -- parsing algorithms, generative power and other formal-language--theoretical properties -- is under active investigation. ACGs can also elegantly model syntax-semantics interface and compute truth conditions. This semantic side of ACGs has received much less attention.

We describe a generalization of ACGs that lets us give the standard dynamic-logic account of anaphora and analyze quantifier strength, quantifier ambiguity and scope islands. Most of these ACG analyses have not been possible before; prior ACG analyzes of quantifier ambiguity required type lifting, hence higher-order ACGs with very high parsing complexity.

Our generalization to ACG affects only the mapping from abstract language to semantics. We retain all ACG benefits of parsing from the surface form. By avoiding type lifting we keep the order of the abstract signature low, so that parsing remains tractable.

The generalization relies on `applicative functors', which extend function applications. The fact that applicative functors compose lets us take the full advantage of modularity and compositionality of ACGs. We assemble the semantic mapping from separate, independently developed components responsible for a single phenomenon such as anaphora, coordination, universal or indefinite operator. We have implemented the generalized ACGs in a `semantic calculator', which is the ordinary Haskell interpreter. The calculator lets us write grammar derivations in a linguist-readable form and see their yields, types and truth conditions. We easily extend fragments with more lexical items and operators, and experiment with different semantic-mapping assemblies.

Version
The current version is June, 2012.
References
semantics-seminar.pdf [188K]
Annotated slides of the talk presented at the Semantics Research seminar. National Institute of Informatics, Toukyou, Japan. June 1, 2012.
< http://research.nii.ac.jp/~kanazawa/semantics.html >

Abstract.hs [7K]
The definition of the Abstract language and its mapping to the surface `syntax'
We define Abstract for a small fragment, later extended with sentence conjunction, anaphora and quantifiers.

Logic.hs [3K]
The language of meaning: First-order predicate logic
We also lift the logic through an applicative.

Sem.hs [8K]
The syntax-semantic interface
We demonstrate the interface for the small fragment, and its modular extension for sentence conjunction, dynamic logic, and two levels of quantification. We illustrate the successive, multi-stage re-writing of Abstract into the meaning.

Applicatives.hs [2K]
Standard CPS and State Applicatives, which could have been defined in the standard applicative library

Computational Abstract Grammars: Quantifier scope ambiguities and verbal ellipsis

To give a principled account of Cooper storage within the framework of Abstract Categorial Grammars (ACG), we propose an enhancement to ACG along the lines of dynamic logic. We add a more expressive semantic lexicon that includes multi-prompt delimited continuations and the notion of evaluation. Multi-prompt delimited control naturally expresses the well-known fact that different quantifier types have different scope-taking abilities. We illustrate our extension by analyzing scope ambiguities and quantification in conjunctions, verbal ellipsis, negation and relative clauses. Borrowing the examples from the ACG literature, we show their type-lifting--free analyses. We give the first analyses of scoping islands within the ACG framework. Scoping islands and anaphoric indefinite descriptions are described in more detail in a separate article.

Abstract Categorial Grammars (ACG) represent syntax-semantic interface by regarding the surface and logical forms as two different interpretations of an abstract form. All three forms are represented in ACG uniformly, as closed (normal) terms in typed linear lambda-calculus over a signature defining base types, constants and the type assignments to the constants. An interpretation of one, abstract calculus, in another, object calculus is formalized in a lexicon , which is a homomorphic, typing preserving map of constants and types of the abstract calculus into terms and types of the object calculus.

Strictly speaking, the mapping of an abstract term `tA` to an object term includes not only the mapping of constants of `tA` as specified by the lexicon, but also the normalization of the result. The ACG literature shows the normalization (beta-reduction) step when presenting examples; the step, however, is left implicit in the formalism. Since simply-typed lambda-calculus is strongly normalizing, the formalities are not needed.

To model semantics, that is, design an ACG lexicon to map an abstract form to a logical form, extensions are often necessary. For example, the linearity requirement is dropped. We carry out further extensions. First, we distinguish a subset of terms of the logical form calculus and call them values. We define the process of reduction of terms to values as left-to-right call-by-value evaluation. We call the logical form calculus a (semantic) metalanguage.

The mapping of an abstract term `tA` to a logical form is an explicitly two-stage process: we map the constants of `tA` to metalanguage terms, and evaluate the result. The evaluation may fail to yield a value: the mapping of `tA` to a semantic form is partial. We take the failure as an indication of ungrammaticality of `tA` . The corresponding surface forms are thus judged syntactically correct but meaningless. In a departure from the Categorial Grammar tradition, the rules of grammatical composition are specified by the evaluation rules of the metalanguage -- and only incompletely by the underlying type calculus. Types are quickly-decidable approximation of the evaluation rules. Whereas categorial grammars and type-logical grammars stress finding proofs, we stress deterministic evaluation.

To avoid type-lifting and still model (covert) movements we add to our metalanguage multi-prompt delimited continuations. Such continuations implement mutable dynamic binding (or, `local store'), which is a computational model of Cooper storage.

We can use an existing programming language as our metalanguage, which gives us a ready-made implementation of computational abstract grammars. We straightforwardly encode abstract forms and lexicons in OCaml, and see the logical forms as the result of running the corresponding OCaml programs. We also take advantage of the abstraction, modularity, and interactive development and debugging facilities of OCaml.

Version
The current version is 1.0, September, 2009.
References
symantics1.ml [11K]
The OCaml code embedding computational abstract grammars and several examples

Symantics1.hs [6K]
A Haskell translation of an early version of `symantics1.ml` , for comparison

Sylvain Pogodalla: Syntax-Semantics Architecture: Comparing CG and ACG
Section 4 of the Advanced ESSLLI09 course on Abstract Categorial Grammars
< http://www.loria.fr/equipes/calligramme/acg/publications/esslli-09/2009-esslli-acg-week-2-part-2.pdf >
We use the examples from Sec 4.4.

Typed Tagless Interpretations
Abstract Categorial Grammars are the instance of Typed Tagless (Final) Interpretations.

Dynamic Logic in ACG: Discourse Anaphora and Scoping Islands

Dynamic Logic in ACG: Discourse Anaphora and Scoping Islands

To analyse scoping islands within the Abstract Categorial Grammar (ACG) formalism we propose an enhancement to ACG along the lines of dynamic logic. The enhanced ACG explains not only the distinct scopes of universals and indefinites, and clause-boundness of universals. We can also apply our ACG to anaphoric indefinite descriptions in discourse. We explain how an indefinite can scope inside negation, yet cannot scope outside negation and create definitedness presuppositions.

Our enhancement to ACG affects only the mapping from abstract language to semantics. We retain all ACG's benefits of parsing from the surface form. Crucially, by avoiding type lifting we keep the order of the abstract signature low, so that parsing remains tractable.

We regard the mapping from abstract language to semantics partial: some sentences, albeit well-formed, just don't make sense. We model this partial mapping as a potentially failing computation in a call-by-value language with multi-prompt delimited control. The evaluation and type inference rules of the language are simple and deterministic. Control prompts may be regarded as loci of binding or quantification, used by quantified phrases and pronouns and set by context. We arrive at the mechanism of interaction of a phrase with its context that determines the scope.

Version
The current version is 1.0, December, 2009.
References
CAG-talk.pdf [168K]
Annotated slides of the talk presented at the 3d CAuLD workshop: Logical Methods for Discourse. December 15. Nancy, France.

symantics2.ml [11K]
OCaml code with the analyses of the puzzles discussed in the talk
The code presents abstract forms for the puzzles, surface forms and the computed logical formulas, if exist.

< http://www.loria.fr/equipes/calligramme/acg/ >
Abstract Categorial Grammar Homepage

The delimcc library: delimited continuations for OCaml

Compilation by evaluation as syntax-semantics interface

We regard the relation of a natural language phrase to its meaning as a two-stage transformation. First, the source, natural language phrase in concrete syntax is elaborated to an intermediate language term. The elaboration is syntactic and involves parsing, dis-inflection, etc. The intermediate language is then deterministically compiled to Church's simple theory of types (STT). The resulting STT formula is taken to be the denotation, the meaning, of the original source language phrase.

We specify the compilation from the intermediate language to the STT denotation as reduction rules. Hence the intermediate language is (call-by-name) lambda calculus whose constants are STT formulas. To compile an intermediate language term to its STT denotation we evaluate the term. The compilation is not always compositional: so-called scopal expressions contribute their meaning in the places other than where they are seen or heard. Scopal phenomena include generalized quantification, wh-phrases, topicalization, focus, anaphora and binding. To account for these phenomena, our intermediate language includes control operators shift and reset. We are thus able to improve on the previous continuation-based analyses of quantification, binding, raised and in-situ wh-questions, binding in wh-questions, and superiority.

The evaluation of an intermediate language term does not always lead to the STT denotation: the evaluation may get stuck. In that case, we regard the corresponding source language phrase as ungrammatical. We introduce a type system to delineate a set of intermediate language terms whose evaluation never fails to produce denotation. Being well-typed becomes an alternative criterion of grammaticality. Typeability, unlike reducibility, has an advantage of being applicable to separate phrases rather than whole sentences. Our main result is that both typing and call-by-name are necessary to correctly predict superiority and binding in wh-questions with topicalization, without resorting to thunking or type raising and thus maintaining the uniformity of the analyses.

Version
The current version is August 2008.
References
gengo-side-effects-cbn.pdf [161K]
Call-by-name linguistic side effects
ESSLLI 2008 Workshop on Symmetric calculi and Ludics for the semantic interpretation. 4-7 August, 2008. Hamburg, Germany.

gengo-cbn-talk.pdf [224K]
Presentation at the workshop, August 6, 2008. Hamburg, Germany.

gengo-side-effects-cbn.elf [19K]
The complete code with all the analyses described in the paper

Call-by-name typed shift/reset calculus
Our intermediate language

Talk: Delimited Continuations in Computer Science and Linguistics

We give a detailed introduction to delimited continuations -- the meanings of partial contexts -- and point out some of their occurrences in multi-processing, transactions, and non-deterministic computations. After briefly touching on the formalism and the logic of delimited continuations, we concentrate on two their particular uses: placing and retrieving multiple contextual marks (dynamic binding, anaphora) and meta-programming (generating code, generating denotations of interrogative sentences and clauses).
• Delimited continuations
• Continuations are the meanings of evaluation contexts
• Control effects: Process scheduling in OS
• Control effects as debugging
• Control operators shift and reset
• Examining the stack
• Anaphora and context marks
• Several pronouns, several marks
• Far-reaching pronouns
• Generating (sentences, meanings) by jumping back-and-forth
• Natural language sentences as computer programs yielding denotations
• Generating denotations of questions
• CPS and types
• Introduction to CPS. CPS and double negation
• Answer types in the CPS transformation

Joint work with Chung-chieh Shan.

Version
The current version is December 4, 2007.
References
delimcc-tohoku-talk-notes.pdf [246K]
delimcc-tohoku-talk.pdf [166K]
Slides with and without annotations
Slides and notes of the talk given at the Research Center for Language, Brain and Cognition. Tohoku University, Sendai, Japan. December 4, 2007.

Last updated January 10, 2015

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org