Semantics of Natural Languages

Compositional semantics of `same', `different', `total'
Lambek Grammars are strongly equivalent to Context Free Grammars, for all practical purposes
Continuation Hierarchy and Quantifier Scope
Lambda: the ultimate syntax-semantics interface
Two styles of semantic analyses: proof search vs. computation
Transformational Semantics (TS)
Lambek Grammars as an embedded DSL: a tool for semanticists
Polynomial Event Semantics: A variable-free event semantics
Montagovian semantics and Scala programming
Lambek Grammars for Computer Scientists
Canonical Constituents and Non-canonical Coordination: Simple Categorial Grammar Account
Applicative Abstract Categorial Grammars
Compilation by evaluation as syntax-semantics interface
Talk: Delimited Continuations in Computer Science and Linguistics

Montagovian semantics and Scala programming

Scala programming language has a feature that is prominent in natural languages. Perhaps the feature was introduced into Scala to make our linguistic intuitions guide our programming.

The feature in question is the notation for anonymous functions: the underscore. It seems that in Scala

_ (just underscore) is the abbreviation for the identity function
f => f
((_:Integer).intValue) is the abbreviation for
((x:Integer) => x.intValue)
(Integer.valueOf(_)) abbreviates
(x => Integer.valueOf(x))
anObject.method(_ ~ xxx ~ yyy) is the abbreviation of
anObject.method(f => f ~ xxx ~ yyy)

Let us briefly review the Montagovian semantics of natural languages. A declarative sentence such as

    (1) John saw Mary.

denotes a proposition, which is true just in case a person denoted by the name John indeed saw a person denoted by the name Mary. We can write the proposition as a formula in the Simple Theory of Types (STT):

    (2) see mary john

where mary and john are individuals (of the type e) who go by the names Mary and John, respectively; see, the meaning of the transitive verb `see', is the function symbol of the type e -> e -> t. Here t is the type of propositions. (In Church's STT, the type e is called i and the type t is called o.)

Let us consider the sentence

    	 (3) It was Mary who John saw.

or, simpler

    	 (4) Mary, John saw.

We'd like to write a logical formula representing the meaning of (4). We would like our semantics to be compositional: the meaning of a sentence should be composed from the meaning of its parts. Thus, we have to determine the meaning of the part ``John saw.'' A transitive verb requires two arguments: first an object, then a subject. The phrase ``John saw'' seems to be missing the object. To handle such phrases -- verb phrases (VP) missing an argument -- linguists have introduced a so-called `trace', often written as an underscore. The trace is assumed silent (not pronounced). Thus the sentence (4) is to be written

    	 (4') Mary, John saw _.

The sentences with traces are common. Examples include raised questions and relative clauses:

    	(5) Who did John see _.
    	(6) I know whom John saw _.
    	(7) The person John saw _ ran away.

Some theories of quantifier scope use trace to explain how a quantifier word such as `someone' influences the meaning of -- `takes scope over' -- the entire sentence. This so-called `quantifier raising' posits that a sentence such as

    	 Alice thinks someone left.

should be analyzed as if it were

    	 someone (Alice thinks _ left).

It seems like a good idea to take the meaning of ``John saw _'' to be

    	 (8) fun x -> see x john

The meaning of (4') then can be composed from the meaning of its components as

    	 (9) (fun x -> see x john) mary

A beta-reduction then gives us (2). Chomsky might say that the transformation from the surface form of the utterance (4') to the logical, hidden form (2) involves the movement of Mary into the position occupied by the trace.

We now face the question of the meaning of trace and using that meaning to compositionally derive (8) from ``John saw _.'' We observe that ``John saw _'' looks like ``John saw Mary'' with ``Mary'' taken out. Therefore, ``John saw _'' is a context: a term with a hole; trace denotes the hole. Continuations are the meanings of contexts. Since ``John saw _'' is a sub-phrase of the larger sentence (4'), our context is partial -- or delimited, -- with a delimited continuation as its meaning. A delimited continuation can be captured by the delimited control operator" shift:

    	 reset (see (shift (fun k -> k)) john)

indeed reduces to

    	 fun x -> see x john

Therefore, we may take the meaning of trace to be (shift (fun (k -> k)). To type that expression, we need shift with the answer-type modification.

The topic of delimited control in natural languages is discussed in great detail by Chung-chieh Shan in a book chapter ``Linguistic Side Effects.'' He analyzes more complex sentences, for example:

    	(10) Who _ saw his mother?

A pronoun could be treated like a trace. The sentence (10) hence exhibits three components with side-effect: the question word, the trace, and the pronoun. Chung-chieh Shan explains their interactions.

Coming back to Scala, it seems that _ is the trace and the expression boundary is a reset.

It is remarkable that natural and programming languages can inform each other. Consciously or not, Scala seems like the best example of such mutual development.

Version

The current version is November 2, 2009

References

Chung-chieh Shan: Linguistic Side Effects.
In ``Direct compositionality,'' ed. Chris Barker and Pauline Jacobson, pp. 132-163. Oxford University Press, 2007.

Lambek Grammars are strongly equivalent to Context Free Grammars, for all practical purposes

We demonstrate that for all practical purposes, Lambek Grammars (LG) are strongly equivalent to Context-Free Grammars (CFG) and hence to second-order Abstract Categorial Grammars (ACG). To be precise, for any Lambek Grammar LG there exists a second-order ACG with a second-order lexicon such that: the set of LG derivations (with a bound on the `nesting' of introduction rules) is the abstract language of the ACG, and the set of yields of those derivations is its object language. Furthermore, the LG lexicon is represented in the abstract ACG signature with no duplications. The fixed, and small, bound on the nesting of introduction rules seems adequate for natural languages. One may therefore say that ACGs are not merely just as expressive as LG, but strongly equivalent.

The key is the algebraic description of Lambek Grammar derivations, and the avoidance of the Curry-Howard correspondence with lambda calculus.

To put the conclusions differently, for any LG and the natural number n, there exists a CFG whose derivations are all and only LG derivations of hyp-rank n. Furthermore, the LG lexicon enters CFG as is, with no duplications, let alone exponential explosions. Thus, LGs of a bounded hyp-rank (which are adequate for natural languages) are strongly equivalent to CFGs.

The paper also formally demonstrates that Lambek Grammars can be viewed as algebras.

Joint work with Hoshino Yuya.

Version

The current version is March 2020

References

LambekACG.pdf [450K]
Lambek Grammars as Second-order Abstract Categorial Grammars
New Frontiers in Artificial Intelligence. JSAI-isAI 2019. Lecture Notes in Computer Science, vol 12331, 2020, pp. 231-243. Springer. doi:10.1007/978-3-030-58790-1_15

LambekACG-talk.pdf [257K]
Talk at LENLS 2019, Yokohama, Japan, November 12, 2019

Lambek Grammars as an embedded DSL: a tool for semanticists

Lambek Grammars for Computer Scientists
The talk gives the intuition for why relating Lambek and Context-Free grammars was so difficult.

Compositional semantics of `same', `different', `total'

We propose a strictly compositional and uniform treatment for the internal readings of the symmetric predicates such as same and different and for the summative phrases such as a total of. The analyses handle multiple occurrences of symmetric and summative predicates in the same sentence, alongside of multiple quantificational elements. We treat symmetric predicates and summative phrases as wide-scope generalized existential quantifiers for choice functions. Like Barker's parasitic scope, we treat symmetric and summative expressions as generalized quantifiers and relate them to universal quantification. However, our quantifiers scope wide rather than parasitically and avoid Barker's non-standard interpretation of universal quantification.
We introduce the analyses in terms of the familiar Quantifier Raising framework, which we make more precise and treat as a somewhat informal version of a type-logical grammar.
Version: The current version is August 2015
References: NLsame.pdf [213K]
The paper published in the Proceedings of the ESSLLI 2015 workshop Empirical Advances in Categorial Grammar (CG2015) August 10-14, 2015, Barcelona, Spain
NLsame-talk.pdf [218K]
Annotated slides of the talk, presented at the CG2015 workshop on August 14, 2015

Two styles of semantic analyses: proof search vs. computation

One of the goals of the semantic analysis is comprehension: determining the meaning of an utterance. The subject of the analysis is usually not the raw utterance but rather more or less abstract form thereof, abstracting away peculiarities of pronunciation and writing and often declination, conjugation, etc. One may as well call this `spellout'; we will stay however with `abstract form'. Finding the right level of abstraction is one of the challenges.

One may discern two broad perspectives of the analysis. One is proof search: building a logical system derivation whose conclusion is directly related to the abstract form of an utterance. The successfully obtained derivation is taken as the evidence that the utterance is grammatical. The meaning is then read off the derivation. The proof search perspective is manifest in type-logical grammars; it also underlies general categorial grammars. Parsing as deduction is truly compelling and logically insightful. Yet it is often hard to characterize the space of possible derivations, to see that everything derivable will be judged by the native speakers as grammatical, and everything that judged as grammatical can be derived. In particular, it is difficult to get negative predictions: to prove that there really cannot be a derivation for a particular sentence, no matter how hard we try to build one.

Another perspective is most clearly described by Chung-chieh Shan in his papers on linguistic side effects. (Discourse representation theory (DRT) is also in the same vein.) It is based not on proof but on computation: the source sentence is taken to be a program that computes the meaning as a logical formula. As any program, it may fail: raise an `exception' or stop before computing any result. Thus in the computational perspective, the meaning is obtained as the result of a mostly deterministic, inherently partial, usually precisely specified and often mechanized process. The algorithmic nature of this process lets it lay a credible claim to `real life', physiological relevance. On the downside, the computational process is rigid. It is also too easy to get bogged down to details (control operators, implementation of monads, etc.) Taken the computational perspective as the starting point, the present work aims to raise its level of abstraction, eliding implementation details and adding the `right' amount of flexibility.

Version

The current version is March 2016

References

AACG1.pdf [263K]
Applicative Abstract Categorial Grammars in Full Swing
It is published in the Proceedings of LENLS 2015 as Springer's Lecture Notes in Artificial Intelligence 10091, pp. 66--78, 2017.
The above excerpt is taken from the introduction section of the paper. The more abstract computational approach hinted at at the end has eventually lead to the Transformational Semantics (TS).

Continuation Hierarchy and Quantifier Scope

We present a directly compositional and type-directed analysis of quantifier ambiguity, scope islands, wide-scope indefinites and inverse linking. It is based on Danvy and Filinski's continuation hierarchy, with deterministic semantic composition rules that are uniquely determined by the formation rules of the overt syntax. We thus obtain a compositional, uniform and parsimonious treatment of quantifiers in subject, object, embedded-NP and embedded-clause positions without resorting to Logical Forms, Cooper storage, type-shifting and other ad hoc mechanisms.

To safely combine the continuation hierarchy with quantification, we give a precise logical meaning to often used informal devices such as picking a variable and binding it off. Type inference determines variable names, banishing ``unbound traces''.

Quantifier ambiguity arises in our analysis solely because quantifier words are polysemous, or come in several strengths. The continuation hierarchy lets us assign strengths to quantifiers, which determines their scope. Indefinites and universals differ in their scoping behavior because their lexical entries are assigned different strengths. PPs and embedded clauses, like the main clause, delimit the scope of embedded quantifiers. Unlike the main clause, their limit extends only up to a certain hierarchy level, letting higher-level quantifiers escape and take wider scope. This interplay of strength and islands accounts for the complex quantifier scope phenomena.

We present an economical ``direct style'', or continuation hierarchy on-demand, in which quantifier-free lexical entries and phrases keep their simple, unlifted types.

Joint work with Chung-chieh Shan.

Version

The current version is 2014

References

inverse-scope.pdf [380K]
The paper appeared as a chapter in the book Formal Approaches to Semantics and Pragmatics: Japanese and Beyond published in the series Studies in Linguistics and Philosophy (E. McCready, K. Yabushita and K. Yoshimoto, eds.)
Springer Netherlands, 2014, pp. 105-134.
doi:10.1007/978-94-017-8813-7_6

QuanCPS.hs [24K]
The accompanying source code to verify the analyses described in the paper and compute the denotations.
The code is in Haskell, in the tagless-final style.

Lambek Grammars for Computer Scientists

Lambek Calculus and Grammars are rarely taught in CS courses, despite their strong logical foundations and despite being the first resource-sensitive calculus, proposed 30 years prior linear logic. Lambek Grammars are also `better' (lexicalized) as grammars, compared to the well-known Context-Free Grammars (CFG). Incidentally, AB grammars (a particular case of Lambek Grammars) were introduced before CFG.

This talk is a half-tutorial on AB and Lambek Grammars, citing sources and developing intuition: using arithmetic to describe syntax. We relate AB and Lambek Grammars to CFG, inviting to look at grammars from a rarely taken (in CS) point of view.

The second part of the talk is about Chomsky's conjecture that Lambek and Context-Free Grammars are equivalent. We give the intuition for why it took 30 years to settle, and what proved to be the key. The settlement, the famous Pentus result, proved the weak equivalence. We then demonstrate, from a very different standpoint, that under a reasonable restriction (which seems to hold for natural languages at least) Lambek Grammars are strongly equivalent to CFG.

References

LambekCFG-talk.pdf [3682K]
The slides of the talk Lambek Grammars and a New Look to Context-Free Grammars, presented as a half-tutorial at AiDL 2022 workshop. Kyoto, May 11 2022.

Applicative Abstract Categorial Grammars

We present the grammar/semantic formalism of Applicative Abstract Categorial Grammars (AACG), based on the recent techniques from functional programming: applicative functors, staged languages and typed final language embeddings. AACG is a generalization of Abstract Categorial Grammars (ACG), retaining the benefits of ACG as a grammar formalism and making it possible and convenient to express a variety of semantic theories.

We use the AACG formalism to uniformly formulate Potts' analyses of expressives, the dynamic-logic account of anaphora, and the continuation tower treatment of quantifier strength, quantifier ambiguity and scope islands. Carrying out these analyses in ACG required compromises with the accompanying ballooning of parsing complexity, or was not possible at all. The AACG formalism brings modularity and extensibility: The separately developed analyses of expressives and QNP are used as they are to compute the truth conditions of sentences with both features. AACG offers a different perspective on linguistic side-effects, demonstrating compositionality not just of meanings but of transformations.

In the follow-up paper, we put AACG through its paces, illustrating its expressive power and positive as well as negative predictions on the wide range of analyses: gender-marked pronouns, quantifier ambiguity, scoping islands and binding, crossover, topicalization. Most of these analyses have not been attempted for ACG.

Like in ACG, the surface form of a sentence, the abstract (tecto-) form, as well as the meaning are all uniformly represented as typed terms, or trees. The meaning of a sentence is obtained by applying to the abstract form a composition of elementary, deterministic but generally partial transformations. These tree transformations are specified precisely, in typed lambda-calculus, and can be carried out mechanically. The rigor of AACG facilitates its straightforward implementation, in Coq, Haskell, Grammatical Framework, etc. It is not a mistake to think of AACG as a reincarnation of the Transformational Grammar, in which surface structures also include logical formulas representing the meaning, and in which the transformations are described precisely, as actual programs rather than English text.

We have indeed implemented AACG as a `semantic calculator', which is the ordinary Haskell interpreter. The calculator lets us interactively write grammar derivations in a linguist-readable form and see their yields, inferred types and computed truth conditions. We easily extend fragments with more lexical items and operators, and experiment with different semantic-mapping assemblies. The mechanization lets a semanticist test more and more complex examples, making empirical tests of a semantic theory more extensive, organized and systematic.

Version

The current version is November 2015

References

AACG has been simplified, polished and further developed as Transformational Semantics

AACG.pdf [277K]
The paper published in the Proceedings of NLCS'15: Third Workshop on Natural Language and Computer Science
EPiC Volume 32, pp. 29-38, 2015

AACG1.pdf [263K]
The follow-up paper, presented at LENLS 2015: Logic and Engineering of Natural Language Semantics

Abstract.hs [11K]
The definition of the Abstract language and its mapping to the surface `syntax'
We define Abstract for a small fragment, later extended with sentence conjunction, anaphora, quantifiers, pronouns, expletives, etc.

Logic.hs [6K]
The language of meaning: First-order predicate logic

SemTr.hs [26K]
The syntax-semantic interface
We demonstrate the interface for the small fragment, and its modular extension for expressives and two levels of quantification. We illustrate the successive, compositional, multi-stage re-writing of Abstract into the meaning.

Applicatives.hs [3K]
Standard CPS and State Applicatives, which could have been defined in the standard applicative library

CAG-talk.pdf [168K]
Annotated slides of the talk presented at the 3d CAuLD workshop: Logical Methods for Discourse. December 15, 2009. Nancy, France.

<http://www.loria.fr/equipes/calligramme/acg/>
Abstract Categorial Grammar Homepage

Canonical Constituents and Non-canonical Coordination: Simple Categorial Grammar Account

A variation of the standard non-associative Lambek calculus with the slightly non-standard yet very traditional semantic interpretation turns out to straightforwardly and uniformly express the instances of non-canonical coordination while maintaining phrase structure constituents. Non-canonical coordination looks just as canonical on our analyses. Gapping, typically problematic in Categorial Grammar--based approaches, is analyzed just like the ordinary object coordination. Furthermore, the calculus uniformly treats quantification in any position, quantification ambiguity and islands. It lets us give what seems to be the simplest account for both narrow- and wide-scope quantification into coordinated phrases and of narrow- and wide-scope modal auxiliaries in gapping.

The calculus lets us express standard covert movements and anaphoric-like references (analogues of overt movements) in types -- as well as describe how the context can block these movements.

Version

The current version is November 2014

References

NL.pdf [269K]
The extended article
The slightly shorter paper was published in the Proceedings of LENLS 11, the International Workshop on Logic and Engineering of Natural Language Semantics. November 22-24, 2014, Japan, pp. 138-151.

lenls-talk.pdf [216K]
The annotated slides of the talk at LENLS 11. November 23, 2014. Keio University, Japan.

HOCCG.hs [28K]
The accompanying source code to verify the analyses described in the paper
The code checks NL derivations and computes the logical formula representing the meaning of the corresponding sentences. The code is in Haskell, in the tagless-final style.

Compilation by evaluation as syntax-semantics interface

We regard the relation of a natural language phrase to its meaning as a two-stage transformation. First, the source, natural language phrase in concrete syntax is elaborated to an intermediate language term. The elaboration is syntactic and involves parsing, dis-inflection, etc. The intermediate language is then deterministically compiled to Church's simple theory of types (STT). The resulting STT formula is taken to be the denotation, the meaning, of the original source language phrase.

We specify the compilation from the intermediate language to the STT denotation as reduction rules. Hence the intermediate language is (call-by-name) lambda calculus whose constants are STT formulas. To compile an intermediate language term to its STT denotation we evaluate the term. The compilation is not always compositional: so-called scopal expressions contribute their meaning in the places other than where they are seen or heard. Scopal phenomena include generalized quantification, wh-phrases, topicalization, focus, anaphora and binding. To account for these phenomena, our intermediate language includes control operators shift and reset. We are thus able to improve on the previous continuation-based analyses of quantification, binding, raised and in-situ wh-questions, binding in wh-questions, and superiority.

The evaluation of an intermediate language term does not always lead to the STT denotation: the evaluation may get stuck. In that case, we regard the corresponding source language phrase as ungrammatical. We introduce a type system to delineate a set of intermediate language terms whose evaluation never fails to produce denotation. Being well-typed becomes an alternative criterion of grammaticality. Typeability, unlike reducibility, has an advantage of being applicable to separate phrases rather than whole sentences. Our main result is that both typing and call-by-name are necessary to correctly predict superiority and binding in wh-questions with topicalization, without resorting to thunking or type raising and thus maintaining the uniformity of the analyses.

Version

The current version is August 2008

References

gengo-side-effects-cbn.pdf [161K]
Call-by-name linguistic side effects
ESSLLI 2008 Workshop on Symmetric calculi and Ludics for the semantic interpretation. 4-7 August, 2008. Hamburg, Germany.

gengo-cbn-talk.pdf [224K]
Presentation at the workshop, August 6, 2008. Hamburg, Germany.

gengo-side-effects-cbn.elf [19K]
The complete code with all the analyses described in the paper

Call-by-name typed shift/reset calculus
Our intermediate language

Talk: Delimited Continuations in Computer Science and Linguistics

We give a detailed introduction to delimited continuations -- the meanings of partial contexts -- and point out some of their occurrences in multi-processing, transactions, and non-deterministic computations. After briefly touching on the formalism and the logic of delimited continuations, we concentrate on two their particular uses: placing and retrieving multiple contextual marks (dynamic binding, anaphora) and meta-programming (generating code, generating denotations of interrogative sentences and clauses).

Delimited continuations
- Continuations are the meanings of evaluation contexts
- Control effects: Process scheduling in OS
- Control effects as debugging
- Control operators shift and reset
Examining the stack
- Anaphora and context marks
- Several pronouns, several marks
- Far-reaching pronouns
Generating (sentences, meanings) by jumping back-and-forth
- Natural language sentences as computer programs yielding denotations
- Generating denotations of questions
CPS and types
- Introduction to CPS. CPS and double negation
- Answer types in the CPS transformation

Joint work with Chung-chieh Shan.

Version

The current version is December 4, 2007

References

delimcc-tohoku-talk-notes.pdf [246K]
delimcc-tohoku-talk.pdf [166K]
Slides with and without annotations
Slides and notes of the talk given at the Research Center for Language, Brain and Cognition. Tohoku University, Sendai, Japan. December 4, 2007.

Semantics of Natural Languages

Last updated January 1, 2024