The feature in question is the notation for anonymous functions: the underscore. It seems that in Scala
_
(just underscore) is the abbreviation
for the identity functionf => f
((_:Integer).intValue)
is the abbreviation for((x:Integer) => x.intValue)
(Integer.valueOf(_))
abbreviates(x => Integer.valueOf(x))
anObject.method(_ ~ xxx ~ yyy)
is the abbreviation ofanObject.method(f => f ~ xxx ~ yyy)
(1) John saw Mary.denotes a proposition, which is true just in case a person denoted by the name John indeed saw a person denoted by the name Mary. We can write the proposition as a formula in the Simple Theory of Types (STT):
(2) see mary johnwhere
mary
and john
are individuals
(of the type e
) who go by the names Mary and John,
respectively; see
, the meaning of the
transitive verb `see', is the function symbol of the type
e -> e -> t
. Here t
is the type of propositions.
(In Church's STT, the type e
is called i
and
the type t
is called o
.)
Let us consider the sentence
(3) It was Mary who John saw.or, simpler
(4) Mary, John saw.We'd like to write a logical formula representing the meaning of (4). We would like our semantics to be compositional: the meaning of a sentence should be composed from the meaning of its parts. Thus, we have to determine the meaning of the part ``John saw.'' A transitive verb requires two arguments: first an object, then a subject. The phrase ``John saw'' seems to be missing the object. To handle such phrases -- verb phrases (VP) missing an argument -- linguists have introduced a so-called `trace', often written as an underscore. The trace is assumed silent (not pronounced). Thus the sentence (4) is to be written
(4') Mary, John saw _.
The sentences with traces are common. Examples include raised questions and relative clauses:
(5) Who did John see _. (6) I know whom John saw _. (7) The person John saw _ ran away.Some theories of quantifier scope use trace to explain how a quantifier word such as `someone' influences the meaning of -- `takes scope over' -- the entire sentence. This so-called `quantifier raising' posits that a sentence such as
Alice thinks someone left.should be analyzed as if it were
someone (Alice thinks _ left).
It seems like a good idea to take the meaning of ``John saw _'' to be
(8) fun x -> see x johnThe meaning of (4') then can be composed from the meaning of its components as
(9) (fun x -> see x john) maryA beta-reduction then gives us (2). Chomsky might say that the transformation from the surface form of the utterance (4') to the logical, hidden form (2) involves the movement of Mary into the position occupied by the trace.
We now face the question of the meaning of trace and using that
meaning to compositionally derive (8) from ``John saw _.'' We observe
that ``John saw _'' looks like ``John saw Mary'' with ``Mary'' taken
out. Therefore, ``John saw _'' is a context:
a term with a hole; trace denotes the hole. Continuations are the
meanings of contexts. Since ``John saw _'' is a sub-phrase of the larger
sentence (4'), our context is partial -- or delimited, -- with a delimited
continuation as its meaning. A delimited continuation can be captured
by the delimited control operator" shift
:
reset (see (shift (fun k -> k)) john)indeed reduces to
fun x -> see x johnTherefore, we may take the meaning of trace to be
(shift (fun (k -> k))
.
To type that expression, we need shift with the answer-type modification.
The topic of delimited control in natural languages is discussed in great detail by Chung-chieh Shan in a book chapter ``Linguistic Side Effects.'' He analyzes more complex sentences, for example:
(10) Who _ saw his mother?
A pronoun could be treated like a trace. The sentence (10) hence exhibits three components with side-effect: the question word, the trace, and the pronoun. Chung-chieh Shan explains their interactions.
Coming back to Scala, it seems that _
is the trace and the expression boundary is a reset.
It is remarkable that natural and programming languages can inform each other. Consciously or not, Scala seems like the best example of such mutual development.
The key is the algebraic description of Lambek Grammar derivations, and the avoidance of the Curry-Howard correspondence with lambda calculus.
To put the conclusions differently, for any LG and the natural number n, there exists a CFG whose derivations are all and only LG derivations of hyp-rank n. Furthermore, the LG lexicon enters CFG as is, with no duplications, let alone exponential explosions. Thus, LGs of a bounded hyp-rank (which are adequate for natural languages) are strongly equivalent to CFGs.
The paper also formally demonstrates that Lambek Grammars can be viewed as algebras.
Joint work with Hoshino Yuya.
LambekACG-talk.pdf [257K]
Talk at LENLS 2019, Yokohama, Japan, November 12, 2019
Lambek Grammars as an embedded DSL: a tool for semanticists
Lambek Grammars for Computer Scientists
The talk gives the intuition for why relating Lambek and
Context-Free grammars was so difficult.
We introduce the analyses in terms of the familiar Quantifier Raising framework, which we make more precise and treat as a somewhat informal version of a type-logical grammar.
NLsame-talk.pdf [218K]
Annotated slides of the talk, presented at the CG2015 workshop
on August 14, 2015
One may discern two broad perspectives of the analysis. One is proof search: building a logical system derivation whose conclusion is directly related to the abstract form of an utterance. The successfully obtained derivation is taken as the evidence that the utterance is grammatical. The meaning is then read off the derivation. The proof search perspective is manifest in type-logical grammars; it also underlies general categorial grammars. Parsing as deduction is truly compelling and logically insightful. Yet it is often hard to characterize the space of possible derivations, to see that everything derivable will be judged by the native speakers as grammatical, and everything that judged as grammatical can be derived. In particular, it is difficult to get negative predictions: to prove that there really cannot be a derivation for a particular sentence, no matter how hard we try to build one.
Another perspective is most clearly described by Chung-chieh Shan in his papers on linguistic side effects. (Discourse representation theory (DRT) is also in the same vein.) It is based not on proof but on computation: the source sentence is taken to be a program that computes the meaning as a logical formula. As any program, it may fail: raise an `exception' or stop before computing any result. Thus in the computational perspective, the meaning is obtained as the result of a mostly deterministic, inherently partial, usually precisely specified and often mechanized process. The algorithmic nature of this process lets it lay a credible claim to `real life', physiological relevance. On the downside, the computational process is rigid. It is also too easy to get bogged down to details (control operators, implementation of monads, etc.) Taken the computational perspective as the starting point, the present work aims to raise its level of abstraction, eliding implementation details and adding the `right' amount of flexibility.
To safely combine the continuation hierarchy with quantification, we give a precise logical meaning to often used informal devices such as picking a variable and binding it off. Type inference determines variable names, banishing ``unbound traces''.
Quantifier ambiguity arises in our analysis solely because quantifier words are polysemous, or come in several strengths. The continuation hierarchy lets us assign strengths to quantifiers, which determines their scope. Indefinites and universals differ in their scoping behavior because their lexical entries are assigned different strengths. PPs and embedded clauses, like the main clause, delimit the scope of embedded quantifiers. Unlike the main clause, their limit extends only up to a certain hierarchy level, letting higher-level quantifiers escape and take wider scope. This interplay of strength and islands accounts for the complex quantifier scope phenomena.
We present an economical ``direct style'', or continuation hierarchy on-demand, in which quantifier-free lexical entries and phrases keep their simple, unlifted types.
Joint work with Chung-chieh Shan.
QuanCPS.hs [24K]
The accompanying source code to verify the
analyses described in the paper and compute the denotations.
The code is in Haskell, in the tagless-final style.
This talk is a half-tutorial on AB and Lambek Grammars, citing sources and developing intuition: using arithmetic to describe syntax. We relate AB and Lambek Grammars to CFG, inviting to look at grammars from a rarely taken (in CS) point of view.
The second part of the talk is about Chomsky's conjecture that Lambek and Context-Free Grammars are equivalent. We give the intuition for why it took 30 years to settle, and what proved to be the key. The settlement, the famous Pentus result, proved the weak equivalence. We then demonstrate, from a very different standpoint, that under a reasonable restriction (which seems to hold for natural languages at least) Lambek Grammars are strongly equivalent to CFG.
We use the AACG formalism to uniformly formulate Potts' analyses of expressives, the dynamic-logic account of anaphora, and the continuation tower treatment of quantifier strength, quantifier ambiguity and scope islands. Carrying out these analyses in ACG required compromises with the accompanying ballooning of parsing complexity, or was not possible at all. The AACG formalism brings modularity and extensibility: The separately developed analyses of expressives and QNP are used as they are to compute the truth conditions of sentences with both features. AACG offers a different perspective on linguistic side-effects, demonstrating compositionality not just of meanings but of transformations.
In the follow-up paper, we put AACG through its paces, illustrating its expressive power and positive as well as negative predictions on the wide range of analyses: gender-marked pronouns, quantifier ambiguity, scoping islands and binding, crossover, topicalization. Most of these analyses have not been attempted for ACG.
Like in ACG, the surface form of a sentence, the abstract (tecto-) form, as well as the meaning are all uniformly represented as typed terms, or trees. The meaning of a sentence is obtained by applying to the abstract form a composition of elementary, deterministic but generally partial transformations. These tree transformations are specified precisely, in typed lambda-calculus, and can be carried out mechanically. The rigor of AACG facilitates its straightforward implementation, in Coq, Haskell, Grammatical Framework, etc. It is not a mistake to think of AACG as a reincarnation of the Transformational Grammar, in which surface structures also include logical formulas representing the meaning, and in which the transformations are described precisely, as actual programs rather than English text.
We have indeed implemented AACG as a `semantic calculator', which is the ordinary Haskell interpreter. The calculator lets us interactively write grammar derivations in a linguist-readable form and see their yields, inferred types and computed truth conditions. We easily extend fragments with more lexical items and operators, and experiment with different semantic-mapping assemblies. The mechanization lets a semanticist test more and more complex examples, making empirical tests of a semantic theory more extensive, organized and systematic.
AACG.pdf [277K]
The paper published in the Proceedings of NLCS'15:
Third Workshop on Natural Language and Computer Science
EPiC Volume 32, pp. 29-38, 2015
AACG1.pdf [263K]
The follow-up paper, presented at LENLS 2015:
Logic and Engineering of Natural Language Semantics
Abstract.hs [11K]
The definition of the Abstract language and its mapping to
the surface `syntax'
We define Abstract for a small fragment, later extended with
sentence conjunction, anaphora, quantifiers, pronouns, expletives, etc.
Logic.hs [6K]
The language of meaning: First-order predicate logic
SemTr.hs [26K]
The syntax-semantic interface
We demonstrate the interface for the small fragment, and its
modular extension for expressives and two levels of
quantification. We illustrate the successive, compositional,
multi-stage re-writing of Abstract into the meaning.
Applicatives.hs [3K]
Standard CPS and State Applicatives, which could have been defined
in the standard applicative library
CAG-talk.pdf [168K]
Annotated slides of the talk presented at
the 3d CAuLD workshop: Logical Methods for Discourse.
December 15, 2009. Nancy, France.
<http://www.loria.fr/equipes/calligramme/acg/>
Abstract Categorial Grammar Homepage
The calculus lets us express standard covert movements and anaphoric-like references (analogues of overt movements) in types -- as well as describe how the context can block these movements.
lenls-talk.pdf [216K]
The annotated slides of the talk at LENLS 11. November 23, 2014.
Keio University, Japan.
HOCCG.hs [28K]
The accompanying source code to verify the
analyses described in the paper
The code checks NL derivations and computes the logical formula
representing the meaning of the corresponding sentences.
The code is in Haskell, in the tagless-final style.
We specify the compilation from the intermediate language to the STT denotation as reduction rules. Hence the intermediate language is (call-by-name) lambda calculus whose constants are STT formulas. To compile an intermediate language term to its STT denotation we evaluate the term. The compilation is not always compositional: so-called scopal expressions contribute their meaning in the places other than where they are seen or heard. Scopal phenomena include generalized quantification, wh-phrases, topicalization, focus, anaphora and binding. To account for these phenomena, our intermediate language includes control operators shift and reset. We are thus able to improve on the previous continuation-based analyses of quantification, binding, raised and in-situ wh-questions, binding in wh-questions, and superiority.
The evaluation of an intermediate language term does not always lead to the STT denotation: the evaluation may get stuck. In that case, we regard the corresponding source language phrase as ungrammatical. We introduce a type system to delineate a set of intermediate language terms whose evaluation never fails to produce denotation. Being well-typed becomes an alternative criterion of grammaticality. Typeability, unlike reducibility, has an advantage of being applicable to separate phrases rather than whole sentences. Our main result is that both typing and call-by-name are necessary to correctly predict superiority and binding in wh-questions with topicalization, without resorting to thunking or type raising and thus maintaining the uniformity of the analyses.
gengo-cbn-talk.pdf [224K]
Presentation at the workshop, August 6, 2008. Hamburg, Germany.
gengo-side-effects-cbn.elf [19K]
The complete code with all the analyses described in the paper
Call-by-name typed shift/reset calculus
Our intermediate language
Joint work with Chung-chieh Shan.