How to marry a star: probabilistic constraints for meaning in context

Page created by Robin Love

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

How to marry a star:
 probabilistic constraints for meaning in context

 Katrin Erk Aurélie Herbelot
 University of Texas at Austin University of Trento
arXiv:2009.07936v1 [cs.CL] 16 Sep 2020

 Abstract In this paper, we derive a notion of word meaning in context from Fill-
 more’s ‘semantics of understanding’, in which a listener draws on their knowledge
 of both language and the world to ‘envision’ the situation described in an utterance.
 We characterize utterance understanding as a combination of cognitive semantics
 and Discourse Representation Theory, formalized as a situation description system:
 a probabilistic model which takes utterance understanding to be the mental process
 of describing one or more situations that would account for an observed utterance.
 Our model captures the interplay of local and global contexts and their joint influ-
 ence upon the lexical representation of sentence constituents. We implement the
 system using a directed graphical model, and apply it to examples containing various
 contextualisation phenomena.

 1 Introduction

 Word meaning is flexible. This flexibility is often characterised by distinguishing the
 ‘timeless’ meaning of a lexical item (its definition(s) in a dictionary) and its ‘speech
 act’ or ‘token’ meaning – the one it acquires by virtue of being used in the context
 of a particular sentence (Grice 1968). The generation of a token meaning goes well
 beyond word sense disambiguation and typically involves speakers’ knowledge of
 the world as well as their linguistic knowledge; for instance, Searle (1980: pp.222-
 223) reminds us that it would be inappropriate to cut grass with a knife, or to cut a
 cake with a lawnmower.
 Fillmore’s frame semantics (Fillmore 1982) is a prominent example of a linguis-
 tic representation which combines aspects of the lexicon with the shared world
 knowledge of a linguistic community. Frames are “schematic representations of the
 conceptual structures and patterns of beliefs, practices, institutions, images, etc. that
 provide a foundation for meaningful interaction in a given speech community” (Fill-
 more et al. 2003: p. 235). The approach is the product of Fillmore’s ‘semantics of
 understanding’, or ‘U-semantics’ (Fillmore 1985), the aim of which is to give “an
 account of the ability of a native speaker to ‘envision’ the ‘world’ of the text under

 1

Katrin Erk & Aurélie Herbelot

an interpretation of its elements” (p.235). The idea behind the notion of envisioning
is that the speaker uses the frames that are ‘evoked’ by the words in the utterance to
“[construct] an interpretation of the whole” (p. 233).
It seems that this envisioning process should capture at least some aspects of meaning
in context: understanding a sentence requires dealing with word senses; in frame
semantics, it additionally involves semantic roles and their selectional constraints,
contributing to utterance-specific token meanings. Finally, frames also include larger
‘scripts’ or ‘scenarios’ (Fillmore 1982: p. 130) that can account for wider, topical
influences on meaning.
However, the treatment of utterance understanding in frame semantics has mainly
been contained within the subfield of ‘frame semantics parsing’, which deals with
automatically identifying the frames that are evoked by a sentence, as well as their
relationships (e.g. Das et al. 2014, Johannsen et al. 2015 and especially Ferraro
& Van Durme 2016). While a frame semantics parser accounts for part of the
envisioning process, it falls short of outputting a full description of a scene, including
elements that may not be explicit in the original sentence.
Our main aim in this paper is to explore, and provide a formalization of, the process of
envisioning, and the kinds of contextualized meanings that emerge from that process.
In particular, we will say that meaning in context emerges from an interaction and
integration of different constraints, some of them local, some of them more global.
To give an overview of the type of phenomena we want to capture, we present next
various types of meaning specializations occurring under the pressure of different
constraints. We will then sketch how such constraints might be integrated in a
probabilistic U-semantics which can in principle be used for both generation and
understanding.
Let’s first consider what contextual influences might play a role in shifting the
meaning of a word. The first effect that comes to mind might be local context.
Specific combinations of predicates and arguments activate given senses of the
lexical items involved in the composition. This is known as ‘selectional preference’
and can be demonstrated with the following example:

(1) She drew a blade.

In this case, where words in both the predicate and the argument positions have
multiple senses, the sentence can mean that the agent sketched either a weapon or a
piece of grass, or that she randomly sampled either a weapon or a piece of grass,

or that she pulled out a weapon. It is much less likely that she pulled out a piece of
grass.
But word meaning is not only influenced by semantic-role neighbors. Global context
is involved. (2) is a contrast pair adapted from an example by Ray Mooney (p.c.),
with different senses of the word ball (sports equipment vs dancing event). Arguably,
the sense of the predicate run is the same in (2a) and (2b), so the difference in the
senses of ball must come from something other than the syntactic neighbors, some
global topical context brought about by the presence of athlete in the first sentence,
and violinist in the second.

(2) a. The athlete ran to the ball.
b. The violinist ran to the ball.

There is even a whole genre of jokes resting on a competition of local and global
topical constraints on meaning: the pun. Sentence (3) shows an example.

(3) The astronomer married the star.

This pun rests on two senses of the word star, which can be paraphrased as ‘well-
known person’ and ‘sun’. It is interesting that this sentence should even work as
a pun: The predicate that applies to star, marry, clearly selects for a person as
its theme. So if the influence of local context were to apply strictly before global
context, marry should immediately disambiguate star towards the ‘person’ sense as
soon as they combine. But the ‘sun’ sense is clearly present.1 So local context and
global topical context seem to be competing.
Finally, it is also notable that meaning can undergo modulation at a level more
fine-grained than senses. Zeevat et al. (2017) distinguish 78 uses of the verb fall,
many of them different but related, as in light fell on a table and her hair falls.
Having considered a range of constraints that act on different levels of granularity
of the word’s meaning, we must sketch how we think they might be encoded in a
‘semantics of understanding’, as we proposed earlier. To answer this question, we
will posit a U-semantics structured as follows. First, we will have a standard frame
semantics, and note that frames can range in ‘size’ from concepts to whole scenarios,
potentially accounting for what we called ‘local’ and ‘global’ contexts. (Fillmore
1 In fact, our own intuitions about sentence (3) vary. One of us prominently perceives the reading
where the astronomer weds a gigantic ball of fire; for the other one of us, the sentence oscillates
between the two different senses of star.

Katrin Erk & Aurélie Herbelot

1982: p. 111).2 Next, we will add fine-grained feature-based representation of lexical
meaning to the standard framework to cater for property modulation, following
e.g. Asher (2011), McNally & Boleda (2017), and Zeevat et al. (2017). Finally,
we will link the cognitive semantics to a logical representation in order to model
the interaction between actual utterances and the representations they evoke in the
comprehender.
With this general structure in mind, we can hypothesize constraints of various types.
At the conceptual level, certain senses (frames) of a word affect the senses of other
lexical items in a sentence. So we require a notion of constraints acting upon
combinations of frames. At the logical level, we must respect the formal structure of
the utterance and ensure, for instance, that properties pertaining to a single discourse
referent are compatible. So the entire constraint system needs to cross over the
conceptual and logical levels (an issue also faced by other frameworks such as Asher
2011, McNally & Boleda 2017, Emerson 2018). Overall, the problem is to explain
where and how such constraints are implemented, and most crucially, the dynamic
process that will activate them given a certain utterance.
Our solution to the above problem is a probabilistic model which we will refer to
as situation description system. As mentioned previously, Fillmore’s framework
assumes that an interpreter actively ‘envisions’ the situation evoked by a sentence.
Our claim is that envisioning naturally implements lexical constraints, and that
modeling the imagination process required to arrive at an interpretation of the
utterance automatically provides contextualised meaning presentations. We will
propose a formalization of the envisioning process – the situation description system
– and provide a particular implementation of our proposal. We will illustrate how
the system can imagine referents that are not mentioned in the observed utterance
and how it builds particular lexical preferences in the process of interpretation – or
indeed, remains agnostic about sense in the case of a pun.
Our framework relates to various strands of research in the linguistic and cognitive
science literature. To close this introduction, we highlight how our proposal differs
from – and complements – major avenues of research. We start with noting that
Fillmorian frames can be about prototypical scenes, as well as cultural conventions.
Because of that, any approach that builds on frame semantics will be pragmatic to
some extent. We however emphasize that the kind of pragmatic reasoning involved
in frame semantics is very different from what can be found in e.g. Rational Speech
Act theory (Frank & Goodman 2012) and Relevance theory (Wilson & Sperber
2004), where the central mechanism is that the listener reasons over the intentions
2 This notion of a frame differs from the one used by Löbner (2014) and Zeevat et al. (2017). There, a
frame is a graph encoding attributes and their values.

of the individual speaker, and this mechanism is used, among other things, to
explain various types of word sense disambiguation. It is certainly true that the
listener reasons over the individual speaker, but we believe that much can already
be accounted for by assuming that the listener draws on general conventions in the
speaker’s community to interpret an utterance, as McMahan & Stone (2015) also
argue.
We also depart from Goodman and colleagues (Goodman et al. 2015, Goodman
& Lassiter 2015), who do assume that concepts are generative devices that can
simulate situations, but hypothesize a strict division between general cognition on
the one hand and linguistic knowledge on the other hand. In their framework, general
cognition can ‘imagine’ situations, but linguistic knowledge cannot do more than
check the truth of utterances in the generated situations. We do not make such a
distinction. On the contrary, we assume that lexical concepts are able to conjure up
objects and events that realize those concepts. In this, our approach is more similar
to Emerson (2018).3
Finally, our work can be related to a number of concerns raised by computational
linguists about what it means for a system to perform ‘Natural Language Understand-
ing’ (NLU). Bender & Koller (2020) argue that true NLU requires to appropriately
connect linguistic form and meaning. They particularly emphasise the interplay of
linguistic expressions with both conventional meanings (what we called ‘timeless
meaning’) and communicative intents (what we called ‘speech act meaning’). They
propose a new version of the Turing test which specifically requires a model to ‘at
least hypothesize the underlying communicative intents’ of speakers, especially the
way that words refer to things and properties in their shared environment. In the
same vein, Trott et al. (2020) call for systems that would adequately model construal,
that is, the way that a speaker’s choice of words reflects a specific way to frame a
meaning, and constrains the interpretation of the listener.
In what follows, we first give a high-level overview of a situation description system,
highlighting its main components and their role in implementing constraints on
various aspects of meaning (§2). We then proceed with a general formalization
of the proposed framework in §3. Finally, §4 suggests an implementation based
on a directed graph model at three levels of complexity: with global constraints
only, with semantic roles, and finally with concept combination at the individual
feature level. We finish with illustrations of the system’s behaviour, showing how
3 Like Goodman and colleagues, Emerson argues in favor of a strict division between linguistic
knowledge and general cognition, with a linguistic knowledge that is discriminative rather than
generative, but his model does not enforce this at a technical level.

Katrin Erk & Aurélie Herbelot

the envisioning process ‘fills in’ the details of an utterance and how it deals with
conflicting constraints, as in the pun example.

2 Model overview

In this section we describe out framework informally, using a simple example
sentence, and show how it lends itself to the formalization of utterance understanding,
as well as the description of semantic constraints.
Our model is formulated as a system of constraints on utterance understanding,
which we will refer to as a situation description system. In this model, constraints
can be regarded as interacting forces which explain how the words of a speaker
and the linguistic and world knowledge of a listener jointly contribute to utterance
meaning.
We will formalize our model as a two-tier system (Pelletier 2017), combining
cognitive aspects of utterance processing with logical description. That is, we have
a system made of two components: a logical tier and a conceptual tier. We will
assume that our logical tier takes the form of a restricted Discourse Representation
Theory (DRT: Kamp 1981, Heim 1982) which we will refer to as eDRT (formal
definitions will be provided later in §3). DRT is an ideal fit for our purposes because
it specifically presents itself as a mental representation of an ongoing discourse,
so it is a good interface between our conceptual tier and the logical aspects of the
framework.4
We will define a situation description to consist of a conceptual representation
G and an eDRS D. It represent a situation’s logical form together with its asso-
ciated conceptual content. A situation description can be regarded as a particular
interpretation of an utterance.
The situation description system defines a probability distribution over situation
descriptions. That is, given an utterance such as The star shines, the system might
attribute a fairly high probability to a situation description encoding a stargazing
situation and a lower probability to a situation description representing a famous
person being particularly witty.
We will consider Fillmore’s description of the understanding process as one in which
“ the lexical, grammatical and semantic material of the sentence
[serves] as a ‘blueprint’ (to borrow an image from Fauconnier) off of
4 The present paper does not look at the discourse phenomena that DRT excels at, but we believe they
can be integrated at a later stage.

which the interpreter constructs an interpretation of the whole. The
 interpreter accomplishes this by bringing to the ‘blueprint’ a great
 deal of knowledge, in particular knowledge of the interpretive frames
 which are evoked [. . . ] but also including knowledge of the larger
 structure (the ‘text’) within which the sentence occurs”
 (Fillmore 1985: p. 233).
We will formalize this idea by having a situation description system produce de-
scriptions that expand on a given utterance. The simple example above, the utterance
u: The star shines, corresponds to the eDRS Du :
 x, e
 star(x)
 shine(e)
 Theme(e, x)
One possible expansion could be the following expanded eDRS De :
 x, e, y
 star(x)
 shine(e)
 Theme(e, x)
 sun(x)
 sky(y)
 Location(e, y)
where De adds to Du ’s content by specifying that the shining star happens to be a
sun (thereby disambiguating star), and that the situation description contains an
additional sky entity, with the Location of the shining event being the sky.
Figure 1 (a) illustrates a situation description system using the simple example from
above, with the conceptual representation G on top and the eDRS on the bottom.The
original Du is shown on the left-hand side of the eDRS in a black font. Its ‘extended’
version De corresponds to the whole eDRS shown in the figure, including the greyed
out referents and conditions. Unpacking the content of that eDRS, we have discourse
referents x,e respectively corresponding to a star entity and a shining event. For De ,
the discourse referents x,e,y include an additional sky entity. Similarly, we have
conditions star(x),shine(e) and T heme(e,x) for Du , while De shows the additional
sun(x) and sky(y), as well as Location(e,y).
We now go through the components of the conceptual tier, starting with individual
concepts, then local constraints, and finally global constraints. We assume the

 7

Katrin Erk & Aurélie Herbelot

 (a) Scenario mix (b) Star

 STARGAZE STARGAZE STARGAZE Scenario

 Star Shine Sky Concept x

 star(x) sun(x)
 Shine-THM Shine-LOC Semantic role

 Feature vector (c)
 Star Shine

 x, e y Discourse Representation
 Shine-THM
 star(x) sky(y) Structure

 shine(e) Location(e, y)
 Theme(e, x) sun(x)

 Figure 1 An illustration of a situation description in our situation description
 system, given the utterance A star shines. In this simple example,
 all concepts are associated with a single STARGAZING scenario, but
 more complex utterances may involve mixtures of scenarios. Figures
 (b) and (c) zoom in on an individual concept, and on a semantic role,
 respectively.

 existence of frames that can be described as concepts, which correspond to word
 senses. They are shown in light blue in Fig. 1, where part (b) zooms in on the
 concept STAR. Concepts can have instances, for example, a particular star might
 be an instance of STAR, or a particular shining event an instance of SHINE. The
 properties of such instances are described in feature vectors (purple in the figure),
 which describe meaning at a more fine-grained level than concepts.5 Each concept
 instance on the conceptual side of the situation description corresponds to a discourse
 referent in the DRS. For example, the STAR instance corresponds to x. Some of the
 properties in the feature vector of the STAR instance are turned into conditions in the
 eDRS – including conditions not mentioned in the original utterance, in this case
 sun(x).
 The local constraints that we consider here include selectional constraints imposed
 by semantic roles. Semantic roles are shown in darker blue in the figure, where
 part (c) zooms in on the influence that the THEME role of SHINE exerts on the STAR
5 We use the terms property and feature interchangeably here.

 8

feature vector. Semantic roles, too, are turned into conditions in the eDRS, such as
T heme(e,x) in part (a) of the figure.
We finally come to the global constraints. We assume that the conceptual tier also
includes frames that can be described as scenarios (light green in part (a) of the
figure). Scenarios are larger settings that can encompass many entities and events,
for instance, a WEDDING scenario, or a STARGAZING scenario. We assume scenarios
to be something like the generalized event knowledge of McRae & Matsuki (2009),
who argue that event knowledge drives the listener’s expectations for upcoming
words. Overall, our formalization uses exactly two kinds of frames: concept frames
and scenario frames. This makes our formalization easier, but it is a simplification.
FrameNet makes no such assumption; any frame can have a subframe (corresponding
to our scenarios), or be a subframe (corresponding to our concepts), or both.6
In our example, we have three concepts STAR, SHINE and SKY, matching the
repeated scenario STARGAZING. A more complex utterance might have a mix
of multiple scenarios, for example The robber paused in his flight to gaze at the
shining star would probably involve an additional ROBBERY scenario. The extent to
which different scenarios might be involved in a particular situation description is
characterised by the ‘scenario mix’ shown at the top of Fig. 1. It is in fact this scenario
mix that imposes a global constraint on understanding, in particular the constraint
that scenario mixes preferably draw on only a few different scenarios rather than
many. This means that different words in the utterance will tend to have senses that
match the same scenario – like the athlete and ball in sentence (2a). To stress this
point, the preference for sparse scenario mixes is the only truly global constraint. All
other components of the conceptual tier can be considered as lexicalized – concepts
evoked by words, and scenarios linked to concepts.
Constraints are shown as lines in Fig. 1, linking components within and across tiers.
As we will explain in more detail in §4, our situation description system is a directed
graphical model, and our constraints are conditional dependencies. But for now,
we can simply regard them as the reciprocal influence that two system components
have over each other. For example, the presence of a STARGAZING scenario in the
situation description promotes the presence of related concepts such as STAR, SHINE
or SKY, making them more likely than, say, the concepts CAKE or FLOWER, thus
reflecting world knowledge about stargazing scenarios (the listener is more likely
to imagine a sky than a cake when thinking of stargazing). Conversely, knowing
6 Another simplification that we adopt for now is that we assume that polysemy always manifests as a
single word (or predicate in the DRS) being linked to different frames, like star the celestial object
and star the person. Several recent proposals in lexical semantics describe a word as an overlay of
multiple senses and sub-senses in a single fine-grained feature representation (McNally & Boleda
2017, Zeevat 2013).

Katrin Erk & Aurélie Herbelot

that the situation description involves a concept STAR might make us more confident
about being in a STARGAZING scenario than, say, a SWIMMING scenario.
A situation description system is formulated as a collection of probability distri-
butions, which express the constraints shown as dotted lines in the figure. These
distributions jointly determine the probability of any situation description. So the
entire situation description in Fig. 1 is associated with some probability according to
the situation description system, and so are alternative situation descriptions for the
same utterance The star shines. This concludes our brief sketch of the formalism,
which we work out in detail in §4.

3 The situation description framework

In this section, we give a generic definition of our framework, with the idea that
it could be the basis for different types of more specific implementations. Since
our proposal defines understanding in terms of ‘situation descriptions’, we will
first discuss why we think that cognitive representations of situations are the right
building blocks for formalising understanding (§3.1). We will then proceed with the
actual definition of situation description systems (§3.2).

3.1 Probability distributions over situation descriptions

We want to model language understanding in terms of probabilities because proba-
bilistic systems provide a general and extensible framework in which we can describe
interacting constraints of different strengths. There are different options for the sam-
ple space of our probability distribution, for example it could be a sample space of
worlds, or situations. We next discuss why we choose neither of those two options
but use situation descriptions instead.
Probability distributions over worlds are used in van Eijck & Lappin (2012), van
Benthem et al. (2009), Zeevat (2013), Erk (2016), Lassiter & Goodman (2015), and
probability distributions over situations in Emerson (2018), Bernardy et al. (2018).
Given a probability distribution over worlds, Nilsson (1986) simply defines the
probability of a sentence ϕ as the summed probability of all the worlds that make ϕ
true:
 p(ϕ) = ∑ p(w)
 w∶JϕKw =1

This has the desirable property that the truth of sentences is evaluated completely
classically, as all probabilistic effects come from the probabilities of worlds.

 10

A problem with using a sample space of worlds is that a world, or at least the real
world, is an unimaginably gigantic object. This is, in fact, the reason why Cooper
et al. (2015) say it is unrealistic to assume that a cognizer could represent a whole
world in their mind, let alone a distribution over worlds. Their argument is that a
world is a maximal set of consistent propositions (Carnap 1947), and no matter the
language in which the propositions are expressed, we cannot assume that a cognizer
would be able to enumerate them. But cognitive plausibility is not the only problem.
Another problem is that we do not know enough about a world as a mathematical
object. Rescher (1999) argues that objects in the real world have an infinite number
of properties, either actual or dispositional. This seems to imply that worlds can
only be represented over an infinite-dimensional probability space. When defining
a probability measure, it is highly desirable to use a finite-dimensional probability
space – but it is not clear whether that is possible with worlds. We cannot know for
certain that a world cannot be ‘compressed’ into a finite-dimensional vector, but that
is because we simply do not know enough about what worlds are to say what types
of probability measures we could define over them.
Situations, or partial worlds, may be smaller in size, but they still present a similar
problem: How large exactly is, say, a situation where Zoe is playing a sonata? Both
Emerson (2018) and Bernardy et al. (2018) assume, when defining a probability
distribution over situations, that there is a given utterance (or set of utterances) and
that the only entities and properties present in the situation are the ones that are
explicitly mentioned in the utterance(s). But arguably, a sonata-playing situation
should contain an entity filling some instrument role, even if it is not explicitly
mentioned. Going one step further, Clark (1975) discusses inferences that are “an
integral part of the message”, including bridging references such as “I walked into the
room. The windows looked out to the bay.” This raises the question of whether any
situation containing a room would need to contain all the entities that are available
for bridging references, including windows and even possibly a chandelier. (Note
that there is little agreement on which entities should count as available for bridging
references: see Poesio & Vieira 1988.) The point is not that for Zoe is playing a
sonata there is a particular fixed-size situation that comprises entities beyond Zoe
and the sonata, the point is that there is no fixed size to the situation at all.
Our solution to the above issues is to use a probability distribution over situation
descriptions, which are objects in the mind of the listener rather than in some actual
state-of-affairs. As human minds are finite in size, we can assume that each situation
description only comprises a finite number of individuals, with a finite number of
possible properties – this addresses the problem that worlds are too huge to imagine.
But we also assume that the size of situation descriptions is itself probabilistic rather
than fixed, and may be learned by the listener through both situated experience and

Katrin Erk & Aurélie Herbelot

language exposure. Doing so, we remain agnostic about what might be pertinent for
describing a particular situation.

3.2 Definition of a situation description system

We now define situation descriptions and situation description systems. Our situation
descriptions are pairs of a conceptual representation and a discourse representation
structure (DRS). From the perspective of the speaker, we will posit that there is a
certain conceptual content that underlies the words that they utter. From the perspec-
tive of the listener, the understanding process involves hypothesizing the conceptual
content that might have resulted in the utterance. We express this hypothesis as a
probability distribution over situation descriptions.
We first define the logical fragment of DRT that we use. Then we formalize situation
descriptions, including the ‘glue’ between utterance and conceptual components.
Finally, we define situation description systems probability distributions over situ-
ation descriptions, and we show how language understanding can be described as
conditioning a situation description system on a given utterance.

A DRT fragment. The context influences on lexical meaning that we want to
model exhibit complex behaviour even in simple linguistic structures – the examples
in the introduction were simple predicate-argument combinations and modifier-noun
phrases with intersective adjectives. With this in mind, we work with a very simple
fragment of DRT:
 • we only have conjunctions of DRS conditions;
 • negation only scopes over individual conditions (to avoid disjunction and
 quantificational effects);
 • we only have unary and binary (neo-Davidsonian) predicate symbols;
 • the fragment does not contain constants.
Let REF be a set of reference markers, and PS a set of predicate symbols with arities
of either one or two. In the following definition, xi ranges over the set REF, and
F over the set of predicate symbols PS. The language of eDRSs (existential and
conjunctive DRSs) is defined by:
conditions C ∶∶= ⊺ ∣ Fx1 ...xk ∣ ¬Fx1 ...xk
eDRSs D ∶∶= ({x1 ,...,xn },{C1 ,...,Cm })

 12

x y
sheep(x) sheep(y)
fluffy(x) fluffy(y)

Table 1 Two equivalent eDRSs

Situation descriptions (SDs). A situation description combines a conceptual rep-
resentation with a logical form. In the most general case, given a set Fr of conceptual
components, a language CON of conceptual representations is a subset of P(Fr),
the powerset of Fr.
To connect a conceptual representation and an eDRS, we define a mapping g from
conditions of the eDRS to conceptual components. More specifically, g is a function
that links each condition in D to a single node in the conceptual structure. In §4,
this will be a feature vector node for unary conditions, and a semantic role node for
binary conditions (as illustrated in Fig. 3 and 5).

Definition (Situation description). Given a language CON of conceptual represen-
tations, a language of situation descriptions is a subset of the set of all tuples

⟨G,D,g⟩

for a conceptual representation G ∈ CON, an eDRS D, and a partial mapping g from
conditions of D to components of G such that if D = ({x1 ,...,xn }, {C1 ,...,Cm }),
then for each i ∈ {1,...,m}: Ci is in the domain of g iff Ci ≠ ⊺.

Situation description systems. We characterize an individual’s conceptual knowl-
edge as a situation description system, a probability distribution over all situation
descriptions that the person can possibly imagine.7
One technical detail to take care of is that a situation description system must not give
separate portions of probability mass to situation descriptions that are equivalent. For
now, we will define equivalence of eDRSs; we will add equivalence of conceptual
structures below. Two eDRSs are equivalent if they are the same modulo variable
renaming function v, as in Table 1. So we define two situation descriptions S1 ,S2 as
being equivalent, written S1 ≡ S2 , if their conceptual representations are the same,
their eDRSs are the same modulo variable renaming, and the g-mappings are the
7 We do not assume that a person would hold such a probability distribution in mind explicitly, but
we do assume that they have the mental ability to generate situation descriptions according to this
knowledge, and that they will be more likely to generate some situation descriptions than others.

Katrin Erk & Aurélie Herbelot

same modulo variable renaming on the conditions (see definition in appendix). We
will define a situation description system as only involving a single representative of
each equivalence class of situation descriptions.

Definition (Situation description system). Given a language S of situation descrip-
tions over CON, a situation description system over S is a tuple ⟨S,∆⟩ where ∆ is a
probability distribution over S and where S ⊆ S such that
• For any S1 ,S2 ∈ S, S1 ≡/ S2
• there is a dimensionality N and a function f ∶ S → RN such that f is injective.
A situation description system ⟨S,∆⟩ is called conceptually envisioned if there are
probability distributions p1 , p2 such that
• For any ⟨G,D,g⟩ ∈ S with D = ({x1 ,...,xn },{C1 ,...,Cm }),
• ∆(⟨G,D,g⟩) = p1 (G) ∏m
i=1 p2 (Ci ∣ g(Ci ))

The second bullet point in the definition of S ensures that S can be embedded into a
finite-dimensional space, addressing our concerns about finiteness from §3.1.
A conceptually envisioned situation description system is one that can be decom-
posed into two probability distributions, one that is a distribution over conceptual
representations (p1 ), and one that describes each DRS condition as probabilistically
dependent solely on its g-linked conceptual component (p2 ). This is the decomposi-
tion that we will use in §4, and that we have illustrated in §2.

Situation description system for a given utterance. We have previously defined
situation descriptions systems in their most general form. We centrally want to use
situation description systems to describe the process of understanding, where the
listener has to envision a probability distribution over situation descriptions given
a specific utterance. As we have discussed above, we want a situation description
system to be able to expand on a given utterance, to construct an “interpretation of
the whole” from the “blueprint” that is the utterance (Fillmore 1985: p. 233). We can
do this simply by saying that the situation description system for a given utterance
ascribes non-zero probabilities only to situation descriptions whose eDRS “contains”
the utterance. To be able to specify this relation, we first need to define a subset
operation on eDRSs modulo variable renaming. For eDRSs D1 = (X1 ,C1 ),D2 =
(X2 ,C2 ), we write D1 ⫅ D2 if there is an eDRS D = (X,C) such that D1 ≡ D and
X ⊆ X2 and C ⊆ C2 .

To restrict probability mass to eDRSs that “contain” the utterance, we use Dirac’s
delta, a probability distribution that puts all its probability mass in one point. With
this, we can assign zero probability to cases we are not interested in.
We now have all the pieces in place to say what the situation description system
given an observed utterance Du should be:
Let ⟨S,∆⟩ be a situation description system over S, and let Du be an eDRS such that
there is some ⟨G,D,g⟩ ∈ S with Du ⫅ D.
Then the situation description system given utterance Du is a tuple ⟨S,∆Du ⟩ such that
for any ⟨G,D,g⟩ ∈ S,

∆Du (⟨G,D,g⟩∣Du ) ∝ ∆(⟨G,D,g⟩) δ (Du ⫅ D)

That is, ∆Du assigns non-zero probabilities only to situation descriptions that “con-
tain” Du . In more detail, it assigns a zero probability to all situation descriptions that
do not “contain” Du , and normalizes all other probabilities so all probabilities sum
to 1 again. This is the probabilistic meaning of the utterance for the listener.
This concludes the formal exposition of our general framework. Using the above
definitions, we will describe in the following section a situation description system
where DRS conditions are conditionally dependent on frames that express lexical-
conceptual and world knowledge.

4 Using situation description systems to model word meaning in context

We will now use the general framework introduced in the previous section to im-
plement a specific system of probabilistic constraints, with a view to model word
meaning in context. We will introduce a situation description system with a concep-
tual tier made of frames and of property vectors, and structured as a graph. We will
show that our notion of situation description system lets us restrict inferences based
on both global and local constraints, guiding the envisioning process towards the
appropriate meanings. At the global level, we will inspect the role of scenarios in
restricting word senses. At the local level we will inspect selectional constraints that
semantic roles impose on role fillers, and constraints on modifier-head combinations.
We will describe a conceptual representation as a directed graphical model (Koller
& Friedman 2009), or Bayesian network. A directed graphical model describes a
factorization of a joint probability over random variables: Each node is a random
variable, directed edges are conditional probabilities, and the graph states that the
joint probability over all random variables factorizes into the conditional probabilities

Katrin Erk & Aurélie Herbelot

Scenario mix

STARGAZE STARGAZE STARGAZE Scenario

Star Shine Sky
Concept

Feature vector

x, e y
Discourse Representation
Structure
star(x) sky(y)
shine(e) sun(x)

Figure 2 Situation description from Fig. 1, without semantic roles.

in the graph. The illustration in Fig. 1 is a simplified version of the graphical models
that we introduce in the current section, and the edges are conditional probabilities.
For example, each concept in the figure is conditioned on the scenario that it links
to, that is, each concept is sampled from a probability distribution associated with
the linked scenario.8
We will define our system of constraints in three stages of increasing complexity,
starting with global constraints only, then adding semantic roles, and finally modifier-
head combinations. At each stage, we will illustrate the behaviour of the system
with concrete examples implemented in the probabilistic programming language
WebPPL (Goodman & Stuhlmüller 2014).

4.1 System of constraints, first stage: scenario constraints

We start out with a system that only has global constraints, no local ones. We use
conceptual representations that consist of scenarios, concepts, and feature vectors,
but no semantic roles, as shown in Fig. 2. As we mentioned in §2, global constraints
arise here from the ‘scenario mix’ having a preference for fewer scenarios. They
exert an influence on the co-occurrence of word senses (lexicalized concepts) in an
utterance: The ‘sun’ sense of star is more likely to occur in a situation description
in which the STARGAZE scenario is active than in a discussion of theater, where
8 Note that inference in a directed graphical model does not always proceed in the direction of the
edges. If a scenario is known, we can probabilistically infer which concept is likely to be sampled
from it. But likewise, if we know the concept, we can probabilistically infer from which scenario it is
likely to be sampled.

the ‘person’ sense would be more salient.9 In what follows, we formally define a
language CON1 of conceptual representations that comprises scenarios, concepts,
and properties. We then define languages SD1 of situation descriptions over CON1
and eDRSs. We finally construct a particular conceptually envisioned situation
description system over an SD1 language. The construction is illustrated step-by-
step with toy examples involving the two senses of bat.

4.1.1 Definitions

The language CON1 of conceptual representations. We now formalize the
shape of conceptual representations such as the one in Fig. 2: They form a graph
with nodes for scenario tokens, concept tokens, and feature vectors, where each
scenario token samples one concept token, and each concept token samples a feature
vector.

Definition (Language CON1 of conceptual representations). We assume a finite set
FS of scenarios, a finite set FC of concepts, and a finite set FF of features, where the
three sets are disjoint. We write fF for the set of binary vectors of length ∣FF ∣ (or
equivalently, the set of functions from FF to {0,1}).
Then the language CON1 of conceptual representations is the largest set of directed
graphs (V,E,`) with node set V , edge set V , and node labeling function ` ∶ V →
FS ∪ FC ∪ fF such that
• If v is a node with a label in FS , then v has exactly one outgoing edge, and
the target has a label from FC .
• If v is a node with an FC label, then it has exactly one incoming edge, and
the source has a label from FS . It further has exactly one outgoing edge, and
the target has a label from fF .
• If v is a node with a label in fF , then v has exactly one incoming edge and
no outgoing edges. The source on the incoming edge has a label from FC .
Note that each representation in CON1 can be described as a set, as defined in §3,
where the set members are nodes, edges, and node labels.
9 This may be an unusual constellation from a linguistic point of view, without the familiar local
constraints on word meaning. But from a probabilistic modeling point of view, this system is
straightforward, a simple variant on topic models (Blei et al. 2003).

Katrin Erk & Aurélie Herbelot

Languages SD1 of situation descriptions. Before we define our languages of
situation descriptions, we need a few notations in place. First, we had required,
in the definition of situation description systems in §3.1, that it must be possible
to embed situation descriptions in a finite space. For that reason, we assume from
now on that PS, the set of predicate symbols used in eDRSs, is at most as large
as R. In the definition below, we restrict the size of the conceptual representation
to some maximal number M of scenario tokens, and the size of the eDRS to some
maximal number N of referents and of conditions per referent. (We use a single
upper limit N for both referents and conditions per referent in order to keep the
number of parameters down; the number of conditions per discourse referent does
not have to be the same as the number of discourse referents, only the upper limit is
re-used.)
As mentioned previously, the conceptual representation characterizes an individual
as a feature vector which matches a particular referent in the eDRS. We will want
to say that all unary conditions in the eDRS that mention a particular referent x are
probabilistically dependent on the same feature vector node, and that all conditions
that depend on that feature vector are about x. In order to express that, we define
Var(C) for an eDRS condition C as its sequence of variables:
 • if C = ⊺ then Var(C) = ⟨⟩,
 • if C = Fx1 ,...,xk then Var(C) = ⟨x1 ,...,xk ⟩,
 • if C = ¬Fx1 ,...,xk , then Var(C) = ⟨x1 ,...,xk ⟩.
We call the length of Var(C) the arity of the condition C; we call a condition unary if
the length of Var(C) is 1. Later in this section, we will encounter binary conditions
generated by role nodes, for which the length of VAR(C) is 2.

Definition (Language SDM,N 1 of situation descriptions). For M,N ∈ N, SDM,N
 1 is the
largest set of situation descriptions ⟨G,D,g⟩ such that
 • G ∈ CON1 has at most M nodes with labels from FS ,
 • D is an eDRS that contains at most N discourse referents and at most N
 unary conditions per discourse referent, and no conditions with arity greater
 than 1.
 • For any condition C of D, g(C) has a label in fF .
 • g is a function from conditions of D to nodes of G such that for any two unary
 conditions C1 ,C2 of D, g(C1 ) = g(C2 ) iff Var(C1 ) = Var(C2 ).

 18

Lemma (Finite embedding of SD1 situation descriptions.). For any M,N ∈ N, any
situation description in SDM,N
1 can be embedded in a finite-dimensional space.

Proof. Let ⟨G,D,g⟩ ∈ SDM,N 1 . Then G has at most M nodes with labels from FS , and
the same number of nodes with labels in FC , and the same number of nodes with
labels in fF . So there is a finite number of nodes and edges in the graph. The set
of possible node labels is also finite (as fF is finite), so G can be embedded in a
finite-dimensional space. The number of discourse referents and conditions in D
is finite. As the set of predicate symbols is at most as large as R, each predicate
symbol can be embedded in a single real-valued dimension.

We need to amend our definition of equivalence classes of situation descriptions.
Equivalence among eDRSs is as before, based on variable renaming. But now we
additionally have an equivalence relation on the conceptual representation graphs.
Two graphs in CON1 are equivalent if there is a homomorphism between them that
respects node labels. Two situation descriptions ⟨G1 ,D1 ,g1 ⟩,⟨G2 D2 ,g2 ⟩ in SDM,N
1
are equivalent, ⟨G1 ,D1 ,g1 ⟩ ≡ ⟨G2 D2 ,g2 ⟩, iff
• D1 ≡ D2 with variable mappings a such that a(D1 ) = D2 ,
• G1 ≡ G2 with graph homomorphism h mapping the nodes of G1 to nodes of
G2 in a way that respects node labels, and
• for any condition C of D1 , h(g1 (C)) = g2 (a(C)).

4.1.2 A conceptually envisioned situation description system with global con-
straints

In §3.2, we defined a conceptually envisioned situation description system as being
decomposable into two probability distributions, one that is a distribution over con-
ceptual representations (p1 ), and one that relates each DRS condition to a conceptual
node (p2 ). We will now show that it is possible to define a situation description
system that satisfies this definition, that is, it decomposes into the appropriate p1 and
p2 distributions. Formally, we want to show that:

Lemma. There are M,N ∈ N such that there is a situation description system over
SDMN
1 that is conceptually envisioned.
What follows is a construction of the situation description system, which also serves
as the proof for this lemma.

Katrin Erk & Aurélie Herbelot

We have already shown that any situation description in SDM,N 1 has a finite em-
bedding. It remains to show that there is a situation description system, a joint
 M,N
probability distribution over random variables, that operates over some subset SD1
of SDM,N
 1 and that can be factored in the right way:
 • there are probability distributions p1 , p2 such that the probability of any
 M,N
 ⟨G,D,g⟩ ∈ SD1 , with D having conditions C1 ,...,Cm , can be characterized
 as p1 (G) ∏m
 i=1 p2 (Ci ∣ g(Ci )).
 M,N
 • SD1 does not contain more than one representative from any SD equiva-
 lence class.
M is the number of scenario tokens, and by extension the number of concept tokens
and feature vectors. (This is illustrated in Fig 2, where each referent is associated
with a unique token at each level of the conceptual representation). We will talk
of ‘representations of individuals’ to refer to the combination of a scenario token,
a concept token and a feature vector. We do not specify any restrictions on M, the
number of individual representations. As for N, the number of discourse referents
and the number of conditions per discourse referents, we only consider values of
N with N ≥ M, because we need a one-to-one correspondence between discourse
referents and feature vector nodes, and N ≥ ∣FF ∣, as each feature q in FF will be
associated with a predicate corresponding to q, so a feature vector can generate at
most ∣FF ∣ conditions.
We now describe a particular decomposition of the joint probabilities of random
variables in the model, in the form of a ‘generative story’. A generative story
verbalizes the conditional probabilities in a pseudo-procedural form, as a series of
sampling steps. However, this is just a procedural way of presenting a declarative
formulation, not a description of a procedure. And in fact, in §4.1.3 below we draw
inferences in the opposite direction from the ‘steps’ of the generative story.
Our generative story samples some situation description in SDM,N
 1 , starting at the top
of the conceptual graph with scenario nodes, going down through concept nodes,
and ending with feature vectors which then probabilistically condition the content of
the eDRS. The associated steps are defined as follows:
A: Draw a scenario mix, and from it, a collection of scenario frames (where the
 same scenario frame can appear multiple times to accommodate multiple
 individuals drawn from the same type).
B: For each token of a scenario frame, draw a token of a concept frame.
C: For each token of a concept frame, draw a feature vector.

 20

scenarios p scenarios p
GOTHIC 0.1295 GOTHIC , GOTHIC , GOTHIC , GOTHIC 0.0695
BASEBALL 0.1280 BASEBALL , BASEBALL , BASEBALL , BASEBALL 0.0675
BASEBALL , BASEBALL 0.0930 GOTHIC , GOTHIC , GOTHIC 0.0675
GOTHIC , GOTHIC 0.0895 BASEBALL , GOTHIC 0.0565
BASEBALL , BASEBALL , BASEBALL 0.0880 BASEBALL , GOTHIC , GOTHIC 0.0515
FS = {BASEBALL, GOTHIC}.
FC = {BAT- STICK, BAT- ANIMAL, VAMPIRE, PLAYER}

Table 2 Ten most likely sampled scenario collections among 2,000 SDs

At this point, we have a conceptual representation from which conditions in the
eDRS can similarly be sampled:
D: For any discourse referent x, there is a single graph node with a label from fF
from which we sample all the (unary) conditions involving x, overall at most
N conditions.
We now formalize this generative story.

A. Drawing a collection of scenarios. We sample scenarios from some distribu-
tion θ (the ‘scenario mix’ in Fig 1), where the overall shape of θ controls how many
different scenarios will be involved in the situation description. To obtain θ , we
assume an ∣FH ∣-dimensional Dirichlet (which can be viewed as a distribution over
parameters for a multinomial) with a concentration parameter α < 1, which will
prefer sparse distributions (distributions that assign non-zero probabilities only to
few outcomes). So θ ∼ Dir(α). We thus obtain a distribution giving some non-zero
probability mass to a limited number of scenarios. As mentioned before, the con-
centration parameter α is the only truly global constraint we consider. All other
constraints are specific to particular concepts and particular scenarios.
We draw a number m ∈ {1,...,M}, the number of individuals in the conceptual
representation, from a discrete uniform distribution. This will also be the number of
discourse referents in the situation description.
We then draw a multiset H (a bag, meaning that an element can appear multiple
times) of m node labels from FS from a multinomial with parameters θ . This gives us
m independent draws, with possible repeats of scenario labels, where the probability
of a collection of scenarios is independent of the order in which scenarios were
drawn from the multinomial: H ∼ Multinomial(m,θ ).
Illustration: In the WebPPL programming language, we can directly implement
the sampling process, and draw samples that are situation descriptions according to
their probability under the situation description system. We use a tiny toy collection

Katrin Erk & Aurélie Herbelot

concepts p concepts p
BAT- ANIMAL 0.0670 BAT- ANIMAL , VAMPIRE 0.0440
BAT- STICK 0.0650 BAT- STICK , PLAYER , PLAYER 0.0350
PLAYER 0.0630 BAT- STICK , BAT- STICK , PLAYER 0.0305
VAMPIRE 0.0625 PLAYER , PLAYER 0.0265
BAT- STICK , PLAYER 0.0470 BAT- ANIMAL , VAMPIRE , VAMPIRE 0.0255
FS = {BASEBALL, GOTHIC}.
FC = {BAT- STICK, BAT- ANIMAL, VAMPIRE, PLAYER}

Table 3 Ten most likely sampled concept collections among the same 2,000 SDs

of two scenarios, BASEBALL and GOTHIC. The BASEBALL scenario can sample
the concepts BAT- STICK or PLAYER with equal probability (giving 0 probability
mass to the other concepts). Similarly, the GOTHIC scenario can sample the con-
cepts BAT- ANIMAL or VAMPIRE. We set the maximum number of scenario tokens
(individuals) to M = 4 and sample 2,000 situation descriptions. With a Dirichlet
concentration parameter of α = 0.5, we prefer sparser distributions θ , so the majority
of our sampled situation descriptions contain one scenario frame only: 38% of all
2000 SDs include BASEBALL only, 36% GOTHIC only, 27% include both. If we
lower the concentration parameter still more to α = 0.1, the preference for fewer
scenarios becomes more pronounced: We obtain 48% GOTHIC, 44% BASEBALL,
and 8% of sampled SDs containing both scenarios.
Table 2 shows the 10 most likely scenario collections in our 2000 sampled SDs
for α = 0.5, with their probabilities. We see for instance that with a probability
of approximately 0.13, a single GOTHIC scenario token was sampled. A single
BASEBALL scenario token was likewise sampled with a probability of about 0.13.
The most likely mixed scenario collection, at a probability of 0.057, has two tokens,
one BASEBALL and one GOTHIC.

B. Drawing a concept frame for each scenario frame. The next step is for each
scenario frame to sample a concept. We write ĥ for tokens of scenario frame h, and
similarly ẑ for tokens of concepts frame z.
We assume that each scenario type h is associated with a categorical distribution,
parametrized by a probability vector φh , over the concept frame labels in FC . For
each scenario token ĥ of h in H, we draw a concept frame type zĥ from the categorical
distribution associated with h, zĥ ∼ Categorical(φh ), for a total of m concept tokens.
Illustration: Re-using the same 2,000 sampled scenario collections (for α = 0.5),
we sample one concept per scenario using a uniform distribution. Table 3 shows
the ten most likely concept collections. As expected given the distribution over

STARGAZE

Star

: ptruth

star:1.0 object:1.0 bright:0.4 ...

star object - bright sun - animate - furry consisting of plasma

: psalience

x
star(x)
sun(x)

Figure 3 A feature vector. A minus sign in front of a property indicates that the
instance does not have that property. Only some components of the
feature vectors will be turned into conditions in the eDRS.

scenarios, which prefers single types, we have coherent concept groups: bat-animals
co-occur with vampires while bat-sticks co-occur with players.

C. Drawing a feature vector for each concept frame. We assume that each
concept frame type z is associated with a vector τz of ∣FF ∣ Bernoulli probabilities,
which lets us sample, for each feature, whether it should be true or false. Abusing
notation somewhat by also viewing τz as a function, we write τz (q) for the value in
τz that is the Bernoulli probability of feature q ∈ FF .
The sampling of feature values is illustrated in Fig. 3, which shows the properties of
a particular star entity which happens to be a sun and not to be bright.10 In this paper,
we restrict ourselves to binary properties for simplicity. In the figure, τ specifies
that the properties star and object have to be true of any entity that is an instance of
STAR, indicated by a τ probability of 1.0, while stars may or may not be bright, with
a τ value of 0.4.11
10 Note that we are showing here ambiguous properties such as bright, which could mean smart or
luminous, but properties can also be disambiguated on the conceptual side without any change to the
framework.
11 Intermediate probabilities like the one for bright allow us to represent typicality or prevalence of
properties, while probabilities of 1.0 or 0.0 put hard constraints on category membership. We
need to represent typicality because we want the cognizer to be more likely to imagine typical
situation descriptions than atypical ones. As a side note, Kamp & Partee (1995) stress the distinction

Katrin Erk & Aurélie Herbelot

 concepts bat vampire player have_wings fly humanlike athletic wooden p
 BAT- ANIMAL 1 0 0 1 1 0 0 0 0.0650
 BAT- STIC k 1 0 0 0 0 0 0 1 0.0485
 PLAYER 0 0 1 0 0 1 1 0 0.0480
 VAMPIRE 0 1 0 0 0 1 0 0 0.0320
 BAT- STICK 1 0 0 0 0 0 0 1 0.0485
 PLAYER 0 0 1 0 0 1 1 0 0.0480
 FS = {BASEBALL, GOTHIC}.
 FC = {BAT- STICK, BAT- ANIMAL, VAMPIRE, PLAYER}
 FF = {bat, vampire, player, have_wings, fly, humanlike, athletic, wooden}

Table 4 Feature vectors for the 5 most likely sampled collections of individuals.

For each concept frame token ẑ that was previously sampled, we sample a feature
vector f (which we write as a function from FF to {0,1}). The probability of
sampling feature vector f for concept z is

(1) p( f ∣z) = ∏ τz (q) ⋅ ∏ (1 − τz (q))
 q∈FF ∶ f (q)=1 q∈FF ∶ f (q)=0

That is, p( f ∣z) is the probability of truth of the features that are true in f and
probability of falsehood of the features that are false in f .
Illustration: Re-using the same 2,000 scenario and concept collections obtained in
the previous steps, we now sample a feature vector for each individual. To do so, we
need τ distributions for our four concepts of interest. We can partially build on ‘real’
distributions by considering the quantifiers that Herbelot & Vecchi (2016) added to
the feature norms collected by McRae et al. (2005) (which for instance tell us that
on average annotators judged that an instance of BAT- ANIMAL has a probability of
1.0 to have wings, and that an instance of BAT- STICK has a probability of 0.75 to be
wooden). For concepts that are not available from the annotated norms, we manually
set the values of τ. All distributions are shown in Table 7 of the appendix.
The sampling process results in a collection of feature vectors for each situation
description, one feature vector per individual. Table 4 shows the five most likely
collections of individuals, corresponding to a) a bat-animal; b) a bat-stick; c) a
player; d) a vampire; e) a bat-stick and a player, together with the associated feature
between typicality and membership in a concept, while Hampton (2007) suggests that typicality
and membership are both functions of a single cline of similarity to a prototype. Our formalization
accommodates both views. Strict property constraints, as for the object property of STAR, correspond
to strict membership: STAR will never have an instance that is not an object. Soft property constraints,
as for bright above, allow for degrees of typicality of STAR instances.

 24

You can also read