Characterizing Idioms: Conventionality and Contingency

Page created by Jeanette Salazar
 
CONTINUE READING
Characterizing Idioms: Conventionality and Contingency

                                              Michaela Socolof, Jackie Chi Kit Cheung, Michael Wagner, Timothy J. O’Donnell
                                                                              McGill University
                                                                     michaela.socolof@mail.mcgill.ca
                                                                           jcheung@cs.mcgill.ca
                                                                   {chael, timothy.odonnell}@mcgill.ca

                                                              Abstract                         of idiom in the literature. There is little consen-
                                                                                               sus about which phrases should be called idioms
                                              Idioms are unlike other phrases in two im-       beyond a small number of prototypical examples.
                                              portant ways. First, the words in an id-
                                                                                               Some phrases that have proven challenging for
                                              iom have unconventional meanings. Sec-
arXiv:2104.08664v1 [cs.CL] 17 Apr 2021

                                              ond, the unconventional meaning of words         definitions of idioms include light verb construc-
                                              in an idiom are contingent on the presence       tions (e.g., take a walk) and semantically transpar-
                                              of the other words in the idiom. Linguistic      ent collocations (e.g., on and off ). What seems
                                              theories disagree about whether these two        clear is that phrases typically called idioms involve
                                              properties depend on one another, as well        some level of non-conventional meaning and some
                                              as whether special theoretical machinery is      level of contingency between particular words.
                                              needed to accommodate idioms. We define
                                              two measures that correspond to these two           This combination of non-conventionality and
                                              properties, and we show that idioms fall at      contingency between words distinguishes so-
                                              the expected intersection of the two dimen-      called idioms from other linguistic constructions
                                              sions, but that the dimensions themselves        and has led to a number of theories that treat id-
                                              are not correlated. Our results suggest that     ioms as exceptions to the mechanism that builds
                                              idioms are no more anomalous than other          phrases compositionally. These theories posit spe-
                                              types of phrases, and that introducing spe-
                                                                                               cial machinery for handling idioms (e.g., Weinre-
                                              cial machinery to handle idioms may not be
                                              warranted.                                       ich, 1969; Bobrow and Bell, 1973; Swinney and
                                                                                               Cutler, 1979). We propose an alternative ap-
                                                                                               proach, which views idioms not as exceptional,
                                         1   Introduction                                      but merely the result of the interaction of two
                                         Idioms—expressions like spill the beans—bring         independently motivated cognitive mechanisms.
                                         together two phenomena which are of fundamen-         The first mechanism stores and reuses linguistic
                                         tal interest in understanding language. First, they   structures that are useful, including form-meaning
                                         exemplify non-conventional word meaning (Wein-        mappings (e.g., Sciullo and Williams, 1987; Jack-
                                         reich, 1969; Nunberg et al., 1994). The words         endoff, 2002; O’Donnell, 2015). The second
                                         spill and beans in this idiom seem to carry par-      mechanism allows words to be interpreted in non-
                                         ticular meanings—something like divulge and in-       canonical ways depending on the context. This ap-
                                         formation, respectively—which are different from      proach predicts that neither mechanism should de-
                                         the conventional meanings of these words in other     pend on the presence of the other, and further, that
                                         contexts. However, unlike other kinds of non-         we should expect to find linguistic constructions
                                         conventional word use such as novel metaphor,         throughout the entire space where the two mecha-
                                         there is a contingency relationship between words     nisms interact.
                                         in an idiom (Wood, 1986; Pulman, 1993). It is the        This paper presents evidence that idioms oc-
                                         specific combination of the words spill and beans     cupy a particular region of the space of these two
                                         that has come to carry the idiomatic meaning of       mechanisms, but are not otherwise exceptional.
                                         divulging information. One cannot dump out the        We define two measures, conventionality—meant
                                         legumes with the same non-conventional meaning.       to measure the degree to which words are in-
                                            Spill the beans is a prototypical example of an    terpreted in a canonical way, and contingency—
                                         idiom. However, a major difficulty in studying        meant to measure the degree to which a combina-
                                         idioms is that there is no agreed-upon definition     tion of words is useful to store as a unit. We con-
struct a novel corpus of idioms and non-idiom col-       gree to which there is a statistical contingency—
locations, and show that phrases typically called        the presence of one or more words strongly signals
idioms fall at the intersection of low convention-       the presence of the others. This notion of contin-
ality and high contingency, but that the two mea-        gency has also been argued to be a critical piece
sures are not correlated and there are no clear dis-     of evidence used by language learners in decid-
continuities that separate idioms from other types       ing which linguistic structures to store (e.g., Hay,
of phrases.                                              2003; O’Donnell, 2015).
   Our experiments also reveal hitherto unnoticed          To aid in visualizing the space of phrase types
asymmetries in the behavior of head versus non-          we expect to find in language, we place our two di-
head words of idioms. In phrases we would in-            mensions on different axes of a 2x2 matrix, where
tuitively call idioms, the dependent word (e.g.,         each cell contains phrases that are either high or
beans in spill the beans) shows greater deviation        low on the conventionality scale, and high or low
from its conventional meaning than the head word.        on the contingency scale. The matrix is given in
We find that coordinate structures (e.g. X and Y)        Figure 1, with the types of phrases we expect in
pattern consistently with other types of phrases in      each cell.
this respect, as long as the first conjunct is consid-
ered the head. This may shed light on the long-                              Low                 High
standing debate over the headedness of coordinate                            conv.               conv.
structures.                                                    High         Idioms             Common
                                                               cont.   (e.g. raise hell)      collocations
2   Background                                                                             (e.g. in and out)
                                                               Low         Novel                Regular
There are various positions in the linguistics liter-          cont.      metaphors          language use
ature about the relationship between composition-                                           (e.g. eat peas)
ality and storage. On one hand, there are theories
predicting that the only elements that get stored        Figure 1: Matrix of phrase types, organized by whether
                                                         they have high/low conventionality and high/low con-
are those that are non-compositional (e.g., Bloom-
                                                         tingency
field, 1933; Pinker and Prince, 1988). On the
other hand, some theories predict that storage can
                                                            We expect our measures to place idioms primar-
happen regardless of whether something is com-
                                                         ily in the top left corner of the space.At the same
positional (e.g., O’Donnell, 2015; Tremblay and
                                                         time, however, we do not expect to find evidence
Baayen, 2010).
                                                         that idioms are anomalous; thus we predict a lack
   To explore the relationship between composi-
                                                         of correlation between the measures and a lack of
tionality and storage, we present measures that are
                                                         major discontinuities in the space.
intended to capture these two phenomena, and we
                                                            We find that our predictions are borne out. We
probe for evidence of a dependence between stor-
                                                         take this as support for theories that factorize out
age and non-compositionality.
                                                         the dimensions associated with compositionality
   Our first measure, which we call conventional-
                                                         and storage, and we contend that this kind of fac-
ity, captures the extent to which the subparts of
                                                         torization provides a natural way of characteriz-
a phrase contribute their normal meaning to the
                                                         ing not just idioms, but also collocations and novel
phrase. Most of language is highly conventional;
                                                         metaphors, alongside regular language use.
we can combine a small set of units in novel ways,
precisely because we can trust that those units will     3     Methods
have similar meanings across contexts. At the
same time, the linguistic system allows structures       In this section, we describe the creation of our cor-
such as idioms and metaphors, which use words in         pus of idioms and define our measures of conven-
non-conventional ways. Our measure is intended           tionality and contingency.
to distinguish between phrases based on how con-
ventional the meanings of their words are.               3.1    Dataset
   Our second measure, contingency, captures how         We built a corpus of sentences containing idioms
unexpectedly often a group of words occurs to-           as well as non-idioms, all of which were gathered
gether in a phrase and, thus, measures the de-           from the British National Corpus (BNC; Burnard,
2000). Our corpus is made up of sentences con-           Andrew, 2006) expressions were used to find in-
taining target phrases and matched phrases, which        stances of each target phrase.
we detail below.                                            Matched, non-idiomatic sentences were also
   Given that definitions of idioms differ in which      extracted. To obtain these matches, we used
phrases in our dataset count as idioms (some             Tregex to find sentences that included a phrase
would include semantically transparent colloca-          with the same syntactic structure as the target
tions, others would not), we do not want to commit       phrase. Each target phrase was used to obtain two
to any particular definition a priori. Most impor-       sets of matched phrases: one set where the head
tantly, given that we are investigating whether id-      word remained constant and one set where the
ioms can be characterized on dimensions that do          non-head word remained constant. For example,
not make reference to the notion of an idiom, we         for the head word matches of the adjective noun
have tried to refrain from presupposing an a pri-        combination sour grape, we found sentences that
ori category of “idiom” in our analysis, while still     had the noun grape modified with an adjective
acknowledging that people share some weak but            other than sour. Below is an example of a matched
broad intuitions about idiomaticity.                     sentence found by this method:
   The target phrases in our corpus consist of 207
                                                           Not a special grape for winemaking, nor
English phrasal expressions, some of which are
                                                           a hidden architectural treasure, but hot
prototypical idioms (e.g. spill the beans) and some
                                                           steam gushing out of the earth.
of which are collocations (e.g. bits and pieces) that
are more marginally considered idioms. These ex-            The number of instances of the matched phrases
pressions are divided into four categories based         ranged from 29 (the number of verb object phrases
on their syntax: verb object (VO), adjective noun        with the object logs and a verb other than saw) to
(AN), noun noun (NN), and binomial (B) expres-           the tens of thousands (e.g. for verb object phrases
sions. Binomial expressions are fixed pairs of           beginning with have), with the majority falling in
words joined by and or or (e.g., wear and tear).         the range of a few hundred to a few thousand. Is-
The breakdown of phrases into these four cate-           sues of sparsity were more pronounced among the
gories is as follows: 31 verb object pairs, 36 ad-       target phrases, which ranged from one instance
jective noun pairs, 33 noun noun pairs, and 58           (word salad) to 2287 (up and down). Because of
binomials. The phrases were selected primarily           this sparsity, some of the analyses described below
from lists of idioms published in linguistics papers     focus on a subset of the phrases.
(Riehemann, 2001; Stone, 2016; Bruening et al.,             The syntactic consistency between the target
2018; Bruening, 2019; Titone et al., 2019). The          and matched phrases is an important feature of
full list of target phrases is given in Appendix A.      our corpus, as it allows us to control for syntac-
The numerical distribution of phrases is given in        tic structure in our experiments.
Table 1.                                                 3.2   Conventionality measure
     Phrase      Number of         Example               Our measure of conventionality is built on the idea
      type        phrases                                that a word being used in a compositional way
      Verb          31           jump the gun            should keep roughly the same meaning across con-
     Object                                              texts, whereas a non-conventional word meaning
      Noun            36          word salad             should be idiosyncratic to particular contexts. In
      Noun                                               the case of idioms, we expect that the difference
      Adj.            33            red tape             between a word’s meaning in an idiom and the
      Noun                                               word’s conventional meaning should be greater
     Binom.           58        fast and loose           than the difference between the word’s meaning in
                                                         a non-idiom and the word’s conventional meaning.
                                                            Our measure makes use of the language model
Table 1: Types, counts, and examples of target phrases   BERT (Devlin et al., 2018) to obtain contextu-
in our idiom corpus, with head words bolded              alized embeddings for the words in our dataset.
                                                         For each of our phrases, we compute the conven-
  The BNC was parsed using the Stanford Parser           tionality measure separately for the head and non-
(Manning et al., 2014), then Tregex (Levy and            head words. For each case (head and non-head),
we first take the average embedding for the word                   To estimate the contingency of particular
across sentences not containing the phrase. That                phrases, we make use of word probabilities given
is, for spill in spill the beans, we get the embed-             by XLNet (Yang et al., 2019), an auto-regressive
dings for the word spill in sentences where it does             language model that gives estimates for the con-
not occur with the direct object beans. Let O be                ditional probabilities of words given their context.
a set of instances w1 , w2 , ..., wn of a particular            To estimate the joint probability of the words in
word used in contexts other than the context of                 spill the beans in some particular context (the nu-
the target phrase. Each instance has an embedding               merator of the expression above), we use XLNet to
uw1 , uw2 , ..., uwn . The average embedding for the            obtain the product of the conditional probabilities
word among these sentences is:                                  in the chain rule decomposition of the joint. We
                                                                get the relevant marginal probabilities by using at-
                             n
                       1X                                       tention masks over particular words, as shown be-
                  µO =    uwi .                           (1)
                       n                                        low, where c refers to the context—that is, the rest
                            i=1
                                                                of the words in the sentence containing spill the
   We take this quantity to be a proxy for the proto-           beans.
typical meaning of the word (i.e., its conventional
meaning). The conventionality score is the nega-
tive of the average distance between uO and the                  Pr(beans | spill the, c) = ..spill the beans...
embeddings for uses of the word within the phrase                Pr(the | spill, c) = ...spill the [___]...
in question. We compute this as follows:                         Pr(spill | c) = ...spill [___] [___]...

                             m                                     The denominator is the product of the probabil-
                         1 X Ti − µO
      conv(phrase) = −                                ,   (2)   ities of each individual word in the phrase, with
                         m      σO                2
                             i=1                                both of the other words masked out:
where T is the embedding corresponding to a par-
ticular use of the word in the target phrase, and σO                 Pr(beans | c) = ...[___] [___] beans...
is the component-wise standard deviation of the                      Pr(the | c) = ...[___] the [___]...
set of embeddings uwi .                                              Pr(spill | c) = ...spill [___] [___]...

3.3    Contingency measure                                         The conditional probabilities were computed
Our second measure, which we have termed con-                   right to left, and with a paragraph of padding text
tingency, refers to whether a particular set of                 that came before the sentence containing the rel-
words appears within the same phrase at an un-                  evant phrase. Note that this measure calculates
expectedly high rate. The measure is based on                   the XLNet-based generalized PMI for the entire
the notion of pointwise mutual information (PMI),               string bounded by the two words of the idiom –
which is a measure of the strength of association               this means, for example, that the phrase spill the
between two events. We estimate a generalization                exciting beans will return the PMI score for the
of PMI that extends it to sets of more than two                 entire phrase, adjective included.
events, allowing us to capture the association be-
                                                                4   Related work
tween phrases that contain more than two words.
   The specific generalization of PMI that we use               The measures described above have been instan-
has at various times been called total correla-                 tiated in various ways in previous work on idiom
tion (Watanabe, 1960), multi-information (Stu-                  detection.
dený and Vejnarová, 1998), and specific correla-                   Many idiom detection models build on insights
tion (Van de Cruys, 2011).                                      about unconventional meaning in metaphor. A
                                                                number of these approaches use distributional
                                  p(x1 , x2 , ..., xn )         models, such as Kintsch (2000), Utsumi (2011),
 cont(x1 , x2 , ..., xn ) = log     Qn                  . (3)
                                       i=1 p(xi )               Sa-Pereira (2016), and Shutova et al. (2012), the
                                                                latter of which was one of the first to implement
  For the case of three variables, we get:
                                                                a fully unsupervised approach that aims to en-
                                    p(x, y, z)                  code relationships between words, their contexts,
         cont(x, y, z) = log                   .          (4)   and their dependencies. A related line of work
                                  p(x)p(y)p(z)
has taken on the task of automatically determin-        two measures together, we expect idioms to fall
ing whether potentially idiomatic expressions are       at the intersection of low conventionality and high
being used idiomatically or literally in individual     contingency, but not to show major discontinuities
sentences, based on contextual information (Katz        that qualitatively distinguish them from phrases
and Giesbrecht, 2006; Fazly et al., 2009; Sporleder     that fall at other areas of intersection.
and Li, 2009, 2014).
   Our measure of conventionality is inspired by        5.1     Analysis 1: conventionality measure
the insights of these previous models. As de-           In order to validate that our conventionality mea-
scribed in Section 3.2, our measure uses differ-        sure corresponds to an intuitive notion of un-
ences in embeddings across contexts.                    usual word meaning, we carried out an online
   Meanwhile, approaches to collocation detection       experiment to see whether human judgments of
have taken a probabilistic or information-theoretic     conventionality correlated with our automatically-
approach that seeks to identify collocations us-        computed conventionality scores. The experimen-
ing word combination probabilities. A frequently-       tal design and results are described below.
used quantity for measuring co-occurrence prob-
                                                        5.1.1    Human rating experiment
abilities is pointwise mutual information (PMI;
Fano, 1961; Church and Hanks, 1990). Other              The experiment asked participants to rate the liter-
implementations include selectional association         alness of a word or phrase in context. We focused
(Resnik, 1996), symmetric conditional probabil-         the experiment on twenty-two verb object target
ity (Ferreira and Pereira Lopes, 1999), and log-        phrases and their corresponding matched phrases.
likelihood (Dunning, 1993; Daille, 1996). To im-        For each target phrase (i.e. rock the boat), there
plement our notion of contingency, we use a gen-        were ten items, each of which consisted of the
eralized PMI measure that is in line with the mea-      target phrase used in the context of a (different)
sures in previous work.                                 sentence. Each sentence was presented with the
   While much of the literature in NLP recognizes       preceding sentence and the following sentence as
that idioms are characterized by several properties     context, which is the same amount of context that
which must be measured separately (e.g., Fazly          the automatic measure was given. In each item,
and Stevenson, 2006; Fazly et al., 2009), our ap-       a word or phrase was highlighted, and the partici-
proach is novel in attempting to characterize id-       pant was asked to rate the literalness of the high-
ioms along two orthogonal dimensions that cor-          lighted element. We obtained judgments of the lit-
relate with human intuitions about word meaning         eralness of the head word, non-head word, and en-
and usage.                                              tire phrase for ten different sentences containing
                                                        each target phrase.
5   Analyses                                               We also obtained literalness judgments of the
                                                        head word and the entire phrase for phrases
In this section we first present analyses of our two    matched on the head of the idiom (e.g., verb object
measures individually, showing that they capture        phrases with rock as the verb and some noun other
the properties they were intended to capture, fol-      than boat as the object). Similarly, we obtained lit-
lowed by an exploration of the interaction between      eralness judgments of the non-head word and the
the measures. Section 5.3 evaluates our central         entire phrase for phrases matched on the non-head
predictions, showing results that are inconsistent      word of the idiom (e.g., verb object phrases with
with a special mechanism theory of idioms (under        boat as the object and some verb other than rock).
which a high level of word co-occurrence is pack-       Participants were asked to rate literalness on a
aged together with non-canonical interpretation).       scale from 1 (‘Not literal at all’) to 6 (‘Completely
   We predict that the target phrases will score        literal’). Items were presented using a Latin square
lower on conventionality than the matched               design.
phrases, since we expect these phrases to contain          Participants were adult native English speakers
words with (often highly) unconventional mean-          who provided written informed consent to partici-
ings. We further predict that the target phrases will   pate in the experiment. The experiment took about
have higher contingency scores than the matched         10 minutes to complete. The data were recorded
phrases, due to all of the target phrases being ex-     using anonymized participant codes, and none of
pressions that are frequently reused. Putting the       the results included any identifying information.
There were 150 participants total. The data from           phrases, with a difference in contingency value of
10 of those participants were excluded due to fail-        2.25 (t(159) = 8.807, p < 0.001).
ure to follow the instructions (assessed with catch
                                                                                                 AN                           B
trials).                                                                       12.5

                                                                               10.0                     ●

5.1.2    Results                                                                7.5
                                                                                                                        ●

                                                                                                                        ●

                                                           Contingency score
To explore whether our conventionality measure                                  5.0
                                                                                                                                      ●

correlates with human judgments of literalness,                                 2.5
                                                                                                 NN                           VO
we compare the scores to the results from the hu-                              12.5
man rating experiment. Ratings were between 1                                  10.0
and 6, with 6 representing the highest level of con-                            7.5
ventionality.                                                                   5.0        ●

   We predicted that the conventionality scores                                 2.5
                                                                                           ●

                                                                                         Match        Target          Match         Target
should increase as literalness ratings increased. To                                             Phrase type (target vs. matched)
assess whether our prediction was borne out, a lin-
ear model was fit using the lmerTest (Kuznetsova           Figure 2: Contingency of target and matched phrases,
et al., 2017) package in R (Team, 2017), with hu-          for phrases with at least 30 instances
man rating average and highlighted word (head
versus non-head) as predictors, plus random ef-               Figure 2 shows boxplots of the average contin-
fects of participant and item. Results of fitting          gency score of each phrase’s average contingency
the model are shown in Table 2. The results con-           score. Given that many of the target phrases only
firm our prediction that words that receive higher         occurred in a handful of sentences, we have ex-
scores on the conventionality metric are rated as          cluded phrases for which the target or matched
highly literal by humans (β̂ = 0.181, SE(β̂) =             sets contain less than 30 sentences. This thresh-
0.024, p < 0.001).                                         old was chosen to strike a balance between having
   We carried out a nested model comparison to             enough instances contributing to the average score
see whether including the BERT conventionality             for each datapoint, and having a large enough sam-
score as a predictor significantly improved the            ple of phrases. We considered thresholds at every
model. A likelihood ratio test with the above              multiple of 10 until we reached one that left at least
model and one that differed in excluding the BERT          100 datapoints remaining. Within each structural
conventionality score as a predictor yielded a             type of phrase (adjective noun, binomial, noun
higher log likelihood for the model with the BERT          noun, verb object), the target phrases are on the
score as a predictor (χ2 = 42.784, p < 0.001).             right and the non-idiomatic, matched phrases are
                                                           on the left. It should be noted that for the most
                                                           part, there were fewer sentences containing the tar-
Table 2: Model results table with human literalness rat-
                                                           get phrase than there were sentences containing
ing as the dependent variable
                                                           only the head word in the relevant structural posi-
  Coefficient        β̂   SE(β̂)      t     p              tion. This likely explains the greater spread among
  Intercept    0.033 0.026 1.258 0.104                     the target phrases—namely, the averages are based
  ConvScore    0.181 0.024 7.644 < 0.001                   on fewer data points.
  Word:Nonhead 0.024 0.020 1.220 0.111                        For all four types of phrases, the median con-
                                                           tingency score was higher for the target phrases
  n = 2117                                                 than for their matched phrases. The greatest differ-
                                                           ences were observed for verb object and binomial
                                                           phrases.
5.2     Analysis 2: contingency measure
To evaluate the role of contingency in distinguish-        5.3                        Analysis 3: interaction and correlation of
ing phrases that are often assumed to be stored (id-                                  measures
ioms and collocations), we compare the scores of           Here we show that idioms fall in the expected area
the target and matched phrases.                            of our two-dimensional space with no sharp dis-
  We find that, overall, the target phrases had            continuities or correlation between the measures,
higher contingency scores than the matched                 thus confirming our predictions.
Recall the 2x2 matrix of contingency versus
conventionality (Figure 1), where idioms were ex-
                                                                                     10
pected to be in the top left quadrant. Plotting our

                                                                       Contingency
measures against each other gives us the results
in Figure 3. Since the conventionality scores were
                                                                                      5
for individual words, we averaged the scores of the
head word and the primary non-head word (i.e.,
the verb and the object for verb object phrases, the                                  0
adjective and the noun for adjective noun phrases,                                             −40         −35              −30              −25
the two nouns in noun noun phrases, and the two                                                            Conventionality

words of the same category in binomial phrases).                                             Phrase type   Target phrases         Matched phrases

The plot shows the average values of the target and
matched phrases only if the phrase and matched                         Figure 4: Contingency versus conventionality of tar-
structure both occurred in at least 30 sentences in                    get and matched phrases (for target phrases selected as
the BNC.                                                               examples of idioms). Large points represent average
                                                                       values.

              15                                                       tion. There are, of course, intermediate areas that
                                                                       phrases can fall into.
Contingency

              10                                                          We also found no evidence of correlation be-
                                                                       tween contingency and conventionality values
               5                                                       among the entire set of phrases, target and
                                                                       matched (r(312) = -0.037, p = 0.518).
               0

                   −45       −40        −35          −30         −25   6                  Asymmetries between heads and
                                   Conventionality                                        dependents
                    Phrase type    Target phrases    Matched phrases

                                                                       Our experiments revealed an unexpected but in-
Figure 3: Contingency versus conventionality of target                 teresting asymmetry between heads and their de-
and matched phrases. Large points represent average                    pendents. We found that the head word of the
values.                                                                target phrases was more conventional on average
                                                                       than the non-head content word. A two-sample
   As discussed above, the target phrases con-                         t-test revealed that this difference was significant
tained a mix of phrases typically considered id-                       (t = 3.029, df = 252.45, p = 0.0027). This effect
ioms, as well as (seemingly) compositional collo-                      was much more robust for the target phrases (made
cations. Therefore, we predicted that they would                       up of idioms and collocations) than it was for the
be distributed between the top two quadrants, with                     matched phrases. The matched phrases did not
idioms on the top left and collocations on the top                     show a significant difference between heads and
right. For the matched phrases, we assumed that                        non-heads (t = 1.506, df = 277.42, p = 0.1332).
the majority were instances of regular language                           Figure 5 presents the data in a slightly different
use, so we predicted that they should be clustered                     way. The plots here include all of the target and
in the bottom right quadrant. The horizontal and                       matched phrase, and show that at lower values of
vertical black lines on the plot were placed at the                    overall phrase conventionality, the unusual mean-
mean values for each measure.                                          ing comes disproportionately from the non-head
   Figure 4 shows only the target phrases that re-                     word. We have shown that idioms fall at the low
ceived a human annotation of 1 or 2 for head                           end of the conventionality spectrum, so the data in
word literality—that is, the phrases judged to be                      Figure 5 indicate that the unconventional meaning
most non-compositional. As expected, the average                       of idioms comes largely from the dependent word.
score for the target phrases moved more solidly                        The effect is largest for verb object phrases.
into the idiom quadrant. Note that, given the con-                        One possibility we raise for the difference
tinuous nature of the measures, the division of lan-                   in effect sizes between the phrase types is that
guage space into four quadrants is a simplifica-                       there could be an additive effect of linear order,
with conventionality decreasing from left to right                                              about language, and we showed that these mea-
through the phrase. For verb object phrases, the                                                sures concentrate idioms in the predicted region of
two effects go in the same direction, whereas for                                               the space. To compare the behavior of our conven-
adjective noun and noun noun phrases, the linear                                                tionality measure with human behavior, we carried
order effect would go in the opposite direction as                                              out an online rating experiment in which people
the headedness effect.                                                                          rated the literalness of words and phrases in con-
   There is disagreement in the literature about                                                text. As predicted, higher literalness ratings corre-
whether binomial phrases (which are coordinate                                                  lated with higher conventionality scores given by
structures) contain a head at all. Some proposals                                               our measure. As for the contingency measure, we
treat the first conjunct as the head (e.g., Rothstein,                                          separated the phrases in our corpus into idioms
1991; Kayne, 1994; Gazdar, 1981), while others                                                  and collocations, on one hand, and non-idiom,
treat the conjunction as the head or claim that there                                           non-collocations on the other. Again, as predicted,
is no head (e.g., Bloomfield, 1933). We note that                                               the idioms and collocations scored higher on our
in the binomial case, the first conjunct seems to                                               contingency measure.
pattern like the heads of the other phrase types,                                                  When we plotted the conventionality and con-
lending support to the first-conjunct-as-head the-                                              tingency scores against one another, we found that
ory.                                                                                            idioms fell, on average, in the area of low con-
                                                                                                ventionality and high contingency, as expected,
                                                   AN                           B
                                       3
                                                                                                while regular, non-idiomatic phrases fell in the
                                                                                                high conventionality, low contingency area, also
                                       0                                                        as expected. The lack of a correlation between the
                                                                                                two measures provides support for theories that di-
    Word conventionality (rescaled)

                                      −3
                                                                                                vorce the notions of compositionality and storage.
                                      −6                                                        Specifically, this lack of correlation indicates that
                                                                                                storage of a form is not dependent on the extent to
                                      −9                                                        which the form is compositional.
                                                   NN                           VO
                                       3
                                                                                                   We have thus provided evidence that idioms
                                                                                                trend toward one end of the intersection between
                                       0                                                        compositionality and storage, but are not other-
                                                                                                wise exceptional, contrary to theories that posit
                                      −3
                                                                                                special machinery to handle idioms. Idioms, then,
                                      −6                                                        represent just one of the ways that composition-
                                                                                                ality and storage can interact, and are no more
                                      −9                                                        special than collocations or novel metaphors. We
                                           −4     −2    0    2          −4     −2       0   2
                                                Phrase conventionality (rescaled)               have also presented the novel finding that the locus
                                                 Word type       Head        Non−head           of non-compositionality in idioms resides primar-
                                                                                                ily in the dependent, rather than the head, of the
Figure 5: Change in head versus non-head convention-                                            phrase, a result that merits further study.
ality scores as phrase conventionality increases, for all
phrases (target and matched) separated by phrase type
(adjective noun, binomial, noun noun, and verb object).                                         References
                                                                                                Leonard Bloomfield. 1933.       Language.     Henry
7                                Discussion & Conclusion                                          Holt, New York.
We investigated whether idioms could be charac-                                                 Samuel Bobrow and Susan Bell. 1973. On catch-
terized as occupying the intersection between stor-                                               ing on to idiomatic expressions. Memory &
age and compositionality of a form, without need-                                                 Cognition, 1:343–346.
ing to appeal to special machinery to handle these
phrases, as has been proposed at various times in                                               Benjamin Bruening. 2019. Idioms, collocations,
the literature.                                                                                   and structure: Syntactic constraints on conven-
   We validated that our conventionality and                                                      tionalized expressions. Natural Language and
contingency measures corresponded to intuitions                                                   Linguistic Theory, 70:491–538.
Benjamin Bruening, Xuyen Dinh, and Lan Kim.         Jennifer Hay. 2003. Causes and Consequences of
  2018. Selection, idioms, and the structure of       Word Structure. Routledge, New York, NY.
  nominal phrases without classifiers. Glossa,
  42:1–46.                                          Ray Jackendoff. 2002. What’s in the lexicon? In
                                                      S. Nooteboom, F. Weerman, and F. Wijnen, edi-
Lou Burnard. 2000. The British National Cor-          tors, Storage and Computation in the Language
  pus Users Reference Guide. Oxford University        Faculty. Kluwer Academic Press, Dordrecht.
  Computing Services.
                                                    Graham Katz and Eugenie Giesbrecht. 2006.
Kenneth Ward Church and Patrick Hanks. 1990.          Automatic identification of non-compositional
  Word association norms, mutual information,         multi-word expressions using latent semantic
  and lexicography. Computational Linguistics,        analysis. In Proceedings of the Workshop on
  16(1):22–29.                                        Multiword Expressions: Identifying and Ex-
Beatrice Daille. 1996. Study and implementa-          ploiting Underlying Properties, pages 12–19,
  tion of combined techniques for automatic ex-       Sydney, Australia. Association for Computa-
  traction of terminology. In The Balancing           tional Linguistics.
  Act: Combining Symbolic and Statistical Ap-
  proaches to Language, pages 49–66. MIT Press,     Richard Kayne. 1994. The Antisymmetry of Syn-
  Cambridge, MA.                                      tax. MIT Press, Cambridge, MA.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and       Walter Kintsch. 2000. A computational theory of
  Kristina Toutanova. 2018. BERT: pre-training       metaphor comprehension. Psychonomic Bul-
  of deep bidirectional transformers for language    letin & Review, 7:257–266.
  understanding. CoRR, abs/1810.04805.
                                                    A. Kuznetsova, P. B. Brockhoff, and R. H. B.
Ted Dunning. 1993. Accurate methods for the           Christensen. 2017. lmertest package: tests in
  statistics of surprise and coincidence. Compu-      linear mixed effects models. Journal of Statis-
  tational Linguistics, 19(1):61–74.                  tical Software, 82:1–26.

Robert Mario Fano. 1961. Transmission of In-        Roger Levy and Galen Andrew. 2006. Tregex and
  formation: A Statistical Theory of Communica-       tsurgeon: tools for querying and manipulating
  tions. MIT Press, Cambridge, MA.                    tree data structures. In 5th International Con-
                                                      ference on language Resources and Evaluation
Afsaneh Fazly, Paul Cook, and Suzanne Steven-
                                                      (LREC 2006).
  son. 2009. Unsupervised type and token iden-
  tification of idiomatic expressions. Computa-     Christopher D. Manning, Mihai Surdeanu, John
  tional Linguistics, 35:61–103.                      Bauer, Jenny Finkel, Steven J. Bethard, and
Afsaneh Fazly and Suzanne Stevenson. 2006.            David McClosky. 2014. The Stanford CoreNLP
  Automatically constructing a lexicon of verb        natural language processing toolkit. In Associ-
  phrase idiomatic combinations. In Proceedings       ation for Computational Linguistics (ACL) Sys-
  of the 11th Conference of the European Chapter      tem Demonstrations, pages 55–60.
  of the ACL (EACL’06), pages 337–344. Trento,
  Italy.                                            Geoffrey Nunberg, Ivan A. Sag, and Thomas Wa-
                                                      sow. 1994. Idioms. Language, 70:491–538.
Joaquim Ferreira and Gabriel Pereira Lopes. 1999.
  A local maxima method and a fair dispersion       Timothy O’Donnell. 2015. Productivity and reuse
  normalization for extracting multiword units        in language: A theory of linguistic computation
  from corpora. In Sixth Meeting on Mathemat-         and storage. MIT Press.
  ics of Language, pages 369–381.
                                                    Steven Pinker and Alan Prince. 1988. On lan-
Gerald Gazdar. 1981. Unbounded dependencies           guage and connectionism: Analysis of a parallel
  and coordinate structure. Linguistic Inquiry,       distributed processing model of language acqui-
  12:155–184.                                         sition. Cognition, 28:73–193.
Stephen G. Pulman. 1993. The recognition and in-         in Graphical Models. Kluwer Academic Pub-
  terpretation of idioms. In C. Cacciari et al., ed-     lishers, Norwell, MA.
  itor, Idioms—Processing, Structure, and Inter-
  pretation, pages 249–270. Lawrence Erlbaum           David Swinney and Anne Cutler. 1979. The ac-
  Associates, Hillsdale, NJ.                             cess and processing of idiomatic expressions.
                                                         Journal of Verbal Learning and Verbal Behav-
Philip Resnik. 1996. Selectional constraints: An         ior, 18:523–534.
  information-theoretic model and its computa-
  tional realization. Cognition, 61:127–159.           R Core Team. 2017. R: A Language and Envi-
                                                         ronment for Statistical Computing. Vienna: R
Suzanne Z. Riehemann. 2001. A constructional             Foundation for Statistical Computing.
  approach to idioms and word formation. PhD
  thesis, Stanford University.                         D. Titone, K. Lovseth, K. Kasparian, and M. Tiv.
                                                         2019. Are figurative interpretations of id-
Susan Rothstein. 1991. Heads, projections, and           ioms directly retrieved, compositionally built,
  categorial determination. In K. Leffel and             or both? Canadian Journal of Experimental
  D. Bouchard, editors, Views on phrase struc-           Psychology/Revue canadienne de psychologie,
  ture, pages 97–112. Kluwer, Dordrecht.                 73:216–230.

Fernando Sa-Pereira. 2016. Distributional repre-       Antoine Tremblay and Harald Baayen. 2010.
  sentations of idioms. Master’s thesis, McGill          Holistic processing of regular four-word se-
  University.                                            quences: A behavioral and erp study of the ef-
                                                         fects of structure, frequency, and probability on
Anna Maria Di Sciullo and Edwin Williams. 1987.          immediate free recall. In D. Wood, editor, Per-
  On the definition of word. MIT Press, Cam-             spectives on Formulaic Language: Acquisition
  bridge, MA.                                            and communication, pages 151–173. The Con-
Ekaterina Shutova, Tim Van de Cruys, and Anna            tinuum International Publishing Group, Lon-
  Korhonen. 2012. Unsupervised metaphor para-            don.
  phrasing using a vector space model. In Pro-         Akira Utsumi. 2011. Computational exploration
  ceedings of COLING 2012: Posters, pages                of metaphor comprehension processes using
  1121–1130.                                             a semantic space model. Cognitive Science,
Caroline Sporleder and Linin Li. 2009. Unsuper-          35:251–296.
  vised recognition of literal and non-literal use     Tim Van de Cruys. 2011. Two multivariate gen-
  of idiomatic expressions. In Proceedings of the        eralizations of pointwise mutual information.
  12th Conference of the European Chapter of the         In Proceedings of the Workshop on Distribu-
  ACL, pages 754–762.                                    tional Semantics and Compositionality. Asso-
Caroline Sporleder and Linin Li. 2014. Classify-         ciation for Computational Linguistics, Strouds-
  ing idiomatic and literal expressions using topic      burg, PA.
  models and intensity of emotions. In Proceed-        Satosi Watanabe. 1960. Information theoretical
  ings of the 2014 Conference on Empirical Meth-         analysis of multivariate correlation. IMB Jour-
  ods in Natural Language Processing (EMNLP),            nal of Research and Development, 4:66–82.
  pages 2019–2027.
                                                       Uriel Weinreich. 1969. Problems in the analysis
Megan Schildmier Stone. 2016. The difference             of idioms. In J. Puhvel, editor, Substance and
 between bucket-kicking and kicking the bucket:          structure of language, pages 23–81. University
 Understanding idiom flexibility. PhD thesis,            of California Press.
 University of Arizona.
                                                       Mary McGee Wood. 1986. A Definition of Idiom.
Milan Studený and Jirina Vejnarová. 1998. The           University of Birmingham.
 multi-information function as a tool for measur-
 ing stochastic dependence. In Proceedings of          Zhilin Yang, Zihang Dai, Yiming Yang, Jaime
 the NATO Advanced Study Institute on Learning           Carbonell, Ruslan Salakhutdinov, and Quoc V.
Le. 2019. XLNet: Generalized autoregressive   A   Appendix
pretraining for language understanding.
                                              On the following page is a list of the target phrases
                                              in our corpus.
Target phrase       Type   Target phrase      Type Target phrase       Type   Target phrase         Type
deliver the goods   VO     swimming pool      NN   cold feet           AN     by and large           B
run the show        VO     cash cow           NN   green light         AN     more or less           B
rock the boat       VO     foot soldier       NN   red tape            AN     bits and pieces        B
call the shots      VO     attorney general   NN   black box           AN     up and down            B
talk turkey         VO     hit list           NN   blue sky            AN     rise and fall          B
cut corners         VO     soup kitchen       NN   bright future       AN     sooner or later        B
jump the gun        VO     bull market        NN   sour grape          AN     rough and ready        B
have a ball         VO     boot camp          NN   green room          AN     far and wide           B
foot the bill       VO     message board      NN   easy money          AN     give and take          B
break the mold      VO     gold mine          NN   last minute         AN     time and effort        B
pull strings        VO     report card        NN   hard heart          AN     pro and con            B
mean business       VO     comfort food       NN   hot dog             AN     sick and tired         B
raise hell          VO     pork barrel        NN   raw talent          AN     back and forth         B
close ranks         VO     flower girl        NN   hard labor          AN     day and night          B
strike a chord      VO     hit man            NN   broken home         AN     wear and tear          B
cry wolf            VO     blood money        NN   fat chance          AN     nut and bolt           B
lose ground         VO     cottage industry   NN   dirty joke          AN     tooth and nail         B
make waves          VO     board game         NN   happy hour          AN     on and off             B
clear the air       VO     death wish         NN   high time           AN     win or lose            B
pay the piper       VO     word salad         NN   rich history        AN     food and shelter       B
spill the beans     VO     altar boy          NN   clean slate         AN     odds and ends          B
bite the dust       VO     bench warrant      NN   stiff competition   AN     in and out             B
saw logs            VO     time travel        NN   maiden voyage       AN     sticks and stones      B
lead the field      VO     love language      NN   cold shoulder       AN     make or break          B
take the powder     VO     night owl          NN   clean energy        AN     part and parcel        B
buy the farm        VO     life blood         NN   hard sell           AN     loud and clear         B
turn tail           VO     road rage          NN   back pay            AN     cops and robbers       B
get the sack        VO     light house        NN   deep pockets        AN     short and sweet        B
hit the sack        VO     bid price          NN   broken promise      AN     safe and sound         B
kick the bucket     VO     carrot cake        NN   dead silence        AN     black and blue         B
shoot the bull      VO     command line       NN   blind faith         AN     toss and turn          B
                           stag night         NN   tight schedule      AN     fair and square        B
                           husband material   NN   brutal honesty      AN     heads or tails         B
                                                   bright idea         AN     hearts and flowers     B
                                                   kind soul           AN     rest and relaxation    B
                                                   bruised ego         AN     flesh and bone         B
                                                                              life and limb          B
                                                                              checks and balances    B
                                                                              fast and loose         B
                                                                              high and dry           B
                                                                              pots and pans          B
                                                                              now or never           B
                                                                              hugs and kisses        B
                                                                              bread and butter       B
                                                                              risk and reward        B
                                                                              cloak and dagger       B
                                                   pins and needles     B     nickel and dime        B
                                                   sugar and spice      B     rhyme or reason        B
                                                   neat and tidy        B     leaps and bounds       B
                                                   step by step         B     live and learn         B
                                                   lost and found       B     peace and quiet        B
                                                   old and grey         B     song and dance         B
You can also read