A Corpus-Linguistic Approach to the colour naming debate

Page created by Clifton Burns
 
CONTINUE READING
A Corpus-Linguistic Approach to the colour naming debate

                                              Luigi SQUILLANTE
                                       Sapienza – Università di Roma (Italy)
                                    Stiftung Universität Hildesheim (Germany)

Abstract
Since the publication of the famous work by Berlin & Kay (1969) on universality and evolution of
basic colour terms, several studies on colours have tried to support or reject the hypothesis that
language is shaped by universals of perception. In fact, colours are a privileged subject in this issue
because of their twofold nature of both results of biological perceptions and lexical items of language.
This work presents a new contribution to the so called “colour naming debate” from a point of view
that, as far as we know, has not been taken yet into account in studies on this subject: the
phraseological perspective. Our analysis, carried out on a large set of Italian nominal multiword
expressions (MWEs) including colour terms, shows that colours are not equally distributed in this kind
of expressions and, if only idiomatic MWEs are considered, the order arising from the quantitative
distribution of colour terms strongly reproduces Berlin & Kay's hierarchy. In this sense our results
support the idea that perception can influence language and show how also phraseological and corpus-
based studies can shed new light on this subject.

1. Introduction
It is well known that different languages develop different lexical categories in order to refer to certain
areas of the perceivable spectrum that are commonly called colours. The focus on the difference
between the terms used to refer to colours in all the languages is not new to linguistic studies and can
be brought back to the nineteenth century, when works like those by Gladstone (1858), Allen (1879)
and Geiger (1880) first appeared. The initial debate focused on whether it was possible to infer an
evolution in the human perception of hues from a philological analysis of the usage of colour terms in
the human literature, from the ancient epic poems to nowadays. The answer to such a question was
soon provided by the study of Magnus (1880), showing that lexical distinctions or the lack of terms for
expressing certain colours did not seem to imply different or deficit perceptions in the speakers: the fact
that some population had just one term to identify a hue, which another population was used to refer to
by means of two or more words, was rather due to different ways of categorizing the same physical
reality1.
The investigation on the correlation between colour terms and perception was pursued by several
studies during the twentieth century (Ray 1952, 1953; Conklin 1955; Nida 1959 among the others), all
of which insisted on the arbitrary possibility of languages of segmenting the spectral continuum.
However, it was not until the publication of the famous work by Berlin & Kay (1969) on universality
and evolution of basic colour terms that this topic started to gain much attention from the scientific
community, leading to what today can be defined as the "colour naming debate", spreading throughout
the fields of anthropology, cognitive sciences, linguistics and philosophy.
The great importance of Berlin & Kay's work is that it represents one of the most influential criticisms
to the Sapir-Whorf hypothesis (e.g. Sapir, 1921:219; Whorf 1956 [1940]:212) and to linguistic
relativism, which, during the first half of the twentieth century, had broadly dominated the research

1   This concept will be fully developed in general terms some years later by Saussure, with the notion of arbitrarity of
    signs. In the structuralist frame, Hjemslev (1968[1943]:57-58) will explicitly take the lack of biunivocal corrispondence
    between colour terms in English and Welsh as an example to show how languages choose arbitrarly how to categorise
    the same physical entities.
approaches to social sciences. As Kay and Maffi (1999:744) underline, "with the ascendance in the
1920s, '30s and '40s of linguistic and cultural relativity [...] color came to be singled out as the parade
example of a lexical domain in which the control of language over perception is patent [...]". Against
this interpretation, according to which language shapes perception, Berlin & Kay proposed a
universalistic point of view, holding that language is shaped by universals of perception. Their
experiment on 98 different languages showed that there exists a simple rule defining a universal
hierarchy for the appearance of the basic color terms in every language:

           [white, black] < red < [green, yellow] < blue < brown < [purple, pink, orange, grey]2.

The hierarchy is developed on the base of the presence of the colour terms within the analyzed
languages. In fact, according to the order above, it was seen that if a language has one term, then all the
terms on its left are attested in the same language. The eleven colours included in the hierarchy were
chosen according to several principles (Berlin & Kay, 1969:6) intended to define what an ideal basic
colour term is3.
The results of Berlin & Kay's experiment were discussed and tested in the ensuing years, leading to a
strong polarization between universalists and relativists. As Kay & Maffi (ibid.) recall, psychologists
tended to welcome and support by new empirical testings Berlin & Kay's findings (among the others
Bronstein 1973a,b; Brown 1976, Collier 1976; Shepard 1992), while anthropologists raised doubts on
methodological issues (e.g. Hickerson, 1971; Durbin, 1972; Collier 1973; Conklin 1973), especially
those regarding the concept of "basic colour term" (Saunders 1995, 1997; Lucy 1996).
However, empirical and theoretical considerations raised by the following studies produced several
modification of the universal model originally proposed in 1969 (Kay and McDaniel, 1978; Kay and
Maffi, 1999, among the others) which, although not invalidating the hierarchy shown above, led to
different interpretations of the mechanisms governing the evolution of the appearance of colour terms.
One of the main issue concerned the fact that "the evolutionary sequence that views the development of
basic color-term lexicon [is] not [seen] as the successive encoding of foci, but as the successive
differentiation of previously existing basic color categories 4" (McDaniel, 1974, recalled in Kay and
McDaniel, 1978:640). Moreover "the Kay and McDaniel model emphasizes [...] the six primary colors
of opponent theory (black, white, red, yellow, green, blue)" (Kay and Maffi, 1999:745).
Although the colour naming debate is still active nowadays, in recent years the availability of new data
and the evolution of complex numerical algorithms have shown new ways of testing the hypothesis,
confirming the existence of universality in color naming systems (e.g. Baronchelli et al., 2010).

2. Colour terms and Multiword Expressions
In 1879, when the colour naming debate was just at the beginning, Allen replied to Gladstone's
opinions on the ambiguity of the use of colour terms in Homer's poems in these terms:

        "Mr. Gladstone tells us that they [the Homeric Greeks] could not have understood real
        colour by their apparent colour terms, because the words are used so loosely. Here,
2   Here "
green means green: there, it means fresh or young. [...] Do Englishmen never talk of
       green old age or Americans of green corn, which is really pale yellow? Is not red blood
       confronted with sangre azul and red wine with petit vin bleu? [...] In short, are not
       colour terms always vague, and are they not vaguer in the idealized language of poetry
       than anywhere else?" (Allen, 1879:267, cited in Berlin & Kay, 1969:137, italics added
       by the author).

Although Allen's point was just to argue that one can not abstract the concept of colours from their
figurative and sometimes very vague meanings (especially in poetry), all his examples mention colour
terms appearing in what nowadays are generally referred to as multiword expressions (hereafter
MWEs).
MWEs are phenomena of preeminent interest in phraseology and contemporary corpus linguistics.
They include a great variety of entities lying on a continuum between lexicon and syntax, whose
typical features include morpho-syntactic fixedness, semantic restrictions, semantic unpredictability,
non-grammatical constructions, conventionality and institutionalization. Their interpretation generally
crosses the boundaries between words (Sag et al., 2002) and one of the most useful definition, able to
comprehend a great number of phenomena, is that proposed by Calzolari, Fillmore et al. (2002:1934),
according to whom a MWE is "a sequence of words that acts as a single unit at some level of linguistic
analysis".
The phenomenon of MWEs has been long studied in the linguistic tradition because of its relevance to
every language. Despite their apparent anomalous behavior, MWEs are "an important and frequent
phenomenon in human language" (Ramisch et al., 2010:1), as Sinclair (1991) definitely attested by the
formulation of his famous idiom principle, which stated that idiomatic and morpho-syntactically
restricted constructions are as normal and natural in discourse as free combinations.
Apart from the great amount of theoretical works on MWEs developed within the major linguistic
frameworks throughout the twentieth century, in recent years the computational approach to this
phenomenon has become one of the dominant lines of research in this field. In fact, although none of
the features cited above appears as a necessary and sufficient condition to attest the presence of a
MWE, the components of this kind of entities exhibit a strong tendency to co-occur in texts more
frequently than they separately appear with other words. This led to the development of several statistic
approaches and association measures in order to identify, study and automatically extract MWEs from
texts (just to mention some: Evert and Krenn, 2001; Evert, 2004; Kilgarriff, 2006; Ramisch et al.,
2010).
The great amount of structured textual data, available nowadays in large corpora, allows researchers to
deepen the studies on MWEs in new testable and empirical ways. For example, starting from Allen's
considerations cited above, one can study the role of colour terms appearing as components of MWEs.
The present work, in fact, is intended to contribute to the colour naming debate from the phraseological
point of view that, as far as we know, has not been examined yet in the studies on this subject. Since
MWEs represent a very important linguistic phenomenon and colour terms can occur in such entities as
part of the lexicon, it is reasonable to establish if there are preferences in the choice of colours in the
creation of this kind of expressions.
The study exposed below is focused on the Italian language, although it has potentially relevant cross-
linguistic implications.

3. Data and Methodology
In our work we consider the Italian equivalents of the eleven basic colour terms of Berlin & Kay's
hierarchy (bianco, eng. white; nero, eng. black; rosso, eng. red; verde, eng. green; giallo, eng. yellow;
blu, eng. blue; marrone, eng. brown; viola, eng. purple; rosa, eng. pink; arancione, eng. orange; grigio,
eng. grey), plus the colour azzurro (eng. light blue) which in Italian, as well as in other non-Germanic
languages such as Russian or Turkish, is considered to be distinct from blue (Philip, 2003:12). In
addition violetto and arancio (two variants of purple and orange) are also taken into account for their
potential competitiveness with their synonymicous forms.
Our first reference is GRADIT (1999-2007), the most comprehensive lexicographic resource for the
Italian language. This dictionary includes about 130.000 MWEs that have been selected according to
one or more of the following criteria (De Mauro, 2005:88-89):

        - the existence of an unpredictable semantic addition to the meanings of the component words;
        - syntactic and/or lexical fixedness with respect to lexical or structural variations that would
result in the loss of the idiomaticity of the expression;
        - significant presence in a specialized language, where MWEs typically form terminology.

These criteria are able to include both typically figurative expressions generally called idioms and
expressions in which the components are interpreted according to one of their basic sense, with no
further unpredictable semantic addition (especially in the case of terminology). At the same time, this
definition do not consider more flexible expressions like collocations, since GRADIT has not been
developed as a combinatory dictionary.
We start by extracting from GRADIT all the nominal MWEs in which a colour term appears as an
adjective modifying the nominal head, obtaining a list of 943 entities (such as "scatola nera", eng.
black box).
In order to have a more complete resource to work on, we also consider MWEs from the Italian corpus
PAISÀ (2012). The PAISÀ corpus is a large resource for the Italian language, composed of ca. 380.000
documents and 250 million tokens. It collects different types of texts extracted automatically from the
web on the base of word pairs elicited from GRADIT and used as a seed list. Since it is morpho-
syntactically annotated, it allows for queries of combinations of part-of-speech categories.
In order to consider MWE candidates we use several scripts of the computational tool mwetoolkit
(Ramisch et al., 2010) to extract noun + adjective 5 combinations, using five statistical association
measures provided by the tool (maximum likelihood estimator, pointwise mutual information, log-
likelihood, Dice's coefficient, Student's t-score).
        Once we have sorted the candidates according to their scores, we consider only those containing
the colour terms we analyze as adjectives (thus appearing as the second component of the candidates)
and which appear among the top-500.000 candidates for each of the five association measures. Then
we filter out manually false positives that do not satisfy the requirements of our definition, as well as
MWEs also attested in GRADIT. At the end of this process we retrieve 99 new MWEs from PAISÀ,
such that our material reaches a total amount of 1042 entities. This set of over 1000 MWE types
provides wide coverage and can be considered a reliable and relatively complete set to represent our
phenomenon.

4. First results
Our analysis showed the following frequency order for the colour terms used in Italian nominal MWEs:
bianco (white, ~22,2%), nero (black, ~20,6%), rosso (red, ~18,5%), verde (green, ~10,3%), giallo
(yellow, 9,5%), grigio (grey, ~5,8%), azzurro (light blue, ~4,5%), blu (blue, ~3,7%), rosa (pink,
~3,4%), violetto (purple, ~1,1%), marrone (brown, ~0,3%), arancione (orange, ~0,2%), viola (purple,
0,1%), arancio (orange, 0%), as shown in details by Table 1.
It is evident that only for the five most frequent colours the universal hierarchy is reproduced in an
exact way, while for the remaining terms there is no evidence for a correspondence with the hierarchy6.
5   [noun + adjective] is the general structure for the unmarked noun phrase in Italian.
6   It is interesting to note that combining the occurrences of the MWE types including blu and azzurro in a unique set
    (labeled as blue), we retrieve the universal hierarchy order up to the sixth colour. However, since our study focuses on
Colour term                 # of MWEs         % of MWEs                                Examples
Bianco                             231                22,17     abete bianco, bandiera bianca, camice bianco
Nero                               215                20,63     caffè nero, cintura nera, lavoro nero
Rosso                              193                18,52     falco rosso, filo rosso, globulo rosso
Verde                              107                10,27     anni verdi, pollice verde, tavolo verde
Giallo                              99                 9,50     bocca gialla, fiamme gialle, melone giallo
Grigio                              60                 5,76     corpo grigio, lupo grigio, sostanza grigia
Azzurro                             47                 4,51     alga azzurra, pesce azzurro, telefono azzurro
Blu                                 38                 3,65     auto blu, fifa blu, sangue blu
Rosa                                35                 3,36     cronaca rosa, fiocco rosa, quarzo rosa
Violetto                             11                1,06     camaleonte violetto, tartufo violetto
Marrone                               3                0,29     cintura marrone, lemure marrone
Arancione                             2                0,19     bandiera arancione, contrassegno arancione
Viola                                 1                0,10     gallinella viola americana
Arancio                               0                0,00     /
Totale                            1042               100,00
  Table 1: Distribution of colour terms into Italian nominal MWEs.

                 Colour term                      PAISÀ             ItTenTen           Repubblica
                 Bianco                           27.914              324.273                51.512
                 Nero                             26.836              339.472                71.154
                 Rosso                            20.156              230.685                39.051
                 Blu                                8.046              90.023                16.938
                 Verde                              6.749             171.839                24.890
                 Giallo                             6.425              70.434                14.622
                 Azzurro                            6.392              86.214                18.555
                 Grigio                             3.780                2.399               10.258
                 Viola                              2.635              17.257                 2.373
                 Rosa                               1.797              74.392                10.876
                 Arancione                          1.681              12.103                 1.003
                 Marrone                           1.174*             11.547*                1.088*
                 Arancio                             268*               3.689*                   65*
                 Violetto                             181                3.121                   414
                    Table 2: Number of occurrences of basic colour terms (tagged as adjectives)
                    in three Italian corpora. The occurrences marked with an asterisk are just a
                    projection based on a set of 100 manually processed random samples. In
                    fact, “marrone” and “arancio” were tagged only as nouns in the three
                    corpora, although they obviously appear as adjectives as well.
  the salience of the colour terms used to refer to hues and not on the hues themselves, this result is not of preeminent
  interest.
This result could already be of some interest, showing that the most cognitively salient colours of
Berlin & Kay's experiment seem to be preferred also in the choice of constructing MWEs. In this way
perception seems to influence the speakers' choices in creating new expressions that become
particularly sedimented in language. However a consideration must be done about the fact that the
result shown above could also be due to the frequency distribution in the general use of colour terms in
Italian. Chiari (2012), on the base of empirical evidences, suggests that Zipf's law on the relation
between frequency and meanings of a word7 (Zipf, 1949) can be expanded to MWEs as well, in the
sense that there exists a proportional relation between the frequency of a word and the number of
MWEs it forms. Nevertheless, Chiari's empirical observation considers such relation only for the
nominal heads of MWEs, while in our work colour terms only appear as modifiers.
In order to shed light on such a question, the number of occurrences of every basic colour term in
several Italian corpora are considered. Apart from the PAISÀ corpus itself, the ItTenTen (2010) and the
Repubblica (2004) corpora were chosen to perform the check. The first of the two new resources is
built by processes of web crawling, comprises about 3.1 billion tokens and it is available inside the
Sketch Engine interface (Kilgarriff et al., 2004). The second corpus is a collection of articles from one
of the major Italian newspaper, and includes about 380 million tokens. Both corpora are morpho-
syntactically annotated so that queries with POS-categories are allowed.
We choose to search only for the occurrences of the basic colour terms classified as adjectives. The
results are shown in Table 2.
Table 3 shows the different orders for the basic colour terms based on the number of occurrences found
in each of the analyzed corpora. The adaptation of Zipf's law to MWEs seems more or less supported

                    MWE frequency order             PAISÀ            ItTenTen         Repubblica
                             Bianco                  Bianco            Nero               Nero
                               Nero                   Nero            Bianco            Bianco
                              Rosso                  Rosso             Rosso             Rosso
                              Verde                   Blu              Verde             Verde
                              Giallo                 Verde              Blu             Azzurro
                              Grigio                 Giallo           Azzurro             Blu
                             Azzurro                Azzurro            Rosa              Giallo
                               Blu                   Grigio            Giallo             Rosa
                               Rosa                   Viola            Viola             Grigio
                             Violetto                 Rosa          Arancione            Viola
                             Marrone               Arancione         Marrone            Marrone
                            Arancione               Marrone           Arancio          Arancione
                              Viola                 Arancio           Violetto          Violetto
                             Arancio                Violetto          Grigio            Arancio
                     Table 3: Comparison between the frequency order for basic colour terms
                     found from MWEs and in three Italian corpora.

7   Zipf's law about frequency/meanings relation states that words occurring with higher frequencies are more generic and
    thus have a higher number of meanings (senses) with respect to less frequent ones.
for the first three colours8; verde appears as the fourth term in two of the three corpora, according to the
MWE order, while the remaining terms do not show any significant correspondence.
On one hand, then, the order obtained from MWE type frequencies seems to be relevant independently
from the adapted Zipf's law only from giallo on; on the other hand the lower part of the chart doesn't
show a correspondence with the universal hierarchy. It is also possible that, for the more frequent
colour terms, both universals of perception and the Zipf's law act in order to make them appear in a
great number of MWEs, but at this point we are not able to discriminate between the two causes.

5. Colour terms and idioms
An interesting result, however, comes out if we take into account only those MWEs that can be
classified as idioms, that is when one of the components is used in a metaphorical or metonymical way
or when the global meaning includes an unpredictable semantic addition. In this way most of the
terminology of specialized languages is ruled out (e.g. "alga rossa", eng. red algae; "abete bianco",
eng. silver fir) because in these cases colours are mostly used just to denote the actual hue they refer to.
Such process of filtering only saves 295 of the original 1024 MWEs (see annex below), producing the
following distributions of colour terms: nero (~25.1%), bianco (20%), rosso (~16.6%), verde (~10.5%),
giallo (~7.1%), blu (~6.4%), azzurro (~5.4%), rosa (~5.1%), grigio (~2.7%), arancione (~0.7%),
marrone (~0.3%), arancio (0%), viola (0%), violetto (0%). The results are shown in details in Table 4.
It is possible to note two things: (i) this time the universal hierarchy is reproduced in an exact way,
except for marrone which seems to be shifted in the last group of the hierarchy; (ii) azzurro can also be
included in the last group.

    Colour term               # of MWEs          % of MWEs                                 Examples
    Nero                                74                25,08     aristocrazia nera, pecora nera, uomo nero
    Bianco                              59                20,00     acque bianche, calore bianco, morte bianca
    Rosso                               49                16,61     basco rosso, bollino rosso, croce rossa
    Verde                               31                10,51     carta verde, onda verde, numero verde
    Giallo                              21                 7,12     febbre gialla, pagine gialle, romanzo giallo
    Blu                                 19                 6,44     banana blu, caschi blu, colletto blu
    Azzurro                             16                 5,42     arma azzurra, parco azzurro, pesce azzurro
    Rosa                                15                 5,08     balletto rosa, foglio rosa, punto rosa
    Grigio                                8                2,71     corpo grigio, eminenza grigia, materia grigia
    Arancione                             2                0,68     bandiera arancione, contrassegno arancione
    Marrone                               1                0,34     cintura marrone
    Arancio                               0                0,00     /
    Viola                                 0                0,00     /
    Violetto                              0                0,00     /
    Totale                             295              100,00
      Table 4: Distribution of colour terms into Italian nominal idioms.

8     Although the order of bianco and nero seems unstable, Table 2 shows that their number of occurrences is very close in
      terms of magnitude. The occurrences for red, instead, appear quite well separated from the first two terms.
With regard to observation (i), the possibility of downgrading the brown colour term is not so
problematic since already Kay & McDaniel's modification of the original universal model underlined
the preeminence of only white, black, red, green, yellow and blue as fundamental categories in the
partition of the perceivable spectrum. On the other hand, conclusion (ii) is fully reasonable and in some
way expected: first, azzurro is a derivation from blue in the same way as pink derives from red, and one
expects the recourse to such terms, which are just subtle hues of more definite and fundamental
colours, after that these are already developed and available; secondly, and most important, azzurro can
be seen as one more basic term that a language like English has not developed yet: in this way, if the
English-based hierarchy is valid, it can not appear anywhere else but in the last group.
The choice of considering idioms only is grounded in studies related to the cognitive theory of
metaphor (e.g. Lakoff & Johnson, 1980), as the work by Casadei (1996) on the Italian language, which
leads back the interpretation of both idioms and metaphors to cognitive schemes and
physical/perceptual experiences. One of the underlying hypotheses in such works is that idioms (and
metaphors, in general) can be seen as the result of the speaker's need for expressing abstract concepts in
terms of concrete and perceivable elements that can be related to our senses. In our case, we can add
that idioms may represent one of the phenomena in language that reflect unsupervised and instinctive
links between cognitive schemes and linguistic production in order to express, in the case of colours,
more complex concepts via the sense of sight9.
Starting from these considerations we can suppose that some colours are more likely to be chosen when
expressing figurative meanings because of their cognitive salience in representing certain abstract states
or features10. At the same time, the different levels of success for basic colours to be institutionalized
into idioms can provide a proof of their different cognitive relevance regardless of the existence of any
explicit metaphor based on them (which Casadei, 1996:262, is not able to recognize apart from those
for white and black) or conventional figurative meanings.
In order to throw light on this point we consider the figurative meanings for each basic colour term that
are attested in GRADIT. Table 5 shows the number of meanings concerning colour adjectives that do
not refer to the primary denotative meaning indicating their prototypical hue 11. It is possible to see that
there is no correlation between the number of institutionalized conventional meanings for the colours
and the number of idioms produced. The case of blu is the most interesting: although there seems to be
no figurative meaning conventionally associated with this colour, it appears anyway in the sixth rank of
the chart in Table 4 with 19 idiomatic MWEs. In this sense, also the hypothesis that the number of
idioms including a specific colour can depend on how many conventional figurative meanings are
associated with the colour is falsified. Thus, the fact that the frequency order of the idiom types follows
Berlin & Kay's hierarchy can provide further support to the univeralistic hypothesis.

6. Conclusion and future works
This work has presented a new contribution to the colour naming debate from the phraseological
perspective. The analysis on the presence of basic colour terms in Italian nominal MWEs has shown
that colour terms are not equally distributed in this kind of expressions, and the frequency ordering of
idiomatic MWEs strongly reproduces the order of the universal hierarchy first proposed by Berlin &
9  In this sense it is useful to consider the example of foglio rosa (lit. pink sheet) which is an Italian certificate printed on
   pink paper that allows to practise driving a car before obtaining a driving license. The institutionalization of such
   expression shows how speakers tend to express the abstract meaning of the document with a reference to its colour: a
   feature directly connected to the sense of sight.
10 With regard to the cognitive status of the opposition of black and white Casadei (1996:264) points out Lakoff's opinion,
   according to which white is related to positive meanings and black to negative ones because of the intuitive experience
   that darkness (black) implies danger, while light (white) is connected to visibility and safety.
11 To be more explicit, in the case of bianco, for example, the meaning considered are: (i) pale, (ii) clear, (iii) covered by
   snow, (iv) typical of europoid races (in the case of skin), (v) clean, (vi) pure, (vii) blank, (viii) typical of a christian
   association, (ix) typical of antirivolutionary movements; while the excluded meaning is "of the colour of snow or milk".
Colour term          # of figurative meanings (adj.)
                               Nero                           12
                              Bianco                           9
                               Verde                           6
                              Grigio                           5
                              Rosso                            4
                              Giallo                           3
                            Arancione                          2
                             Azzurro                           2
                               Rosa                            1
                              Arancio                          0
                                Blu                            0
                             Marrone                           0
                               Viola                           0
                              Violetto                         0
                       Table 5: Number of figurative meanings attested in GRADIT
                       for the Italian basic colour terms.

Kay. In general we can attest that the distribution of colour terms in Italian idioms cannot be
exhaustively explained neither on the base of the Zipf's law on frequency and meanings adapted to
MWEs, nor by the consideration that a higher number of conventional figurative meanings for some
colour imply a higher number of idioms with the same colour. The fact that the order arising from the
distribution of colour terms into idioms follows Berlin & Kay's hierarchy can thus be an additional
proof to their conclusions.
The idea that colour terms appear into nominal idioms according to the universal hierarchy can also
suggest that perceptual preferences related to colours can be seen not only in an evolutionary or cross-
linguistic perspective, but also in the linguistic uses within the same language.
Finally this analysis has shown how also phraseology and corpus-based studies can shed new light on
the subject.
Future works on this line of research can include the extension of the set of MWEs to those including
basic colour terms as the nominal head (e.g. “azzurro cielo”, eng. sky blue) or to other grammatical
categories such as verbal or adverbial MWEs (e.g. "vedere rosso", eng. to see red; "essere al verde", lit.
"to be at green" meaning to be without any money; "di punto in bianco", lit. "of point in white"
meaning all at once). Moreover it is desirable to compare this kind of results in a cross-linguistic
perspective, especially between unrelated languages.

Annex – List of idioms by colour term
Nero: abito nero, acque nere, Africa nera, angelo nero, anima nera, aristocrazia nera, bandiera     nera,
basco nero, bestia nera, borsa nera, borsaro nero, brigate nere, buco nero, caffè nero, camicia     nera,
carne nera, cintura nera, continente nero, corpo nero, cravatta nera, cronaca nera, effluente       nero,
eversione nera, febbre nera, fiamme nere, fumarola nera, fumata nera, gabinetto nero, giacca        nera,
giornata nera, giovedì nero, goccia nera, guelfo nero, humor nero, irraggiamento nero, lavoro       nero,
libro nero, lista nera, luce nera, magia nera, maglia nera, male nero, maniera nera, mano           nera,
marciume nero, marea nera, mercato nero, messa nera, morbo nero, morte nera, musica nera, nobiltà
nera, numero nero, onda nera, oro nero, pane nero, papa nero, pecora nera, peste nera, pietra nera,
polvere nera, pozzo nero, punto nero, scatola nera, settembre nero, specchio nero, tavola nera, tavoletta
nera, testa nera, testina nera, umore nero, umorismo nero, uomo nero, vedova nera.

Bianco: acque bianche, albero bianco, arma bianca, arte bianca, bandiera bianca, bianca signora, caffè
bianco, camice bianco, cappello bianco, carne bianca, Casa Bianca, circo bianco, clown bianco, colletto
bianco, cravatta bianca, cronaca bianca, effluente bianco, elettrodomestici bianchi, fratello bianco,
frittura bianca, fumata bianca, globulo bianco, golpe bianco, infarto bianco, libro bianco, luce bianca,
lupara bianca, magia bianca, mal bianco, materia bianca, matrimonio bianco, monte bianco, morte
bianca, nana bianca, nota bianca, notte bianca, omicidio bianco, pan bianco, perdite bianche, pizza
bianca, risultato bianco, rumore bianco, scheda bianca, sciopero bianco, semestre bianco, serie bianca,
settimana bianca, sostanza bianca, strada bianca, striscia bianca, telefoni bianchi, terrore bianco, treno
bianco, tuta bianca, vedova bianca, voce bianca.

Rosso: ala rossa, armata rossa, bandiera rossa, basco rosso, biennio rosso, bollino rosso, brigate rosse,
brigatista rosso, calore rosso, camicia rossa, carne rossa, cartellino rosso, clausola rossa, code rosse,
croce rossa, debito rosso, disco rosso, febbre rossa, fiamme rosse, filo rosso, gamba rossa, gambe rosse,
gambi rossi, gigante rossa, giubba rossa, globulo rosso, guardia rossa, infarto rosso, khmer rosso,
libretto rosso, linea rossa, macchia rossa, mal rosso dei suini, mezzaluna rossa, nana rossa, nonna rossa,
numero rosso, papa rosso, partito rosso, passaporto rosso, perdite rosse, polpa rossa, punto rosso, serie
rossa, soccorso rosso, stella rossa, tappeto rosso, telefono rosso, toghe rosse.

Verde: anni verdi, archeoastronomia verde, balletto verde, basco verde, benzina verde, berretto verde,
biglietto verde, bollino verde, camicia verde, carta verde, croce verde, disco verde, fiamme verdi, libro
verde, lira verde, maggese verde, maglia verde, moneta verde, numero verde, onda verde, pasdaran
verde, pollice verde, polmone verde, potatura verde, raggio verde, regime verde, tappeto verde, tavolo
verde, treno verde, valuta verde, zona verde.

Giallo: bandiera gialla, bocca gialla, cartellino giallo, febbre gialla, fiamme gialle, fumata gialla,
maglia gialla, marciume giallo, morbo giallo, nana gialla, oro giallo, pagine gialle, pan giallo, pericolo
giallo, pioggia gialla, romanzo giallo, signorina gialla, sindacato giallo, stampa gialla, stella gialla,
striscia gialla.

Blu: auto blu, bambino blu, banana blu, basco blu, bollino blu, casco blu, colletto blu, fifa blu, gigante
blu, luna blu, macchia blu, morbo blu, parco blu, sangue blu, scettico blu, striscia blu, tuta blu, uomo
blu, zona blu.

Azzurro: arma azzurra, camicia azzurra, croce azzurra, fiamme azzurre, maglia azzurra, malattia
azzurra, morbo azzurro, nastro azzurro, parco azzurro, partito azzurro, pesce azzurro, pietra azzurra,
principe azzurro, sangue azzurro, signorina azzurra, telefono azzurro.

Rosa: balletto rosa, bollino rosa, cartolina rosa, colletto rosa, cronaca rosa, dente rosa di Mummery,
fiocco rosa, foglio rosa, maglia rosa, marciume rosa, punto rosa, quote rosa, romanzo rosa, salsa rosa,
telefono rosa.

Grigio: corpo grigio, eminenza grigia, lettera grigia, letteratura grigia, marciume grigio, materia grigia,
mercato grigio, sostanza grigia.
Arancione: bandiera arancione, contrassegno arancione.

Marrone: cintura marrone.

REFERENCES
ALLEN, G. (1879) - The Colour-Sense. London, Trubner and Company.
BARONCHELLI, A., GONG, T., PUGLISI, A. & LORETO, V. (2010) - Modeling the emergence of
      universality in color naming patterns. Proceedings of the National Academy of Sciences of the
      United States of America. 107(6):2403-2407.
BERLIN, B. & KAY, P. (1969) - Basic Color Terms: Their Universality and Evolution, University of
      California Press.
BORNSTEIN, M. H. (1973a) - The psychophysiological component of cultural difference in color
      naming and illusion susceptibility. Behavioral Science Notes. 1:41-101.
BORNSTEIN, M. H. (1973b) - Color vision and color naming: A Psychological hypothesis of cultural
      difference. Psychological Bulletin. 80:257-285.
BROWN, R. W. (1976) - Reference. Cognition. 4:125-153.
CALZOLARI, N., FILLMORE, C., GRISHMAN, R., IDE, N., LENCI, A., MACLEOD, C. & ZAMPOLLI, A.
      (2002) - Towards best practice for multiword expressions in computational lexicons.
      Proceedings of the 3rd International Conference on Language Resources and Evaluation
      (LREC 2002). Las Palmas, Canary Island. 1934-40.
CASADEI, F. (1996) - Metafore ed espressioni idiomatiche. Uno studio semantico sull'italiano. Roma,
      Bulzoni Editore.
CHIARI, I. (2012) - Collocazioni e polirematiche nel lessico musicale italiano. In R. Nikodinovska (ed.).
      "Lingua, letteratura e cultura italiana". Atti del convegno Internazionale 50 anni di studi
      italiani, Phylology Faculty "Blaze Koneski", Skopje. 165-190.
COLLIER, G. A. (1973) - Review of Basic Color Terms. Language. 49:245-248.
COLLIER, G. A. (1976) - Further evidence for universal color categories. Language. 52:884-890.
CONKLIN, H. C. (1973) - Color categorization: Review of Basic Color Terms, by Brent Berlin and Paul
      Kay. Language. 75:931-942.
CONKLIN, H. C. (1955) - Hanunóo Color Categories. Southwestern Journal of Anthropology. 11:339-
      344.
DE MAURO, T. (2005) - La fabbrica delle parole, UTET.
DURBIN, M. (1972) - Review of Basic Color Terms. Semiotica. 6:257-278.
EVERT, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. Stuttgart,
      University of Stuttgart.
EVERT, S. & KRENN, B. (2001) - Methods for the qualitative evaluation of lexical association
      measures. Proceedings of the 39th Annual Meeting of the Association for Computational
      Linguistics. Toulouse, France. 188-95 .
GEIGER, L. (1880) - Contributions to the History of the Development of the Human Race. London,
      Tubner and Company.
GLADSTONE, W. E. (1858) - Studies on Homer and the Homeric Age. London, Oxford University Press.
GRADIT (1999-2007) Grande Dizionario Italiano dell'Uso, a cura di T. De Mauro, UTET.
HICKERSON, N. (1971) - Review of Berlin and Kay (1969). International Journal of American
      Linguistics. 37:257-270.
HJEMSLEV, L. (1968) [1943] - I fondamenti della teoria del linguaggio. Introduction and translation by
      Giulio C. Lepschy. Torino, Einaudi. Originally published as "Omkring sprogteoriens
      grundlaeggelse", Copenhagen.
ITTENTEN (2010) - Italian Web Corpus available at Sketch Engine. www.sketchengine.co.uk.
KAY, P. & MAFFI, L. (1999) - Color Appearance and the Emergence and Evolution of Basic Color
        Lexicons. American Anthropologists. 101:743-760.
KAY, P. & MCDANIEL, C. K. (1978) - The linguistic significance of the meanings of basic color terms.
        Language. 54:610-646.
KILGARRIFF, A., RYCHLY, P., SMRZ, P. & TUGWELL, D. (2004) - The Sketch Engine. Proceedings of
        EURALEX 2004. Lorient, France. 105-116.
KILGARRIFF, A. (2006). Collocationality (and how to measure it). Proceedings of the 12th EURALEX
        International Congress. E. Corino, M. C. and C. Onesti. Torino, Edizioni dell’Orso,
        Alessandria. 997-1004.
LAKOFF, G. & JOHNSON, M. (1980) - Metaphors we live by. Chicago, The University of Chicago Press.
LUCY, J. A. (1996) - The scope of linguistic relativity. In J. J. Gumperz and S. C. Levinson (eds.).
        "Rethinking Linguistic Relativity". Cambridge University Press.
MAGNUS, H. (1880) - Untersuchungen über den Farbensinn der Nâturvölker. Jena, Fraher.
NIDA, E. A. (1959) - Principles of translation as exemplified by Bible translating. In Reuben A. Brower
        (ed.). "On Translation". Cambridge, Harvard University Press. pp. 11-31.
PAISÀ (2012) - Corpus dell'Italiano, realizzazione comune dell'Università di Bologna (S. Scalise, C.
        Borghetti), CNR Pisa (V. Pirrelli, A. Lenci, F. Dell'Orletta), Accademia Europea di Bolzano (A.
        Abel, C. Culy, H. Dittmann, V. Lyding), Università di Trento (M. Baroni, M. Brunello, S.
        Castagnoli, E. Stemle), www.corpusitaliano.it.
PHILIP, G. S. (2003) - Collocation and Connotation: A corpus-based investigation of Colour Words in
        English and Italian. Birmingham, University of Birmingham.
RAMISCH, C., VILLAVICENCIO, A. & BOITET, C. (2010) - mwetoolkit: a Framework for Multiword
        Expression Identification. Proceedings of the Seventh International Conference on Language
        Resources and Evaluation (LREC 2010), Valetta, Malta.
RAY, V. F. (1952) - Techniques and Problems in the Study of Human Color Perception. Southwestern
        Journal of Anthropology. 8:251-259.
RAY, V. F. (1953) - Human Color Perception and Behavioral Response. Transactions. New York
        Academy of Sciences (ser. 2). 16:98-104.
REPUBBLICA (2004) - Corpus dell'italiano. Descripted in M. Baroni, S. Bernardini, F. Comastri, L.
        Piccioni, A. Volpi, G. Aston, M. Mazzoleni (2004) - "Introducing the la Repubblica Corpus: A
        large, annotated, TEI(XML)-compliant corpus of newspaper Italian. .Proceedings of LREC
        2004.
SAG, I., BALDWIN, T., BOND, F., COPESTAKE, A. & FLICKINGER, D. (2002) - Multiword expressions: A
        pain in the neck for NLP. Proceedings of the 3rd CICLing (CICLing-2002), vol. 2276/2010 of
        LNCS, Mexico City, Mexico, 1-15.
SAPIR, E. (1921) - Language. New York, Harcourt, Brace.
SAUNDERS, B. (1995) - Disinterring Basic Color Terms: a study in the mystique of cognitivism. History
        of the Human Sciences. 8 (7):19–38.
SAUNDERS, B. (1997) - Are there non-trivial constraints on colour categorization? Behavioral and
        Brain Sciences. 20:167-228.
SHEPARD, R. (1992) - The perceptual organization of colors. In J. Barkow, L. Cosmides and J. Tooby
        (eds.). "The Adapted Mind". Oxford, Oxford University Press.
SINCLAIR, J. (1991) - Corpus, Concordance, Collocation. Oxford, Oxford University Press.
WHORF, B. L. (1956) [1940] - Science and Linguistics. In John B. Carroll (ed.). "Language, Though
        and Reality: The Collected Papers of Benjamin Lee Whorf". Cambridge, Massachussetts: MIT
        Press. Originally published in Technology Review. 42:229-231, 247-248.
ZIPF, G. K. (1949) - Human Behavior and the Principle of Least Effort. Addison-Weasly Press.
You can also read