COMPARING SIMILARITY OF WORDS BASED ON PSYCHOSEMANTIC EXPERIMENT AND RUWORDNET

Page created by Janet Sparks
 
CONTINUE READING
Comparing Similarity of Words Based
                on Psychosemantic Experiment and RuWordNet

                 Valery Solovyev                          Natalia Loukachevitch
              Kazan Federal University,              Lomonosov Moscow State University,
                  Kazan, Russia,                             Moscow, Russia,
            maki.solovyev@mail.ru                         louk_nat@mail.ru

                                                     mantic associations correspond to co-occurrence
                    Abstract                         (syntagmatic relations) in texts.
                                                        To study automatic methods of word similari-
    In the paper we compare the structure of         ty calculation, specialized datasets are created.
    the Russian language thesaurus RuWord-           Some researchers try to create datasets distin-
    Net with the data of a psychosemantic ex-        guishing different subtypes of semantic similari-
    periment to identify semantically close          ty of words, which requires additional efforts and
    words. The aim of the study is to find out       guidelines. Agirre et al. (2009) subdivided the
    to what extent the structure of RuWordNet        existing semantic word dataset WordSim353
    corresponds to the intuitive ideas of native     (Finkelstein et al., 2002) into two subsets:
    speakers about the semantic proximity of         WordSim353-similarity        and    WordSim353-
    words. The respondents were asked to list        relatedness datasets. SimLex-999 guidelines
    synonyms to a given word. As a result of         (Hill et al., 2015) aim to distinguish word pairs
    the experiment, we found that the re-            in taxonomical semantic similarity relation (syn-
    spondents mainly mentioned not only              onyms, hypernyms, hy ponyms) from remaining
    synonyms but words that are in paradig-          types of relations (antonymy, co-hyponyms).
    matic relations with the stimuli. The            The authors of WIN353 dataset (Kliegr and
    words of the mental sphere were chosen           Zamazal, 2018) ask respondents about word sim-
    for the experiment. In 95% of cases, the         ilarity based on word interchangeability in sen-
    words characterized in the experiment as         tences.
    semantically close were also close accord-          We can also try to use human scores of word
    ing to the thesaurus. In other cases, addi-      semantic similarity to assess the quality of de-
    tions to the thesaurus were proposed.            scriptions in electronic lexical-semantic re-
                                                     sources (thesauri). Such resources are built on
1    Introduction                                    the basis of synsets – sets of synonyms – linked
                                                     by semantic relations such as hyponymy, hyper-
Semantic proximity of words is an important pa-      nymy, antonymy, and some others. The automat-
rameter required in various tasks of natural lan-    ic use of thesauri requires high quality descrip-
guage processing. It can be estimated in different   tions of semantic senses and semantic relations
ways: by corpus, using distributional methods        between them.
(Mikolov 2013, Bojanowski et al., 2017), expert         In this paper, we compare the results of a sur-
assessments; from psychosemantic experiments;        vey of respondents and the similarity of words
using thesauri such as WordNet (Fellbaum,            according to the RuWordNet thesaurus (Louka-
1998).                                               chevitch et al., 2018) for the Russian language.
   The general concept of semantic similarity can    Currently, the published RuWordNet version
be subdivided to paradigmatic (taxonomical)          comprises about 110 thousand Russian words
similarity and semantic associations (Agirre et      and expressions. A new version of RuWordNet is
al., 2009; Hill et al., 2015; Kliegr and Zamazal,    being prepared and RuWordNet data are tested
2018; Majewska et al., 2020). Paradigmatic simi-     from different points of view.
larity can be defined in terms of shared superor-       In the psychosemantic experiment the re-
dinate category or shared semantic features. Se-     spondents were asked to list synonyms for stimu-
li words without any guidelines. We found that       text corpus, words are searched for which 20
their answers mainly contain paradigmatically        words closest in the corpus (based on the stand-
similar words, practically without words related     ard method for evaluating the semantic similarity
via any other similarity relationships, which        of words) are located far from each other in the
makes it possible to check the taxonomic struc-      thesaurus. The distance between words in the
ture of the thesaurus.                               thesaurus is the length of the shortest path be-
   The paper is structured as follows. Section 2     tween them in the graph of semantic relations.
provides information on related work. Section 3      For found words with such properties, the rea-
describes a psychosemantic experiment to de-         sons for such discrepancy are analyzed. The
termine the semantic proximity of words. Section     analysis of the data presented in (Loukachevitch,
4 analyzes the data obtained. Section 5 discusses    2019) was continued in (Bayrasheva, 2019).
the results of the experiment.                          In work (Soloviev et al., 2020), RuWordNet
                                                     synsets were compared with synonymous sets
2     Related work                                   according to published 10 dictionaries of Russian
                                                     synonyms. The work (Erofeeva et al., 2020) pre-
The paper concerns two directions of studies:
                                                     sents the results of an experiment in which the
revision and updating existing lexical-semantic
                                                     respondents were asked to list synonyms for a
resources for natural language processing (the-
                                                     given word. The results are compared with the
sauri) and studies on relation types exploited by
                                                     RuWordNet synsets. Usmanova et al. (2020)
native speakers in word association experiments.
                                                     analyzed pairs of quasi-synonyms and the dis-
                                                     tance between them in RuWordNet. It was ex-
2.1    Revision of Existing Lexical Semantic         pected that quasi-synonyms, as semantically
       Resources                                     close words, should be located at a short distance
                                                     in the thesaurus.
Procedures for revising and verifying resources         The general result of above-mentioned studies
are important for the developers of WordNet-like     of RuWordNet is as follows. RuWordNet data,
resources. Some ontological tools have been          including the composition of synsets and the
proposed to check the consistency of relation-       structure of semantic relations, correlate well
ships in WordNet (Guarino and Welty, 2004;           with all the other considered sources of infor-
Alvez et al., 2018). Rambousek et al. (2018) con-    mation about semantically close words. At the
sidered a crowdsourcing tool allowing a user of      same time, a number of gaps in RuWordNet
the Czech wordnet to report errors. Users may        were identified, taking into account of which al-
propose an update of any data value. These sug-      lows improving descriptions in the thesaurus.
gestions can be approved or rejected by editors.     This article continues research in this direction.
Visualization tools can also help to find prob-
lems in wordnets (Piasecki et al. 2013; Jo-          2.2   Lexical relations in associations experi-
hannsen et al. 2011). Cristea et al. (2004) and            ments
Rudnicka et al. (2012) reported on the revision of
                                                     One of the most known associative experiments
mistakes and inconsistencies in their wordnets in
                                                     for Russian was organized by Karaulov in 1986-
the process of linking the wordnet and the Eng-
                                                     1997 (Karaulov et al., 2002). In experiments
lish WordNet.
                                                     such demographic information such as age, gen-
    McCrae et al. (2019) discussed a new project:
                                                     der, specialization, and location was also consid-
Open-Source WordNet for English, which is
                                                     ered and recorded. Currently, these data are con-
based on the Princeton WordNet. This project
                                                     sidered as outdated.
has already fixed errors found in the current ver-
                                                        Many researchers classify word associations
sion of WordNet, including spelling mistakes in
                                                     into syntagmatic and paradigmatic relations
definitions and examples. Some problematic is-
                                                     (Fitzpatrick, 2006). The researchers study the
sues were reported (for example, synset dupli-
                                                     structure of associations for language learners
cates, missed or incorrect relationships) for fur-
                                                     (Fitzpatrick, 2006), patients (Arias-Trejo et al.,
ther revision.
                                                     2018), children (Wojcik and Kandhadai, 2019),
   Recently, verification and enrichment methods
                                                     and other social groups.
have been systematically developed for the Ru-
                                                        Vylomova et al. (2018) study types of rela-
WordNet thesaurus. In (Loukachevitch, 2019),
                                                     tions in associative responses of Russian native
the following method for enriching the Ru-
                                                     speakers in dependence on socio-demographic
WordNet thesaurus was proposed. For a large
                                                     characteristics. They organized associative ex-
periments in various Russian regions, including         tion) is listed in the 5th place, and for non-
Siberia and the Urals. The age of participants          philologists, fantasia (fantasy) is in the 4th posi-
ranged from 16 to 26, most of them were univer-         tion. It seems that the figurative thinking of phi-
sity students of approximately 50 specialties. In       lologists, the reading and study of fiction, which
their analysis, Vylomova et al. (2018) classified       form their linguistic personalities, are reflected in
the lexical relations in associations to syntagmat-     the results of the experiment: for them, the word
ic (they calculated word co-occurrences in a text       mechta (dream) is associated with fantasy and
corpus) and paradigmatic (according to Ru-              dreams, that is, with something unreal, ephemer-
WordNet thesaurus). The authors found that              al. Not-philologists are more pragmatic: the third
men more frequently list paradigmatic associa-          position in their lists is occupied by the word
tions whereas women are more likely to produce          stremlenie (aspiration), in the semantics of which
syntagmatic associations. It was also revealed that     the presentation of concrete results is conveyed
most students of technical specializations and          (Erofeeva et al., 2020).
natural sciences demonstrate high scores for par-          Further we will write Cyrillic Russian words
adigmatic association types.                            in Latin transcription.
   Sinopalnikova (2004) studies approaches to               Since the respondents, naturally, did not use
extract useful lexical relations from existing          the criteria of synonymy, such as interchangea-
word association thesauri to assist in developing       bility in different contexts and did not have much
new wordnets.                                           time to complete the task, they suggested words
                                                        that have something semantically in common
3    Experiment Setting                                 with the given word, but not necessarily syno-
                                                        nyms in the strict sense of the term. For example,
In the current study we present the results of a
                                                        for the word mechta (dream as imaginative
psychosemantic experiment, carried out in ac-
                                                        thoughts), the following words were listed as
cordance with the methodology described in
                                                        synonyms in RuWordNet: gresa, mechtaniye,
(Petrenko, 2010). The experiment reveals seman-
                                                        fantasia (fantasy). The respondents most often
tically close words (synonyms) as seen by native
                                                        indicated the following words: zhelaniye (desire),
speakers. In (Erofeeva et al., 2020) only syno-
                                                        tsel’ (goal), fantasia (fantasy), gresa, stremleniye
nyms from the RuWordNet synsets were consid-
                                                        (aspiration), nadezhda (hope). Only two of them
ered, in this work all semantic relations are in-
                                                        are synonymous. The rest of the words –
volved.
                                                        zhelaniye (desire), tsel’ (goal), stremleniye (aspi-
   The experiment is as follows. The respondents
                                                        ration), nadezhda (hope) – at first glance may
(Russian native speakers) receive a number of
                                                        seem like associations with the given word
words, and they have to list synonyms for these
                                                        dream. However, this assumption is not true.
words in a limited time. The respondents are stu-
                                                           In the Karaulov’s dictionary of Russian asso-
dents (18-23 years old, 200 people) of Kazan
                                                        ciations (Karaulov et al., 2002), the word mechta
Federal University (Kazan, Russia). About half
                                                        (dream) has the following most frequent associa-
of the students are philologists, the second half
                                                        tions: goluboy (blue), zhizn’ (life), moya (mine),
are non-philological students. The definition of a
                                                        sbylas’ (come true), idiota (idiot), nesbytochnaya
synonym is not explained to the respondents; we
                                                        (unrealizable), rozovaya (pink). The words
rely on intuitive understanding of synonyms by
                                                        zhelaniye (desire), stremleniye (aspiration),
native speakers. For the experiment, words relat-
                                                        nadezda (hope) are not mentioned as associations
ed to the mental sphere are selected. This seman-
                                                        at all, and the word tsel’ (goal) is mentioned only
tic area is the most difficult for clear differentia-
                                                        once in 101 responses. We can see that in fact
tion of synonym sets and their semantic relations.
                                                        words having syntagmatic relations with the
   The results of philologists and non-
                                                        original one are also mentioned as associations
philologists differ insignificantly. However, it is
                                                        by respondents. In the current experiment, the
worth noting that synonyms for word мечта
                                                        respondents indicated words that were not in
(mechta – dream as imaginative thoughts) listed
                                                        syntagmatic but in paradigmatic relations with
by philologists and non-philologists have inter-
                                                        the stimulus. Rather, they can be characterized as
esting distinctions. So, for philologists, the word
                                                        belonging to the semantic field of the original
фантазия (fantasia – fantasy) is in 3rd place,
                                                        word or as its analogues.
and for non-philologists, the word стремление
                                                           It is worth noting that in the dictionary
(stremleniye – aspiration) is in the 3rd position.
                                                        (Apresian, 2004) the words namereniye (inten-
Conversely, for philologists, stremlenie (aspira-
                                                        tion) and mysl’ (thought) are considered as ana-
logues (near-synonyms) of the word mechta                 The words veseliye (fun), likovaniye (exulta-
(dream) (its synonyms are not given in the dic-        tion), udovol'stviye (pleasure) are hyponyms in
tionary). For the verb mechtat’ (to dream), the        relation to radost’ (joy). The words vostorg (de-
synonyms, according to (Apresyan, 2004), are           light) and schast'ye (happiness) are co-hyponyms
khotet’ (to want), zhelat’ (to desire), and the ana-   with radost’ (joy) with a common hypernym –
logue is the word nadeyat’sya (to hope). Thus,         dushevnoye perezhivaniye (emotional experi-
the words indicated by the respondents are close       ence). But between the words radost’ (joy) and
in meaning to the word mechta (dream). Our ex-         ulybka (smile) there is only a very long way: ra-
periment can be characterized as aimed at identi-      dost’ (joy) – dushevnoye perezhivaniye (emo-
fying paradigmatic associations, while the             tional experience) – mental'nyy ob"yekt (mental
Karaulov's dictionary (Karaulov et al., 2002) in       object) – abstraktnaya sushchnost' (abstract enti-
fact mixes paradygmatic and syntagmatic associ-        ty) – kachestvo (quality) – vneshnost’ (appear-
ations.                                                ance) – vyrazheniye litsa (facial expression) –
                                                       ulybka (smile). Such a long path reflects the fact
4    Analysis of Results                               that in RuWordNet the word ulybka (smile) is
                                                       interpreted only as a facial expression and, ac-
In this work, the associations for the words obida
                                                       cordingly, radost’ (joy) and ulybka (smile) in
(offense, as a feeling caused being offended),
                                                       RuWordNet refer to different spheres –- the
radost’ (joy), talant (talent), strast’ (passion),
                                                       mental world and the physical.
lyubov’ (love), mysl’ (thought), vostorg (delight)
                                                          Princeton WordNet presents the point of view
are considered. For each stimulus word, six most
                                                       that a person smiles to communicate something
frequently mentioned responses are studied.
                                                       to others about his condition (to change one's
                                                       facial expression by spreading the lips, often to
   Обида (offense feeling). The informants most
                                                       signal pleasure1) and thus it is classified as com-
often indicated the words: ogorcheniye (grief),
                                                       munication. Still, it should be noted that a person
dosada (annoyance), bol’ (pain), grust’ (sad-
                                                       can smile at own thoughts, pleasant memories
ness), razocharovaniye (disappointment), zlost’
                                                       while alone with yourself, i.e. a smile is also pos-
(anger). The first of them is interpreted in Ru-
                                                       sible outside the communication situation. As we
WordNet as a hypernym for obida. The word
                                                       can see, the situation here is very difficult. Ac-
grust’ (sadness) in RuWordNet also has a direct
                                                       cording to the Russian explanatory dictionary
connection with obida – it is a hypernym-
                                                       (Ozhegov and Shvedova, 1997), ulybka (smile)
hypernym for obida. Dosada (annoyance) is a
                                                       has the following definition: “mimic movement
co-hyponym for obida, having the common hy-
                                                       of the face, lips, eyes, showing disposition to
pernym nedovol'stvo (discontent).
                                                       laughter, expressing pleasure or ridicule and oth-
   There is also a short path between the words
                                                       er feelings (translation from Russian)”. This def-
obida and razocharovaniye (disappointment):
                                                       inition takes into account both facial expressions
obida (offense) – nedovol'stvo (discontent) –
                                                       and communicative intentions.
dushevnoye perezhivaniye (emotional experi-
                                                          It is possible to take into account the intuition
ence) – razocharovaniye (disappointment). There
                                                       of native speakers and the dual nature of a smile
is a similar path between the words obida (of-
                                                       by making certain changes to the thesaurus. It
fense) and razocharovaniye (disappointment):
                                                       can be described with the entailment relationship
offense – discontent – emotional experience –
                                                       between concepts ulybat’sya (to smile) and ra-
disappoitment. Finally, the path between the
                                                       dovat’sya (to joy). If a person smiles, then usual-
words bol’ (pain) and obida is only slightly long-
                                                       ly this person is really happy, or at least seeks to
er: pain – suffering – emotional experience – dis-
                                                       show happiness to others. Conversely, if a person
content – offense. Semantic distances of 4 steps
                                                       is really happy about something, then this mani-
or less are treated in (Loukachevitch, 2019) as
                                                       fests itself in a smile.
short. All semantic relations are hypo-
hypernymic.
                                                         Talant (talent). For this word, the respond-
                                                       ents indicate the following synonymous words:
   Radost’ (joy). For this word, respondents in-
                                                       sposobnost’ (ability), dar (gift), umeniye (skill),
dicate the following word associations: schast'ye
                                                       darovanie, odarenost’ (giftedness), talent, geniy
(happiness), vostorg (delight), vesel’ye (fun),
                                                       (genius). In RuWordNet darovaniye (giftedness),
ulybka     (smile),    likovaniye   (exultation),
udovol'stviye (pleasure).                              1
                                                           http://wordnetweb.princeton.edu/perl/webwn
дар (gift), are listed as synonyms to the word          (sympathy), nezhnost’(tenderness), vlecheniye
talant (talent). Sposobnost’ (ability) is a hyper-      (attraction).
nym for prirodnaya sposobnost' (natural ability),          We saw above that lyubov’ (love) is at a dis-
which is a hypernym for talant (talent). Umeniye        tance of 3 from strast’ (passion) and 2 from
(skill) is a co-hyponym with talant (talent)            vlecheniye (attraction). Privyazannost’ (attach-
through the general hypernym prirodnaya                 ment)) is a hypernym for lyubov’ (love). The
sposobnost' (natural ability). Odarennost’ (gift-       words lyubov’ (love) and vlyublennost’ (falling
edness) is a hyponym in relation to sposobnost’         in love) are co-hyponyms with a common hyper-
(ability), i.e. is at distance 3 from the word talant   nym dushevnoye perezhivaniye (emotional expe-
(talent). The word geniy (genius) is a hyponym          rience). The word sympatia (sympathy) is a co-
to odarennost’ (giftedness), at a distance 4 from       hyponym with word vlyublennost’ (falling in
the word talant (talent).                               love) through a hypernym lichnostnyye
                                                        otnosheniya (personal relationships). Thus, be-
   Strast’ (passion). For the word strast’ (pas-        tween the words sympatia (sympathy) and lyu-
sion), the respondents indicate the synonymous          bov’ (love) there is a distance of length 4. But
words: zhelaniye (desire), vlecheniye (attraction),     nezhnost’ (tenderness) is interpreted in Ru-
любовь (love), uvlecheniye (infatuation), pokhot’       WordNet only as a character trait (two other
(lust), интерес (interest). In RuWordNet,               senses: nezhnost’ 1 (soft, gentle to the touch),
uvlecheniye (infatuation) is a hypernym for             nezhnost’ 3 (fragile, too weak) are not here dis-
strast’ (passion). The word interes (interest) is a     cussed as irrelevant), and not as mental experi-
hypernym of the hypernym for the word                   ence and there is no close way between them.
strast’(passion). The words vlecheniye (attrac-            In fact, in RuWordNet one of the senses of the
tion), zhelanie (desire), lyubov’ (love) are co-        word nezhnost’ (tenderness) is missing. In the
hyponyms with the word uvlecheniye (infatua-            dictionary (Ozhegov and Shvedova, 1997), nezh-
tion) with a common hypernym, dushevnoye pe-            nost’ (tenderness) refers to the word nezhnyi
rezhivaniye (emotional experience), i.e. are at a       (tender), which is interpreted (in this sense) as
distance of 3 from strast’ (passion). The word          “affectionate, full of love: tender feelings”. Ac-
pokhot’ (lust) is a hyponym in relation to vlech-       cording to the dictionary (Apresian, 2004)
eniye (attraction), i.e. is at a distance of 4 from     "nezhnyi (tender) – showing a feeling of love or
the initial word strast’ (passion).                     affection in communication with a person."
   The scheme of semantic relations in this group          Thus, in the interpretation of this word, the
of words can be represented in Fig. 1. This is a        word lyubov’ (love) invariably appears, indicat-
typical scheme for the student answers in the ex-       ing the correctness of the students' assessment.
periment. The arrows show links from hyper-             Therefore, it is recommended to add a new sense
nyms to hyponyms.                                       of the word nezhnost’ (tenderness) in RuWord-
                                                        Net, in accordance with the above-mentioned
   Lyubov’ (love). For this word, the respondents       dictionary definitions.
indicated such words as privyazannost’ (attach-
ment), vlyublennost’(falling in love), sympatia

        Fig1. Scheme of semantic relations of words-reactions to the stimulus strast’ (passion)
Vostorg (delight). Respondents indicate the        is necessary to make certain changes in the the-
following words: radost’ (joy), voskhishcheniye       saurus to obtain smaller distance for semantically
(admiration), udivleniye (surprise), schast'ye        close words. These pairs of words are as follows:
(happiness), likovaniye (jubilation), voo-            lyubov’ (love) – nezhnost’ (tenderness) and ra-
dushevleniye (inspiration).                           dost’ (joy) – ulybka (smile). In the first case, it is
   In RuWorNet, vostorg (delight) and                 proposed to add a new sense of the word nez-
voskhishcheniye (admiration) are synonyms,            nost’ (tenderness) to the thesaurus and to estab-
schast'ye (happiness), udivleniye (surprise) and      lish the necessary additional relation of hypon-
radost’ (joy) are co-hyponyms with vostorg (de-       ymy, in the second case we suggest to add the
light) with the common hypernym dushevnoye            relation of entailment.
perezhivaniye (emotional experience). Word li-           Thus, most words frequently mentioned by re-
kovaniye (jubilation), as noted above, is a hypo-     spondents are located close to the stimulus word
nym in relation to radost’ (joy), i.e. is at a dis-   in RuWordNet, which indicates good consistency
tance of 3 from vostorg (delight). Voo-               of the thesaurus with the intuition of native
dushevleniye (inspiration) is a co-hypernym with      speakers. At the same time, taking into account
the word radost’ (joy) with the common hypo-          the data of a psychosemantic experiment makes
nym euphoria, i.e. is at a distance of 4 from vos-    it possible to identify some problem areas in the
torg (delight).                                       thesaurus. Let us consider whether there is a cor-
                                                      relation between the frequency with which the
   Mysl’ (thought). Respondents indicate the fol-     response word is chosen by the respondents and
lowing words: ideya (idea), duma (thought),           the distance in the thesaurus from the stimulus
mneniye (opinion), dogadka (guess), soobra-           word to the response word. We sort words-
zheniye (consideration), suzhdeniye (judgment).       reactions according to the frequency of their
   Words soobrazheniye (consideration) and du-        mention.
ma (thought) are synonymous with mysl’                   Table 1 summarizes the data, sorted by the
(thought). Ideya (idea) is a hyponym for mysl’        frequency of the words in the respondents' an-
(thought), suzhdeniye (judgment) is a hypernym        swers. The asterisk indicates the distances that
for mysl’ (thought). Mneniye (opinion) is a co-       will take place after the implementation of the
hyponym with mysl’ (thought) via common hy-           above-mentioned suggestions for improving the
pernym suzhdeniye (judgment).                         thesaurus structure. We can see that the words
   Between the words mysl’ (thought) and              mentioned more often are at a shorter distance
dogadka (guess) there is a path of length 4: mysl’    from the stimulus word in the thesaurus, which is
(thought) – suzhdeniye (judgment) – mneniye           also a good confirmation of the correct structure
(opinion) – dopushcheniye (assumption) –              of the thesaurus and the adequacy of the experi-
dogadka (guess).                                      ment.

5    Discussion                                       6    Conclusion
We analyzed 40 word pairs (out of a total of 7x6      Thesauri are created by professionals who rely
= 42 pairs, two pairs were repeated). In 38 cases     on both the theory of language and their ideas
(95%), word pairs listed by the respondents as        about the semantics of linguistic units. However,
synonyms are also close according to the thesau-      semantics are not described in the literature in as
rus descriptions: 13 pairs are at a distance of 1;    much detail as required by the thesaurus devel-
12 pairs are at a distance of 2; 7 pairs are at a     opers. Taking this into account, it is of natural
distance of 3; 6 pairs are at a distance of 4. The    interest to compare thesaurus data with the lin-
number of mentioned words located at a certain        guistic intuition of native speakers, manifested in
path distance in the thesaurus decreases mono-        psychosemantic experiments.
tonically with increasing distance. In all these
cases, it turned out to be sufficient to consider
only hypo-hypernymic relations. In two cases, it
Stimulus word         Obida       Radost’   Talant         Strast’       Lyubov’     Mysl’       Vostorg      Average
                      (offense)   (joy)     (talent)       (passion)     (love)      (thought)   (delight)
                 st
Words in the 1 places of the respondents’associations
Frequency (%)         24          49        55.5           43.5          26          61.5        48.5         40.3
Relation dist.        1           2         2              3             1           1           2            1.7
Words in the 2nd places of the respondents’associations
Frequency (%)         19          34.5      44.5           28.5          24.5        28.5        39.5         31.3
Relation dist.        3           2         1              3             4           1           1            2.1
                 d
Words in the 3 places of the respondents’associations
Frequency (%)         16          32        30             17.5          22          12.5        29.5         22.7
Relation dist.        2           1         2              3             2           2           2            2.0
Words in the 4th places of the respondents’associations
Frequency (%)         14          12        14             12.5          20.5        11          13.5         13.9
Relation dist.        4           1         3              1             3           1           2            2.1
                 th
Words in the 5 places of the respondents’associations
Frequency (%)         13.5        10.5      13             11.5          18          10.5        12.5         12.7
Relation dist.        3           3*        4              2             2           1           3            2.6
Words in the 6th places of the respondents’associations
Frequency (%)         13          7         11.5           9.5           14.5        9           8.5          10.4
Relation dist.        2           1         1              4             2*          4           4            2.6

   Table 1. Positions, frequencies (percentage of answers) and RuWordNet distances of word associa-
tions. The sign *) means distances after the suggested corrections.

   Most words frequently mentioned by respond-                       and relatedness using distributional and WordNet-
ents or synonyms with the stimulus word or are                       based approaches. In Proceedings of NAACL-HLT:
located close to it in RuWordNet, which indi-                        19–27.
cates good consistency of the thesaurus with the                 Alvez, J., Gonzalez-Dios, I., and Rigau, G. 2018.
intuition of native speakers. This confirms the                    Cross-checking WordNet and SUMO using mer-
high quality of the RuWordNet thesaurus. The                       onymy. 2018. Proceedings of the Eleventh Interna-
experimental results also support the choice of                    tional Conference on Language Resources and
distance 4 as a measure of the semantic proximi-                   Evaluation (LREC-2018).
ty of words in the thesaurus. At the same time,                  Apresian, I.D. New explanatory dictionary of Russian
taking into account the experimental data made it                  synonyms. Moscow, Vena: Iazyki slavianskoi
possible to identify some problem areas in the                     kul'tury, 2004. (in Russian)
thesaurus.                                                       Arias-Trejo, N., Minto-García, A., Luna-Umanzor, D.
                                                                   I., Ríos-Ponce, A. E., Mariana, B. P., and Bel-
Acknowledgments                                                    Enguix, G. 2018. Word-word relations in Dementia
This research was financially supported by                          and typical aging. In Proceedings of the first inter-
RFBR, grants № 18-00-01238 and № 18-00-                            national workshop on language cognition and
01226 as parts of complex project № 18-00-                         computational models:85-93.
01240 (K).                                                       Bayrasheva, V. 2019. Corpus-based vs thesaurus-
                                                                   based word similarities: expert verification. The
References                                                         XXth International Scientific Conference “Cogni-
                                                                   tive Modeling in Linguistics”. Proceedings:56-63.
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pas-
  ca, M.,and Soroa, A. 2009. A study on similarity
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.        Spatial Multi-Arrangement for Clustering and Mul-
  2017. Enriching word vectors with subword infor-            ti-way Similarity Dataset Construction. In Pro-
  ma-tion.Transactions of the ACL, 5:135–146.                 ceedings of The 12th Language Resources and
                                                              Evaluation Conference: 5749-5758.
Cristea, D., Mihaila, C., Forascu, Trandabat, D., Hu-
   sarciuc, M., Haja, G., and Postolache, O. 2004.          McCrae, J., Rademaker, A., Bond, F., Rudnicka, E.,
   Mapping Princeton WordNet synsets onto Roma-               and Fellbaum, Ch. 2019. English WordNet. An
   nian WordNet synsets. Romanian Journal of In-              Open Source WordNet for English. In Proceedings
   formation Science and Technology, 7(1-2):125-              of Global WordNet Conference GWC-2019.
   145.
                                                            Mikolov, T., Chen, K., Corrado, G. S., and Dean, J.
Erofeeva, I., Solovyev, V., and Bayrasheva, V. 2020.          (2013). Efficient estimation of word representa-
  Psychosemantic experiment as a tool for objectify-          tions in vectorspace.CoRR, abs/1301.3781.
  ing data on the ways of representing synonymy in
                                                            Ozhegov, S.I. and Shvedova, N.Yu. 1997. Explanato-
  modern Russian language. Bulletin of the Volgo-
                                                              ry Dictionary of the Russian Language. Russian
  grad State University. Series 2: Linguistics. 2020. -
                                                              Academy of Sciences. Institute of the Russian
  T. 19, No. 1:178–194 (in Russian).
                                                              Language named after V.V. Vinogradov. - 4th ed.,
Fellbaum, Ch. 1998. WordNet: An electronic lexical            Add. - M .: Azbukovnik,. (in Russian).
   database. MIT Press. 1998.
                                                            Petrenko, V.F. 2010. Fundamentals of psychoseman-
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E.,     tics. M .: Eksmo. (in Russian)
   Solan,Z., Wolfman, G., and Ruppin, E. 2002. Plac-
                                                            Piasecki, M., Marcińczuk, M., and Maziarz, M. 2013.
   ing searchin context: The concept revisited. ACM
                                                               WordNetLoom: a WordNet development system
   Transactions on Information Systems, 20(1):116–
                                                               integrating form-based and graph-based perspec-
   131.
                                                               tives. International Journal of Data Mining, Mod-
Fitzpatrick, Tess. 2006. "Habits and rabbits: Word             elling and Management, 5(3) :210-232.
   associations and the L2 lexicon." EuroSLA Year-
                                                            Rudnicka, E., Maziarz, M., Piasecki, M., and Szpa-
   book 6.1 (2006):121-145.
                                                              kowicz, S. 2012. A strategy of Mapping Polish
Guarino, N., and Welty, Ch. 2004. An overview of              WordNet onto Princeton WordNet. In Proceedings
  OntoClean. Handbook on ontologies. Springer,                of COLING 2012 :1039-1048.
  Berlin, Heidelberg:151-171.
                                                            Sinopalnikova, A. 2004. Word association thesaurus
Hill, F., Reichart, R., and Korhonen, A. 2015. Sim-            as a resource for building wordnet. Proceedings of
   Lex-999: Evaluating semantic models with (genu-             the 2nd International WordNet Conference: 199-
   ine) similarity estimation. Computational Linguis-          205.
   tics, 41(4):665–695.
                                                            Soloviev, V., Gimaletdinova, G., Khalitova, L., and
Johannsen, A. and Pedersen, B. 2011. Andre ord”–a             Usmanova, L. 2020. Expert Assessment of Syno-
  wordnet browser for the Danish wordnet, DanNet."            nymic Rows in RuWordNet. Proc. of Analysis of
  Proceedings of the 18th Nordic Conference of                Images, Social Networks and Texts AIST-2020
  Computational Linguistics (NODALIDA 2011).                  cnference. CCIS, vol. 1086:174-183.
Karaulov, Yu.N., G.A. Cherkasova, N.V. Ufimtseva,           Usmanova, L, Erofeeva, I., Solovyev, V., and V.
 Yu.A. Sorokin, E.F., and Tarasov. 2002 Russian               Bochkarev. 2020. Analysis of the Semantic Dis-
 associative dictionary. In 2 volumes. M.: AST-               tance of Words in the RuWordNet Thesaurus.
 Astrel, 2002. 784 p. (in Russian)                            2020. Proceedings of the DAMDID/RCDL’2020
                                                              Conference.
Kliegr T., and Zamazal O. 2018. Antonyms are simi-
   lar: Towards paradigmatic association approach to        Vylomova, E., Shcherbakov, A., Philippovich, Y., and
   rating similarity in SimLex-999 and WordSim-353.           Cherkasova, G. 2018. Men Are from Mars, Women
   Data & Knowledge Engineering. V. 115:174-193.              Are from Venus: Evaluation and Modelling of
                                                              Verbal Associations. In: Analysis of Images, Social
Loukachevitch N., Lashevich G., and Dobrov B.
                                                              Networks and Texts. AIST 2017. Lecture Notes in
  2018. Comparing Two Thesaurus Representations
                                                              Computer Science, vol 10716. Springer,
  for Russian. Proceedings of Global WordNet Con-
                                                              Cham:106-115.
  ference GWC-2018:35-44.
                                                            Wojcik, E. H., and P. Kandhadai. 2019. Paradigmatic
Loukachevitch, N. 2019. Corpus-based Check-up for
                                                             associations and individual variability in early lexi-
  Thesaurus. Proc. of the 57th Annual Meeting of the
                                                             cal–semantic networks: Evidence from a free asso-
  Association for Computational Linguistics ACL-
                                                             ciation task. Developmental Psychology. 56(1):53-
  2019: 5773-5779.
                                                             69.
Majewska, O., McCarthy, D., van den Bosch, J.,
  Kriegeskorte, N., Vulić, I., and Korhonen, A. 2020.
You can also read