COMPARING SIMILARITY OF WORDS BASED ON PSYCHOSEMANTIC EXPERIMENT AND RUWORDNET
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Comparing Similarity of Words Based
on Psychosemantic Experiment and RuWordNet
Valery Solovyev Natalia Loukachevitch
Kazan Federal University, Lomonosov Moscow State University,
Kazan, Russia, Moscow, Russia,
maki.solovyev@mail.ru louk_nat@mail.ru
mantic associations correspond to co-occurrence
Abstract (syntagmatic relations) in texts.
To study automatic methods of word similari-
In the paper we compare the structure of ty calculation, specialized datasets are created.
the Russian language thesaurus RuWord- Some researchers try to create datasets distin-
Net with the data of a psychosemantic ex- guishing different subtypes of semantic similari-
periment to identify semantically close ty of words, which requires additional efforts and
words. The aim of the study is to find out guidelines. Agirre et al. (2009) subdivided the
to what extent the structure of RuWordNet existing semantic word dataset WordSim353
corresponds to the intuitive ideas of native (Finkelstein et al., 2002) into two subsets:
speakers about the semantic proximity of WordSim353-similarity and WordSim353-
words. The respondents were asked to list relatedness datasets. SimLex-999 guidelines
synonyms to a given word. As a result of (Hill et al., 2015) aim to distinguish word pairs
the experiment, we found that the re- in taxonomical semantic similarity relation (syn-
spondents mainly mentioned not only onyms, hypernyms, hy ponyms) from remaining
synonyms but words that are in paradig- types of relations (antonymy, co-hyponyms).
matic relations with the stimuli. The The authors of WIN353 dataset (Kliegr and
words of the mental sphere were chosen Zamazal, 2018) ask respondents about word sim-
for the experiment. In 95% of cases, the ilarity based on word interchangeability in sen-
words characterized in the experiment as tences.
semantically close were also close accord- We can also try to use human scores of word
ing to the thesaurus. In other cases, addi- semantic similarity to assess the quality of de-
tions to the thesaurus were proposed. scriptions in electronic lexical-semantic re-
sources (thesauri). Such resources are built on
1 Introduction the basis of synsets – sets of synonyms – linked
by semantic relations such as hyponymy, hyper-
Semantic proximity of words is an important pa- nymy, antonymy, and some others. The automat-
rameter required in various tasks of natural lan- ic use of thesauri requires high quality descrip-
guage processing. It can be estimated in different tions of semantic senses and semantic relations
ways: by corpus, using distributional methods between them.
(Mikolov 2013, Bojanowski et al., 2017), expert In this paper, we compare the results of a sur-
assessments; from psychosemantic experiments; vey of respondents and the similarity of words
using thesauri such as WordNet (Fellbaum, according to the RuWordNet thesaurus (Louka-
1998). chevitch et al., 2018) for the Russian language.
The general concept of semantic similarity can Currently, the published RuWordNet version
be subdivided to paradigmatic (taxonomical) comprises about 110 thousand Russian words
similarity and semantic associations (Agirre et and expressions. A new version of RuWordNet is
al., 2009; Hill et al., 2015; Kliegr and Zamazal, being prepared and RuWordNet data are tested
2018; Majewska et al., 2020). Paradigmatic simi- from different points of view.
larity can be defined in terms of shared superor- In the psychosemantic experiment the re-
dinate category or shared semantic features. Se- spondents were asked to list synonyms for stimu-li words without any guidelines. We found that text corpus, words are searched for which 20
their answers mainly contain paradigmatically words closest in the corpus (based on the stand-
similar words, practically without words related ard method for evaluating the semantic similarity
via any other similarity relationships, which of words) are located far from each other in the
makes it possible to check the taxonomic struc- thesaurus. The distance between words in the
ture of the thesaurus. thesaurus is the length of the shortest path be-
The paper is structured as follows. Section 2 tween them in the graph of semantic relations.
provides information on related work. Section 3 For found words with such properties, the rea-
describes a psychosemantic experiment to de- sons for such discrepancy are analyzed. The
termine the semantic proximity of words. Section analysis of the data presented in (Loukachevitch,
4 analyzes the data obtained. Section 5 discusses 2019) was continued in (Bayrasheva, 2019).
the results of the experiment. In work (Soloviev et al., 2020), RuWordNet
synsets were compared with synonymous sets
2 Related work according to published 10 dictionaries of Russian
synonyms. The work (Erofeeva et al., 2020) pre-
The paper concerns two directions of studies:
sents the results of an experiment in which the
revision and updating existing lexical-semantic
respondents were asked to list synonyms for a
resources for natural language processing (the-
given word. The results are compared with the
sauri) and studies on relation types exploited by
RuWordNet synsets. Usmanova et al. (2020)
native speakers in word association experiments.
analyzed pairs of quasi-synonyms and the dis-
tance between them in RuWordNet. It was ex-
2.1 Revision of Existing Lexical Semantic pected that quasi-synonyms, as semantically
Resources close words, should be located at a short distance
in the thesaurus.
Procedures for revising and verifying resources The general result of above-mentioned studies
are important for the developers of WordNet-like of RuWordNet is as follows. RuWordNet data,
resources. Some ontological tools have been including the composition of synsets and the
proposed to check the consistency of relation- structure of semantic relations, correlate well
ships in WordNet (Guarino and Welty, 2004; with all the other considered sources of infor-
Alvez et al., 2018). Rambousek et al. (2018) con- mation about semantically close words. At the
sidered a crowdsourcing tool allowing a user of same time, a number of gaps in RuWordNet
the Czech wordnet to report errors. Users may were identified, taking into account of which al-
propose an update of any data value. These sug- lows improving descriptions in the thesaurus.
gestions can be approved or rejected by editors. This article continues research in this direction.
Visualization tools can also help to find prob-
lems in wordnets (Piasecki et al. 2013; Jo- 2.2 Lexical relations in associations experi-
hannsen et al. 2011). Cristea et al. (2004) and ments
Rudnicka et al. (2012) reported on the revision of
One of the most known associative experiments
mistakes and inconsistencies in their wordnets in
for Russian was organized by Karaulov in 1986-
the process of linking the wordnet and the Eng-
1997 (Karaulov et al., 2002). In experiments
lish WordNet.
such demographic information such as age, gen-
McCrae et al. (2019) discussed a new project:
der, specialization, and location was also consid-
Open-Source WordNet for English, which is
ered and recorded. Currently, these data are con-
based on the Princeton WordNet. This project
sidered as outdated.
has already fixed errors found in the current ver-
Many researchers classify word associations
sion of WordNet, including spelling mistakes in
into syntagmatic and paradigmatic relations
definitions and examples. Some problematic is-
(Fitzpatrick, 2006). The researchers study the
sues were reported (for example, synset dupli-
structure of associations for language learners
cates, missed or incorrect relationships) for fur-
(Fitzpatrick, 2006), patients (Arias-Trejo et al.,
ther revision.
2018), children (Wojcik and Kandhadai, 2019),
Recently, verification and enrichment methods
and other social groups.
have been systematically developed for the Ru-
Vylomova et al. (2018) study types of rela-
WordNet thesaurus. In (Loukachevitch, 2019),
tions in associative responses of Russian native
the following method for enriching the Ru-
speakers in dependence on socio-demographic
WordNet thesaurus was proposed. For a large
characteristics. They organized associative ex-periments in various Russian regions, including tion) is listed in the 5th place, and for non-
Siberia and the Urals. The age of participants philologists, fantasia (fantasy) is in the 4th posi-
ranged from 16 to 26, most of them were univer- tion. It seems that the figurative thinking of phi-
sity students of approximately 50 specialties. In lologists, the reading and study of fiction, which
their analysis, Vylomova et al. (2018) classified form their linguistic personalities, are reflected in
the lexical relations in associations to syntagmat- the results of the experiment: for them, the word
ic (they calculated word co-occurrences in a text mechta (dream) is associated with fantasy and
corpus) and paradigmatic (according to Ru- dreams, that is, with something unreal, ephemer-
WordNet thesaurus). The authors found that al. Not-philologists are more pragmatic: the third
men more frequently list paradigmatic associa- position in their lists is occupied by the word
tions whereas women are more likely to produce stremlenie (aspiration), in the semantics of which
syntagmatic associations. It was also revealed that the presentation of concrete results is conveyed
most students of technical specializations and (Erofeeva et al., 2020).
natural sciences demonstrate high scores for par- Further we will write Cyrillic Russian words
adigmatic association types. in Latin transcription.
Sinopalnikova (2004) studies approaches to Since the respondents, naturally, did not use
extract useful lexical relations from existing the criteria of synonymy, such as interchangea-
word association thesauri to assist in developing bility in different contexts and did not have much
new wordnets. time to complete the task, they suggested words
that have something semantically in common
3 Experiment Setting with the given word, but not necessarily syno-
nyms in the strict sense of the term. For example,
In the current study we present the results of a
for the word mechta (dream as imaginative
psychosemantic experiment, carried out in ac-
thoughts), the following words were listed as
cordance with the methodology described in
synonyms in RuWordNet: gresa, mechtaniye,
(Petrenko, 2010). The experiment reveals seman-
fantasia (fantasy). The respondents most often
tically close words (synonyms) as seen by native
indicated the following words: zhelaniye (desire),
speakers. In (Erofeeva et al., 2020) only syno-
tsel’ (goal), fantasia (fantasy), gresa, stremleniye
nyms from the RuWordNet synsets were consid-
(aspiration), nadezhda (hope). Only two of them
ered, in this work all semantic relations are in-
are synonymous. The rest of the words –
volved.
zhelaniye (desire), tsel’ (goal), stremleniye (aspi-
The experiment is as follows. The respondents
ration), nadezhda (hope) – at first glance may
(Russian native speakers) receive a number of
seem like associations with the given word
words, and they have to list synonyms for these
dream. However, this assumption is not true.
words in a limited time. The respondents are stu-
In the Karaulov’s dictionary of Russian asso-
dents (18-23 years old, 200 people) of Kazan
ciations (Karaulov et al., 2002), the word mechta
Federal University (Kazan, Russia). About half
(dream) has the following most frequent associa-
of the students are philologists, the second half
tions: goluboy (blue), zhizn’ (life), moya (mine),
are non-philological students. The definition of a
sbylas’ (come true), idiota (idiot), nesbytochnaya
synonym is not explained to the respondents; we
(unrealizable), rozovaya (pink). The words
rely on intuitive understanding of synonyms by
zhelaniye (desire), stremleniye (aspiration),
native speakers. For the experiment, words relat-
nadezda (hope) are not mentioned as associations
ed to the mental sphere are selected. This seman-
at all, and the word tsel’ (goal) is mentioned only
tic area is the most difficult for clear differentia-
once in 101 responses. We can see that in fact
tion of synonym sets and their semantic relations.
words having syntagmatic relations with the
The results of philologists and non-
original one are also mentioned as associations
philologists differ insignificantly. However, it is
by respondents. In the current experiment, the
worth noting that synonyms for word мечта
respondents indicated words that were not in
(mechta – dream as imaginative thoughts) listed
syntagmatic but in paradigmatic relations with
by philologists and non-philologists have inter-
the stimulus. Rather, they can be characterized as
esting distinctions. So, for philologists, the word
belonging to the semantic field of the original
фантазия (fantasia – fantasy) is in 3rd place,
word or as its analogues.
and for non-philologists, the word стремление
It is worth noting that in the dictionary
(stremleniye – aspiration) is in the 3rd position.
(Apresian, 2004) the words namereniye (inten-
Conversely, for philologists, stremlenie (aspira-
tion) and mysl’ (thought) are considered as ana-logues (near-synonyms) of the word mechta The words veseliye (fun), likovaniye (exulta-
(dream) (its synonyms are not given in the dic- tion), udovol'stviye (pleasure) are hyponyms in
tionary). For the verb mechtat’ (to dream), the relation to radost’ (joy). The words vostorg (de-
synonyms, according to (Apresyan, 2004), are light) and schast'ye (happiness) are co-hyponyms
khotet’ (to want), zhelat’ (to desire), and the ana- with radost’ (joy) with a common hypernym –
logue is the word nadeyat’sya (to hope). Thus, dushevnoye perezhivaniye (emotional experi-
the words indicated by the respondents are close ence). But between the words radost’ (joy) and
in meaning to the word mechta (dream). Our ex- ulybka (smile) there is only a very long way: ra-
periment can be characterized as aimed at identi- dost’ (joy) – dushevnoye perezhivaniye (emo-
fying paradigmatic associations, while the tional experience) – mental'nyy ob"yekt (mental
Karaulov's dictionary (Karaulov et al., 2002) in object) – abstraktnaya sushchnost' (abstract enti-
fact mixes paradygmatic and syntagmatic associ- ty) – kachestvo (quality) – vneshnost’ (appear-
ations. ance) – vyrazheniye litsa (facial expression) –
ulybka (smile). Such a long path reflects the fact
4 Analysis of Results that in RuWordNet the word ulybka (smile) is
interpreted only as a facial expression and, ac-
In this work, the associations for the words obida
cordingly, radost’ (joy) and ulybka (smile) in
(offense, as a feeling caused being offended),
RuWordNet refer to different spheres –- the
radost’ (joy), talant (talent), strast’ (passion),
mental world and the physical.
lyubov’ (love), mysl’ (thought), vostorg (delight)
Princeton WordNet presents the point of view
are considered. For each stimulus word, six most
that a person smiles to communicate something
frequently mentioned responses are studied.
to others about his condition (to change one's
facial expression by spreading the lips, often to
Обида (offense feeling). The informants most
signal pleasure1) and thus it is classified as com-
often indicated the words: ogorcheniye (grief),
munication. Still, it should be noted that a person
dosada (annoyance), bol’ (pain), grust’ (sad-
can smile at own thoughts, pleasant memories
ness), razocharovaniye (disappointment), zlost’
while alone with yourself, i.e. a smile is also pos-
(anger). The first of them is interpreted in Ru-
sible outside the communication situation. As we
WordNet as a hypernym for obida. The word
can see, the situation here is very difficult. Ac-
grust’ (sadness) in RuWordNet also has a direct
cording to the Russian explanatory dictionary
connection with obida – it is a hypernym-
(Ozhegov and Shvedova, 1997), ulybka (smile)
hypernym for obida. Dosada (annoyance) is a
has the following definition: “mimic movement
co-hyponym for obida, having the common hy-
of the face, lips, eyes, showing disposition to
pernym nedovol'stvo (discontent).
laughter, expressing pleasure or ridicule and oth-
There is also a short path between the words
er feelings (translation from Russian)”. This def-
obida and razocharovaniye (disappointment):
inition takes into account both facial expressions
obida (offense) – nedovol'stvo (discontent) –
and communicative intentions.
dushevnoye perezhivaniye (emotional experi-
It is possible to take into account the intuition
ence) – razocharovaniye (disappointment). There
of native speakers and the dual nature of a smile
is a similar path between the words obida (of-
by making certain changes to the thesaurus. It
fense) and razocharovaniye (disappointment):
can be described with the entailment relationship
offense – discontent – emotional experience –
between concepts ulybat’sya (to smile) and ra-
disappoitment. Finally, the path between the
dovat’sya (to joy). If a person smiles, then usual-
words bol’ (pain) and obida is only slightly long-
ly this person is really happy, or at least seeks to
er: pain – suffering – emotional experience – dis-
show happiness to others. Conversely, if a person
content – offense. Semantic distances of 4 steps
is really happy about something, then this mani-
or less are treated in (Loukachevitch, 2019) as
fests itself in a smile.
short. All semantic relations are hypo-
hypernymic.
Talant (talent). For this word, the respond-
ents indicate the following synonymous words:
Radost’ (joy). For this word, respondents in-
sposobnost’ (ability), dar (gift), umeniye (skill),
dicate the following word associations: schast'ye
darovanie, odarenost’ (giftedness), talent, geniy
(happiness), vostorg (delight), vesel’ye (fun),
(genius). In RuWordNet darovaniye (giftedness),
ulybka (smile), likovaniye (exultation),
udovol'stviye (pleasure). 1
http://wordnetweb.princeton.edu/perl/webwnдар (gift), are listed as synonyms to the word (sympathy), nezhnost’(tenderness), vlecheniye
talant (talent). Sposobnost’ (ability) is a hyper- (attraction).
nym for prirodnaya sposobnost' (natural ability), We saw above that lyubov’ (love) is at a dis-
which is a hypernym for talant (talent). Umeniye tance of 3 from strast’ (passion) and 2 from
(skill) is a co-hyponym with talant (talent) vlecheniye (attraction). Privyazannost’ (attach-
through the general hypernym prirodnaya ment)) is a hypernym for lyubov’ (love). The
sposobnost' (natural ability). Odarennost’ (gift- words lyubov’ (love) and vlyublennost’ (falling
edness) is a hyponym in relation to sposobnost’ in love) are co-hyponyms with a common hyper-
(ability), i.e. is at distance 3 from the word talant nym dushevnoye perezhivaniye (emotional expe-
(talent). The word geniy (genius) is a hyponym rience). The word sympatia (sympathy) is a co-
to odarennost’ (giftedness), at a distance 4 from hyponym with word vlyublennost’ (falling in
the word talant (talent). love) through a hypernym lichnostnyye
otnosheniya (personal relationships). Thus, be-
Strast’ (passion). For the word strast’ (pas- tween the words sympatia (sympathy) and lyu-
sion), the respondents indicate the synonymous bov’ (love) there is a distance of length 4. But
words: zhelaniye (desire), vlecheniye (attraction), nezhnost’ (tenderness) is interpreted in Ru-
любовь (love), uvlecheniye (infatuation), pokhot’ WordNet only as a character trait (two other
(lust), интерес (interest). In RuWordNet, senses: nezhnost’ 1 (soft, gentle to the touch),
uvlecheniye (infatuation) is a hypernym for nezhnost’ 3 (fragile, too weak) are not here dis-
strast’ (passion). The word interes (interest) is a cussed as irrelevant), and not as mental experi-
hypernym of the hypernym for the word ence and there is no close way between them.
strast’(passion). The words vlecheniye (attrac- In fact, in RuWordNet one of the senses of the
tion), zhelanie (desire), lyubov’ (love) are co- word nezhnost’ (tenderness) is missing. In the
hyponyms with the word uvlecheniye (infatua- dictionary (Ozhegov and Shvedova, 1997), nezh-
tion) with a common hypernym, dushevnoye pe- nost’ (tenderness) refers to the word nezhnyi
rezhivaniye (emotional experience), i.e. are at a (tender), which is interpreted (in this sense) as
distance of 3 from strast’ (passion). The word “affectionate, full of love: tender feelings”. Ac-
pokhot’ (lust) is a hyponym in relation to vlech- cording to the dictionary (Apresian, 2004)
eniye (attraction), i.e. is at a distance of 4 from "nezhnyi (tender) – showing a feeling of love or
the initial word strast’ (passion). affection in communication with a person."
The scheme of semantic relations in this group Thus, in the interpretation of this word, the
of words can be represented in Fig. 1. This is a word lyubov’ (love) invariably appears, indicat-
typical scheme for the student answers in the ex- ing the correctness of the students' assessment.
periment. The arrows show links from hyper- Therefore, it is recommended to add a new sense
nyms to hyponyms. of the word nezhnost’ (tenderness) in RuWord-
Net, in accordance with the above-mentioned
Lyubov’ (love). For this word, the respondents dictionary definitions.
indicated such words as privyazannost’ (attach-
ment), vlyublennost’(falling in love), sympatia
Fig1. Scheme of semantic relations of words-reactions to the stimulus strast’ (passion)Vostorg (delight). Respondents indicate the is necessary to make certain changes in the the-
following words: radost’ (joy), voskhishcheniye saurus to obtain smaller distance for semantically
(admiration), udivleniye (surprise), schast'ye close words. These pairs of words are as follows:
(happiness), likovaniye (jubilation), voo- lyubov’ (love) – nezhnost’ (tenderness) and ra-
dushevleniye (inspiration). dost’ (joy) – ulybka (smile). In the first case, it is
In RuWorNet, vostorg (delight) and proposed to add a new sense of the word nez-
voskhishcheniye (admiration) are synonyms, nost’ (tenderness) to the thesaurus and to estab-
schast'ye (happiness), udivleniye (surprise) and lish the necessary additional relation of hypon-
radost’ (joy) are co-hyponyms with vostorg (de- ymy, in the second case we suggest to add the
light) with the common hypernym dushevnoye relation of entailment.
perezhivaniye (emotional experience). Word li- Thus, most words frequently mentioned by re-
kovaniye (jubilation), as noted above, is a hypo- spondents are located close to the stimulus word
nym in relation to radost’ (joy), i.e. is at a dis- in RuWordNet, which indicates good consistency
tance of 3 from vostorg (delight). Voo- of the thesaurus with the intuition of native
dushevleniye (inspiration) is a co-hypernym with speakers. At the same time, taking into account
the word radost’ (joy) with the common hypo- the data of a psychosemantic experiment makes
nym euphoria, i.e. is at a distance of 4 from vos- it possible to identify some problem areas in the
torg (delight). thesaurus. Let us consider whether there is a cor-
relation between the frequency with which the
Mysl’ (thought). Respondents indicate the fol- response word is chosen by the respondents and
lowing words: ideya (idea), duma (thought), the distance in the thesaurus from the stimulus
mneniye (opinion), dogadka (guess), soobra- word to the response word. We sort words-
zheniye (consideration), suzhdeniye (judgment). reactions according to the frequency of their
Words soobrazheniye (consideration) and du- mention.
ma (thought) are synonymous with mysl’ Table 1 summarizes the data, sorted by the
(thought). Ideya (idea) is a hyponym for mysl’ frequency of the words in the respondents' an-
(thought), suzhdeniye (judgment) is a hypernym swers. The asterisk indicates the distances that
for mysl’ (thought). Mneniye (opinion) is a co- will take place after the implementation of the
hyponym with mysl’ (thought) via common hy- above-mentioned suggestions for improving the
pernym suzhdeniye (judgment). thesaurus structure. We can see that the words
Between the words mysl’ (thought) and mentioned more often are at a shorter distance
dogadka (guess) there is a path of length 4: mysl’ from the stimulus word in the thesaurus, which is
(thought) – suzhdeniye (judgment) – mneniye also a good confirmation of the correct structure
(opinion) – dopushcheniye (assumption) – of the thesaurus and the adequacy of the experi-
dogadka (guess). ment.
5 Discussion 6 Conclusion
We analyzed 40 word pairs (out of a total of 7x6 Thesauri are created by professionals who rely
= 42 pairs, two pairs were repeated). In 38 cases on both the theory of language and their ideas
(95%), word pairs listed by the respondents as about the semantics of linguistic units. However,
synonyms are also close according to the thesau- semantics are not described in the literature in as
rus descriptions: 13 pairs are at a distance of 1; much detail as required by the thesaurus devel-
12 pairs are at a distance of 2; 7 pairs are at a opers. Taking this into account, it is of natural
distance of 3; 6 pairs are at a distance of 4. The interest to compare thesaurus data with the lin-
number of mentioned words located at a certain guistic intuition of native speakers, manifested in
path distance in the thesaurus decreases mono- psychosemantic experiments.
tonically with increasing distance. In all these
cases, it turned out to be sufficient to consider
only hypo-hypernymic relations. In two cases, itStimulus word Obida Radost’ Talant Strast’ Lyubov’ Mysl’ Vostorg Average
(offense) (joy) (talent) (passion) (love) (thought) (delight)
st
Words in the 1 places of the respondents’associations
Frequency (%) 24 49 55.5 43.5 26 61.5 48.5 40.3
Relation dist. 1 2 2 3 1 1 2 1.7
Words in the 2nd places of the respondents’associations
Frequency (%) 19 34.5 44.5 28.5 24.5 28.5 39.5 31.3
Relation dist. 3 2 1 3 4 1 1 2.1
d
Words in the 3 places of the respondents’associations
Frequency (%) 16 32 30 17.5 22 12.5 29.5 22.7
Relation dist. 2 1 2 3 2 2 2 2.0
Words in the 4th places of the respondents’associations
Frequency (%) 14 12 14 12.5 20.5 11 13.5 13.9
Relation dist. 4 1 3 1 3 1 2 2.1
th
Words in the 5 places of the respondents’associations
Frequency (%) 13.5 10.5 13 11.5 18 10.5 12.5 12.7
Relation dist. 3 3* 4 2 2 1 3 2.6
Words in the 6th places of the respondents’associations
Frequency (%) 13 7 11.5 9.5 14.5 9 8.5 10.4
Relation dist. 2 1 1 4 2* 4 4 2.6
Table 1. Positions, frequencies (percentage of answers) and RuWordNet distances of word associa-
tions. The sign *) means distances after the suggested corrections.
Most words frequently mentioned by respond- and relatedness using distributional and WordNet-
ents or synonyms with the stimulus word or are based approaches. In Proceedings of NAACL-HLT:
located close to it in RuWordNet, which indi- 19–27.
cates good consistency of the thesaurus with the Alvez, J., Gonzalez-Dios, I., and Rigau, G. 2018.
intuition of native speakers. This confirms the Cross-checking WordNet and SUMO using mer-
high quality of the RuWordNet thesaurus. The onymy. 2018. Proceedings of the Eleventh Interna-
experimental results also support the choice of tional Conference on Language Resources and
distance 4 as a measure of the semantic proximi- Evaluation (LREC-2018).
ty of words in the thesaurus. At the same time, Apresian, I.D. New explanatory dictionary of Russian
taking into account the experimental data made it synonyms. Moscow, Vena: Iazyki slavianskoi
possible to identify some problem areas in the kul'tury, 2004. (in Russian)
thesaurus. Arias-Trejo, N., Minto-García, A., Luna-Umanzor, D.
I., Ríos-Ponce, A. E., Mariana, B. P., and Bel-
Acknowledgments Enguix, G. 2018. Word-word relations in Dementia
This research was financially supported by and typical aging. In Proceedings of the first inter-
RFBR, grants № 18-00-01238 and № 18-00- national workshop on language cognition and
01226 as parts of complex project № 18-00- computational models:85-93.
01240 (K). Bayrasheva, V. 2019. Corpus-based vs thesaurus-
based word similarities: expert verification. The
References XXth International Scientific Conference “Cogni-
tive Modeling in Linguistics”. Proceedings:56-63.
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pas-
ca, M.,and Soroa, A. 2009. A study on similarityBojanowski, P., Grave, E., Joulin, A., and Mikolov, T. Spatial Multi-Arrangement for Clustering and Mul-
2017. Enriching word vectors with subword infor- ti-way Similarity Dataset Construction. In Pro-
ma-tion.Transactions of the ACL, 5:135–146. ceedings of The 12th Language Resources and
Evaluation Conference: 5749-5758.
Cristea, D., Mihaila, C., Forascu, Trandabat, D., Hu-
sarciuc, M., Haja, G., and Postolache, O. 2004. McCrae, J., Rademaker, A., Bond, F., Rudnicka, E.,
Mapping Princeton WordNet synsets onto Roma- and Fellbaum, Ch. 2019. English WordNet. An
nian WordNet synsets. Romanian Journal of In- Open Source WordNet for English. In Proceedings
formation Science and Technology, 7(1-2):125- of Global WordNet Conference GWC-2019.
145.
Mikolov, T., Chen, K., Corrado, G. S., and Dean, J.
Erofeeva, I., Solovyev, V., and Bayrasheva, V. 2020. (2013). Efficient estimation of word representa-
Psychosemantic experiment as a tool for objectify- tions in vectorspace.CoRR, abs/1301.3781.
ing data on the ways of representing synonymy in
Ozhegov, S.I. and Shvedova, N.Yu. 1997. Explanato-
modern Russian language. Bulletin of the Volgo-
ry Dictionary of the Russian Language. Russian
grad State University. Series 2: Linguistics. 2020. -
Academy of Sciences. Institute of the Russian
T. 19, No. 1:178–194 (in Russian).
Language named after V.V. Vinogradov. - 4th ed.,
Fellbaum, Ch. 1998. WordNet: An electronic lexical Add. - M .: Azbukovnik,. (in Russian).
database. MIT Press. 1998.
Petrenko, V.F. 2010. Fundamentals of psychoseman-
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., tics. M .: Eksmo. (in Russian)
Solan,Z., Wolfman, G., and Ruppin, E. 2002. Plac-
Piasecki, M., Marcińczuk, M., and Maziarz, M. 2013.
ing searchin context: The concept revisited. ACM
WordNetLoom: a WordNet development system
Transactions on Information Systems, 20(1):116–
integrating form-based and graph-based perspec-
131.
tives. International Journal of Data Mining, Mod-
Fitzpatrick, Tess. 2006. "Habits and rabbits: Word elling and Management, 5(3) :210-232.
associations and the L2 lexicon." EuroSLA Year-
Rudnicka, E., Maziarz, M., Piasecki, M., and Szpa-
book 6.1 (2006):121-145.
kowicz, S. 2012. A strategy of Mapping Polish
Guarino, N., and Welty, Ch. 2004. An overview of WordNet onto Princeton WordNet. In Proceedings
OntoClean. Handbook on ontologies. Springer, of COLING 2012 :1039-1048.
Berlin, Heidelberg:151-171.
Sinopalnikova, A. 2004. Word association thesaurus
Hill, F., Reichart, R., and Korhonen, A. 2015. Sim- as a resource for building wordnet. Proceedings of
Lex-999: Evaluating semantic models with (genu- the 2nd International WordNet Conference: 199-
ine) similarity estimation. Computational Linguis- 205.
tics, 41(4):665–695.
Soloviev, V., Gimaletdinova, G., Khalitova, L., and
Johannsen, A. and Pedersen, B. 2011. Andre ord”–a Usmanova, L. 2020. Expert Assessment of Syno-
wordnet browser for the Danish wordnet, DanNet." nymic Rows in RuWordNet. Proc. of Analysis of
Proceedings of the 18th Nordic Conference of Images, Social Networks and Texts AIST-2020
Computational Linguistics (NODALIDA 2011). cnference. CCIS, vol. 1086:174-183.
Karaulov, Yu.N., G.A. Cherkasova, N.V. Ufimtseva, Usmanova, L, Erofeeva, I., Solovyev, V., and V.
Yu.A. Sorokin, E.F., and Tarasov. 2002 Russian Bochkarev. 2020. Analysis of the Semantic Dis-
associative dictionary. In 2 volumes. M.: AST- tance of Words in the RuWordNet Thesaurus.
Astrel, 2002. 784 p. (in Russian) 2020. Proceedings of the DAMDID/RCDL’2020
Conference.
Kliegr T., and Zamazal O. 2018. Antonyms are simi-
lar: Towards paradigmatic association approach to Vylomova, E., Shcherbakov, A., Philippovich, Y., and
rating similarity in SimLex-999 and WordSim-353. Cherkasova, G. 2018. Men Are from Mars, Women
Data & Knowledge Engineering. V. 115:174-193. Are from Venus: Evaluation and Modelling of
Verbal Associations. In: Analysis of Images, Social
Loukachevitch N., Lashevich G., and Dobrov B.
Networks and Texts. AIST 2017. Lecture Notes in
2018. Comparing Two Thesaurus Representations
Computer Science, vol 10716. Springer,
for Russian. Proceedings of Global WordNet Con-
Cham:106-115.
ference GWC-2018:35-44.
Wojcik, E. H., and P. Kandhadai. 2019. Paradigmatic
Loukachevitch, N. 2019. Corpus-based Check-up for
associations and individual variability in early lexi-
Thesaurus. Proc. of the 57th Annual Meeting of the
cal–semantic networks: Evidence from a free asso-
Association for Computational Linguistics ACL-
ciation task. Developmental Psychology. 56(1):53-
2019: 5773-5779.
69.
Majewska, O., McCarthy, D., van den Bosch, J.,
Kriegeskorte, N., Vulić, I., and Korhonen, A. 2020.You can also read