Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University

Page created by Bill Williamson
 
CONTINUE READING
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Conceptual flooding:
A discourse-cognitive approach using
Market Basket Analysis + Companions

             Masako Fidler & Václav Cvrček
  masako_fidler@brown.edu, vaclav.cvrcek@ff.cuni.cz
   Slavic Cogni
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Starting point

“The Democrats don't matter. The real opposition is the media. And the
way to deal with them is to flood the zone with shit.”
                                                         Steve Bannon

How does conceptual flooding work in media portals in “anti-system
media class” in Czech?
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Road map
1.   Cognitive linguistic basis for corpus text analysis
2.   Existing approaches to discourse
3.   After KWA: Market Basket Analysis
4.   Data – ONLINE
5.   Case studies
     a) migrant – small-scale study, flooding – based on MBA Associative Links
     b) triangulation of ANTS – full-blown study with MBA using Associative Arrays
     c) Introducing Companions, a new tool to complement MBA
6. Conclusions: Interpretation of data, the current state of corpus methods
   that facilitate cognitive linguistics
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Cognitive linguistic basis for
   corpus text analysis

          Keyword Analysis

             Market Basket Analysis

    !
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Cogni&ve linguis&c basis for corpus text
analysis
1. Audience worldview reflected in audience preference of texts
2. Contrast: probing the characteristics of text against the background
    of what is expected
• Keyword analysis: what is prominent vis-à-vis language patterns
  shared by the speech community
• Contextualization: Patterns of association among KWs likely to be
  noticeable by the reader in anti-system media vis-à-vis patterns of
  association among KWs in mainstream media
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Types of discourse analysis
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Empirical discourse analysis
• (Critical) Discourse Analysis (CDA)
   • Discourse historical analysis (Reisigl & Wodak 2016)
      • Strategies: nomination, predication, argumentation, perspectivization,
        intensification/mitigation
   • Sociocognitive approach (van Dijk 2016)
      • Concepts: Polarization (in-group × out-group), identification with group, activity, norms &
        values, interests

• Corpus-assisted discourse studies (CADS)
   • Lancaster U school (Baker & McEnery 2005)
   • Bologna U school (Partington 2004)
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
CDA and CADS
(C)DA                                        CADS
• Hypothesis-based (socio-                   • Exploratory, less hypothesis-
  political stance, cf. Partington             driven
  2004: 10-11)                               • Quantitative (× qualitative
• Qualitative (× implicit                      interpretation, annotation etc.)
  quantification is used/expected,           • “data mining” => hypothesis
  ten Have 2007: 158)                          (pattern) => experiment =>
• Selected examples => conclusion              conclusion

Quantitative analysis is not just built on on the back of qualitative single case
analysis (Schegloff 1993: 102), QA has its own exploratory potential.
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
Many CADS start with Keyword Analysis (KWA)
• KWA compares target text/corpus and reference corpus
• Identifies prominent units (keywords, KWs) (Scott & Tribble 2006)
• Based on differences in frequencies: statistical significance (log-
  likelihood or chi2 tests) + effect size (DIN)
• KWs: what the text is about, genre/register (× cultural keywords,
  search terms)
• KWs: ”just pointers” (Scott 2010) – interpretation required
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
KWA – “a bag-of-words” approach
• Identification of KWs solely on
  frequency in the corpus
• Dispersion of units and internal
  structure is neglected (× Egbert &
  Biber 2019)
• No information about the context or
  associations between KWs

                               Image, Cvrček and Fidler 2022
So we have KWs, now what?
Interpretation:
• Concordance reading – limited scope, qualitative
• Collocation analysis, KW clusters – semantic disambiguation,
  terminological collocations, proper names – limited scope
• KW links examination – co-occurrence of KWs in a larger context
  window (e.g. 15 words), related topics (NYA: crisis + prices, Hobbit:
  Gollum + hiss) – no evaluative aparatus
• Key keywords – words appearing as keywords in more texts – no
  evaluative aparatus
Extended versions of CADS
Reference corpus models the reader (KWA)
  • New Year’s Addresses – current vs. contemporary perspec7ves (Fidler & Cvrček
    2015)
  • Inexperienced vs. experienced readers of academic texts (Cvrček & Fidler 2019)

Morphology in KW-assisted CADS
• “Keymorph” analysis – prominent gram. func7ons (cases, mood…)
  • NYA: dynamic democra7c presidents vs. noun-heavy communists (Fidler &
    Cvrček 2019)
  • Sputnik News: use of preposi7ons (vůči, pro*) in crea7ng the narra7ve
    portraying RussiaDat as a vic7m (Fidler & Cvrček 2018)
Proposing the next step with
   Market Basket Analysis
MBA applied to texts
• Data-mining technique
   • Identifies interrelated choices among sets of items (shopping items, KWs in
     texts…), extracts associative links
• Systematic co-occurrence of items applied to KWs in texts
   => MBA helps “re-contextualizing” KWs
• Cvrček & Fidler (2022): No Keyword is an Island: In search of covert
  associations. Corpora 17(2). https://arxiv.org/abs/2103.17114
Associative links extracted from MBA
• Higher level of abstrac1on; which topics are interrelated in
  discourse
• Each Associa1ve link: Antecedent (LHS) ➝ Consequent (RHS)
• Associa1ve Links are based on:
  • KWs of individual texts
  • Co-occurrence of KWs in texts within media classes
• Evalua1on of ALs:
  • LiC = strength of associa1on
  • Support = range of uses
Keywords (reproduced) and associative links
Associa've links with migrant
Mainstream (top 10)    Anti-system (top 10, excl. duplicates)
Associative Arrays (AA)
     • Represents the network of associations a word is involved in
     • Contrast in framing between anti-system and mainstream media
       classes.

                                                          AA for antisystem:
                                                          Associations unique to
                                                          the media class
AA for mainstream:
Associations unique to
the media class                               Shared associa
Data – corpus ONLINE
ONLINE corpus
• Monitoring corpus of Czech online media, discussions and social
  networks (facebook, twitter)
• Available 2/2017–3/2021 with daily updates (current hiatus in
  updating)
• 4.5 mil. words a day – lemmatized and morphologically tagged
• Media classification based on similarity of audiences (cf. J. Šlerka’s
  typology):
   • Similarity web – links between web pages
   • Alexa Rank – clusters of readers
   • CrowdTangle – activity on social media
https://www.nfnz.cz/studie-a-analyzy/typologie-domacich-zpravodajskych-webu/
Source        Media type                 Share (%)          Example
facebook      facebook                               34,5 (currently unavailable)
twitter       twitter                                22,7
discussions   discussions                            13,7 blesk.cz
forums        forums                                  4,7 emimino.cz
              Mainstream                              9,4 novinky.cz
              Other                                   5,6 blogs, sport…
              Tabloids                                2,7 blesk.cz
              Anti-system                             1,5 sputniknews.cz
              Opinion portals                         1,7 blisty.cz
news
              Market-driven                           0,8 globe24.cz
              Political tabloids                      0,7 parlamentnilisty.cz
              Party sites                             0,4 halonoviny.cz
              Institution sites                       0,3 policie.cz
              Analytical-investigative               0,01 hlidacipes.cz
Data from two studies + one in progress
  Contrast between anti-system (ANTS) and mainstream (MS) media
  1. MBA study of migration (Cvrček & Fidler 2022)
     • Older version of ONLINE corpus (10/2017–10/2018)
     • 7,401 ANTS texts and 12,110 MS (“center-right”) texts, antecedent migrant
  2. Study of anti-system web portals via triangulation (Fidler & Cvrček
     forthcoming)
     • ONLINE corpus (6/2020–9/2020)
     • 10841 texts in MS and 4352 texts in ANTS
     • Companions – all data in ONLINE corpus
Case studies
On abbreviations…

• Mainstream media: MS and CR (older label: Center-Right)
• ANTS (An;system)
• AL (Associa;ve Link)
• AA (Associa;ve Array)
Migrant: a small-scale    study on
conceptual flooding with Associative
Links (ALs)

• texts containing migrant as a KW:
  ANTS (472 vs. 256)
• 6x more ALs in ANTS than MS
  (1448 vs. 235)
• Overall strong associative links in
  MS in contrast to ANTS
à stronger preoccupation with
migrant in ANTS than in MS.
àwider and diffuse network of
  associations with migrant in ANTS
àmany ALs are more equally
  cohesively linked to migrant in MS
Migrant: Contents of “flooding” in ANTS (in contrast to
MS), informed by ALs
Mainstream
Migration as a challenge to the EU, covering dissenting voices (the “rebels”), but
EU-exit not proposed as a solution. A detached picture of the migration crisis as a
problem to be resolved among the EU member states

Anti-system
• Negative image of migrant
• Czech government colludes with the EU
• EU is an authoritarian system, Czech EU-exit
• The mainstream media hides the truth (in contrast to ANTS)
• Global conspiracy by transnational organizations behind the massive migration
Migrant: Illustrative samples (more in Cvrček and Fidler 2022)
MS: italský, migrant ➝ Itálie ‘Italian, migrant ➝ Italy’
Italské volby ve znamení migrace. V průzkumech vedou populisté, slibující její zastavení, EU ani
eurozónu ale nezpochybňují
‘Italian elections at the signal of migration. Populists are leading in the polls, promising to stop it,
but they do not challenge the EU or the eurozone’

ANTS: globální, migrant, OSN ➝ migrace ‘global, migrant, UN ➝ migrace ’
Toto ustanovení [UNHCR] […] dává volnou ruku dosud utajovaným sponzorům migrace a
migračních neziskovek - oligarchům typu Sorose!
‘The [UNHCR] provision is said to “give the green light to thus far hidden sponsors of migration
and migration NGOs – to oligarchs of the type Soros!”’

boldstyle = unique to the media class
Analysis of ANTS using triangulation

• Based on AAs
• Topic-blind (target: the en7re media class irrespec7ve of what is covered, cf.
  other studies dealing with specific themes)

Associa've Arrays of:
1. ANTS-dominant topics (KWs): Characteris'cs of dominant KWs in ANTS (at
   least 2x in ANTS as in MS)
2. Shared topics (KWs): Topics that consistently appear in both ANTS and MS
   and differ in associa'ons (framing)
3. Seasonal topics (KWs): Short-term topics
à Goal is to find pervasive narra've lines and a possible argumenta'on that
emanates from them.
Triangulation: 1. ANTS-dominant topics (samples)
media   ALs           notable contexts found           KWs in ALs, KWs in black are unique to media class (i.e. AA)
        contains
MS      Brusel (125   EU negotiations                  EU, evropský, komise, premiér, unijní, země
        texts)
ANTS    Brusel (55    Babiš and EU subsidies,          Babiš, Bělorusko, členský, dluh, dotace, ekonomický, ekonomika, EU,
        texts)        pandemics, economic crisis       euro, evropský, fond, komise, krize, miliarda, občan, obnova,
                      (debt) associated with Brusel,   pandemie, peníze, plán, politický, právo, premiér, rada, rozpočet,
                      Brusel applying double           stát, summit, Turecko, unie, vláda, země
                      standards for democracy to
                      Belarus and Turkey

MS      bílý          White House                      americký, Biden, demokratický, Donald,
        ‘white’                                        dům, prezident, protest, spojený, stát, Trump, Trumpův, USA,
        (178 texts)                                    Washington
ANTS    bílý          Race                             americký, Amerika, Antifa, barevný, běloch, Black, BLM, černoch,
        (226 texts)   Racism, polarization between     černošský, černý, Čína, Donald, Floyd, hnutí, Lives, Matter, násilí,
                      black vs. white populations in   policejní, policie, policista, politický, právo, prezident, protest, proti,
                      US, unrests, violence            rasa, rasismus, rasistický, rasový, revoluce,
                                                       socha, spojený, Trump, USA, video, všecek, zde, zkoušet
Triangulation: 2 Shared non-seasonal (longer-lasting) topics
 (sample)
media   ALs contains       notable contexts found                KWs in ALs. KWs in black are unique to media class (i.e.
                                                                 AA)
MS      Řecko (95 texts)   COVID infection and possibility for   cestovní, kancelář, koronavirus, nákaza, řecký, srpen,
        ‘Greece’           tourism in Greece                     test, turista, země

ANTS    Řecko              Greece asks Russia for help against   armáda, členský, dohoda, džihádista, Egypt,
        (104 texts)        Turkey                                ekonomický, Erdogan, EU, Evropa, evropský, Francie,
                           Macron says NATO is dying.            hranice, jednání, komise, krize, kyperský, Kypr, Libye,
                           Impending war in the Mediterranean.   libyjský, loď, ministr, moře, NATO, Německo, německý,
                                                                 ostrov, plyn, politika, právo, premiér, prezident, proti,
                                                                 rada, řecký, ropa, Rusko, ruský, sankce, situace,
                                                                 smlouva, stát, Středomoří, Sýrie, Turecko, turecký, unie,
                                                                 vláda, vojenský, zahraničí, země
Triangulation: 3 Seasonal topics (sample)
media   ALs           notable contexts found                   KWs in ALs. KWs in black are unique to media class (i.e. AA)
        contains

MS      Lukašenko     Reporting Lukašenko`s claim that the agentura, Alexandr, Bělorus, Bělorusko, běloruský,
        (165 texts)   demonstrations are orchestrated by Cichanouská, demonstrace, demonstrant, Minsk, opozice,
                      the West                             opoziční,policejní, prezident, prezidentský, protest, proti,
                                                           Putin, režim, Rusko, ruský, volba, volby, výsledek, země

ANTS    Lukašenko     Statement that the demonstrations        Alexandr, americký, armáda,
        (182 texts)   in Belarus are orchestrated by the       Babiš, Bělorus, Bělorusko, běloruský, bezpečnost, Čína,
                      West.                                    demokracie, demokratický, demonstrace, demonstrant,
                      (Attempt at Maidan in Belarus            Donbas, EU, evropský, informovat, koronavirus, Kreml,
                      represented as a foreign film [=irony,   Lukašenkův, ministerstvo, ministr, Minsk, Moskva, NATO,
                      i.e. fake] directed by foreigners)       občan, opozice, opoziční, podpora, politický, Polsko,
                                                               premiér, prezident, prezidentský, prohlášení,
                                                               prohlásit, protest, proti, Putin, rada, republika, režim,
                                                               Rus, Rusko, ruský, sankce, situace, státní, svoboda, tajný,
                                                               Ukrajina, ukrajinský, unie, USA, vláda, voják,
                                                               vojenský, volba, volby, zahraničí, zahraniční, západ,
                                                               západní, země, zveřejnit
Companions to complement the triangulation
  analysis
• Words/collocations sharing the same
  frequency development through a period of
  time
• Applicable to homogeneous data which
  reflects ongoing events in politics, society…
  (e.g. monitor newspaper corpus)
• Peaks and valleys mirror the societal context
   1. Confirmatory use: extent to which two concepts
      relate in a time frame
   2. Exploratory use: find the closest match (not
      implemented yet)
What Companions show
1: flu and coronavirus

Further support for
the validity of
narra3ve lines from
the two studies
(2-6/2020)

MS:
chřipka ‘flu' and
koronavir.*
‘coronavirus’
rho=-0.04
What Companions show
1: flu and coronavirus
Further support for the
validity of narrative lines
from the two studies (2-
6/2020, COVID starts in
Southern Europe)

ANTS:
chřipka ‘flu' and
koronavir.* ‘coronavirus’
rho=0.92
“mainstream media and
government unreliable”
What Companions show 2: Russia/Russian vs.
West/Western (MS)
(Rusko|ruský) vs.
the West
(západ|západní)
from 1/2020 to
1/2021.
rho = 0.365
What Companions show 2
Russia/Russian and West/Western (ANTS)
Permanent (non-
seasonal)
preoccupation with
“Russia (Rusko|ruský)
vs. the West
(západ|západní)” from
1/2020 to 1/2021.
rho = 0.658
Conclusions
Interpretation and Conclusions 1
     With the help of corpus linguistic methods based on simulated cognitive contrast and reader
     perceptions, similar narrative lines were found in both case studies:

     • “Globalists” conspiring, negative representation of EU and NATO, US conspiring/in disarray,
       unreliability of the mainstream media, islamization of Europe, crime by migrants, hinting at
       serious crisis (war)

      overarching argumentation:
         Czechs should exit the EU and NATO (otherwise
         there might be serious consequences)

                                        Potential agenda: implicit further alternative to EU
                                        and NATO
Conclusions 2: Corpus methods for cognitive linguistics
Shift in corpus methods, enabling complex interactions among concepts + more sophisticated
interpretations
1) Identifying what the reader is likely to notice ⟹ keywords (a long list without context)
2) Observing how KW is used
   ⟹ short-distance context (Concordances → Collocations → KW clusters)
3) KW in relation to nearby KWs (attempt to see possible topics nearby, no info re: significance
   of associations
   ⟹ KW links
4) Measuring strong and systematically recurring associative relationships among KWs (that
   may be distant from one another within a single text)
   ⟹ Associative Links, Associative Arrays capturing
   narrative lines that may not be obvious (texts may seem to be ”flooded” by random ideas)
   not obvious incremental effects of narrative lines (overarching implicit discourse agenda)

MBA, inspired by the concept of frame (Fillmore 1982), looks for evidence in text.
This study explicitly shows how framing of a word can be manipulated via discourse
Framing manipulation may lead to wider cognitive polarization
Widening gap between speakers who speak the same language but differ in framing key concepts
Acknowledgement

This presenta,on was supported by the European Regional Development Fund project “Crea,vity
and Adaptability as Condi,ons of the Success of Europe in an Interrelated World”
(reg. no.: CZ.02.1.01/0.0/0.0/16_019/0000734) and resulted from the implementa,on of the Czech
Na,onal Corpus project (LM2018137) funded by the Ministry of Educa,on, Youth and Sports of the
Czech Republic within the framework of Large Research, Development and Innova,on Infrastructures.

The presentation was also supported in part by Brown University Humanities Research Fund.
References
Baker, P. and T. McEnery. 2005. A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language
and Poli0cs 4(2), p. 197 – 226.
Cvrček, V. and M. Fidler. 2019. “Up close and personal vs. birds-eye view” of discourse: a corpus study of perspecVve using Czech data. ICLC15 -
InternaVonal CongiVve LinguisVcs Conference, Nishinomya. Japan. h[ps://www.brown.edu/research/projects/needle-in-
haystack/sites/brown.edu.research.projects.needle-in-haystack/files/uploads/ICLC2019-FINAL.pdf
Cvrček, V. and M. Fidler. 2022 (scheduled to appear). No keyword is an island: In search for covert associaVons. Corpora. 17(2)
(h[ps://arxiv.org/abs/2103.17114)
Egbert, J. and D. Biber. 2019. ‘IncorporaVng text dispersion into keyword analyses’, Corpora 14(1), pp. 77–104.
Fidler, M. and V. Cvrček. 2015. ‘A Data-Driven Analysis of Reader Viewpoints: ReconstrucVng the Historical Reader Using Keyword Analysis’, Journal of
Slavic Linguis0cs 23(2), pp. 197–239.
Fidler, M. and V. Cvrček. 2018. ‘Going Beyond “Aboutness”: A QuanVtaVve Analysis of Sputnik Czech Republic’ in M. Fidler and V. Cvrček (eds.) Taming
the corpus: From inflec0on and lexis to interpreta0on, pp. 195–225. Cham: Springer.
Fidler, M. and V. Cvrček. 2019. ‘Keymorph analysis, or how morphosyntax informs discourse’, Corpus Linguis0cs and Linguis0c Theory, 15(1), pp. 39–70.
Fillmore, C. J. 1982. Frame SemanVcs. Linguis0cs in the Morning Calm. Ed. The LinguisVc Society of Korea, 111-137. Seoul: Korea.
ParVngton, A. (2004). Corpora and discourse, a most congruous beast. Corpora and discourse, 11-20.
Reisigl M. and R. Wodak. 2016. The discourse-historical approach (DHA). In Methods of Cri0cal Discourse Studies, ed by R. Wodak and M. Meyer. 23-
61.
Schegloff, E.A. 1993. ReflecVons on quanVficaVon in the study of conversaVon. Research on Language and Social Interac0on 26: 99–128.
Sco[, M. 2010. ‘Problems in invesVgaVng keyness, or clearing the undergrowth and marking out trails…’ in M. Bondi and M. Sco[ (eds.) Keyness in
Texts, pp. 43–58. Amsterdam/Philadelphia: John Benjamins.
Sco[, M. and C. Tribble. 2006. Textual PaKerns: Key words and corpus analysis in language educa0on. Philadelphia: John Benjamins
Šlerka, J. h[ps://www.nfnz.cz/studie-a-analyzy/typologie-domacich-zpravodajskych-webu/
ten Have, Paul. 2007. Doing ConversaVon Analysis (2nd ediVon). SAGE Publ., Los Angeles/London.
van Dijk, T. A. 2016. CriVcal discourse studies: a sociocogniVve approach. In Methods of Cri0cal Discourse Studies, ed. By R. Wodak and M. Meyer. 62-
85.
Thank you for the attention.
  Děkujeme za pozornost.
Companions II
Race-related issues
(BLM, G. Floyd’s death)
coverage in 5/2021–
11/2021

MS:
bílý|běloch ‘white’ and
černý|černoch ‘black’
rho = 0.34
Companions II
Race-related issues
(BLM, G. Floyd’s death)
coverage in 5/2021–
11/2021

ANTS:
bílý|běloch ‘white’ and
černý|černoch ‘black’
rho = 0.86
You can also read