Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions - Masako Fidler & Václav Cvrček - Brown University
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Conceptual flooding: A discourse-cognitive approach using Market Basket Analysis + Companions Masako Fidler & Václav Cvrček masako_fidler@brown.edu, vaclav.cvrcek@ff.cuni.cz Slavic Cogni
Starting point “The Democrats don't matter. The real opposition is the media. And the way to deal with them is to flood the zone with shit.” Steve Bannon How does conceptual flooding work in media portals in “anti-system media class” in Czech?
Road map 1. Cognitive linguistic basis for corpus text analysis 2. Existing approaches to discourse 3. After KWA: Market Basket Analysis 4. Data – ONLINE 5. Case studies a) migrant – small-scale study, flooding – based on MBA Associative Links b) triangulation of ANTS – full-blown study with MBA using Associative Arrays c) Introducing Companions, a new tool to complement MBA 6. Conclusions: Interpretation of data, the current state of corpus methods that facilitate cognitive linguistics
Cogni&ve linguis&c basis for corpus text analysis 1. Audience worldview reflected in audience preference of texts 2. Contrast: probing the characteristics of text against the background of what is expected • Keyword analysis: what is prominent vis-à-vis language patterns shared by the speech community • Contextualization: Patterns of association among KWs likely to be noticeable by the reader in anti-system media vis-à-vis patterns of association among KWs in mainstream media
Empirical discourse analysis • (Critical) Discourse Analysis (CDA) • Discourse historical analysis (Reisigl & Wodak 2016) • Strategies: nomination, predication, argumentation, perspectivization, intensification/mitigation • Sociocognitive approach (van Dijk 2016) • Concepts: Polarization (in-group × out-group), identification with group, activity, norms & values, interests • Corpus-assisted discourse studies (CADS) • Lancaster U school (Baker & McEnery 2005) • Bologna U school (Partington 2004)
CDA and CADS (C)DA CADS • Hypothesis-based (socio- • Exploratory, less hypothesis- political stance, cf. Partington driven 2004: 10-11) • Quantitative (× qualitative • Qualitative (× implicit interpretation, annotation etc.) quantification is used/expected, • “data mining” => hypothesis ten Have 2007: 158) (pattern) => experiment => • Selected examples => conclusion conclusion Quantitative analysis is not just built on on the back of qualitative single case analysis (Schegloff 1993: 102), QA has its own exploratory potential.
Many CADS start with Keyword Analysis (KWA) • KWA compares target text/corpus and reference corpus • Identifies prominent units (keywords, KWs) (Scott & Tribble 2006) • Based on differences in frequencies: statistical significance (log- likelihood or chi2 tests) + effect size (DIN) • KWs: what the text is about, genre/register (× cultural keywords, search terms) • KWs: ”just pointers” (Scott 2010) – interpretation required
KWA – “a bag-of-words” approach • Identification of KWs solely on frequency in the corpus • Dispersion of units and internal structure is neglected (× Egbert & Biber 2019) • No information about the context or associations between KWs Image, Cvrček and Fidler 2022
So we have KWs, now what? Interpretation: • Concordance reading – limited scope, qualitative • Collocation analysis, KW clusters – semantic disambiguation, terminological collocations, proper names – limited scope • KW links examination – co-occurrence of KWs in a larger context window (e.g. 15 words), related topics (NYA: crisis + prices, Hobbit: Gollum + hiss) – no evaluative aparatus • Key keywords – words appearing as keywords in more texts – no evaluative aparatus
Extended versions of CADS Reference corpus models the reader (KWA) • New Year’s Addresses – current vs. contemporary perspec7ves (Fidler & Cvrček 2015) • Inexperienced vs. experienced readers of academic texts (Cvrček & Fidler 2019) Morphology in KW-assisted CADS • “Keymorph” analysis – prominent gram. func7ons (cases, mood…) • NYA: dynamic democra7c presidents vs. noun-heavy communists (Fidler & Cvrček 2019) • Sputnik News: use of preposi7ons (vůči, pro*) in crea7ng the narra7ve portraying RussiaDat as a vic7m (Fidler & Cvrček 2018)
Proposing the next step with Market Basket Analysis
MBA applied to texts • Data-mining technique • Identifies interrelated choices among sets of items (shopping items, KWs in texts…), extracts associative links • Systematic co-occurrence of items applied to KWs in texts => MBA helps “re-contextualizing” KWs • Cvrček & Fidler (2022): No Keyword is an Island: In search of covert associations. Corpora 17(2). https://arxiv.org/abs/2103.17114
Associative links extracted from MBA • Higher level of abstrac1on; which topics are interrelated in discourse • Each Associa1ve link: Antecedent (LHS) ➝ Consequent (RHS) • Associa1ve Links are based on: • KWs of individual texts • Co-occurrence of KWs in texts within media classes • Evalua1on of ALs: • LiC = strength of associa1on • Support = range of uses
Keywords (reproduced) and associative links
Associa've links with migrant Mainstream (top 10) Anti-system (top 10, excl. duplicates)
Associative Arrays (AA) • Represents the network of associations a word is involved in • Contrast in framing between anti-system and mainstream media classes. AA for antisystem: Associations unique to the media class AA for mainstream: Associations unique to the media class Shared associa
Data – corpus ONLINE
ONLINE corpus • Monitoring corpus of Czech online media, discussions and social networks (facebook, twitter) • Available 2/2017–3/2021 with daily updates (current hiatus in updating) • 4.5 mil. words a day – lemmatized and morphologically tagged • Media classification based on similarity of audiences (cf. J. Šlerka’s typology): • Similarity web – links between web pages • Alexa Rank – clusters of readers • CrowdTangle – activity on social media
https://www.nfnz.cz/studie-a-analyzy/typologie-domacich-zpravodajskych-webu/
Source Media type Share (%) Example facebook facebook 34,5 (currently unavailable) twitter twitter 22,7 discussions discussions 13,7 blesk.cz forums forums 4,7 emimino.cz Mainstream 9,4 novinky.cz Other 5,6 blogs, sport… Tabloids 2,7 blesk.cz Anti-system 1,5 sputniknews.cz Opinion portals 1,7 blisty.cz news Market-driven 0,8 globe24.cz Political tabloids 0,7 parlamentnilisty.cz Party sites 0,4 halonoviny.cz Institution sites 0,3 policie.cz Analytical-investigative 0,01 hlidacipes.cz
Data from two studies + one in progress Contrast between anti-system (ANTS) and mainstream (MS) media 1. MBA study of migration (Cvrček & Fidler 2022) • Older version of ONLINE corpus (10/2017–10/2018) • 7,401 ANTS texts and 12,110 MS (“center-right”) texts, antecedent migrant 2. Study of anti-system web portals via triangulation (Fidler & Cvrček forthcoming) • ONLINE corpus (6/2020–9/2020) • 10841 texts in MS and 4352 texts in ANTS • Companions – all data in ONLINE corpus
Case studies
On abbreviations… • Mainstream media: MS and CR (older label: Center-Right) • ANTS (An;system) • AL (Associa;ve Link) • AA (Associa;ve Array)
Migrant: a small-scale study on conceptual flooding with Associative Links (ALs) • texts containing migrant as a KW: ANTS (472 vs. 256) • 6x more ALs in ANTS than MS (1448 vs. 235) • Overall strong associative links in MS in contrast to ANTS à stronger preoccupation with migrant in ANTS than in MS. àwider and diffuse network of associations with migrant in ANTS àmany ALs are more equally cohesively linked to migrant in MS
Migrant: Contents of “flooding” in ANTS (in contrast to MS), informed by ALs Mainstream Migration as a challenge to the EU, covering dissenting voices (the “rebels”), but EU-exit not proposed as a solution. A detached picture of the migration crisis as a problem to be resolved among the EU member states Anti-system • Negative image of migrant • Czech government colludes with the EU • EU is an authoritarian system, Czech EU-exit • The mainstream media hides the truth (in contrast to ANTS) • Global conspiracy by transnational organizations behind the massive migration
Migrant: Illustrative samples (more in Cvrček and Fidler 2022) MS: italský, migrant ➝ Itálie ‘Italian, migrant ➝ Italy’ Italské volby ve znamení migrace. V průzkumech vedou populisté, slibující její zastavení, EU ani eurozónu ale nezpochybňují ‘Italian elections at the signal of migration. Populists are leading in the polls, promising to stop it, but they do not challenge the EU or the eurozone’ ANTS: globální, migrant, OSN ➝ migrace ‘global, migrant, UN ➝ migrace ’ Toto ustanovení [UNHCR] […] dává volnou ruku dosud utajovaným sponzorům migrace a migračních neziskovek - oligarchům typu Sorose! ‘The [UNHCR] provision is said to “give the green light to thus far hidden sponsors of migration and migration NGOs – to oligarchs of the type Soros!”’ boldstyle = unique to the media class
Analysis of ANTS using triangulation • Based on AAs • Topic-blind (target: the en7re media class irrespec7ve of what is covered, cf. other studies dealing with specific themes) Associa've Arrays of: 1. ANTS-dominant topics (KWs): Characteris'cs of dominant KWs in ANTS (at least 2x in ANTS as in MS) 2. Shared topics (KWs): Topics that consistently appear in both ANTS and MS and differ in associa'ons (framing) 3. Seasonal topics (KWs): Short-term topics à Goal is to find pervasive narra've lines and a possible argumenta'on that emanates from them.
Triangulation: 1. ANTS-dominant topics (samples) media ALs notable contexts found KWs in ALs, KWs in black are unique to media class (i.e. AA) contains MS Brusel (125 EU negotiations EU, evropský, komise, premiér, unijní, země texts) ANTS Brusel (55 Babiš and EU subsidies, Babiš, Bělorusko, členský, dluh, dotace, ekonomický, ekonomika, EU, texts) pandemics, economic crisis euro, evropský, fond, komise, krize, miliarda, občan, obnova, (debt) associated with Brusel, pandemie, peníze, plán, politický, právo, premiér, rada, rozpočet, Brusel applying double stát, summit, Turecko, unie, vláda, země standards for democracy to Belarus and Turkey MS bílý White House americký, Biden, demokratický, Donald, ‘white’ dům, prezident, protest, spojený, stát, Trump, Trumpův, USA, (178 texts) Washington ANTS bílý Race americký, Amerika, Antifa, barevný, běloch, Black, BLM, černoch, (226 texts) Racism, polarization between černošský, černý, Čína, Donald, Floyd, hnutí, Lives, Matter, násilí, black vs. white populations in policejní, policie, policista, politický, právo, prezident, protest, proti, US, unrests, violence rasa, rasismus, rasistický, rasový, revoluce, socha, spojený, Trump, USA, video, všecek, zde, zkoušet
Triangulation: 2 Shared non-seasonal (longer-lasting) topics (sample) media ALs contains notable contexts found KWs in ALs. KWs in black are unique to media class (i.e. AA) MS Řecko (95 texts) COVID infection and possibility for cestovní, kancelář, koronavirus, nákaza, řecký, srpen, ‘Greece’ tourism in Greece test, turista, země ANTS Řecko Greece asks Russia for help against armáda, členský, dohoda, džihádista, Egypt, (104 texts) Turkey ekonomický, Erdogan, EU, Evropa, evropský, Francie, Macron says NATO is dying. hranice, jednání, komise, krize, kyperský, Kypr, Libye, Impending war in the Mediterranean. libyjský, loď, ministr, moře, NATO, Německo, německý, ostrov, plyn, politika, právo, premiér, prezident, proti, rada, řecký, ropa, Rusko, ruský, sankce, situace, smlouva, stát, Středomoří, Sýrie, Turecko, turecký, unie, vláda, vojenský, zahraničí, země
Triangulation: 3 Seasonal topics (sample) media ALs notable contexts found KWs in ALs. KWs in black are unique to media class (i.e. AA) contains MS Lukašenko Reporting Lukašenko`s claim that the agentura, Alexandr, Bělorus, Bělorusko, běloruský, (165 texts) demonstrations are orchestrated by Cichanouská, demonstrace, demonstrant, Minsk, opozice, the West opoziční,policejní, prezident, prezidentský, protest, proti, Putin, režim, Rusko, ruský, volba, volby, výsledek, země ANTS Lukašenko Statement that the demonstrations Alexandr, americký, armáda, (182 texts) in Belarus are orchestrated by the Babiš, Bělorus, Bělorusko, běloruský, bezpečnost, Čína, West. demokracie, demokratický, demonstrace, demonstrant, (Attempt at Maidan in Belarus Donbas, EU, evropský, informovat, koronavirus, Kreml, represented as a foreign film [=irony, Lukašenkův, ministerstvo, ministr, Minsk, Moskva, NATO, i.e. fake] directed by foreigners) občan, opozice, opoziční, podpora, politický, Polsko, premiér, prezident, prezidentský, prohlášení, prohlásit, protest, proti, Putin, rada, republika, režim, Rus, Rusko, ruský, sankce, situace, státní, svoboda, tajný, Ukrajina, ukrajinský, unie, USA, vláda, voják, vojenský, volba, volby, zahraničí, zahraniční, západ, západní, země, zveřejnit
Companions to complement the triangulation analysis • Words/collocations sharing the same frequency development through a period of time • Applicable to homogeneous data which reflects ongoing events in politics, society… (e.g. monitor newspaper corpus) • Peaks and valleys mirror the societal context 1. Confirmatory use: extent to which two concepts relate in a time frame 2. Exploratory use: find the closest match (not implemented yet)
What Companions show 1: flu and coronavirus Further support for the validity of narra3ve lines from the two studies (2-6/2020) MS: chřipka ‘flu' and koronavir.* ‘coronavirus’ rho=-0.04
What Companions show 1: flu and coronavirus Further support for the validity of narrative lines from the two studies (2- 6/2020, COVID starts in Southern Europe) ANTS: chřipka ‘flu' and koronavir.* ‘coronavirus’ rho=0.92 “mainstream media and government unreliable”
What Companions show 2: Russia/Russian vs. West/Western (MS) (Rusko|ruský) vs. the West (západ|západní) from 1/2020 to 1/2021. rho = 0.365
What Companions show 2 Russia/Russian and West/Western (ANTS) Permanent (non- seasonal) preoccupation with “Russia (Rusko|ruský) vs. the West (západ|západní)” from 1/2020 to 1/2021. rho = 0.658
Conclusions
Interpretation and Conclusions 1 With the help of corpus linguistic methods based on simulated cognitive contrast and reader perceptions, similar narrative lines were found in both case studies: • “Globalists” conspiring, negative representation of EU and NATO, US conspiring/in disarray, unreliability of the mainstream media, islamization of Europe, crime by migrants, hinting at serious crisis (war) overarching argumentation: Czechs should exit the EU and NATO (otherwise there might be serious consequences) Potential agenda: implicit further alternative to EU and NATO
Conclusions 2: Corpus methods for cognitive linguistics Shift in corpus methods, enabling complex interactions among concepts + more sophisticated interpretations 1) Identifying what the reader is likely to notice ⟹ keywords (a long list without context) 2) Observing how KW is used ⟹ short-distance context (Concordances → Collocations → KW clusters) 3) KW in relation to nearby KWs (attempt to see possible topics nearby, no info re: significance of associations ⟹ KW links 4) Measuring strong and systematically recurring associative relationships among KWs (that may be distant from one another within a single text) ⟹ Associative Links, Associative Arrays capturing narrative lines that may not be obvious (texts may seem to be ”flooded” by random ideas) not obvious incremental effects of narrative lines (overarching implicit discourse agenda) MBA, inspired by the concept of frame (Fillmore 1982), looks for evidence in text. This study explicitly shows how framing of a word can be manipulated via discourse Framing manipulation may lead to wider cognitive polarization Widening gap between speakers who speak the same language but differ in framing key concepts
Acknowledgement This presenta,on was supported by the European Regional Development Fund project “Crea,vity and Adaptability as Condi,ons of the Success of Europe in an Interrelated World” (reg. no.: CZ.02.1.01/0.0/0.0/16_019/0000734) and resulted from the implementa,on of the Czech Na,onal Corpus project (LM2018137) funded by the Ministry of Educa,on, Youth and Sports of the Czech Republic within the framework of Large Research, Development and Innova,on Infrastructures. The presentation was also supported in part by Brown University Humanities Research Fund.
References Baker, P. and T. McEnery. 2005. A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Poli0cs 4(2), p. 197 – 226. Cvrček, V. and M. Fidler. 2019. “Up close and personal vs. birds-eye view” of discourse: a corpus study of perspecVve using Czech data. ICLC15 - InternaVonal CongiVve LinguisVcs Conference, Nishinomya. Japan. h[ps://www.brown.edu/research/projects/needle-in- haystack/sites/brown.edu.research.projects.needle-in-haystack/files/uploads/ICLC2019-FINAL.pdf Cvrček, V. and M. Fidler. 2022 (scheduled to appear). No keyword is an island: In search for covert associaVons. Corpora. 17(2) (h[ps://arxiv.org/abs/2103.17114) Egbert, J. and D. Biber. 2019. ‘IncorporaVng text dispersion into keyword analyses’, Corpora 14(1), pp. 77–104. Fidler, M. and V. Cvrček. 2015. ‘A Data-Driven Analysis of Reader Viewpoints: ReconstrucVng the Historical Reader Using Keyword Analysis’, Journal of Slavic Linguis0cs 23(2), pp. 197–239. Fidler, M. and V. Cvrček. 2018. ‘Going Beyond “Aboutness”: A QuanVtaVve Analysis of Sputnik Czech Republic’ in M. Fidler and V. Cvrček (eds.) Taming the corpus: From inflec0on and lexis to interpreta0on, pp. 195–225. Cham: Springer. Fidler, M. and V. Cvrček. 2019. ‘Keymorph analysis, or how morphosyntax informs discourse’, Corpus Linguis0cs and Linguis0c Theory, 15(1), pp. 39–70. Fillmore, C. J. 1982. Frame SemanVcs. Linguis0cs in the Morning Calm. Ed. The LinguisVc Society of Korea, 111-137. Seoul: Korea. ParVngton, A. (2004). Corpora and discourse, a most congruous beast. Corpora and discourse, 11-20. Reisigl M. and R. Wodak. 2016. The discourse-historical approach (DHA). In Methods of Cri0cal Discourse Studies, ed by R. Wodak and M. Meyer. 23- 61. Schegloff, E.A. 1993. ReflecVons on quanVficaVon in the study of conversaVon. Research on Language and Social Interac0on 26: 99–128. Sco[, M. 2010. ‘Problems in invesVgaVng keyness, or clearing the undergrowth and marking out trails…’ in M. Bondi and M. Sco[ (eds.) Keyness in Texts, pp. 43–58. Amsterdam/Philadelphia: John Benjamins. Sco[, M. and C. Tribble. 2006. Textual PaKerns: Key words and corpus analysis in language educa0on. Philadelphia: John Benjamins Šlerka, J. h[ps://www.nfnz.cz/studie-a-analyzy/typologie-domacich-zpravodajskych-webu/ ten Have, Paul. 2007. Doing ConversaVon Analysis (2nd ediVon). SAGE Publ., Los Angeles/London. van Dijk, T. A. 2016. CriVcal discourse studies: a sociocogniVve approach. In Methods of Cri0cal Discourse Studies, ed. By R. Wodak and M. Meyer. 62- 85.
Thank you for the attention. Děkujeme za pozornost.
Companions II Race-related issues (BLM, G. Floyd’s death) coverage in 5/2021– 11/2021 MS: bílý|běloch ‘white’ and černý|černoch ‘black’ rho = 0.34
Companions II Race-related issues (BLM, G. Floyd’s death) coverage in 5/2021– 11/2021 ANTS: bílý|běloch ‘white’ and černý|černoch ‘black’ rho = 0.86
You can also read