Register matters in probabilistic grammatical knowledge - Lirias
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Register matters in probabilistic grammatical knowledge ꟷ A programmatic sketch and two case studies on syntactic alternations in English Alexandra Engel 08/02/2021 Supervisors: Benedikt Szmrecsanyi, Jason Grafmiller, Laura Rosseel, Freek Van de Velde Research group: Quantitative Lexicology and Variational Linguistics (QLVL), KU Leuven 1/ 32
Structure 1. Project outline ▪ Background: Two approaches to register variation ▪ Probabilistic grammar framework ▪ Research questions ▪ Methodology 2. Case study on the English dative alternation 3. Case study on the English future marker alternation 4. General discussion Alexandra Engel | CECL seminar 08/02/2021 2/ 32
1. Project outline: Background Two approaches to register variation Text-linguistic approach Variationist approach • texts (or sub-corpora) • ‘variables’ ( alternations) • frequencies of (co-)occurrence “alternate ways of saying ‘the • functional relationship with same’ thing” (Labov 1972: 188) situational context • probabilistic preferences for one • often used method: variant over the other Multi-dimensional analysis (Biber 1988) • logistic regression analysis identification of underlying (or Varbrul analysis) dimensions of variation & • language-internal and language- interpretation of situational external constraints that condition characteristics the choice of a variant, e.g. • robust in cross-linguistic comparison: animacy, length, definiteness etc., ‘oral’ vs. ‘literate’ & narrative vs. and age, gender, region or register non-narrative (Szmrecsanyi 2019) (Biber 2012, 2014, 2019; Biber et al. 2016) Alexandra Engel | CECL seminar 08/02/2021 3/ 32
1. Project outline: Background However: Traditional Labovian Corpus-based variationist sociolinguistics variationist linguistics • assumption that core grammar • reliance on text categories is stable and “internal included in (large) corpora, constraints are normally often only written texts independent of social and included stylistic factors” (Labov 2010: 265) • register as “nuisance factor” • focus on vernacular (Szmrecsanyi 2019: 77) • often sociolinguistic interview • treated as main effect or corpora (only one register) random effect (e.g. Ehmer & Rosemeyer 2018; Geleyn 2017) • little research into register • not systematically studied in differences (cf. Rickford 2014: 590) interaction with language- internal factors (but see Theijssen et al. 2013; Grafmiller 2014) Alexandra Engel | CECL seminar 08/02/2021 4/ 32
Probabilistic grammar framework • Probabilistic grammars describe usage patterns of syntactic alternations as a function of quantifiable probabilistic constraints. • Assumption that speakers are sensitive to these probabilistic constraints • usage-based approach knowledge is gradient and based on knowledge of actual language use and the generalizations made upon usage events (Bybee 2006) • Quantitative, corpus-linguistic approach with regression analysis as a statistical method Alexandra Engel | CECL seminar 08/02/2021 5/ 32
1. Project outline: Research questions RQ1: Where do we find most register-related variability with regard to probabilistic grammar - along the continuum of formality (formal vs. informal) or between modes (written vs. spoken)? RQ2: Which probabilistic constraints are particularly variable across registers? RQ3: Are language users sensitive to register-specific probabilistic effects? RQ4: Do closely related languages such as English and Dutch differ in terms of the importance of probabilistic register differences? Alexandra Engel | CECL seminar 08/02/2021 6/ 32
1. Project outline: Overview • Probabilistic grammar framework - two grammatical alternations: dative alternation, future marker alternation - two languages: English and Dutch - combination of two methodologies: corpus study and rating task experiment • registers as variation patterns associated with characteristics of the situational context of production in both speech and writing reliance on customary text categories Alexandra Engel | CECL seminar 08/02/2021 7/ 32
1. Project outline: Methodology Operationalisation of register at the intersection of formality and mode (Koch & Oesterreicher 1985, 2012) chats Alexandra Engel | CECL seminar 08/02/2021 8/ 32
1. Project outline: Methodology Balanced datasets of 2,600 observations: 650 observations per register 2600 observations spoken informal (conversations between family 325 variant A members and friends) 325 variant B formal 325 variant A (parliamentary debates) 325 variant B written informal 325 variant A (English: blogs; Dutch: chats) 325 variant B formal (newspaper articles) 325 variant A 325 variant B Annotation of language-internal constraints Alexandra Engel | CECL seminar 08/02/2021 9/ 32
1. Project outline: Methodology Rating task experiment (cf. Bresnan & Ford 2010; Ford & Bresnan 2013) “Which continuation sounds most natural to you given the context?” - Gradient ratings via a slider bar - Target items from the whole probability range - Filler items to distract from the target construction ➢ more substantial conclusions from converging results for corpus research and ratings (Klavan & Divjak 2016) ➢ better understanding of how processes/factors in language production (corpus data) and language processing (experimental data) are related and how we can optimize linguistic methodologies to study these processes/factors (Arppe et al. 2010: 5; Schönefeld 2011: 3f.) Alexandra Engel | CECL seminar 08/02/2021 10/ 32
2. Case study: English dative alternation - Design (1) a. ditransitive dative: Sue gives [the plants]recipient [water]theme b. prepositional dative: Sue gives [water]theme to [the plants]recipient • Random sample of 2,600 observations of give • Variable context: exclusion of instances with particle verbs, clausal constituents, fixed expressions, passive constructions, relative clauses • Language-internal constraints: pronominality, animacy, complexity, length, frequency of the constituents, and verb sense • Model: Response variable as a function of Register in interaction with RecipientDefiniteness, ThemeDefiniteness, and WeightRatio and their main effects as well as main effects of other language-internal constraints, random effects for speaker, recipient lemma and theme lemma Alexandra Engel | CECL seminar 08/02/2021 11/ 32
2. Case study: English dative alternation - Results C = 0.97 All registers: PD more likely when the recipient is indefinite Effect size modulated by register Largest effect in spoken informal register, smallest effect in spoken formal register (p = 0.006) Alexandra Engel | CECL seminar 08/02/2021 12/ 32
2. Case study: English dative alternation - Results C = 0.97 In all registers except spoken formal: PD more likely when the theme is definite; direction of effect is reversed in spoken formal register (p = 0.005) Effect size is modulated by register Largest effect in spoken informal register, smallest effects in both formal registers Alexandra Engel | CECL seminar 08/02/2021 13/ 32
2. Case study: English dative alternation Experimental design Material: 32 items (corpus excerpts) spoken formal spoken informal 10 filler items 10 filler items - 6 relativizer (which vs. that) - 6 relativizer (which vs. that) - 4 lexical choice - 4 lexical choice 6 target items 6 target items Criteria (target items): • simple, non-pronominal constituents • definite recipient • no dative constructions or give in the context 3 seen and 3 unseen items per register Alexandra Engel | CECL seminar 08/02/2021 14/ 32
2. Case study: English dative alternation – Experimental design Material: dative probability theme seen/ predicted variant bin definite- unseen probability of ness the PD DO 1 indefinite unseen SF: 0.05 SI: 0.1 DO 2 definite unseen SF: 0.25 SI: 0.24 PD 3 indefinite seen SF: 0.47 SI: 0.45 DO 4 definite seen SF: 0.5 SI: 0.64 PD 5 indefinite unseen SF: 0.75 SI: 0.82 PD 6 definite seen SF: 0.93 SI: 0.99 Alexandra Engel | CECL seminar 08/02/2021 15/ 32
2. Case study: English dative alternation – Experimental design Material: • Two lists: Presentation side of the original variant • Two versions per list: – blocked presentation of all items per register – version A: spoken formal – spoken informal – version B: spoken informal – spoken formal • not more than 2 consecutive items of the same type • 8 yes/no comprehension questions Alexandra Engel | CECL seminar 08/02/2021 16/ 32
2. Case study: English dative alternation – Experiment Alexandra Engel | CECL seminar 08/02/2021 17/ 32
2. Case study: English dative alternation – Experiment Participants: • 100 British English native speakers (sampling: Qualtrics Research Services) • 50 male, 50 female; mean age: 55 years old (range: 19-78; IQR: 47-65) Mean overall duration: 26 minutes (outliers: 7 participants who took >40 minutes to complete the survey) Mean accuracy (comprehension questions): 84% (after exclusion of 4 participants with < 75% accuracy and 7 outlier participants) Alexandra Engel | CECL seminar 08/02/2021 18/ 32
2. Case study: English dative alternation – Experiment results = 0.21 (p < 0.001) Main effect for predicted probability based on the corpus model ( = 0.29, p < 0.001) Alexandra Engel | CECL seminar 08/02/2021 19/ 32
2. Case study: English dative alternation – Experiment results Interaction between register and theme definiteness (p = 0.002) in line with corpus model predictions Alexandra Engel | CECL seminar 08/02/2021 20/ 32
2. Case study: English dative alternation – Experiment results Interaction between register and filler type (p = 0.003) stronger preferences for variants in the lexical choice items than for variants of relativizer choice (which vs. that) Alexandra Engel | CECL seminar 08/02/2021 21/ 32
2. Case study: English dative alternation - Discussion • Main effects in line with ‘harmonic alignment’ effects found by previous research (Bresnan et al. 2007; Bresnan & Hay 2008; Theijssen et al. 2013; Röthlisberger et al. 2017) • ‘Easy First’ bias (MacDonald 2013): first constituent tends to be simple, short, animate, and definite • Register interacts with definiteness of both theme and recipient definiteness linked to accessibility of constituents (cf. Gundel et al. 1993, 2012) different processing demands in spoken and written production Alexandra Engel | CECL seminar 08/02/2021 22/ 32
2. Case study: English dative alternation - Discussion • Rating data correlate with corpus predictions • Interaction between register and theme definiteness register-specific effects are subtle language users still seem to be sensitive to such subtle effects • Overall: small portion of variance (R²) explained by the model for the target items, higher R² for model of filler items great deal of individual variation (cf. Verhagen & Mos 2016; Verhagen et al. 2020) inclusion of register-sensitive fillers may have triggered participants to adjust their scale use for the target items Alexandra Engel | CECL seminar 08/02/2021 23/ 32
3. Case study: English future marker alternation - Design (2) a. will : I think that 2021 will be a good year. b. be going to : I think that 2021 is going to be a good year. • Random sample of 2,600 observations of will and be going to • Variable context: exclusion of instances with nominal will, lexical go, be going to in past tense, tag-questions • Language-internal constraints: verb type, sentence type, clause type, polarity, animacy of the subject, grammatical person, presence of temporal adverb(ial), proximity of future time reference • Model: Response variable as a function of Register in interaction with all language-internal constraints and their main effects, random effects for speaker and lexical verb Alexandra Engel | CECL seminar 08/02/2021 24/ 32
3. Case study: English future marker alternation - Results C = 0.74 - all registers: effect for first person subjects significantly different from spoken informal - preference for be going to strongest in written formal register when first person pronoun as a subject (p < 0.001) Alexandra Engel | CECL seminar 08/02/2021 25/ 32
3. Case study: English future marker alternation - Results C = 0.74 - effect reversed in written informal (ns) - preference for be going to strongest in written formal register when polarity is negative (p = 0.03) Alexandra Engel | CECL seminar 08/02/2021 26/ 32
3. Case study: English future marker alternation - Results C = 0.74 - effect reversed in written registers (written informal: p = 0.02; written formal: p = 0.002) - preference for be going to strongest in written formal register when verb type is stative Alexandra Engel | CECL seminar 08/02/2021 27/ 32
3. Case study: English future marker alternation - Results C = 0.74 - effect for proximate time reference reversed in spoken formal register compared to spoken informal register (p = 0.02) - effect for non-proximate contexts reversed (will preferred) in all registers compared to spoken informal (spoken formal: p = 0.02; written informal: p = 0.03; written formal: p = 0.001) Alexandra Engel | CECL seminar 08/02/2021 28/ 32
3. Case study: English future marker alternation - Results C = 0.74 - effect reversed in written registers compared to spoken registers (written informal: p = 0.02; written formal p = 0.03) Alexandra Engel | CECL seminar 08/02/2021 29/ 32
3. Case study: English future marker alternation - Discussion • Main effects in line with previous research: be going to favored in interrogative sentences, subclauses, if-subclauses (Szmrecsanyi 2003; Torres- Cacoullos & Walker 2009; Tagliamonte et al. 2014; Denis & Tagliamonte 2018) • be going to favored with stative verbs (in contrast to Torres-Cacoullos & Walker 2009) • 5 interaction effects alternation with a great deal of variability across registers • Difference between spoken and written registers more pronounced than between formal and informal registers • be going to as a grammaticalization phenomenon written registers seem to be more conservative than spoken informal register Alexandra Engel | CECL seminar 08/02/2021 30/ 32
4. General discussion • Register-specific effects seem to be robust in syntactic alternations • Degree and nature of register-specificity depends on alternation under scrutiny: – variability along formality continuum in the dative alternation – variability between modes in the future marker alternation • Language users seem to be sensitive to probabilistic effects • ‘Grammatical Difference Hypothesis’ ≈ speakers of a language with multiple registers are in fact multilingual due to register-specific probabilistic grammars (Guy 2015) Alexandra Engel | CECL seminar 08/02/2021 31/ 32
Thank you! thx Contact: alexandra.engel@kuleuven.be Alexandra Engel | CECL seminar 08/02/2021 32/ 32
References Arppe, Antti, Gilquin, Gaëtanelle, Glynn, Dylan, Hilpert, Martin & Zeschel, Arne. 2010. Cognitive corpus linguistics: Five points of debate on current theory and methodology. Corpora 5(1): 1–27. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory 8(1): 9-37. https://doi.org/10.1515/cllt-2012-0002 Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1).7–34. Biber, Douglas. 2019. Text-linguistic approaches to register variation. Register Studies 1(1): 42-75. Biber, Douglas, Egbert, Jesse, Gray, Bethany, Oppliger, Rahel & Szmrecsanyi, Benedikt. 2016. Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of head nouns. In The Cambridge Handbook of English Historical Linguistics, Merja Kytö & Päivi Pahta (eds), 351-375. Cambridge: Cambridge University Press. Bresnan, Joan, Cueni, Anna, Nikitina, Tatiana & Baayen, R. Harald. 2007. Predicting the dative alternation. In Cognitive Foundations of Interpretation, Gerlof Boume, Irene Kraemer & Joost Zwarts (eds), 69-94. Amsterdam: Royal Netherlands Academy of Arts and Sciences. Alexandra Engel | CECL seminar 08/02/2021
References Bresnan, Joan & Ford, Marilyn. 2010. Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language 86(1), 168-213. Bresnan, Joan & Hay, Jennifer. 2008. Gradient grammar: An effect of animacy on the syntax of give in New Zealand and American English. Lingua 118(2): 245-259. Bybee, Joan L. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4): 711- 733. Denis, Derek & Tagliamonte, Sali A. 2018. The changing future: Competition, specialization and reorganization in the contemporary English future temporal reference system. English Language and Linguistics 22(3): 403–30. https://doi.org/10.1017/S1360674316000551. Ford, Marilyn & Bresnan, Joan. 2013. Using convergent evidence from psycholinguistics and usage. In Research Methods in Language Variation and Change, Manfred Krug & Julia Schlüter (eds), 295- 312. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511792519.020 Geleyn, Tim. 2017. Syntactic variation and diachrony: The case of the Dutch dative alternation. Corpus Linguistics and Linguistic Theory 13(1): 65-96. https://doi.org/10.1515/cllt-2015-0062 Grafmiller, Jason. 2014. Variation in English Genitives Across Modality and Genres. English Language and Linguistics 18(3): 471–96. https://doi.org/10.1017/S1360674314000136. Gundel, Jeanette K., Hedberg, Nancy & Zacharski, Ron. 1993. Cognitive status and the form of referring expressions in discourse. Language 69(2): 274-307. https://www.jstor.org/stable /416535 Alexandra Engel | CECL seminar 08/02/2021
References Gundel, Jeanette K., Hedberg, Nancy & Zacharski, Ron 2012. Underspecification of cognitive status in reference production: Some empirical predictions. Topics in Cognitive Science 4(2): 249-268. Guy, Gregory R. 2015. Coherence, constraints and quantities. Paper presented at New Ways of Analyzing Variation (NWAV) 44, University of Toronto. Klavan, Jane & Divjak, Dagmar. 2016. The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence. Folia Linguistica 50(2): 355-384. https://doi.org/10.1515/flin-2016-0014 Koch, Peter, and Wulf Oesterreicher. 1985. Sprache der Nähe - Sprache der Distanz. Mündlichkeit und Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte. In Romanistisches Jahrbuch, Vol. 36, 15–43. Berlin/New York: Walter de Gruyter. Koch, Peter & Oesterreicher, Wulf. 2012. Language of immediacy – Language of distance: Orality and literacy from the perspective of language theory and linguistic history. In Communicative spaces: Variation, contact, and change, Claudia Lange, Beatrix Weber & Göran Wolf (eds.), 441–473. Frankfurt: Lang. Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. MacDonald, Maryellen C. 2013. How language production shapes language form and comprehension. Frontiers in Psychology 4, 226. Rickford, John R. 2014. Situation: Stylistic variation in sociolinguistic corpora and theory. Language and Linguistics Compass 8(11): 590-603. https://doi.org/10.1111/lnc3.12110 Röthlisberger, Melanie, Grafmiller, Jason & Szmrecsanyi, Benedikt. 2017. Cognitive indigenization effects in the English dative alternation. Cognitive Linguistics 28(4): 673-710. Alexandra Engel | CECL seminar 08/02/2021
References Schönefeld, Doris. 2011. Introduction: On evidence and the convergence of evidence in linguistic research. In Converging Evidence: Methodological and Theoretical Issues for Linguistic Research, Doris Schönefeld (ed.), 1-31. Amsterdam: Benjamins. https://doi.org/10.1075/hcp.33.03sch Szmrecsanyi, Benedikt. 2003. Be going to versus will/shall: Does syntax matter? Journal of English Linguistics 31(4): 295–323. Szmrecsanyi, Benedikt. 2019. Register in variationist linguistics. Register Studies 1(1): 76-99. https://doi.org/10.1075/rs.18006.szm Tagliamonte, Sali A., Durham, Mercedes & Smith, Jennifer. 2014. Grammaticalization at an early stage: Future be going to in conservative British dialects. English Language and Linguistics 18(1): 75–108. Theijssen, Daphne, Bosch, Louis ten, Boves, Lou, Cranen, Bert & van Halteren, Hans. 2013. Choosing alternatives: Using Bayesian networks and memory-based learning to study the dative alternation. Corpus Linguistics and Linguistic Theory 9(2): 227-262. https://doi.org/10.1515/cllt-2013-0007 Torres Cacoullos, Rena & Walker, James A. 2009. The present of the English future: Grammatical variation and collocations in discourse. Language 85(2): 321–54. Verhagen, Véronique & Mos, Maria. 2016. Stability of familiarity judgments: Individual variation and the invariant bigger picture. Cognitive Linguistics 27(3): 307-344. https://doi.org/10.1515/cog-2015- 0063 Verhagen, Véronique, Mos, Maria, Schilperoord, Joost & Backus, Ad. 2020. Variation is information: Analyses of variation across items, participants, time, and methods in metalinguistic judgment data. Linguistics 58 (1): 37–81. https://doi.org/10.1515/ling-2018-0036. Alexandra Engel | CECL seminar 08/02/2021
Distribution of recipient definiteness across registers Alexandra Engel | CECL seminar 08/02/2021
Distribution of theme definiteness across registers Alexandra Engel | CECL seminar 08/02/2021
Distribution of pronominality for levels of recipient definiteness across registers Alexandra Engel | CECL seminar 08/02/2021
Distribution of pronominality for levels of theme definiteness across registers Alexandra Engel | CECL seminar 08/02/2021
Frequency of theme lemma per register Alexandra Engel | CECL seminar 08/02/2021
Experimental items (per participant)
Experimental items (per participant)
Frequency of lexical verb per register Alexandra Engel | CECL seminar 08/02/2021
You can also read