Register matters in probabilistic grammatical knowledge - Lirias

Page created by Emma Pena
 
CONTINUE READING
Register matters in probabilistic grammatical knowledge - Lirias
Register matters in
 probabilistic grammatical knowledge
 ꟷ
 A programmatic sketch and two case studies on syntactic
 alternations in English

 Alexandra Engel
 08/02/2021

Supervisors: Benedikt Szmrecsanyi, Jason Grafmiller, Laura Rosseel, Freek Van de Velde
Research group: Quantitative Lexicology and Variational Linguistics (QLVL), KU Leuven

 1/ 32
Register matters in probabilistic grammatical knowledge - Lirias
Structure

1. Project outline
 ▪ Background: Two approaches to register variation
 ▪ Probabilistic grammar framework
 ▪ Research questions
 ▪ Methodology
2. Case study on the English dative alternation
3. Case study on the English future marker alternation
4. General discussion

 Alexandra Engel | CECL seminar 08/02/2021
 2/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Background
 Two approaches to register variation

 Text-linguistic approach Variationist approach
• texts (or sub-corpora) • ‘variables’ ( alternations)
• frequencies of (co-)occurrence “alternate ways of saying ‘the
• functional relationship with same’ thing” (Labov 1972: 188)
 situational context • probabilistic preferences for one
• often used method: variant over the other
 Multi-dimensional analysis (Biber 1988) • logistic regression analysis
  identification of underlying (or Varbrul analysis)
 dimensions of variation & • language-internal and language-
 interpretation of situational external constraints that condition
 characteristics the choice of a variant, e.g.
• robust in cross-linguistic comparison: animacy, length, definiteness etc.,
 ‘oral’ vs. ‘literate’ & narrative vs. and age, gender, region or register
 non-narrative (Szmrecsanyi 2019)
 (Biber 2012, 2014, 2019; Biber et al. 2016)
 Alexandra Engel | CECL seminar 08/02/2021
 3/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Background

However:
 Traditional Labovian Corpus-based
 variationist sociolinguistics variationist linguistics

• assumption that core grammar • reliance on text categories
 is stable and “internal included in (large) corpora,
 constraints are normally often only written texts
 independent of social and included
 stylistic factors” (Labov 2010: 265) • register as “nuisance factor”
• focus on vernacular (Szmrecsanyi 2019: 77)
• often sociolinguistic interview • treated as main effect or
 corpora (only one register) random effect (e.g. Ehmer &
 Rosemeyer 2018; Geleyn 2017)
• little research into register
 • not systematically studied in
 differences (cf. Rickford 2014: 590)
 interaction with language-
 internal factors (but see Theijssen et
 al. 2013; Grafmiller 2014)

 Alexandra Engel | CECL seminar 08/02/2021
 4/ 32
Register matters in probabilistic grammatical knowledge - Lirias
Probabilistic grammar framework

• Probabilistic grammars describe usage patterns of syntactic alternations as a
 function of quantifiable probabilistic constraints.

• Assumption that speakers are sensitive to these probabilistic constraints

• usage-based approach
  knowledge is gradient and based on knowledge of actual language use
 and the generalizations made upon usage events (Bybee 2006)

• Quantitative, corpus-linguistic approach with regression analysis as a
 statistical method

 Alexandra Engel | CECL seminar 08/02/2021
 5/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Research questions

RQ1: Where do we find most register-related variability with regard to
 probabilistic grammar - along the continuum of formality (formal vs.
 informal) or between modes (written vs. spoken)?

RQ2: Which probabilistic constraints are particularly variable across
 registers?

RQ3: Are language users sensitive to register-specific probabilistic effects?

RQ4: Do closely related languages such as English and Dutch differ in terms
 of the importance of probabilistic register differences?

 Alexandra Engel | CECL seminar 08/02/2021
 6/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Overview

• Probabilistic grammar framework

- two grammatical alternations: dative alternation, future marker alternation

- two languages: English and Dutch

- combination of two methodologies: corpus study and rating task experiment

• registers as variation patterns associated with characteristics of the
 situational context of production in both speech and writing
  reliance on customary text categories

 Alexandra Engel | CECL seminar 08/02/2021
 7/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Methodology

Operationalisation of register at the intersection of formality and mode
(Koch & Oesterreicher 1985, 2012)

 chats

 Alexandra Engel | CECL seminar 08/02/2021
 8/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Methodology

Balanced datasets of 2,600 observations: 650 observations per register

2600 observations spoken informal (conversations between family 325 variant A
 members and friends)
 325 variant B

 formal 325 variant A
 (parliamentary debates)
 325 variant B

 written informal 325 variant A
 (English: blogs; Dutch: chats) 325 variant B

 formal (newspaper articles) 325 variant A

 325 variant B

 Annotation of language-internal constraints

 Alexandra Engel | CECL seminar 08/02/2021
 9/ 32
Register matters in probabilistic grammatical knowledge - Lirias
1. Project outline: Methodology

Rating task experiment (cf. Bresnan & Ford 2010; Ford & Bresnan 2013)

“Which continuation sounds most natural to you given the context?”
- Gradient ratings via a slider bar
- Target items from the whole probability range
- Filler items to distract from the target construction

➢ more substantial conclusions from converging results for corpus research and
 ratings (Klavan & Divjak 2016)
➢ better understanding of how processes/factors in language production
 (corpus data) and language processing (experimental data) are related and
 how we can optimize linguistic methodologies to study these
 processes/factors (Arppe et al. 2010: 5; Schönefeld 2011: 3f.)

 Alexandra Engel | CECL seminar 08/02/2021
 10/ 32
2. Case study: English dative alternation - Design

 (1) a. ditransitive dative: Sue gives [the plants]recipient [water]theme
 b. prepositional dative: Sue gives [water]theme to [the plants]recipient

• Random sample of 2,600 observations of give
• Variable context: exclusion of instances with particle verbs, clausal
 constituents, fixed expressions, passive constructions, relative clauses

• Language-internal constraints:
 pronominality, animacy, complexity, length, frequency of the
 constituents, and verb sense

• Model: Response variable as a function of Register in interaction with
 RecipientDefiniteness, ThemeDefiniteness, and WeightRatio and their main
 effects as well as main effects of other language-internal constraints, random
 effects for speaker, recipient lemma and theme lemma

 Alexandra Engel | CECL seminar 08/02/2021
 11/ 32
2. Case study: English dative alternation - Results
 C = 0.97

All registers:
PD more likely when the recipient is indefinite

Effect size modulated by register
 Largest effect in spoken informal register, smallest effect in spoken formal register
 (p = 0.006)
 Alexandra Engel | CECL seminar 08/02/2021
 12/ 32
2. Case study: English dative alternation - Results
 C = 0.97

In all registers except spoken formal:
PD more likely when the theme is definite; direction of effect is reversed in spoken formal
register (p = 0.005)

Effect size is modulated by register
 Largest effect in spoken informal register, smallest effects in both formal registers
 Alexandra Engel | CECL seminar 08/02/2021
 13/ 32
2. Case study: English dative alternation
 Experimental design

Material: 32 items (corpus excerpts)

 spoken formal spoken informal
 10 filler items 10 filler items
 - 6 relativizer (which vs. that) - 6 relativizer (which vs. that)
 - 4 lexical choice - 4 lexical choice
 6 target items 6 target items

Criteria (target items):
• simple, non-pronominal constituents
• definite recipient
• no dative constructions or give in the context

 3 seen and 3 unseen items per register

 Alexandra Engel | CECL seminar 08/02/2021
 14/ 32
2. Case study: English dative alternation –
 Experimental design

Material: dative probability theme seen/ predicted
 variant bin definite- unseen probability of
 ness the PD
 DO 1 indefinite unseen SF: 0.05
 SI: 0.1
 DO 2 definite unseen SF: 0.25
 SI: 0.24
 PD 3 indefinite seen SF: 0.47
 SI: 0.45
 DO 4 definite seen SF: 0.5
 SI: 0.64
 PD 5 indefinite unseen SF: 0.75
 SI: 0.82
 PD 6 definite seen SF: 0.93
 SI: 0.99

 Alexandra Engel | CECL seminar 08/02/2021
 15/ 32
2. Case study: English dative alternation –
 Experimental design

Material:
• Two lists: Presentation side of the original variant
• Two versions per list:
 – blocked presentation of all items per register
 – version A: spoken formal – spoken informal
 – version B: spoken informal – spoken formal
• not more than 2 consecutive items of the same type

• 8 yes/no comprehension questions

 Alexandra Engel | CECL seminar 08/02/2021
 16/ 32
2. Case study: English dative alternation –
 Experiment

 Alexandra Engel | CECL seminar 08/02/2021
 17/ 32
2. Case study: English dative alternation –
 Experiment

Participants:
• 100 British English native speakers (sampling: Qualtrics Research Services)
• 50 male, 50 female; mean age: 55 years old (range: 19-78; IQR: 47-65)

Mean overall duration: 26 minutes
(outliers: 7 participants who took >40 minutes to complete the survey)

Mean accuracy (comprehension questions): 84%
(after exclusion of 4 participants with < 75% accuracy and 7 outlier
participants)

 Alexandra Engel | CECL seminar 08/02/2021
 18/ 32
2. Case study: English dative alternation –
 Experiment results
 = 0.21 (p < 0.001)

Main effect for predicted
probability based on the
corpus model ( = 0.29,
p < 0.001)

 Alexandra Engel | CECL seminar 08/02/2021
 19/ 32
2. Case study: English dative alternation –
 Experiment results
Interaction between
register and theme
definiteness (p = 0.002)

 in line with corpus
model predictions

 Alexandra Engel | CECL seminar 08/02/2021
 20/ 32
2. Case study: English dative alternation –
 Experiment results

Interaction between register and filler type (p = 0.003)
 stronger preferences for variants in the lexical choice items
than for variants of relativizer choice (which vs. that)

 Alexandra Engel | CECL seminar 08/02/2021
 21/ 32
2. Case study: English dative alternation -
 Discussion

• Main effects in line with ‘harmonic alignment’ effects found by previous
 research (Bresnan et al. 2007; Bresnan & Hay 2008; Theijssen et al. 2013; Röthlisberger et al.
 2017)

• ‘Easy First’ bias (MacDonald 2013): first constituent tends to be simple, short,
 animate, and definite

• Register interacts with definiteness of both theme and recipient

  definiteness linked to accessibility of constituents (cf. Gundel et al. 1993, 2012)

  different processing demands in spoken and written production

 Alexandra Engel | CECL seminar 08/02/2021
 22/ 32
2. Case study: English dative alternation -
 Discussion

• Rating data correlate with corpus predictions

• Interaction between register and theme definiteness

 register-specific effects are subtle

 language users still seem to be sensitive to such subtle effects

• Overall: small portion of variance (R²) explained by the model for the target
 items, higher R² for model of filler items

 great deal of individual variation (cf. Verhagen & Mos 2016; Verhagen et al. 2020)

 inclusion of register-sensitive fillers may have triggered participants to adjust
 their scale use for the target items
 Alexandra Engel | CECL seminar 08/02/2021
 23/ 32
3. Case study: English future marker alternation -
 Design

(2) a. will : I think that 2021 will be a good year.
 b. be going to : I think that 2021 is going to be a good year.

• Random sample of 2,600 observations of will and be going to
• Variable context: exclusion of instances with nominal will, lexical go, be
 going to in past tense, tag-questions

• Language-internal constraints:
 verb type, sentence type, clause type, polarity, animacy of the
 subject, grammatical person, presence of temporal adverb(ial),
 proximity of future time reference

• Model: Response variable as a function of Register in interaction with all
 language-internal constraints and their main effects, random effects for
 speaker and lexical verb
 Alexandra Engel | CECL seminar 08/02/2021
 24/ 32
3. Case study: English future marker alternation -
 Results
 C = 0.74

- all registers: effect for first person subjects significantly different from spoken informal
- preference for be going to strongest in written formal register when first person
 pronoun as a subject (p < 0.001)

 Alexandra Engel | CECL seminar 08/02/2021
 25/ 32
3. Case study: English future marker alternation -
 Results
 C = 0.74

- effect reversed in written informal (ns)
- preference for be going to strongest in written formal register
 when polarity is negative (p = 0.03)
 Alexandra Engel | CECL seminar 08/02/2021
 26/ 32
3. Case study: English future marker alternation -
 Results
 C = 0.74

- effect reversed in written registers
 (written informal: p = 0.02; written formal: p = 0.002)
- preference for be going to strongest in written formal register when verb
 type is stative Alexandra Engel | CECL seminar 08/02/2021
 27/ 32
3. Case study: English future marker alternation -
 Results
 C = 0.74

- effect for proximate time reference reversed in spoken formal register compared to
 spoken informal register (p = 0.02)
- effect for non-proximate contexts reversed (will preferred) in all registers compared
 to spoken informal
 (spoken formal: p = 0.02; written informal: p = 0.03; written formal: p = 0.001)
 Alexandra Engel | CECL seminar 08/02/2021
 28/ 32
3. Case study: English future marker alternation -
 Results
 C = 0.74

- effect reversed in written registers compared to spoken registers
 (written informal: p = 0.02; written formal p = 0.03)

 Alexandra Engel | CECL seminar 08/02/2021
 29/ 32
3. Case study: English future marker alternation -
 Discussion

• Main effects in line with previous research: be going to favored in
 interrogative sentences, subclauses, if-subclauses (Szmrecsanyi 2003; Torres-
 Cacoullos & Walker 2009; Tagliamonte et al. 2014; Denis & Tagliamonte 2018)
• be going to favored with stative verbs (in contrast to Torres-Cacoullos & Walker 2009)

• 5 interaction effects  alternation with a great deal of variability across
 registers
• Difference between spoken and written registers more pronounced than
 between formal and informal registers

• be going to as a grammaticalization phenomenon  written registers seem
 to be more conservative than spoken informal register

 Alexandra Engel | CECL seminar 08/02/2021
 30/ 32
4. General discussion

• Register-specific effects seem to be robust in syntactic alternations

• Degree and nature of register-specificity depends on alternation under
 scrutiny:
 – variability along formality continuum in the dative alternation

 – variability between modes in the future marker alternation

• Language users seem to be sensitive to probabilistic effects

• ‘Grammatical Difference Hypothesis’ ≈ speakers of a language with multiple
 registers are in fact multilingual due to register-specific probabilistic
 grammars (Guy 2015)

 Alexandra Engel | CECL seminar 08/02/2021
 31/ 32
Thank you! thx

 Contact: alexandra.engel@kuleuven.be

 Alexandra Engel | CECL seminar 08/02/2021
 32/ 32
References

Arppe, Antti, Gilquin, Gaëtanelle, Glynn, Dylan, Hilpert, Martin & Zeschel, Arne. 2010. Cognitive corpus
 linguistics: Five points of debate on current theory and methodology. Corpora 5(1): 1–27.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic
 Theory 8(1): 9-37. https://doi.org/10.1515/cllt-2012-0002
Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register
 variation. Languages in Contrast 14(1).7–34.
Biber, Douglas. 2019. Text-linguistic approaches to register variation. Register Studies 1(1): 42-75.
Biber, Douglas, Egbert, Jesse, Gray, Bethany, Oppliger, Rahel & Szmrecsanyi, Benedikt. 2016.
 Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of
 head nouns. In The Cambridge Handbook of English Historical Linguistics, Merja Kytö & Päivi Pahta
 (eds), 351-375. Cambridge: Cambridge University Press.
Bresnan, Joan, Cueni, Anna, Nikitina, Tatiana & Baayen, R. Harald. 2007. Predicting the dative
 alternation. In Cognitive Foundations of Interpretation, Gerlof Boume, Irene Kraemer & Joost Zwarts
 (eds), 69-94. Amsterdam: Royal Netherlands Academy of Arts and Sciences.

 Alexandra Engel | CECL seminar 08/02/2021
References

Bresnan, Joan & Ford, Marilyn. 2010. Predicting syntax: Processing dative constructions in American
 and Australian varieties of English. Language 86(1), 168-213.
Bresnan, Joan & Hay, Jennifer. 2008. Gradient grammar: An effect of animacy on the syntax of give in
 New Zealand and American English. Lingua 118(2): 245-259.
Bybee, Joan L. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4): 711-
 733.
Denis, Derek & Tagliamonte, Sali A. 2018. The changing future: Competition, specialization and
 reorganization in the contemporary English future temporal reference system. English Language and
 Linguistics 22(3): 403–30. https://doi.org/10.1017/S1360674316000551.
Ford, Marilyn & Bresnan, Joan. 2013. Using convergent evidence from psycholinguistics and usage. In
 Research Methods in Language Variation and Change, Manfred Krug & Julia Schlüter (eds), 295-
 312. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511792519.020
Geleyn, Tim. 2017. Syntactic variation and diachrony: The case of the Dutch dative alternation. Corpus
 Linguistics and Linguistic Theory 13(1): 65-96. https://doi.org/10.1515/cllt-2015-0062
Grafmiller, Jason. 2014. Variation in English Genitives Across Modality and Genres. English Language
 and Linguistics 18(3): 471–96. https://doi.org/10.1017/S1360674314000136.
Gundel, Jeanette K., Hedberg, Nancy & Zacharski, Ron. 1993. Cognitive status and the form of referring
 expressions in discourse. Language 69(2): 274-307. https://www.jstor.org/stable /416535

 Alexandra Engel | CECL seminar 08/02/2021
References
Gundel, Jeanette K., Hedberg, Nancy & Zacharski, Ron 2012. Underspecification of cognitive status in
 reference production: Some empirical predictions. Topics in Cognitive Science 4(2): 249-268.
Guy, Gregory R. 2015. Coherence, constraints and quantities. Paper presented at New Ways of Analyzing
 Variation (NWAV) 44, University of Toronto.
Klavan, Jane & Divjak, Dagmar. 2016. The cognitive plausibility of statistical classification models:
 Comparing textual and behavioral evidence. Folia Linguistica 50(2): 355-384.
 https://doi.org/10.1515/flin-2016-0014
Koch, Peter, and Wulf Oesterreicher. 1985. Sprache der Nähe - Sprache der Distanz. Mündlichkeit und
 Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte. In Romanistisches
 Jahrbuch, Vol. 36, 15–43. Berlin/New York: Walter de Gruyter.
Koch, Peter & Oesterreicher, Wulf. 2012. Language of immediacy – Language of distance: Orality and
 literacy from the perspective of language theory and linguistic history. In Communicative spaces:
 Variation, contact, and change, Claudia Lange, Beatrix Weber & Göran Wolf (eds.), 441–473.
 Frankfurt: Lang.
Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
MacDonald, Maryellen C. 2013. How language production shapes language form and comprehension.
 Frontiers in Psychology 4, 226.
Rickford, John R. 2014. Situation: Stylistic variation in sociolinguistic corpora and theory. Language and
 Linguistics Compass 8(11): 590-603. https://doi.org/10.1111/lnc3.12110
Röthlisberger, Melanie, Grafmiller, Jason & Szmrecsanyi, Benedikt. 2017. Cognitive indigenization effects
 in the English dative alternation. Cognitive Linguistics 28(4): 673-710.

 Alexandra Engel | CECL seminar 08/02/2021
References

Schönefeld, Doris. 2011. Introduction: On evidence and the convergence of evidence in linguistic
 research. In Converging Evidence: Methodological and Theoretical Issues for Linguistic Research,
 Doris Schönefeld (ed.), 1-31. Amsterdam: Benjamins. https://doi.org/10.1075/hcp.33.03sch
Szmrecsanyi, Benedikt. 2003. Be going to versus will/shall: Does syntax matter? Journal of English
 Linguistics 31(4): 295–323.
Szmrecsanyi, Benedikt. 2019. Register in variationist linguistics. Register Studies 1(1): 76-99.
 https://doi.org/10.1075/rs.18006.szm
Tagliamonte, Sali A., Durham, Mercedes & Smith, Jennifer. 2014. Grammaticalization at an early stage:
 Future be going to in conservative British dialects. English Language and Linguistics 18(1): 75–108.
Theijssen, Daphne, Bosch, Louis ten, Boves, Lou, Cranen, Bert & van Halteren, Hans. 2013. Choosing
 alternatives: Using Bayesian networks and memory-based learning to study the dative alternation.
 Corpus Linguistics and Linguistic Theory 9(2): 227-262. https://doi.org/10.1515/cllt-2013-0007
Torres Cacoullos, Rena & Walker, James A. 2009. The present of the English future: Grammatical
 variation and collocations in discourse. Language 85(2): 321–54.
Verhagen, Véronique & Mos, Maria. 2016. Stability of familiarity judgments: Individual variation and the
 invariant bigger picture. Cognitive Linguistics 27(3): 307-344. https://doi.org/10.1515/cog-2015-
 0063
Verhagen, Véronique, Mos, Maria, Schilperoord, Joost & Backus, Ad. 2020. Variation is information:
 Analyses of variation across items, participants, time, and methods in metalinguistic judgment data.
 Linguistics 58 (1): 37–81. https://doi.org/10.1515/ling-2018-0036.

 Alexandra Engel | CECL seminar 08/02/2021
Distribution of recipient definiteness across registers

 Alexandra Engel | CECL seminar 08/02/2021
Distribution of theme definiteness across registers

 Alexandra Engel | CECL seminar 08/02/2021
Distribution of pronominality for levels of recipient
 definiteness across registers

 Alexandra Engel | CECL seminar 08/02/2021
Distribution of pronominality for levels of theme
 definiteness across registers

 Alexandra Engel | CECL seminar 08/02/2021
Frequency of theme lemma per register

 Alexandra Engel | CECL seminar 08/02/2021
Experimental items (per participant)
Experimental items (per participant)
Frequency of lexical verb per register

 Alexandra Engel | CECL seminar 08/02/2021
You can also read