Coreference Resolution - cmpu 366 Computational Linguistics - 15 April 2021 - Computer ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Coreference resolution is the problem of identifying all mentions that refer to the same real-world entity.
Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady.
Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. Each of these is a mention.
Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. These mentions are coreferences because they refer to the same real-world entity.
Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. These mentions are coreferences because they refer to the same real-world entity.
Coreference resolution Identify all mentions that refer to the same real world entity. Input: Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. Output: {Barack Obama, his, He} {Hillary Rodham Clinton, secretary of state, her, she, First Lady}
Coreference resolution is essential as part of a full natural language understanding system. It’s also required to get reasonable performance at specific NLP tasks like summarization, question answering, or information extraction.
For instance, an information extraction system reading this text First Union Corp. is continuing to wrestle with severe problems unleashed by a botched merger and a troubled business strategy. According to industry insiders at Paine Webber, their president, John R. Georgius, is planning to retire by the end of the year. should extract that (John R. Georgius, president of, First Union Corp) not (John R. Georgius, president of, Paine Webber)
For machine translation, languages have different features for gender, number, dropped pronouns, etc.
And as we saw talking about virtual assistants on the first day of class, understanding what the user’s asking you to do requires understanding coreference: “Book tickets to see James Bond” “Spectre is playing near you at 2:00 and 3:00 today. How many tickets would you like?” “Two tickets for the showing at three”
Coreference resolution can be really difficult! Some cases of coreference require world knowledge or commonsense reasoning to solve. E.g., the Winograd schema problems – a kind of alternative to the Turing test – include The city council denied the demonstrators a permit because they feared violence. The city council denied the demonstrators a permit because they advocated violence. And The trophy didn’t fit into the suitcase because it was too large. The trophy didn’t fit into the suitcase because it was too small.
Coreference and anaphora Coreference is when two mentions refer to the same entity in the world, e.g., Barack Obama travelled to… Obama… A related linguistic concept is anaphora – when a term (an anaphor) refers to another term (an antecedent): Barack Obama [antecedent] said he [anaphor] would sign the bill. The interpretation of the anaphor is in some way determined by the interpretation of the antecedent.
Coreference: some linguistics Coreference: some linguistics Coreference and anaphora • Coreference with named entities • Coreference with named Coreference with named entities entities text world • Anaphora Anaphora • Anaphora text world
Coreference and anaphora Coreference: some linguistics Not all anaphoric relations are coreferential, e.g., bridging • Notanaphora: all anaphoric relations are coreferential We went to see a concert last night. The tickets were really expensive. We went to see a concert last night. The tickets bridging anaphora were really expensive.
Ways to refer to entities Say that your friend has a 1961 Ford Falcon automobile (not Ford Falcon to be confused with the Ford Thundercougarfalconbird), and you want to refer to it, as friends do. You might say it, this, that, this car, that car, the car, the Ford, the Falcon, or my friend’s car, among others. Not all of these can be used in all discourse contexts. Ford Thundercougarfalconbird E.g., you can’t simply say it or the Falcon if hearer has no prior knowledge of your friend’s car it hasn’t been mentioned before it’s not in the immediate surroundings of the discourse participants (i.e., the situational context of the discourse)
Discourse model Each type of referring expression encodes different signals about the place that the speaker believes the referent occupies within the hearer’s set of beliefs. Discourse model Subset of these beliefs has a special status from the hearer’s mental model of the ongoing discourse Contains representations of the entities that have been referred to in the discourse and relationships in which they participate Components of a system to interpret referring expressions method for constructing a discourse model that evolves with the dynamically-changing discourse it represents method for mapping between the signals that various referring expressions encode and the hearer’s set of beliefs (including the discourse model)
Operations Two fundamental operations to the discourse model Representation is evoked into the model when a referent is first mentioned in a discourse Representation is accessed from the model on subsequent mention Discourse Model refer (access) refer (evoke) John he corefer
Many types of reference According to Doug, Sue just bought a 1962 Ford Falcon. But that turned out to be a lie. (speech act) But that was false. (proposition) That struck me as a funny way to describe the situation. (manner of description) That caused Sue to become rather poor. (event)
5 types of referring expressions 1. Indefinite noun phrases 2. Definite noun phrases 3. Pronouns 4. Demonstrative pronouns 5. Names
Indefinite noun phrases New to hearer Mrs Martin was so very kind as to send Mrs. Goddard a beautiful goose. He had gone round one day to bring her some walnuts. I am going to the butcher to buy a goose. I hope they still have it. (specific) I hope they still have one. (non-specific)
Definite noun phrases Identifiable to hearer because Mentioned It concerns a white stallion which I have sold to an officer. But the pedigree of the white stallion was not fully established. Identifiable from beliefs or unique I read about it in The New York Times. Inherently unique The fastest car in …
Pronouns Emma smiled and chatted as cheerfully as she could.
Pronouns Cataphora is when a pronoun appears before its referent. Even before she saw it, Dorothy had been thinking about the Emerald City every day.
Pronouns Compared to definite noun phrases, pronouns require more referent salience. John went to Bob’s party, and parked next to a classic Ford Falcon. He went inside and talked to Bob for more than an hour. Bob told him that he recently got engaged. ?He also said that he bought it yesterday. vs He also said that he bought the Falcon yesterday
Demonstrative pronouns E.g., this, that, these, those Behave differently than definite pronouns like it Can also appear as determiners: Note: Colloquial English I saw this ingredient this great movie last night. that spice Differ in lexical meaning Proximal demonstrative this Indicates literal or metaphorical closeness Distal demonstrative that Indicates literal or metaphorical distance (further away in time) I just bought a copy of Thoreau’s Walden. I had bought one five years ago. That one had been very tattered; this one was in much better condition.
Names Can refer to both new and old entities in the discourse. Miss Woodhouse certainly had not done him justice. International Business Machines sought patent compensation from Amazon. In fact, IBM had previously sued a number of other companies.
Information status The same referring expressions can be used to introduce new referents or to refer anaphorically to old referents Information status or information structure: Study of the way different referential forms are used to provide new or old information A variety of theories that express the relation between different types of referential form and the “informativity” or saliency of the referent in the discourse
Theories Givenness Hierarchy (Gundel et al., 1993) Scale representing six kinds of information status that different referring expressions are used to signal uniquely type in focus > activated > familiar > identifiable > referential > identifiable that {it} this {that N} {the N} {indef. this N} {a N} this N
Theories Accessibility Scale (Ariel, 2001) Referents that are more salient are easier for the hearer to call to mind, so can be referred to with less linguistic material. Less salient entities need longer and more explicit referring expressions to help hearer recover the referent. Sample scale, low to high accessibility: Full name > long definite description > short definite description > last name > first name > distal demonstrative > proximate demonstrative > NP > stressed pronoun > unstressed pronoun Accessibility correlates with length Less accessible NPs tend to be longer Often find longer NPs (e.g., long definition descriptions with relative clauses) early in the discourse, and shorter ones (e.g., pronouns) later in the discourse
I was disappointed, though not surprised, to see that today a conjunctive labeling law dictating that “Sonoma County” be placed on every label on wines produced from grapes grown in Sonoma County was unanimously passed by the California Legislature. Pushed as an effort to promote “Sonoma County” wines and a consumer education effort, the new law instead forces vintners to needlessly sully their package and undermines their own marketing efforts. Yet, the law does nothing to educate consumers. Passed unanimously out of the California Assembly and Senate, AB 1798 now awaits the Governor’s signature, which it will surely obtain. According to Noreen Evans, an Assembly sponsor of the bill, this new conjunctive labeling law “requires that any wine labeled with an American Viticultural Area (AVA) located entirely within Sonoma County – like Russian River Valley or Dry Creek Valley – must also include the word “Sonoma County” on the label, starting in 2014. There are 13 AVAs in Sonoma County. The problem, of course, is that by placing the words “Sonoma County” on a bottle of wine that is made with grapes grown in “Russian River Valley”, “Dry Creek Valley”, “Sonoma Valley” or any other AVA in Sonoma County, consumers learn absolutely nothing about the wine in the bottle. There is no evidence that grapes grown in “Sonoma County” have any single distinguishing feature derived from the fact that they were grown inside the borders of Sonoma County.
Theories Prince (1992) analyzes information status in terms of hearer status and discourse status. Hearer status: Is the referent previously known to the hearer or new? Discourse status: Has the referent been previously mentioned in the discourse?
Complications Inferrables (“bridging inferences”) I almost bought a 1962 Ford Falcon today, but a door had a dent and the engine seemed noisy. Generics I’m interested in buying a Mac laptop. They are very stylish. In March in Poughkeepsie you have to wear a jacket. Non-referential uses Pleonastic references (it is raining) Idioms (hit it off) Particular syntactic situations: clefts (It was Frodo who carried the ring.) extraposition (It was good that Frodo carried the ring.)
Features for pronominal anaphora resolution Problem Statement Given a single pronoun (he, him, she, her, it, and sometimes they/them), together with the previous context, find the antecedent of the pronoun.
Useful constraints Number agreement John has a Ford Falcon. It is red. *John has three Ford Falcons. It is red. But note: IBM is announcing a new machine translation product. They have been been working on it for 20 years. Person agreement English distinguishes first, second, third person Antecedent of a pronoun must agree with the pronoun in number 1st-person pronoun (I, me, my) must have 1st person antecedent (I, me, or my). 2nd-person pronoun (you or your) must have 2nd person antecedent (you or your) 3rd-person pronoun (he, she, they, him, her, them, his, her, their) must have 3rd-person antecedent (one of the above or any other noun phrase) Gender agreement John has an Acura. He/it/she is attractive.
Pronoun interpretation features Binding theory constraints John bought himself a new Ford. [himself = John] John bought him a new Ford. [him ≠ John] John said that Bill bought him a new Ford. [him ≠ Bill] John said that Bill bought himself a new Ford. [himself = Bill]
Pronoun interpretation features Selectional restrictions vehicle John parked his Ford in the garage. He had driven it around for hours. drive: agent: +human car Selectional restrictions can be theme: +vehicle implemented by storing a dictionary of probabilistic Ford dependencies between the verb associated with the pronoun and the potential referent and/ or an ontology.
Less hard-and-fast rules
Recency The doctor found an old map in the captain’s chest. Jim found an even older map hidden on the shelf. It described an island full of redwood trees and sandy beaches.
Grammatical role Subject preference Billy Bones went to the bar with Jim Hawkins. He called for a glass of rum. [He = Billy] Jim Hawkins went to the bar with Billy Bones. He called for a glass of rum. [He = Jim]
Repeated mention Billy Bones had been thinking about a glass of rum ever since the pirate ship docked. He hobbled over to the Old Parrot bar. Jim Hawkins went with him. He called for a glass of rum. [He = Billy]
Parallelism Long John Silver went with Jim to the Old Parrot. Billy Bones went with him to the Old Anchor Inn. [him = Jim] Note: The grammatical role hierarchy described before ranks Long John Silver as more salient than Jim, and thus should be the preferred referent of him. Furthermore, there is no semantic reason that Long John Silver cannot be the referent. Nonetheless, him is instead understood to refer to Jim.
Verb semantics John telephoned Bill. He lost the laptop. John criticized Bill. He lost the laptop. Implicit causality Implicit cause of telephoning is subject Implicit cause of criticizing is object
Coreference resolution For the general coreference task, need to decide whether any pair of noun phrases corefer. Have to deal with Names Non-referential pronouns Definite NPs
Example Victoria Chen, Chief Financial Officer of Megabucks Banking Corp since 2004, saw her pay jump 20%, to $1.3 million, as the 37-year- old also became the Denver-based financial-service company’s president. It has been ten years since she came to Megabucks from rival Lotsabucks. As before, need to determine pronominal anaphora (her refers to Victoria Chen) filter out non-referential pronouns (pleonastic It in It has been ten years) But also figure out the 37-year-old is coreferent with Victoria Chen the Denver-based financial-services company is the same as Megabucks Megabucks is the same as Megabucks Banking Corp
Algorithm for coreference resolution Basis A binary classifier given an anaphor and a potential antecedent Returns true or false Uses same features as for pronominal resolution, plus others, e.g., Megabucks and Megabucks Banking Corp share the word Megabucks Megabucks Banking Corp and the Denver-based financial-services company both end in words (Corp and company) indicating a corporate organization Process Scan document from left to right For each NPj encountered Search backwards through document NPs For each such potential antecedent NPi Run our classifier If it returns true, coindex NPi and NPj and return Terminate when we reach beginning of document
Commonly used features Anaphor edit distance [0, 1, 2, …]. The minimum edit distance from the potential antecedent to the anaphor Antecedent edit distance [0, 1, 2, …]. The minimum edit distance from the anaphor to the antecedent Alias [true or false]: Requires a named-entity tagger. Returns true if NPi and NPj are both named entities of the same type, and NPi is an alias of NPj Meaning of alias depends on the types, e.g., DATE: Dates are aliases if refer to the same date PERSON: Strip prefixes (e.g., Dr, Chairman), and check if the NPs are now identical. ORGANIZATION: Check for acronyms (e.g., IBM for International Business Machines Corp.)
More features Appositive [true or false]. True if the anaphor is in the syntactic apposition relation to the antecedent E.g., NP Chief Financial Officer of Megabucks Banking Corp is in apposition to the NP Victoria Chen Victoria Chen, Chief Financial Officer of Megabucks Banking Corp since 2004, … can be detected using a parser, or more shallowly by looking for commas and requiring that neither NP have a verb and one of them is a name Linguistic form [proper, definite, indefinite, pronoun]. Whether the potential anaphor NPj is a proper name, definite description, indefinite NP, or pronoun
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because x = city council y = the demonstrators z = violence w = permit
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence Axiom x = city council ∀ x, y, z, w fear(x, z) ∧ advocate(y, z) ∧ y = the demonstrators enable_to_cause(w, y, z) z = violence → deny(x, z, w) w = permit Hence deny(city_council, demonstrators, permit)
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence x = city council y = the demonstrators z = violence w = permit
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence x = city council y = the demonstrators z = violence w = permit
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence Axiom x = city council y = the demonstrators z = violence w = permit
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence Axiom x = city council ∀ x, y, z, w fear(x, z) ∧ advocate(y, z) ∧ y = the demonstrators enable_to_cause(w, y, z) z = violence → deny(x, z, w) w = permit
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence Axiom x = city council ∀ x, y, z, w fear(x, z) ∧ advocate(y, z) ∧ y = the demonstrators enable_to_cause(w, y, z) z = violence → deny(x, z, w) w = permit Hence
Coreference: further difficulties Lots of other algorithms and other constraints Hobbs: reference resolution as by-product of general reasoning The city council denied the demonstrators a permit because they feared violence they advocated violence Axiom x = city council ∀ x, y, z, w fear(x, z) ∧ advocate(y, z) ∧ y = the demonstrators enable_to_cause(w, y, z) z = violence → deny(x, z, w) w = permit Hence deny(city_council, demonstrators, permit)
Algorithms for anaphora resolution The Hobbs algorithm Centering algorithm A log-linear model (machine learning)
Hobbs algorithm (1978) A relatively simple and reasonably effective syntactic method for resolving pronouns: Trace a path from the pronoun to the top S (sentence) in the parse tree Perform a left-to-right breadth-first search on NPs left of the path If a referent isn’t found in the same sentence, Perform a left-to-right, breadth-first search on preceding sentences. The first candidate NP that matches in gender, number, and person is returned as the antecedent. The Hobbs algorithm is commonly used as a baseline when evaluating pronoun resolution methods
Hobbs algorithm (1978) Hobbs algorithm (1978) The castle inThe Camelot remained the residence of the castle in Camelot remained the residence of the king until he king until he moved it to London. moved it to London.
“…the naïve approach is quite good. Computationally speaking, it will be a long time before a semantically based algorithm is sophisticated enough to perform as well, and these results set a very high standard for any other approach to aim for. “Yet there is every reason to pursue a semantically based approach. The naïve algorithm does not work. Any one can think of examples where it fails. In these cases it not only fails; it gives no indication that it has failed and offers no help in finding the real antecedent.” Hobbs (1978), Lingua, p. 345
Centering theory Hobbs algorithm does not use an explicit representation of a discourse model Centering theory (Grosz et al. 1995) Explicit representation of a discourse model Additional claim: There is a single entity “centered” at any given point in the discourse
Centering for anaphora resolution Two entities tracked in two adjacent utterances Un and Un+1: Backward-looking center of Un: Cb(Un) Entity focused on in discourse after Un−1 is interpreted Cb of first utterance in a discourse undefined Forward-looking centers of Un: Cf(Un) Ordered list of entities mentioned in Un Here, use simple heuristic for ordering: Subject: An Acura Integra is parked in the lot. hierarchy Existential predicate nominal: There is an Acura Integra parked in the lot. Object: John parked an Acura Integra in the lot. Indirect object: John gave his Acura Integra a bath. Demarcated adverbial PP: Inside his Acura Integra, John showed Susan his new CD player. Cb(Un+1): Most highly ranked element of Cf(Un) mentioned in Un+1 Cp: Highest-ranked forward-looking center
Algorithm Preferred referents of pronouns computed from relations between forward and backward looking centers in adjacent sentences Four defined relations: Cb(Un+1) = Cb(Un) Cb(Un+1) ≠ Cb(Un) or undefined Cb(Un) Cb(Un+1) = Cp(Un+1) Continue Smooth-shift Cb(Un+1) ≠ Cp(Un+1) Retain Rough-shift Rules: If any element of Cf(Un) is realized by a pronoun in utterance Un+1, then Cb(Un+1) must be realized as a pronoun also Transition states are ordered. Continue is preferred to Retain is preferred to Smooth-shift is preferred to Rough-shift
Algorithm 1. Generate possible Cb–Cf combinations for each possible set of reference assignments 2. Filter by constraints, e.g., syntactic coreference constraints, selectional restrictions, centering rules and constraints 3. Rank by transition orderings The pronominal referents that get assigned are those that yield the most preferred relation in Rule 2, assuming that Rule 1 and other coreference constraints (gender, number, syntactic, selectional restrictions) are not violated.
Example U1: John saw a beautiful 1961 Ford Falcon at Backward-looking center of Un: Cb(Un) the used car dealership. Forward-looking centers of Un: Cf(Un) U2: He showed it to Bob. Heuristic for ordering: Subject, Existential U3: He bought it. predicate nominal, Object, Indirect object, Demarcated adverbial PP Use the grammatical role hierarchy to order the Cf for U1: Cb(Un+1): Most highly ranked element of Cf(Un) mentioned in Un+1 Cf(U1): {John, Ford, dealership} Cp(U1): John Cp: Highest-ranked forward-looking center Cb(U1): undefined John is Cb(U2) because he is highest ranked member of Cf(U1) mentioned in U2 (only possible referent for he)
Example Cb(Un+1) = Cb(Un) U1: John saw a beautiful 1961 Ford Falcon at the used Cb(Un+1) ≠ Cb(Un) or undefined Cb(Un) car dealership. U2: He showed it to Bob. Cb(Un+1) = Cp(Un+1) Continue Smooth-shift U3: He bought it. Cb(Un+1) ≠ Cp(Un+1) Retain Rough-shift Compare resulting transitions for each potential referent of it Ford Falcon: Cf(U2): {John, Ford Falcon, Bob} Cp(U2): John Cb(U2): John Result: Continue (Cp(U2) = Cb(U2); Cb(U1) undefined) Dealership: Cf(U2): {John, dealership, Bob} Cp(U2): John Cb(U2): John Result: Continue (Cp(U2) = Cb(U2); Cb(U1) undefined)
Example Cb(Un+1) = Cb(Un) U1: John saw a beautiful 1961 Ford Falcon at the used Cb(Un+1) ≠ Cb(Un) or undefined Cb(Un) car dealership. U2: He showed it to Bob. Cb(Un+1) = Cp(Un+1) Continue Smooth-shift U3: He bought it. Cb(Un+1) ≠ Cp(Un+1) Retain Rough-shift Compare resulting transitions for each potential referent of it Ford Falcon: Cf(U2): {John, Ford Falcon, Bob} Assume ties Cp(U2): John broken using Cb(U2): John Result: Continue (Cp(U2) = Cb(U2); Cb(U1) undefined) ordering of Dealership: previous Cf list Cf(U2): {John, dealership, Bob} Cp(U2): John Cb(U2): John Result: Continue (Cp(U2) = Cb(U2); Cb(U1) undefined)
Example U1: John saw a beautiful 1961 Ford Falcon at the used car dealership. U2: He showed it to Bob. Continue is preferred U3: He bought it. to Retain is preferred to Smooth-shift is Compare transitions for each potential referent of he in U3 preferred to Rough- John: shift Cf(U3): {John, Ford Falcon} Cp(U3): John Cb(U3): John Result: Continue (Cp(U3) = Cb(U3) = Cb(U2)) Bob: Cf(U3): {Bob, Ford Falcon} Cp(U3): Bob Cb(U3): Bob Result: Smooth-shift (Cp(U3) = Cb(U3); Cb(U3) ≠ Cb(U2))
Example U1: John saw a beautiful 1961 Ford Falcon at the used car dealership. U2: He showed it to Bob. Continue is preferred U3: He bought it. to Retain is preferred to Smooth-shift is Compare transitions for each potential referent of he in U3 preferred to Rough- John: shift Cf(U3): {John, Ford Falcon} Cp(U3): John Cb(U3): John Continue Result: Continue (Cp(U3) = Cb(U3) = Cb(U2)) preferred to Bob: Smooth-shift Cf(U3): {Bob, Ford Falcon} Cp(U3): Bob Cb(U3): Bob Result: Smooth-shift (Cp(U3) = Cb(U3); Cb(U3) ≠ Cb(U2))
However… Bob opened up a new dealership last week. John took a look at the Fords in his lot. He ended up buying one. What does centering assign as referent of He in the third sentence? Bob. Oops.
Log-linear model Supervised machine learning Train on a corpus in which each pronoun is labeled with the correct antecedent Filter out pleonastic pronouns Need Positive examples of referent–pronoun pairs In training set Negative examples of referent–pronoun pairs Pair each pronoun with some other NP Features for each one Train model to predict 1 for true antecedent and 0 for wrong antecedent
Commonly Used Features Resolution between pronoun Proi and sentence distance [0, 1, 2, 3, …]. potential referent NPj: The number of sentences between pronoun and potential antecedent. strict gender [true or false]. Hobbs distance [0, 1, 2, 3, …]. True if there is a strict match in gender (e.g., male The number of noun groups that the Hobbs pronoun Proi with male antecedent NPj) algorithm has to skip, starting backwards from the compatible gender [true or false]. pronoun Proi, before the potential antecedent NPj is True if Proi and NPj are merely compatible (e.g., male found. pronoun Proi with antecedent NPj of unknown grammatical role [subject, object, PP]. gender) Whether the potential antecedent is a syntactic strict number [true or false]. subject, direct object, or is embedded in a PP. True if there is a strict match in number (e.g., singular linguistic form [proper, definite, indefinite, pronoun]. pronoun with singular antecedent) Whether the potential antecedent NPj is a proper compatible number [true or false]. name, definite description, indefinite NP, or a True if Proi and NPj are merely compatible (e.g., pronoun. singular pronoun Proi with antecedent NPj of unknown number).
Example U1: John saw a beautiful 1961 Ford Falcon at the used car dealership. U2: He showed it to Bob. U3: He bought it.
Comparing algorithms Hobbs and Centering Require full syntactic parse, morphological detectors for gender Rely on hand-built heuristics for antecedent selection Machine learning classifiers Learn the importance of these different features based on their co- occurrence in the training set
Acknowledgments The lecture incorporates material from: Nancy Ide, Vassar College Daniel Jurafsky and James Martin, Speech and Language Processing Christopher Manning, Stanford University
You can also read