Large-Scale Knowledge Graph Identification using PSL

Page created by Peggy Mccoy

Current Events

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Large-Scale Knowledge Graph Identification using PSL

Jay Pujara                                                                                jay@cs.umd.edu
Hui Miao                                                                                   hui@cs.umd.edu
Lise Getoor                                                                            getoor@cs.umd.edu
Department of Computer Science University of Maryland College Park, MD 20742
William Cohen                                                                          wcohen@cs.cmu.edu
Machine Learning Department, Carnegie Mellon University Pittsburgh, PA 15213
                     Abstract                             2008), and efforts at Google (Pasca et al., 2006), which
    Building a web-scale knowledge graph, which           use a variety of techniques to extract new knowledge,
    captures information about entities and the           in the form of facts, from the web. These facts are
    relationships between them, represents a              interrelated, and hence, recently this extracted knowl-
    formidable challenge. While many large-               edge has been referred to as a knowledge graph (Sing-
    scale information extraction systems oper-            hal, 2012). Unfortunately, most web-scale extrac-
    ate on web corpora, the candidate facts they          tion systems do not take advantage of the knowledge
    produce are noisy and incomplete. To re-              graph. Millions of facts and the many dependencies
    move noise and infer missing information in           between them pose a scalability challenge. Accord-
    the knowledge graph, we propose knowledge             ingly, web-scale extraction systems generally consider
    graph identification: a process of jointly rea-       extractions independently, ignoring the dependencies
    soning about the structure of the knowledge           between facts or relying on simple heuristics to enforce
    graph, utilizing extraction confidences and           consistency.
    leveraging ontological information. Scalabil-         However, reasoning jointly about facts shows promise
    ity is often a challenge when building mod-           for improving the quality of the knowledge graph. Pre-
    els in domains with rich structure, but we            vious work (Jiang et al., 2012) chooses candidate facts
    use probabilistic soft logic (PSL), a recently-       for inclusion in a knowledge base with a joint approach
    introduced probabilistic modeling framework           using Markov Logic Networks (MLNs) (Richardson &
    which easily scales to millions of facts. In          Domingos, 2006). Jiang et al. provide a straightfor-
    practice, our method performs joint inference         ward codification of ontological relations and candi-
    on a real-world dataset containing over 1M            date facts found in a knowledge base as rules in first-
    facts and 80K ontological constraints in 12           order logic and use MLNs to formulate a probabilistic
    hours and produces a high-precision set of            model. However, due to the combinatorial explosion
    facts for inclusion into a knowledge graph.           of Boolean assignments to random variables, inference
                                                          and learning in MLNs pose intractable optimization
                                                          problems. Jiang et al. limit the candidate facts they
1. Introduction                                           consider, restricting their dataset to a 2-hop neighbor-
                                                          hood around each fact, and use a sampling approach
The web is a vast repository of knowledge, but au-
                                                          to inference, estimating marginals using MC-SAT. De-
tomatically extracting that knowledge, at scale, has
                                                          spite these approximations, Jiang et al. demonstrate
proven to be a formidable challenge. A number of
                                                          the utility of joint reasoning in comparison to a base-
recent evaluation efforts have focused on automatic
                                                          line that considers each fact independently.
knowledge base population (Ji et al., 2011; Artiles &
Mayfield, 2012), and many well-known broad domain         Our work builds on the foundation of Jiang et al.
and open information extraction systems exist, in-        by providing a richer model for knowledge bases and
cluding the Never-Ending Language Learning (NELL)         vastly improving scalability. Using the noisy input
project (Carlson et al., 2010), OpenIE (Etzioni et al.,   from an information extraction system, we define the
                                                          problem of jointly inferring the entities, relations and
ICML workshop on Structured Learning: Inferring Graphs    attributes comprising a knowledge graph as knowledge
from Structured and Unstructured Inputs (SLG 2013).
Copyright 2013 by the author(s).                          graph identification. We leverage dependencies in the

Large Scale Knowledge Graph Identification using PSL

knowledge graph expressed through ontological con-          ables, and P, Q and R are predicates. A ground-
straints, and perform entity resolution allowing us to      ing of a rule comes from substituting constants for
reason about co-referent entities. We also take advan-      universally-quantified variables in the rule’s atoms. In
tage of uncertainty found in the extracted data, using      this example, assigning constant values a, b, and c to
continuous variables with values derived from extrac-       the respective variables in the rule above would pro-
tor confidence scores. Rather than limit inference to       duce the ground atoms P(a,b), Q(b,c), R(a,b,c). Each
a predefined test set, we employ lazy inference to pro-     ground atom takes a soft-truth value in the range [0, 1].
duce a broader set of candidates.
                                                            PSL associates a numeric distance to satisfaction with
To support this representation, we use a continuous-        each ground rule that determines the value of the cor-
valued Markov random field and use the probabilis-          responding feature in the Markov network. The dis-
tic soft logic (PSL) modeling framework (Broecheler         tance to satisfaction is defined by treating the ground
et al., 2010). Inference in our model can be formu-         rule as a formula over the ground atoms in the rule.
lated as a convex optimization that scales linearly in      In particular, PSL uses the Lukasiewicz t-norm and
the number of variables (Bach et al., 2012), allowing       co-norm to provide a relaxation of the logical con-
us to handle millions of candidate facts.                   nectives, AND (∧), OR(∨), and NOT(¬), as follows
                                                            (where relaxations are denoted using the ∼ symbol
Our work:
                                                            over the connective):
  • Defines the knowledge graph identification prob-                        ˜ q = max(0, p + q − 1)
                                                                           p∧
    lem;
                                                                               ˜ q = min(1, p + q)
                                                                              p∨
  • Uses soft-logic values to leverage extractor confi-
    dences;                                                                         ¬
                                                                                    ˜p = 1 − p
  • Formulates knowledge graph inference as convex          This relaxation coincides with Boolean logic when p
    optimization;                                           and q are in {0, 1}, and provides a consistent inter-
  • Evaluates our proposed approach on extractions          pretation of soft-truth values when p and q are in the
    from NELL, a large-scale operational knowledge          numeric range [0, 1].
    extraction system;                                      A PSL program, P, consisting of a model as defined
  • Produces high-precision results on millions of can-     above, along with a set of constants (or facts), pro-
    didate extractions on the scale of hours.               duces a set of ground rules, R. If I is an interpre-
                                                            tation (an assignment of soft-truth values to ground
                                                            atoms) and r is a ground instance of a rule, then the
2. Background
                                                            distance to satisfaction φr (I) of r is simply the soft-
Probabilistic soft logic (PSL) (Broecheler et al., 2010;    truth value from the Lukasiewicz t-norm. We can de-
Kimmig et al., 2012) is a recently-introduced frame-        fine a probability distribution over interpretations by
work which allows users to specify rich probabilistic       combining the weighted degree of satisfaction over all
models over continuous-valued random variables. Like        ground rules, R, and normalizing, as follows:
other statistical relational learning languages such as
                                                                                    1      X
MLNs, it uses first-order logic to describe features that                 f (I) =     exp[   wr φr (I)]
                                                                                    Z
define a Markov network. In contrast to other ap-                                         r∈R
proaches, PSL: 1) employs continuous-valued random
                                                            Here Z is a normalization constant and wr is the
variables rather than binary variables; and 2) casts
                                                            weight of rule r. Thus, a PSL program (set of weighted
MPE inference as a convex optimization problem that
                                                            rules and facts) defines a probability distribution from
is significantly more efficient to solve than its combi-
                                                            a logical formulation that expresses the relationships
natorial counterpoint (polynomial vs. exponential).
                                                            between random variables.
A PSL model is composed of a set of weighted, first-
                                                            MPE inference in PSL determines the most likely soft-
order logic rules, where each rule defines a set of fea-
                                                            truth values of unknown ground atoms using the values
tures of a Markov network sharing the same weight.
                                                            of known ground atoms and the dependencies between
Consider the formula
                                                            atoms encoded by the rules, corresponding to inference
                                w                           of random variables in the underlying Markov network.
          P(A, B) ∧ Q(B, C) ⇒ R(A, B, C)
                                                            PSL atoms take soft-truth values in the interval [0, 1],
which is an example of a PSL rule. Here w is the weight     in contrast to MLNs, where atoms take Boolean val-
of the rule, A, B, and C are universally-quantified vari-   ues. MPE inference in MLNs requires optimizing over

Large Scale Knowledge Graph Identification using PSL

the combinatorial assignment of Boolean truth values approach uses collective classification to label nodes
to random variables. In contrast, the relaxation to in manner which takes into account ontological con-
the continuous domain greatly changes the tractabil- straints and neighboring labels.
ity of computations in PSL: finding the most probable
A third problem commonly encountered in knowledge
interpretation given a set of weighted rules is equiva-
graphs is finding the set of relations an entity par-
lent to solving a convex optimization problem. Recent
ticipates in. NELL also has many facts relating the
work from (Bach et al., 2012) introduces a consensus
location of Kyrgyzstan to other entities. These candi-
optimization method applicable to PSL models; their
date relations include statements that Kyrgyzstan is
results suggest consensus optimization scales linearly
located in Kazakhstan, Kyrgyzstan is located in Rus-
in the number of random variables in the model.
sia, Kyrgyzstan is located in the former Soviet Union,
Kyrgyzstan is located in Asia, and that Kyrgyzstan is
3. Knowledge Graph Identification located in the US. Some of these possible relations are
true, while others are clearly false and contradictory.
Our approach to constructing a consistent knowledge
Our approach uses link prediction to predict edges in
base uses PSL to represent the candidate facts from
a manner which takes into account ontological con-
an information extraction system as a knowledge graph
straints and the rest of the inferred structure.
where entities are nodes, categories are labels associ-
ated with each node, and relations are directed edges Refining a knowledge graph becomes even more chal-
between the nodes. Information extraction systems lenging as we consider the interaction between the pre-
can extract such candidate facts, and these extrac- dictions and take into account the confidences we have
tions can be used to construct a graph. Unfortu- in the extractions. For example, as mentioned earlier,
nately, the output from an information extraction sys- NELL’s ontology includes the constraint that the at-
tem is often incorrect; the graph constructed from it tributes “bird” and “country” are mutually exclusive.
has spurious and missing nodes and edges, and miss- Reasoning collectively allows us to resolve which of
ing or inaccurate node labels. Our approach, knowl- these two labels is more likely to apply to Krygyzstan.
edge graph identification combines the tasks of entity For example, NELL is highly confident that the Kyr-
resolution, collective classification and link prediction gyz Republic has a capital city, Bishkek. The NELL
mediated by ontological constraints. We motivate the ontology specifies that the domain of the relation “has-
necessity of our approach with examples of challenges Capital” has label “country”. Entity resolution allows
taken from a real-world information extraction system, us to infer that “Kyrgyz Republic” refers to the same
the Never-Ending Language Learner (NELL) (Carlson entity as “Kyrgyzstan”. Deciding whether Kyrgyzstan
et al., 2010). is a bird or a country now involves a prediction where
we include the confidence values of the corresponding
One common problem is entity extraction. Many tex-
“bird” and “country” facts from co-referent entities, as
tual references that initially look different may cor-
well as collective features from ontological constraints
respond to the same real-world entity. For example,
of these co-referent entities, such as the confidence val-
NELL’s knowledge base contains candidate facts which
ues of the “hasCapital” relations.
involve the entities “kyrghyzstan”, “kyrgzstan”, “kyr-
gystan”, “kyrgyz republic”, “kyrgyzstan”, and “kyr- We refer to this process of inferring a knowledge graph
gistan” which are all variants or misspellings of the from a noisy extraction graph as knowledge graph
same country, Kyrgyzstan. In the extracted knowl- identification. Knowledge graph identification builds
edge graph, these correspond to different nodes, which on ideas from graph identification (Namata et al.,
is incorrect. Our approach uses entity resolution to 2011); like graph identification, three key components
determine co-referent entities in the knowledge graph, to the problem are entity resolution, node labeling and
producing a consistent set of labels and relations for link prediction. Unlike earlier work on graph identifi-
each resolved node. cation, we use a very different probabilistic framework,
PSL, allowing us to incorporate extractor confidence
Another challenge in knowledge graph construction is
values and also support a rich collection of ontological
inferring labels consistently. For example, NELL’s ex-
constraints.
tractions assign Kyrgyzstan the attributes “country”
as well as “bird”. Ontological information can be used
to infer that an entity is very unlikely to be both a
country and a bird at the same time. Using the labels
of other, related entities in the knowledge graph can al-
low us to determine the correct label of an entity. Our

Large Scale Knowledge Graph Identification using PSL

4. Knowledge Graph Identification                           the extraction system, as the set C.
   Using PSL                                                In addition to the candidate facts in an information
Knowledge graphs contain three types of facts: facts        extraction system, we sometimes have access to back-
about entities, facts about entity labels and facts         ground knowledge or previously learned facts. Back-
about relations. We represent entities with the log-        ground knowledge that is certain can be represented
ical predicate Ent(E). We represent labels with the         using the Lbl and Rel predicates. Often the back-
logical predicate Lbl(E,L) where entity E has label         ground knowledge included in information extraction
L. Relations are represented with the logical predicate     settings is generated from the same pool of noisy ex-
Rel(E1 ,E2 ,R) where the relation R holds between the       tractions as the candidates, and is considered uncer-
entities E1 and E2 , eg. R(E1 ,E2 ).                        tain. For example, NELL uses a heuristic formula to
                                                            “promote” candidates in each iteration of the system,
In knowledge graph identification, our goal is to iden-     however these promotions are often noisy so the sys-
tify a true set of predicates from a set of noisy extrac-   tem assigns each promotion a confidence value. Since
tions. Our method for knowledge graph identification        these promotions are drawn from the best candidates
incorporates three components: capturing uncertain          in previous iterations, they can be a useful addition to
extractions, performing entity resolution, and enforc-      our model. We incorporate this uncertain background
ing ontological constraints. We show how we create          knowledge as hints, denoted H, providing a source of
a PSL program that encompasses these three compo-           weak supervision through the use of additional predi-
nents, and then relate this PSL program to a distribu-      cates and rules as follows:
tion over possible knowledge graphs.                             HintRel(E1 , E2 , R)
                                                                                           wHR
                                                                                            ⇒ Rel(E1 , E2 , R)
                                                                                          wHL
4.1. Representing Uncertain Extractions                          HintLbl(E, L)             ⇒   Lbl(E, L)
                                                            The weights wHR and wHL allow the system to specify
We relate the noisy extractions from an information         how reliable this background knowledge is as a source
extraction system to the above logical predicates by        of weak supervision, while treating these rules as con-
introducing candidate predicates, using a formulation       straints is the equivalent of treating the background
similar to (Jiang et al., 2012).                            knowledge as certain knowledge.
For each candidate entity, we introduce a correspond-
ing predicate, CandEnt(E). Labels or relations gener-       4.2. Entity Resolution
ated by the information extraction system correspond
                                                            While the previous PSL rules provide the building
to predicates, CandLbl(E,L) or CandRel(E1 ,E2 ,R)
                                                            blocks of predicting links and labels using uncertain
in our system. Uncertainty in these extractions is
                                                            information, knowledge graph identification employs
captured by assigning these predicates a soft-truth
                                                            entity resolution to pool information across co-referent
value equal to the confidence value from the extrac-
                                                            entities. A key component of this process is identifying
tor. For example, the extraction system might gen-
                                                            possibly co-referent entities and determining the sim-
erate a relation, teamPlaysSport(Yankees,baseball)
                                                            ilarity of these entities. We use the SameEnt pred-
with a confidence of .9, which we would represent as
                                                            icate to capture the similarity of two entities. While
CandRel(Yankees,baseball,teamPlaysSport).
                                                            any similarity metric can be used, we compute the
Information extraction systems commonly use many            similarity of entities using a process of mapping each
different extraction techniques to generate candidates.     entity to a set of Wikipedia articles and then comput-
For example, NELL produces separate extractions             ing the Jaccard index of possibly co-referent entities,
from lexical, structural, and morphological patterns,       which we discuss in more detail in Section 5.
among others. We represent metadata about the tech-
                                                            To perform entity resolution using the SameEnt
nique used to extract a candidate by using separate
                                                            predicate we introduce three rules, whose groundings
predicates for each technique T, of the form Can-
                                                            we refer to as R, to our PSL program:
dRelT and CandLblT . These predicates are related
                                                             SameEnt(E1 , E2 )∧ ˜ Lbl(E1 , L) ⇒ Lbl(E2 , L)
to the true values of attributes and relations we seek
to infer using weighted rules.                                                  ˜
                                                             SameEnt(E1 , E2 )∧Rel(E1 , E, R) ⇒ Rel(E2 , E, R)
                             wCR−T
  CandRelT (E1 , E2 , R)        ⇒     Rel(E1 , E2 , R)       SameEnt(E1 , E2 )∧  ˜ Rel(E, E1 , R) ⇒ Rel(E, E2 , R)
                              wCL−T                         These rules define an equivalence class of entities, such
   CandLblT (E, L)           ⇒ Lbl(E, L)
                                                            that all entities related by the SameEnt predicate
Together, we denote the set of candidates, generated        must have the same labels and relations. The soft-
from grounding the rules above using the output from        truth value of the SameEnt, derived from our simi-

Large Scale Knowledge Graph Identification using PSL

larity function, mediates the strength of these rules.      of an information extraction system define a PSL pro-
When two entities are very similar, they will have a        gram P. The corresponding set of ground rules, R,
high truth value for SameEnt, so any label assigned         consists of the union of groundings from uncertain can-
to the first entity will also be assigned to the second     didates, C, and hints, H, co-referent entities, R, and
entity. On the other hand, if the similarity score for      ontological constraints, O. The distribution over in-
two entities is low, the truth values of their respective   terpretations, I, generated by PSL corresponds to a
labels and relations will not be strongly constrained.      probability distribution over knowledge graphs, G:
While we introduce these rules as constraints to the                                    1     X
                                                                      PP (G) = f (I) = exp[       wr φr (I)]
PSL model, they could be used as weighted rules, al-                                    Z
                                                                                              r∈R
lowing us to specify the reliability of the similarity
                                                            The results of inference provide us with the most likely
function.
                                                            interpretation, or soft-truth assignments to entities, la-
                                                            bels and relations that comprise the knowledge graph.
4.3. Enforcing Ontological Constraints                      By choosing a threshold on the soft-truth values in
In our PSL program we also leverage rules corre-            the interpretation, we can select a high-precision set
sponding to an ontology, the groundings of which            of facts to construct our knowledge graphs.
are denoted as O. Our ontological constraints are
based on the logical formulation proposed in (Jiang         5. Experimental Evaluation
et al., 2012).    Each type of ontological relation
is represented as a predicate, and these predicates         We evaluate our method on data from the Never-
represent ontological knowledge of the relationships        Ending Language Learning (NELL) project (Carlson
between labels and relations.       For example, the        et al., 2010). Our goal is to demonstrate that Knowl-
constraints Dom(teamPlaysSport, sportsteam) and             edge Graph Identification can remove noise from un-
Rng(teamPlaysSport, sport) specify that the rela-           certain extractions, producing a high-precision set of
tion teamPlaysSport is a mapping from entities with         facts for inclusion in a knowledge graph. A principal
label sportsteam to entities with label sport. The          concern in this domain is scalability; we show that us-
constraint Mut(sport, sportsteam) specifies that the        ing PSL for MPE inference is a practical solution for
labels sportsteam and sport are mutually exclu-             knowledge graph identification at a massive scale.
sive, so that an entity cannot have both the labels         NELL iteratively generates a knowledge base: in each
sport and sportsteam. We similarly use constraints          iteration NELL uses facts learned from the previous it-
for subsumption of labels (Sub) and inversely-related       eration and a corpus of web pages to generate a new set
functions (Inv). To use this ontological knowledge,         of candidate facts. NELL selectively promotes those
we introduce rules relating each ontological relation to    candidates that have a high confidence from the ex-
the predicates representing our knowledge graph. We         tractors and obey ontological constraints with the ex-
specify seven types of ontological constraints in our       isting knowledge base to build a high-precision knowl-
experiments:                                                edge base. We present experimental results on the
Dom(R, L)      ∧˜ Rel(E1 , E2 , R) ⇒ Lbl(E1 , L)            194th iteration of NELL, using the candidate facts,
Rng(R, L)       ˜
               ∧ Rel(E1 , E2 , R) ⇒ Lbl(E2 , L)             promoted facts and ontological constraints that NELL
Inv(R, S)       ˜ Rel(E1 , E2 , R) ⇒ Rel(E2 , E1 , S)
                ∧                                           used during that iteration. We summarize the impor-
                                                            tant statistics of this dataset in Table 1.
Sub(L, P )      ˜ Lbl(E, L)
                ∧                      ⇒ Lbl(E, P )
                ˜
Mut(L1 , L2 ) ∧ Lbl(E, L1 )            ⇒ ¬Lbl(E, L2 )      In addition to data from NELL, we use data from the
                                                           YAGO database (Suchanek et al., 2007) as part of our
RMut(R, S) ∧    ˜ Rel(E1 , E2 , R) ⇒ ¬Rel(E1 , E2 , S)
                                                           entity resolution approach. The YAGO database con-
                                                           tains entities which correspond to Wikipedia articles,
These ontological rules are specified as constraints to    variant spellings and abbreviations of these entities,
our PSL model. When optimizing the model, PSL              and associated WordNet categories. To correct against
will only consider truth-value assignments or interpre-    the multitude of variant spellings found in NELL’s
tations that satisfy all of these ontological constraints. data, we use a mapping technique from NELL’s enti-
                                                           ties to Wikipedia articles. We then define a similarity
4.4. Probability Distribution Over Uncertain               function on the article URLs, using the similarity as
      Knowledge Graphs                                     the soft-truth value of the SameEnt predicate.

The logical formulation introduced in this section, to-     When mapping NELL entities to YAGO records, We
gether with ontological information and the outputs         perform selective stemming on the NELL entities, em-

Large Scale Knowledge Graph Identification using PSL

                                    Iter194                   links, collectively classifying entity labels, and enforc-
               Date Generated       1/2011                    ing ontological constraints. Using PSL, we illustrate
                Cand. Label           1M                      the scalability benefits of our approach on a large-scale
                                                              dataset from NELL, while producing high-precision re-
                 Cand. Rel           530K                     sults.
                Promotions           300K
               Dom, Rng, Inv          660                     Acknowledgments This work was partially sup-
                    Sub               520                     ported by NSF CAREER grant 0746930 and NSF grant
                    Mut               29K                     IIS1218488.
                   RMut               58K

         Table 1. Summary of dataset statistics               References
                                                              Artiles, J and Mayfield, J (eds.). Workshop on Knowl-
      Percentile    1%     2%    5%    10%     25%              edge Base Population, 2012.
      Precision     .96    .95   .89    .90     .74
                                                              Bach, S. H, Broecheler, M, Getoor, L, and O’Leary,
Table 2. Precision of knowledge graph identification at top     D. P. Scaling MPE Inference for Constrained Con-
soft truth value percentiles                                    tinuous Markov Random Fields with Consensus Op-
                                                                timization. In NIPS, 2012.
ploy blocking on candidate labels, and use a case-            Broecheler, M, Mihalkova, L, and Getoor, L. Proba-
insensitive string match. Each NELL entity can be               bilistic Similarity Logic. In UAI, 2010.
mapped to a set of YAGO entities, and we can generate
                                                              Carlson, A, Betteridge, J, Kisiel, B, Settles, B, Hr-
a set of Wikipedia URLs that map to the YAGO enti-
                                                                uschka, E. R, and Mitchell, T. M. Toward an Ar-
ties. To generate the similarity of two NELL entities,
                                                                chitecture for Never-Ending Language Learning. In
we compute a set-similarity measure on the Wikipedia
                                                                AAAI, 2010.
URLs associated with the entities. For our similarity
score, we use the Jaccard index, the ratio of the size        Etzioni, O, Banko, M, Soderland, S, and Weld, D. S.
of the set intersection and the size of the set union.          Open Information Extraction from the Web. Com-
                                                                munications of the ACM, 51(12), 2008.
The ultimate goal of knowledge graph identification is
to generate a high-precision set of facts for inclusion in    Ji, H, Grishman, R, and Dang, H. Overview of the
a knowledge base. We evaluated the precision of our              Knowledge Base Population Track. In Text Analysis
method by sampling facts from the top 1%, 2%, 5%,                Conference, 2011.
10% and 25% of inferred facts ordered by soft truth
                                                              Jiang, S, Lowd, D, and Dou, D. Learning to Refine
value. We sampled 120 facts at each cutoff, split evenly
                                                                an Automatically Extracted Knowledge Base Using
between labels and relations, and had a human judge
                                                                Markov Logic. In ICDM, 2012.
provide appropriate labs. Table 2 shows the precision
of our method at each cutoff. The precision decays            Kimmig, A, Bach, S. H, Broecheler, M, Huang, B, and
gracefully from .96 at the 1% cutoff to .90 at the 10%          Getoor, L. A Short Introduction to Probabilistic
cutoff, but has a substantial drop-off at the 25% cutoff.       Soft Logic. In NIPS Workshop on Probabilistic Pro-
                                                                gramming, 2012.
Performing the convex optimization for MPE inference
in knowledge graph identification is efficient. The opti-     Namata, G. M, Kok, S, and Getoor, L. Collective
mization problem is defined over 22.6M terms, consist-          Graph Identification. In KDD, 2011.
ing of potential functions corresponding to candidate
                                                              Pasca, M, Lin, D, Bigham, J, Lifchits, A, and Jain,
facts and dependencies among facts generated by on-
                                                                A. Organizing and Searching the World Wide Web
tological constraints. Performing this optimization on
                                                                of Facts-Step One: the One-million Fact Extraction
a six-core Intel Xeon X5650 CPU at 2.66 GHz with
                                                                Challenge. In AAAI, 2006.
32GB of RAM took 44300 seconds.
                                                              Richardson, M and Domingos, P. Markov Logic Net-
6. Conclusion                                                   works. Machine Learning, 62(1-2), 2006.

We have described how to formulate the problem                Singhal, A.    Introducing the Knowledge Graph:
of knowledge graph identification, jointly inferring a          Things, Not Strings, 2012. Official Blog (of Google).
knowledge graph from the noisy output of an informa-
tion extraction system through a combined process of          Suchanek, F. M, Kasneci, G, and Weikum, G. Yago:
determining co-referent entities, predicting relational         A Core of Semantic Knowledge. In WWW, 2007.

You can also read