A theoretical framework for knowledge-based entity resolution

K. Schewe, Q. Wang. A theoretical framework for knowledge-based entity resolution. Theoretical Computer Science, volume 549, pages 101-126, 9, 2014.

  • Klaus-Dieter Schewe
  • Qing Wang
JournalTheoretical Computer Science

Entity resolution is concerned with deciding whether two representations of entities refer to the same real-world object. In this paper we introduce a theoretical framework that can support knowledge-based entity resolution. From a logical point of view, the expressive power of the framework is equivalent to a fragment of first-order logic including conjunction, disjunction and a certain form of negation. Although the framework has sufficient expressive power for representing knowledge about different entities in an interactive process, the question of how efficiently redundancy can be eliminated from knowledge patterns of the framework naturally arises. In answering this question, we study the containment problem of knowledge patterns. Our result shows that this problem is NP-complete w.r.t. expression complexity but in PTIME w.r.t. data complexity. This nice property leads us to develop a theoretical notion of optimality for knowledge representation of entity resolution and a mechanism of optimizing a knowledge model (i.e. a finite set of knowledge patterns). In doing so, the proposed framework can effectively manage knowledge about entity resolution and thus efficiently improve accuracy of entity resolution over time.