https://github.com/dedupeio/dedupe
Raw File
Tip revision: 65e2eaae64d032381652b87c86e4144545186457 authored by Forest Gregg on 09 March 2017, 19:20:27 UTC
bug fix in distinct counter in consoleLabel
Tip revision: 65e2eaa
Bibliography.rst
============
Bibliography
============

-  http://research.microsoft.com/apps/pubs/default.aspx?id=153478
-  http://cs.anu.edu.au/~Peter.Christen/data-matching-book-2012.html
-  http://www.umiacs.umd.edu/~getoor/Tutorials/ER\_VLDB2012.pdf

New School
----------
- Steorts, Rebecca C., Rob Hall and Stephen Fienberg. "A Bayesian Approach to Record Linkage and De-duplication" December 2013. http://arxiv.org/abs/1312.4645

Very beautiful work. Records are matched to latent individuals. O(N)
running time. Unsupervised, but everything hinges on tuning
hyperparameters. This work only contemplates categorical variables.


To Read
-------
- Domingos and Domingos Multi-relational record linkage. http://homes.cs.washington.edu/~pedrod/papers/mrdm04.pdf
- An Entity Based Model for Coreference Resolution http://people.cs.umass.edu/~mwick/MikeWeb/Publications_files/wick09entity.pdf


back to top