https://github.com/dedupeio/dedupe
Tip revision: 65e2eaae64d032381652b87c86e4144545186457 authored by Forest Gregg on 09 March 2017, 19:20:27 UTC
bug fix in distinct counter in consoleLabel
bug fix in distinct counter in consoleLabel
Tip revision: 65e2eaa
Bibliography.rst
============
Bibliography
============
- http://research.microsoft.com/apps/pubs/default.aspx?id=153478
- http://cs.anu.edu.au/~Peter.Christen/data-matching-book-2012.html
- http://www.umiacs.umd.edu/~getoor/Tutorials/ER\_VLDB2012.pdf
New School
----------
- Steorts, Rebecca C., Rob Hall and Stephen Fienberg. "A Bayesian Approach to Record Linkage and De-duplication" December 2013. http://arxiv.org/abs/1312.4645
Very beautiful work. Records are matched to latent individuals. O(N)
running time. Unsupervised, but everything hinges on tuning
hyperparameters. This work only contemplates categorical variables.
To Read
-------
- Domingos and Domingos Multi-relational record linkage. http://homes.cs.washington.edu/~pedrod/papers/mrdm04.pdf
- An Entity Based Model for Coreference Resolution http://people.cs.umass.edu/~mwick/MikeWeb/Publications_files/wick09entity.pdf