The challenges of, and why you should reconsider using, truth sets for optimizing entity resolution: A case study

Pei Wang, Daniel L. Pullen, Maryam Y. Garza, Meredith N. Zozus

Producción científica: Paperrevisión exhaustiva

Resumen

The creation of high quality Entity Resolution (ER) processes depends on the ability to quickly and effectively identify erroneous outcomes (false positives and false negatives) in ER results. In past and current research, truth sets have been used to provide this ability. Unfortunately, managing the quantity of data provided to reviewers for manual annotation during the generation process often forces researchers to generate sampled data that is not entirely representative of the total amount of variation contained within the original dataset. This often causes an over-fitting of the match logic to the truth set. This case study shows the challenges and issues that can arise when using truth sets for creating and analyzing ER matching logic.

Idioma originalEnglish (US)
EstadoPublished - 2017
Publicado de forma externa
Evento22nd MIT International Conference on Information Quality, ICIQ 2017 - Little Rock, United States
Duración: oct 6 2017oct 7 2017

Conference

Conference22nd MIT International Conference on Information Quality, ICIQ 2017
País/TerritorioUnited States
CiudadLittle Rock
Período10/6/1710/7/17

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Information Systems

Huella

Profundice en los temas de investigación de 'The challenges of, and why you should reconsider using, truth sets for optimizing entity resolution: A case study'. En conjunto forman una huella única.

Citar esto