Abstract
The creation of high quality Entity Resolution (ER) processes depends on the ability to quickly and effectively identify erroneous outcomes (false positives and false negatives) in ER results. In past and current research, truth sets have been used to provide this ability. Unfortunately, managing the quantity of data provided to reviewers for manual annotation during the generation process often forces researchers to generate sampled data that is not entirely representative of the total amount of variation contained within the original dataset. This often causes an over-fitting of the match logic to the truth set. This case study shows the challenges and issues that can arise when using truth sets for creating and analyzing ER matching logic.
Original language | English (US) |
---|---|
State | Published - 2017 |
Externally published | Yes |
Event | 22nd MIT International Conference on Information Quality, ICIQ 2017 - Little Rock, United States Duration: Oct 6 2017 → Oct 7 2017 |
Conference
Conference | 22nd MIT International Conference on Information Quality, ICIQ 2017 |
---|---|
Country/Territory | United States |
City | Little Rock |
Period | 10/6/17 → 10/7/17 |
Keywords
- Boolean Match Rule
- EHR Data
- Entity Resolution
- Truth Set
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Information Systems