DeIDNER corpus: Annotation of clinical discharge summary notes for named entity recognition using BRAT tool

Mahanazuddin Syed, Shaymaa Al-Shukri, Shorabuddin Syed, Kevin Sexton, Melody L. Greer, Meredith Zozus, Sudeepa Bhattacharyya, Fred Prior

Producción científica: Chapter

3 Citas (Scopus)

Resumen

Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model's performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and deidentification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.

Idioma originalEnglish (US)
Título de la publicación alojadaPublic Health and Informatics
Subtítulo de la publicación alojadaProceedings of MIE 2021
EditorialIOS Press
Páginas432-436
Número de páginas5
ISBN (versión digital)9781643681856
ISBN (versión impresa)9781643681849
DOI
EstadoPublished - jul 1 2021

ASJC Scopus subject areas

  • General Medicine

Huella

Profundice en los temas de investigación de 'DeIDNER corpus: Annotation of clinical discharge summary notes for named entity recognition using BRAT tool'. En conjunto forman una huella única.

Citar esto