CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Published:
Our paper “CrossWeigh: Training Named Entity Tagger from Imperfect Annotations” is accepted by EMNLP 2019 as an oral presentation.
Highlights
- We correct the test set of CoNLL03 NER. This higher quality evaluation set can be used in further research. The dataset is avalaible here.
- We design a mistake-aware framework
CrossWeigh
that fits any NER model that supports weighted training.
Motivation
The label annotation mistakes by human annotators brings up two challenges to NER:
- mistakes in the test set can interfere the evaluation results and even lead to an inaccurate assessment of model performance.
- mistakes in the training set can hurt NER model training.
We address these two problems by:
- manually correcting the mistakes in the test set to form a cleaner benchmark.
- develop framework
CrossWeigh
for mistake-aware training.
CrossWeigh
works with any NER algorithm that accepts weighted training instances. It is composed of two modules. 1) mistake estimation: where potential mistakes are identified in the training data through a cross-checking process and 2) mistake re-weighing: where weights of those mistakes are lowered during training the final NER model.