CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

1 minute read

Published: August 16, 2019

Our paper “CrossWeigh: Training Named Entity Tagger from Imperfect Annotations” is accepted by EMNLP 2019 as an oral presentation.

Highlights

We correct the test set of CoNLL03 NER. This higher quality evaluation set can be used in further research. The dataset is avalaible here.
We design a mistake-aware framework CrossWeigh that fits any NER model that supports weighted training.

Motivation

The label annotation mistakes by human annotators brings up two challenges to NER:

mistakes in the test set can interfere the evaluation results and even lead to an inaccurate assessment of model performance.
mistakes in the training set can hurt NER model training.

We address these two problems by:

manually correcting the mistakes in the test set to form a cleaner benchmark.
develop framework CrossWeigh for mistake-aware training.

CrossWeigh works with any NER algorithm that accepts weighted training instances. It is composed of two modules. 1) mistake estimation: where potential mistakes are identified in the training data through a cross-checking process and 2) mistake re-weighing: where weights of those mistakes are lowered during training the final NER model.

Please refer to our paper and github for more details.

Share on

Twitter Facebook Google+ LinkedIn

Zihan Wang

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

Highlights

Motivation

Share on

You May Also Enjoy

X-Class: Text Classification with Extremely Weak Supervision

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

First place in ICPC Mid-Central Regional

Champion of IEEE Xtreme 13.0