X-Class: Text Classification with Extremely Weak Supervision

1 minute read

Published: June 08, 2021

Our paper “X-Class: Text Classification with Extremely Weak Supervision” is accepted by NAACL 2021.

Highlights

Our proposed model X-Class is able to assign documents to classes (e.g., sports, politics, and science) without any other supervision other than the class names themselves.

Motivation

We propose the task: Text Classification with Extremely Weak Supervision, which is to classify documents to classes, with the plain class name as the only guidance.
Our method X-Class, breaks up this task into three modules

Class-oriented Document Representation
- We estimate both the class representation (based on the given class names), and the document representation (guided by the class representations).
Document-Class Alignment
- We apply Gaussian Mixture Models to align the document representations into clusters. The GMM is initialized with a prior of every document assigned to its nearest class, and therefore, we know which cluster represents which class.
Text Classifier Training
- We further select the confident document-class pairs from the previous step, and train a supervised text classifier (e.g. BERT) above it. This pipeline also illustrates our method.

Please refer to our paper and github for more details. You can also find our presentation and poster for NAACL.

Share on

Twitter Facebook Google+ LinkedIn

Zihan Wang

X-Class: Text Classification with Extremely Weak Supervision

Highlights

Motivation

Share on

You May Also Enjoy

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

First place in ICPC Mid-Central Regional

Champion of IEEE Xtreme 13.0

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations