X-Class: Text Classification with Extremely Weak Supervision

1 minute read

Published:

Our paper “X-Class: Text Classification with Extremely Weak Supervision” is accepted by NAACL 2021.

Highlights

Our proposed model X-Class is able to assign documents to classes (e.g., sports, politics, and science) without any other supervision other than the class names themselves.

Motivation

We propose the task: Text Classification with Extremely Weak Supervision, which is to classify documents to classes, with the plain class name as the only guidance.
Our method X-Class, breaks up this task into three modules

  • Class-oriented Document Representation
    • We estimate both the class representation (based on the given class names), and the document representation (guided by the class representations).
  • Document-Class Alignment
    • We apply Gaussian Mixture Models to align the document representations into clusters. The GMM is initialized with a prior of every document assigned to its nearest class, and therefore, we know which cluster represents which class.
  • Text Classifier Training
    • We further select the confident document-class pairs from the previous step, and train a supervised text classifier (e.g. BERT) above it. This pipeline also illustrates our method.

Please refer to our paper and github for more details. You can also find our presentation and poster for NAACL.