MA-sLDAc: Multi-Annotator Supervised LDA for classification

MA-sLDAc is a C++ implementation of the supervised topic models with labels provided by multiple annotators with different levels of expertise, as proposed in:

The code is based on the supervised LDA (sLDA) implementation by Chong Wang and David Blei (http://www.cs.cmu.edu/~chongw/slda/). Three different variants of the proposed model are provided:

  • MA-sLDAc (mle): This implementation uses maximum likelihood estimates for the topics distributions (beta) and the annotators confusion matrices (pi);
  • MA-sLDAc (smooth): This implementation places priors on beta and pi and performs approximate Bayesian inference;
  • MA-sLDAc (svi): This implementation is similar to the “MA-sLDAc (smooth)”, but uses stochastic variational inference (svi).

For simplicity reasons, I recommend first-time users to start with “MA-sLDAc (mle)”, since this version has less parameters that need to be specified.

Sample data using the 20newsgroups dataset is provided here. See the readme file for a quick example on how to run MA-sLDA over this data.

Other datasets collected from Amazon Mechanical Turk are also provided below.

DOWNLOAD:

Source code

DATASETS:

  • 20newsgroups (simulated annotators)
  • Reuters (annotations from Amazon Mechanical Turk)
  • LabelMe (annotations from Amazon Mechanical Turk)

CONTACT:

Please send questions and comments to rodr [at] dtu.dk