Learning from Multiple Annotators: Distinguishing Good from Random Labelers

AUTHORS:

Filipe Rodrigues (fmpr [at] dei.uc.pt)
Francisco Câmara Pereira
Bernardete Ribeiro

ABSTRACT:

With the increasing popularity of online crowdsourcing platforms such as Amazon Mechanical Turk (AMT), building supervised learning models for datasets with multiple annotators is receiving an increasing attention from researchers. These platforms provide an inexpensive and accessible resource that can be used to obtain labeled data, and in many situations the quality of the labels competes directly with those of experts. For such reasons, much attention has recently been given to annotator-aware models. In this paper, we propose a new probabilistic model for supervised learning with multiple annotators where the reliability of the different annotators is treated as a latent variable. We empirically show that this model is able to achieve state of the art performance, while reducing the number of model parameters, thus avoiding a potential overfitting. Furthermore, the proposed model is easier to implement and extend to other classes of learning problems such as sequence labeling tasks.

KEYWORDS:

Multiple Annotators, Crowdsourcing, Latent Variable Models, Expectation-Maximization, Logistic Regression

JOURNAL:

Pattern Recognition Letters, Elsevier, 2013

DOWNLOAD:

PDF

DOI:

10.1016/j.patrec.2013.05.012

DATASETS AND SOURCE CODE:

Source code // Datasets