Deep Learning from Crowds

AUTHORS:

Filipe Rodrigues (rodr [at] dtu.dk)
Francisco Câmara Pereira

ABSTRACT:

Over the last few years, deep learning has revolutionized the field of machine learning by dramatically improving the state- of-the-art in various domains. However, as the size of supervised artificial neural networks grows, typically so does the need for larger labeled datasets. Recently, crowdsourcing has established itself as an efficient and cost-effective solution for labeling large sets of data in a scalable manner, but it often requires aggregating labels from multiple noisy contributors with different levels of expertise. In this paper, we address the problem of learning deep neural networks from crowds. We begin by describing an EM algorithm for jointly learning the parameters of the network and the reliabilities of the annotators. Then, a novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using backpropagation. We empirically show that the proposed approach is able to internally capture the reliability and biases of different annotators and achieve new state-of-the-art results for various crowdsourced datasets across different settings, namely classification, regression and sequence labeling.

CONFERENCE:

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) (oral presentation)

DOWNLOAD:

SOURCE CODE:

CrowdLayer

DATASETS:

LabelMe (multi-class classification; annotations from Amazon Mechanical Turk)
MovieReviews (regression; annotations from Amazon Mechanical Turk)
2003 CONLL NER task (sequence labelling; annotations from Amazon Mechanical Turk)

Send me an email if you need any additional information regarding the datasets.