Training an insults classifier on Reddit comments

In this video we'll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. This type of task is especially relevant for online marketplaces or social media sites, to flag abusive or disruptive behaviours. An example use case might be a warning popup before a potentially abusive message is sent, which could alert the user that the tone seems inappropriate.

Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly. The same approach can be used to solve problems such as sentiment analysis or chatbot intent detection.

Overview

Bootstrap a terminology list from seed terms. Using the web interface and the word vectors, we quickly collect over 50 similar insults. The terms are stored in a dataset and can be reused in the next step.

Annotate comments from Reddit based on the terminology list. We can now stream in examples from the Reddit corpus and annotate whether they contain an insult. The terminology list is used to start off with more relevant examples. Using the web app, we can quickly collect 500 annotations.

Train the text classifier and export the model. Using Prodigy's built-in training command, we train a model using 400 annotations for training and 100 for evaluation. We manage to achieve an accuracy of 85% – enough to beat the baseline.

Try the model in spaCy. After training, Prodigy exports a ready-to-use spaCy model that we can load in and test with examples. This also gives us a good idea of how the model is performing, and the training data needed to improve the accuracy.

scikit