Tagging names, concepts or key phrases is a crucial task for Natural Language Understanding pipelines. The Prodigy annotation tool lets you label NER training data or improve an existing model's accuracy with ease.
ordered the new beats product today, hope theyre as good as everyone says 🤔
source: Twitter score: 0.51
Focus on what the model is most uncertain about
Prodigy puts the model in the loop, so that it can actively
participate in the training process, using what it already
knows to figure out what to ask you next. The model learns
as you go, based on the answers you provide. Most annotation
tools avoid making any suggestions to the user, to avoid
biasing the annotations. Prodigy takes the opposite approach:
ask the user as little as possible.
Fast and flexible annotation
Prodigy's web-based annotation app has been carefully
designed to be as efficient as possible. By breaking complex
tasks down into smaller units of work, your annotators stay
focused on one decision at a time, giving you better data,
Prodigy is a fully scriptable annotation tool, letting you
automate as much as possible with custom rule-based logic.
You don't want to waste time labeling every instance of
"New York" by hand. Instead, give Prodigy rules or a list
of examples, review the entities in context and annotate
the exceptions. As you annotate, a statistical model can
learn to suggest similar entities, generalising beyond your
Try out new ideas quickly
Annotation is usually the part where projects stall. Instead
of having an idea and trying it out, you start scheduling
meetings, writing specifications and dealing with quality
control. With Prodigy, you can have an idea over breakfast and get your first results by lunch. Once the model is trained, you can export it as a versioned
Python package, giving you a smooth path from prototype
Custom recipes let you integrate machine learning models
using any framework of your choice, load in data from
different sources, implement your own storage solution or
add other hooks and features. No matter how complex your
pipeline is – if you can call it from a Python function,
you can use it in Prodigy.
Start the server
To start Prodigy, run the ner.teach recipe with the model you want to improve, one or more
labels and a text source. All annotations you collect
will be saved to the dataset specified as the first
prodigy ner.teach your_dataset en_core_web_sm your_data.jsonl --label PERSON
As your texts stream in, Prodigy will look up all
possible analyses for each sentence and suggest you
the entities the model is most uncertain about.
Those are also the entities that need your feedback the most. As you click accept or reject, the model in the loop is updated.
Match patterns help you find potential entity
candidates and get over the "cold start problem".
Each dictionary describes one token and supports the same attributes as spaCy’s Matcher. You can create patterns manually, using word vectors or from an existing
To start Prodigy, run the ner.teach recipe with a base model, one or more labels you want to
add, your patterns file to bootstrap suggestions and
a text source. All annotations you collect will be
saved to the dataset specified as the first argument.
As you click accept or reject, the model in the loop will be updated and will start
learning about your new entity type. Once you’ve
annotated enough examples, the model will also start
suggesting entities it's most uncertain about, based
on what it has learned so far.
Optional: Add manual annotations
To cover especially tricky or very specific entities,
you can always add more annotations manually using the ner.manual recipe.
Start the server
To start Prodigy, run the ner.manual recipe with a data source and a comma-separated list of
labels. The model is only used for tokenization. This
lets you annotate faster, because the selection can
snap to the token boundaries. All annotations you
collect will be saved to the dataset specified as the
To highlight a span, click and drag within the
entity, or double-click on single words. Labels can
be selected from the menu above, or via the number
keys on your keyboard.
Export your data
Prodigy stores annotations in a simple JSON format to
make it easy to reuse your data in other applications.
prodigy db-out your_dataset > annotations.jsonl
You can export your annotations at any time, or use the ner.batch-train command to train a model directly from the database – perfect for quick
experiments. Part of the data will be held aside,
letting you output the weights that generalised best.
After training, Prodigy exports a ready-to-use spaCy model that you can load in and test with examples. This also gives
you a good idea of how the model is performing, and
the data needed to improve the accuracy.
nlp = spacy.load('/tmp/model')
doc = nlp(u"What do you think of the Reddit redesign?")
entities = [(ent.text, ent.label_) for ent in doc.ents]