Span Categorization

Extracting longer phrases and nested expressions from documents is a common task in applied Natural Language Processing. Prodigy lets you label training data for span categorization or improve an existing model’s accuracy with ease.

Fast and flexible annotation

Prodigy’s web-based annotation app has been carefully designed to be as efficient as possible. The manual interface lets you label spans by highlighting words text by hand. Your annotations snap to token boundaries, and you can mark single-word spans by double-clicking.

Try it live and highlight spans!

This live demo requires JavaScript to be enabled.

Try it live and highlight entities!

This live demo requires JavaScript to be enabled.
patterns.jsonl{"pattern": "septic shock", "label": "CONDITION"}
{"pattern": [{"like_num": true}, {"orth": "-"}, {"lower": "day"}, {"lower": "mortality"}], "label": "EFFECT"}
This live demo requires JavaScript to be enabled.

Bootstrap with powerful patterns

Prodigy is a fully scriptable annotation tool, letting you automate as much as possible with custom rule-based logic. You don’t want to waste time labeling every instance of common phrases by hand. Instead, give Prodigy rules or a list of examples, review the spans in context and annotate the exceptions.

Immediately train spancat models

Once you've got your first annotations you can immediately have Prodigy train spaCy models for span categorization. You can point the trainto the datasets of interest and immediately get a machine learning pipeline for text classification. You can even train a model that handles multiple tasks and choose to override the settings from the command line.

From here, you can re-use the model to make annotation easier via spans.correct to pre-highlight annotations for you.

Example

prodigytrain./spancat-model--spancat dataset_a,dataset_b--training.max-steps 1000

Example

prodigyspans.correctspans-dataset./spancat-modelexamples.jsonl--label condition,effect
View the documentation