Prodigy is a machine teaching tool so efficient that a single data scientist can create end-to-end prototypes for new funtionality without commissioning external annotations, and with a smooth path to production. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.

Annotation is usually the part where projects stall. Instead of having an idea and trying it out, you start scheduling meetings, writing specifications and dealing with quality control. With Prodigy, you can have an idea over breakfast and get your first results by lunch. Once the model is trained, you can export it as a versioned Python package, making it easy to get the system into production.

Get up and running quickly. You can use Prodigy straight out-of-the-box – all you need is Python and a web browser. If you run it this way, annotations are stored in a local file, using SQLite. For remote use, you can use the built-in SQLite, MySQL or PostgreSQL back-ends, or easily plug in your own solution.

prodigy dataset my_dataset "New dataset"✨ Created dataset 'my_dataset'.

Use a built-in annotation recipe or write your own. Recipes control the stream of annotation examples and processing logic, and define how to update your model. Prodigy comes with lots of useful components, including loaders for common formats, live API streams, storage back-ends, and neural network models for a range of tasks.

Because recipes are implemented as Python functions, it's easy to integrate your own solutions. No matter how complex your ETL logic is, if you can call it from a Python function, you can use it in Prodigy.


import prodigy from prodigy.components.loaders import NewYorkTimes @prodigy.recipe('news_headlines', dataset=("ID"), query=("Query")) def news_headlines(dataset, query): return { 'dataset': dataset, 'stream': NewYorkTimes(query=query, key='xxx') }

Run recipes from the command line and start annotating. The recipe decorator uses your function's signature to generate a command-line interface, making it easy to run the same recipe with different settings and reuse recipes across your annotation projects. When you run the recipe command, Prodigy will start a web server so you can start annotating.

prodigy news_headlines my_dataset "Silicon Valley" -F recipe.py✨ Starting the web server on port 8080...

Stay productive with a modern web application. Prodigy's web app lets you annotate text, entities, classification and images straight from your browser – even on mobile devices. Its modern UI keeps you focused and only asks you for one binary decision at a time.

As you click or swipe through the examples, annotations are sent back to Prodigy via a REST API. Prodigy can update your model in real-time and choose the most important questions to ask next.

Screenshot of Prodigy web app

Prodigy's recipe for efficient annotation

Prodigy puts the model in the loop, so that it can actively participate in the training process and learns as you go. The model uses what it already knows to figure out what to ask you next, and is updated with the answers you provide. There's no complicated configuration system to work with: you just write a Python function, that returns the components as a dictionary. Prodigy comes with a variety of built-in recipes that can be chained together to build complex systems.

Annotation interfaces

Most annotation tools avoid making any suggestions to the user, to avoid biasing the annotations. Prodigy takes the opposite approach: ask the user as little as possible. The more complicated the structure your model has to produce, the more benefit you can get from Prodigy's binary interface.

Annotate plain text
Named Entity Recognition
Annotate and correct named entities
ner ner_manual
First look at the new MacBook Pro product
Part-of-speech Tagging
Annotate and correct part-of-speech tags
pos pos_manual
First adj look noun at adp the det new adj MacBook propn Pro propn
Dependency Parsing
Annotate syntactic dependencies and semantic relations
First look pobjat the new MacBook
Annotate labelled text or images
First look at the new MacBook Pro
Annotate images, bounding boxes and image segments
Select one or more answers or pick the odd one out
Compare two annotations
Compare texts with a visual diff
Is it time to swap your Mac for a Windows computerlaptop?
Annotate any HTML content

Built-in neural network models

Prodigy includes high-quality statistical models for a number of common applications. You can also use Prodigy to train or evaluate your own solutions — it works with any statistical model.

Named Entity Recognition Start with an existing model and fine-tune its accuracy, add a new entity type or train a new model from scratch. Prodigy supports a novel mode for creating terminology lists and using them to bootstrap NER models.
Text Classification Categorise text by intent, sentiment, topic, or any other scheme. On long documents, an attention mechanism can be used so you only need to read the sentence it finds most relevant.
Text Similarity Coming soon: Assign a numeric similarity score to two pieces of text. With Prodigy, you can judge which of two sentences is more similar to a query.
Image Classification Coming soon: Classify images by object, style, context, or any other measure you're interested in.
Generation Oracle Coming soon: Evaluate your system's generative output, with a model that cleverly memorises the held-out data. Prodigy provides an efficient way to do manual evaluations, by shuffling the output of two systems, and asking you to pick which one is better.

Export and use your models instantly

Prodigy can export ready-to-use models, making it easy to test the results and put them into production. The built-in NLP recipes output spaCy models, which you can package into pip-installable modules. You can also use Prodigy with any machine learning library via custom recipes. Built-in support for TensorFlow, Keras, PyTorch and scikit-learn models is coming soon.

Coming soon
Coming soon
Coming soon

Choice of storage back-ends

You can use your favourite database to keep a copy of all annotations you've collected. Either connect to one of the built-in options, or integrate your own.




Support for various file formats

Prodigy supports the most common file formats out-of-the-box and will detect the loader to use from the file extension.

Textjsonl, json, csv, txtText from files in various data formats.
Imagesjpg, jpeg, png, gif, svgImages from a URL or local directory.
CorporaRedditContent from published data sets.

Real-world data from live APIs

Streaming in content like news headlines or images from live APIs is a great way to jumpstart your project, test how your model is performing on real-world data or quickly bootstrap a set of evaluation examples.

Stream in headlines, excerpts, lead paragraphs or article images plus meta data for a specific topic from news articles via the New York Times API.
Stream in headlines for a specific topic plus meta data from news articles via the The Guardian API.
Stream in headlines for a specific topic plus meta data from news articles via the German Die Zeit API.
Stream in headlines, excerpts or images for various sites via News API. Options include news sites like CNN or Techcrunch, as well as several German publications like Spiegel Online or Handelsblatt.
Stream in tweets for a search query from the Twitter Search API.
Stream in photos for a specific tag, or similar or related tags from the Tumblr Tags API.
Stream in data from issues for a specific search query via the public GitHub API. Supports the GitHub search syntax.
Stream in images from a library of 240k+ high-quality stock photos for a specific search query via the Unsplash API.