Build AI systems that
do exactly what you want

A modern data development experience
from the makers of

prodigyner.llm.correctnews_articles./config.cfg./news.jsonl
This live demo requires JavaScript to be enabled.

Efficiently define, train and evaluate

Prodigy is an extensible annotation tool that gives you a new way to build custom AI systems. Define your classification scheme with real-world examples rather than just prompts, and let powerful models assist – no machine learning experience required.

prodigytrain./information_extraction--ner news_ner--textcat news_textcat=========== Training pipeline ===========48% | ████████████████

Take back control

Prodigy runs entirely under your control, making it suitable for even the strictest privacy requirements. You can download it and run it locally right out of the box, or adapt it to serve your infrastructure needs. The models you produce are yours as well, with absolutely no lock-in.

recipe.py

@prodigy.recipe(
"my_custom_recipe"
dataset=Arg(help="dataset to save answers to"),
source=Arg("--source", help="data to load"),
label=Arg("--label", "-l", help="comma-separated label(s)"),
)
def recipe(dataset: str, source: str, dataset: List[str]):
...

Terminal

prodigymy_custom_recipeannotations./samples.jsonl--label PERSON,PRODUCT

Built for customization and extension

Prodigy lets you define fully custom data feeds and interfaces, letting the computer work instead of the human. By breaking down tasks into smaller pieces and automating whatever you can, you can make annotation over 10× as efficient.

Documentation

Overview
  • Downloadable developer tool and library
  • Create, review and train from your annotations
  • Runs entirely on your own machines
  • Powerful built-in workflows

Pricing

Overview
  • Lifetime license, pay once, use forever
  • Flexible options for individuals and teams
  • Full privacy, no data leaves your servers
  • Download and install like any other library

Real-world case studies

What others say

Christopher Ewen

Senior Product Manager

Andy Halterman

Researcher
A lack of labeled data held geoparsing back for years. It took a week to fix that with Prodigy.

India Kerle

Data Scientist

Anna Vissens

Lead Data Scientist

Cheyanne Baird

NLP Research Scientist

Raphael Cohen

Head of Research
Prodigy is by far the best ROI we had on any tool!

Daniel Bourke

Founder
We love Prodigy! I've tried many data labelling tools and chose Prodigy specifically for the simplicity. Image folder plus text file to database is perfect for our needs. If a model is one of our main products, good data is basically the same as good code.

Antonio Polo de Alvarado

ML Engineer
I have been working with Prodigy these last few weeks and I can say that it is probably (if not the best) one of the best NLP tools.

Rebecca Bilbro

Founder & CTO
Prodigy’s interface is incredibly intuitive! It elevates data labeling to a first-order concern in the ML workflow, enables us to collaborate on measures of inter-rater reliability and makes the labeling options super unambiguous for data annotators.

Frequently Asked Questions

Any other questions that are not covered here? Email us!
What makes Prodigy different from other annotation solutions?

Prodigy is a downloadable developer tool for creating training and evaluation data for machine learning models. You can use Prodigy to build custom AI systems specific to your use case that you can own and control. Prodigy is a Python package and library that includes a web application. You can customize Prodigy with your own Python functions, and mix and match frontend components to make your own annotation experience.

Prodigy integrates tighly with spaCy, but can also be used with any other libraries and tools. The library includes a range of pre-built workflows and command-line commands for various common tasks, and components for implementing your own workflow scripts. Your scripts can specify how the data is loaded and saved and even define custom HTML and JavaScript. The web application is optimized for fast, intuitive and efficient annotation.

Is our data really private? How does it work?

Prodigy runs entirely on your own machines and never “phones home” or connects to our or any third-party servers. Once installed, you can even operate it on an entirely air-gapped machine without internet connection. All data and models you use and create stay entirely private and under your control.

Which models can I use and train with Prodigy?

Prodigy lets you train any models you can train in Python. It comes with first-class support for our NLP library spaCy via the built-in train recipe, as well as plugins for using and training Hugging Face models. It also integrates with the major Large Language Model (LLM) API providers out-of-the-box.

All data you create is accessible via a convenient Python API and command-line interface, making it easy to implement training for custom models with standard libraries like PyTorch or TensorFlow, both in the cloud, as well as in local setups or environments like Jupyter.

How customizable are Prodigy’s workflows and interfaces?

Prodigy allows for extensive customization. A range of built-in settings makes it easy for non-experts to customize the experience, and the developer API and SDK lets you integrate the tool into your existing workflows and build powerful extensions for custom use cases.

At the core of Prodigy’s developer experience are "recipes", Python functions that describe a workflow. Recipes can implement custom data processing and model training logic, integrate with third-party or internal libraries and tools and provide reusable workflows for your team that can be run without requiring programming or machine learning expertise. Prodigy also allows combining interfaces to build fully custom solutions, as well as implementing your own interactive interfaces with HTML, CSS and JavaScript.

What expertise does my team need to use Prodigy?

Prodigy is designed as a developer tool and assumes basic familiarity with the Python programming language and the command line. We also provide extensive documentation and examples to help you get started. Once you’ve set up an annotation task, the web application makes it easy for anyone to create annotations, no programming experience required.

Which cloud providers does Prodigy support?

Prodigy provides a standard Python web server and application that can be deployed on any cloud provider of your choice, including entirely on-premise. You can read more about deployment options and instructions here.

Do you have special offers for researchers and universities?

We’re always happy to support research projects, and researchers at degree-granting academic institutions can apply for an interim license to use Prodigy for free. To claim your research license, email us and include your university details.