Large Language Models

Nothing is stopping you from integrating Prodigy with services that can help you annotate. This includes services that provide large language models, like OpenAI, that offer zero/few-shot learning. Prodigy even provides a few custom recipes to help you get started.

From OpenAI to Prodigy diagram

Named Entity Recognition

You can use ner.openai.correct to annotate examples with live suggestions from OpenAI. This recipe marks entity predictions obtained from a large language model and allows you to accept them as correct, or to manually curate them. Alternatively you can also choose to fetch examples ahead of time. The ner.openai.fetch recipe gives you the same suggestions from OpenAI but is able to download a large batch of examples upfront. These examples can then be annotated and corrected via the ner.manual recipe.

Both recipes can be used to detect entities that spaCy models aren't trained on and you're free to adapt the recipes. You can provide examples to have OpenAI do few-shot learning, change the hyperparameters from the command line or choose to send your own custom prompts.

Read more

Example

prodigyner.openai.fetchexamples.jsonlopenai-out.jsonldish,ingredient,equipment-F ./recipes/ner.py

Example of entities OpenAI can pre-highlight

This live demo requires JavaScript to be enabled.

Example

prodigytextcat.openai.fetchexamples.jsonlopenai-out.jsonlrecipe,feedback,question-F ./recipes/textcat.py

Example response from OpenAI with reasoning

This live demo requires JavaScript to be enabled.

Text Classification

The recipe textcat.openai.correct lets you classify texts faster with the help of OpenAI. It also provides a reason why a particular label was chosen. Just like the named entity recipes, you can also choose to fetch examples upfront instead via the textcat.openai.fetch recipe.

By fetching the examples upfront, you'll also be able to filter based on the OpenAI predictions. This can be incredibly useful when you're dealing with an imbalanced classification task. You can sample more from the examples that are rare by re-using the OpenAI predictions to filter candidates.

Read more

Generate terminology lists from scratch

There are many ways to use a large language model with zero-shot capabilities. You make predictions to pre-annotate examples, but you can also have it bootstrap terminology lists via the openai.terms.fetch recipe. These terms can be reviewed so they can later be used for named entity recognition, span categorization or weak-supervision.

Read more

Example

prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl-F ./recipes/terms.py
skateboard-tricks.jsonl{"text": "kickflip", "meta": {"openai_query": "skateboard tricks"}}
{"text": "nose manual", "meta": {"openai_query": "skateboard tricks"}}
{"text": "heelside flip", "meta": {"openai_query":"skateboard tricks"}}
{"text":"ollie", "meta": {"openai_query": "skateboard tricks"}}
{"text": "frontside boardslide", "meta": {"openai_query": "skateboard tricks"}}
{"text": "5050 Grind", "meta": {"openai_query": "skateboard tricks"}}

Example

prodigyab.openai.promptshaikuinput.jsonltemplates/ab/input.jinja2templates/ab/prompt1.jinja2templates/ab/prompt2.jinja2-F ./recipes/ab.py

Example of A/B prompt workflow

This live demo requires JavaScript to be enabled.

Prompt A/B evaluation

The ab.openai.prompts recipe allows you to quickly compare the quality of outputs from two OpenAI prompts in a quantifiable and blind way. Given these two prompts and the following input data, you can get this interface, with candidates automatically generated by OpenAI.

prompt1.jinja2Write a haiku about {{topic}}.
prompt2.jinja2Write a hilarious haiku about {{topic}}.
input.jsonl{"id": 0, "prompt_args": {"topic": "Python"}}
{"id": 0, "prompt_args": {"topic": "star wars"}}
{"id": 0, "prompt_args": {"topic": "maths"}}
Read more
View the documentation