Large Language Models (LLMs)

Nothing is stopping you from integrating Prodigy with services that can help you annotate. This includes large language models, which allow you to provide a prompt in order to attempt an NLP task. Prodigy integrates with these models via the spacy-llm package and comes preconfigured with some recipes that you can use directly.

Quickstart

I want large language models to help me with Named Entity Recognition.

You can use the ner.openai.correct, ner.openai.fetch, ner.llm.correct and ner.llm.fetch recipes to pre-highlight text examples with NER annotations. These annotations typically deserve a review, but the large language model in these recipes is able to generate annotations for many entities that are not supported out of the box by pretrained spaCy pipelines.

You can learn more by checking the named entity section on this page.

I want large language models to help me with Text Classification.

You can use the textcat.openai.correct, textcat.openai.fetch, textcat.llm.correct and textcat.llm.fetch recipes to attach class predictions to text examples. These annotations typically deserve a review, but the large language model is able to generate labels that don’t require you to train your own model beforehand.

You can learn more by checking the text classification section on this page.

I want to use large language models to generate terminology lists.

The terms.llm.fetch and terms.openai.fetch recipes can accept a topic in order to generate a terminology list for you.

You can learn more by checking the terminology section on this page.

I want to leverage large language models to find interesting data subsets in my data.

The ner.llm.fetch, textcat.llm.fetch, ner.openai.fetch and textcat.openai.fetch recipes allow you to download predictions from large language models upfront. These predictions won’t be perfect, but they might allow you to select an interesting subset for manual review in Prodigy.

The most compelling use case for this is when you’re dealing with a rare label. Instead of going through all the examples manually you could instead only check the examples in which the LLM predicts the label of interest.

Be aware that it can be expensive to send many queries to an LLM vendor, but it can be worth the investment if this is a method to help you get started.

I want to compare the effectiveness of custom prompts.

There can be good reasons to write custom prompts for large language models. To help discover which prompts perform best you may consider using the ab.llm.tournament, ab.openai.prompts and the ab.openai.tournament recipes to compare prompts.

Experimental Technology

The large language model recipes mentioned below offer an integration with a new technology and it’s important to be aware of the limitations and consequences.

Some of the recipes integrate with a third-part vendor, like OpenAI. Using these will require you to send data to their servers. If you’re dealing with sensitive data, you may want to opt out of these recipes. Furthermore, LLM providers will generate responses based on a statistical model which cannot be trusted blindly. The dataset behind the model contains factul inaccuracies, potentially harmful stereotypes and lacks information on recent events. Vendors may apply rate limits when their servers receive more traffic, which is something that the Prodigy recipes cannot control. OpenAI in particular has been known to do this.

The recipes provided by Prodigy are useful when there is a human in the loop, but the experimental and potentially non-local nature of the integration deserves to be highlighted upfront.

How do these recipes work?

Large language models, like those offered by OpenAI, can be used for text completion tasks. They allow you to input some text as a prompt, and the model will generate text completion that tries to match whatever context was given.

This text-in, text-out interface means you can try to engineer a prompt that the large language model can use to perform a specific task that you’re interested in. While this approach is not perfect and part of on-going research, it does offer a general method to construct prompts for many typical NLP tasks such as NER or text classification.

As of v1.13, Prodigy integrates with large language models via the spacy-llm library. This project supports multiple large language models backends that all output structured data like a normal spaCy model.

These recipes include:

ner.llm.correct/ ner.llm.fetch review/download NER annotations performed by spacy-llm
spans.llm.correct/ spans.llm.fetch review/download spancat annotations by spacy-llm
textcat.llm.correct/ textcat.llm.fetch review/download text classification annotations by spacy-llm
terms.llm.fetchdownload terms/phrases generated by an LLM
ab.llm.tournament prompt engineering via tournament selection

These recipes combine large language models with Prodigy to aid the human in the loop by providing pre-annotated examples. The goal of the available recipes is to help you get started quicker but the recipes themselves, including their prompts, can be fully customized too.

Originally, as of v1.12, Prodigy offered seven recipes that interact with OpenAI directly. These include:

ner.openai.correct/ ner.openai.fetch review/download NER annotations
textcat.openai.correct/ textcat.openai.fetch review/download text classification annotations
terms.openai.fetch retreive terminology lists based on a query
ab.openai.prompts compare two prompts for OpenAI in a blind taste test
ab.openai.tournament compare many prompts for OpenAI in a tournament

All of these recipes handle the prompt generation from OpenAI as well as the response parsing in order to help you annotate.

Benefits of spacy-llm

There are also some other benefits of the spacy-llm approach that are worth highlighting.

When you use spacy-llm you’re free to switch LLM providers if you’d like to experiment with different vendors.
Some of the LLM backends that spacy-llm supports can be run on your own hardware, removing the need to send data to a third party.
The spacy-llm recipes allow you to use a cache that prevents the same costly prompt from being run twice.
The spacy-llm recipes will allow you to predict more than just text categories and named entities. Spans are currently supported, but other linguistic tasks will likely be added in the future as well.
The prompts from spacy-llm will likely more up to date prompts. The spaCy team is directly working on that project, which also means that it can iterate on better models quickly and independently.

Getting Started with Prodigy and spacy-llm New: 1.13

You can use spacy-llm to add LLM-powered components to a spaCy pipeline. This allows for an easy integration with Prodigy, because it’s already set up to work nicely with spaCy.

To use an LLM with spaCy you’ll need to start by creating a configuration file that tells spacy-llm how to construct a prompt for your task. Here’s one such example that could be used to find named entities in recipes.

Basic spacy-llm config
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"
save_io = true

[components.llm.task]
@llm_tasks = "spacy.NER.v2"
labels = ["DISH", "INGREDIENT", "EQUIPMENT"]

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

Let’s go over what this configuration file defines.

At the start of the file we’re defining an English spaCy pipeline with a single component called "llm". At this point in the configuration file it is not known what kind of pipeline component it is, all we know is the name.
Later in the file, we see that there is a definition for the llm component and that it uses a factory called "llm". This refers to a registered function that spacy-llm provides that can construct components that can interface with large language models. The file also configures save_io = true, which ensures that the LLM prompt/reponse are saved. Not every Prodigy recipe will need this, but some do, so it’s a good practice to always include it when you’re configuration spacy-llm pipelines for Prodigy.
Next, we see a definition for a “task”. In spacy-llm a task is a combination of a prompt generator and a response parser. The prompt generator will generate a prompt based on the inputs that you provide and the response parser makes sure that any response from the large lanuage model is properly turned into structured data for spaCy. In this example we’re configuring a spacy.NER task, which allows us to provide labels. In this case the file is configured to detect DISH, INGREDIENT and EQUIPMENT entities.
Finally we also configure a backend, which is where we configure which LLM provider to use. You can choose to go with a paid vendor, like OpenAI, but you can also configure a local model, like Dolly, instead. If you’re going with a vendor, you’ll need to set up your environment variables so that you can identify yourself.

The importance of .env files

You might be using a vendor, like OpenAI, as a backend for your LLM. In such cases you’ll need to setup up secrets such that you can identify yourself.

These secrets really need to be kept safe, which is why we recommend storing them in as environment variables in a .env file. Here’s an example of such a file. You can consult the expected environment variable names for different providers in spaCy documentation

.env
OPENAI_API_ORG = "org-..."
OPENAI_API_KEY = "sk-..."

While this is a recommended way, Prodigy shouldn’t make assumptions on how the environment variables are managed. The user needs to make sure these variables are loaded before executing Prodigy. One way to do that is to load them via python-dotenv:

Example

dotenvrun--python -mprodigyrecipearguments

Please note that the OpenAI recipes do make this assumption and load the variables internally so this step should not be needed, but the variables need to be stored in .env. If there are environment variables missing you should see a helpful warning message that tells you which variables you need to add.

If you use an .env file, you should make sure that it is added to a .gitignore such that it never gets uploaded to a central repository. If somebody were to gain access to this key they might incur costs on your behalf with it.

How it works

Under the hood, spacy-llm will take your configuration file and use it to write a prompt for the large language model when it is presented with an example to annotate. If we assume the following input:

{"text": "I know of a great pizza recipe with anchovies."}

Then in this particular case, the prompt may look something like this:

NER prompt sent to LLM
From the text below, extract the following entities in the following format:
dish: <comma delimited list of strings>
ingredient: <comma delimited list of strings>
equipment: <comma delimited list of strings>

Text:
"""
I know of a great pizza recipe with anchovis.
"""

After the large language model receives the prompt, it will process it and produce output, which might look like this.

NER response from LLM
dish: pizza
ingredient: anchovis
equipment:

The goal of spacy-llm is to handle the prompt generation and parsing on your behalf while supporting multiple LLM backends. The interface is just like a normal spaCy pipeline that you’re used to, but by supporting these large language models directly we may have an opportunity to make data annotation easier. It can remove the need to find a pre-trained model for the task that we’re interested in by leveraging an appropriate prompt instead.

Using spacy-llm pipelines directly

Before diving deeper into the spacy-llm recipes, it’s good to observe that you can also use these pipeline programatically. If you have a custom Python recipes, you’ll be able to directly assemble a spaCy pipeline from a config file.

from spacy_llm.util import assemble
from dotenv import load_dotenv

# Make sure the environment variables are loaded
load_dotenv()

# Assemble a spaCy pipeline from the config
nlp = assemble("config.cfg")

# Use this pipeline as you would normally
doc = nlp("I know of a great pizza recipe with anchovis.")
print(doc.ents) # (pizza, anchovis)

You can also use the spaCy assemble command from the terminal to generate a local nlp pipeline that you can load as well.

dotenv run -- spacy assemble config.cfg en_ner_cooking

This will save a folder on disk called en_ner_cooking that contains a spaCy pipeline that you can load like any other spaCy pipeline.

import spacy
from dotenv import load_dotenv

# Again, we have to sure the environment variables are loaded
load_dotenv()

# Load saved LLM pipeline from disk
nlp = spacy.load("en_ner_cooking")

# Again, use this pipeline as you would normally
doc = nlp("I know of a great pizza recipe with anchovis.")
print(doc.ents) # (pizza, anchovis)

This means that, theoretically, you could immediately start re-using the saved pipeline in your recipes as you would normally. If, for example, you’d like to use ner.correct with this model, you can do so by running:

Example

dotenvrun--ner.correctner_cookingen_ner_cookingexamples.jsonl--component llm

This setup works, but as we’ll see later, the Prodigy integration with spacy-llm will make this much easier.

Using spacy-llm recipes for NER

Instead of building a spaCy model and storing it to disk manually, you can also directly use the *.llm.* recipes, which use spacy-llm under the hood. For NER that means that you can use the ner.llm.correct recipe to annotate data with an LLM model in the loop.

Example

dotenvrun--prodigyner.llm.correctannotated-recipesspacy-llm-config.cfgexamples.jsonl

This will start an interface that shows you the LLM predictions together with the prompt and response.

Example of LLM interface

This live demo requires JavaScript to be enabled.

Because this example is annotated correctly, you can simply accept the annotation without having to use your mouse to annotate the entities. This can save a tremendous amount of time, but it should be stressed that the LLM annotations can be wrong. You still want to be in the loop to curate the annotations.

Alternatively, you can also use ner.llm.fetch to download these annotations on disk such that you can later review them with the ner.manual recipe. This interface won’t give you the prompt information, but does allow you to call the LLM exactly once even if you reset the server and you want to show the data to multiple annotators.

Example

dotenvrun--ner.llm.fetchspacy-llm-config.cfgexamples.jsonlner-annotated.jsonl100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

More spacy-llm configurations for NER

Let’s expand the spacy-llm configuration for named entity recognition.

Basic spacy-llm config
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"
save_io = true

[components.llm.task]
@llm_tasks = "spacy.NER.v2"
labels = ["DISH", "INGREDIENT", "EQUIPMENT"]

[components.llm.task.label_definitions]
DISH = "Extract the name of a known dish."
INGREDIENT = "Extract the name of a cooking ingredient, including herbs and spices."
EQUIPMENT = "Extract any mention of cooking equipment. e.g. oven, cooking pot, grill"

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "ner_examples.yml"

[components.llm.cache]
@llm_misc = "spacy.BatchCache.v1"
path = "local-cached"
batch_size = 3
max_batches_in_mem = 10

There are three main additions to this configuration file.

It now has label definitions that help describe the annotation task via components.llm.task.label_definitions. This can help you give the large language model extra context and may yield more reliable results.
It now has few shot examples via components.llm.tasks.examples. It is now configured to use a few shot reader to load in examples from a file on disk. This allows you to add examples to the prompt, possibly examples that the LLM got wrong in the past, in an attempt to steer the LLM to what you want it to do.
It now has a cache via components.llm.cache. By configuring this, spacy-llm will store batches of documents in the local-cached folder so that you don’t have to incur costs when you rerun the same example.

For the best performance, we recommend passing in label definitions as well as few shot examples to the prompt when you write your own configuration files as well.

The aforementioned few-shot examples need to be structured, and the expected structure will depend on the task that you’re running. More details can be found on the spacy-llm docs, but for NER it might look like this:

ner_examples.yaml
- text: "You can't get a great chocolate flavor with carob."
  entities:
    INGEDIENT: ['carob']

- text: "You can probably sand-blast it if it's an anodized aluminum pan."
  entities:
    INGEDIENT: []
    EQUIPMENT: ['anodized aluminum pan']

Given such a file on disk, you will now use a different prompt when running the ner.llm.correct and ner.llm.fetch recipes.

The new prompt used under the hood
You are an expert Named Entity Recognition (NER) system. Your task is to accept Text as input and extract named entities for the set of predefined entity labels.

From the Text input provided, extract named entities for each label in the following format:

DISH: <comma delimited list of strings>
INGREDIENT: <comma delimited list of strings>
EQUIPMENT: <comma delimited list of strings>

Below are definitions of each label to help aid you in what kinds of named entities to extract for each label.
Assume these definitions are written by an expert and follow them closely.

DISH: Extract the name of a known dish.
INGREDIENT: Extract the name of a cooking ingredient, including herbs and spices.
EQUIPMENT: Extract any mention of cooking equipment. e.g. oven, cooking pot, grill

Below are some examples (only use these as a guide):

Text:
'''
You can't get a great chocolate flavor with carob.
'''

INGREDIENT: carob

Text:
'''
You can probably sand-blast it if it's an anodized aluminum pan.
'''

INGREDIENT:
EQUIPMENT: anodized aluminum pan


Here is the text that needs labeling:

Text:
'''
In Silicon Valley, a Voice of Caution Guides a High-Flying Uber
'''

You’ll notice that the prompt is now much longer than before. In theory this will give the LLM more context and may also allow it to give a better predictive performance.

More performance with

In general, it’s best to create a spacy-llm pipeline with detailed label descriptions and examples. It’s especially useful to add examples to the prompt that LLM might otherwise get wrong. This tends to add a lot of context to the prompt which the LLM can use to repond appropriately. The only downside, in practice, is that the longer prompt may also slow down the pipeline and incur more compute costs on your behalf.

If you’re interested in making the LLM even more reliable then you might want to consider the spacy.NER.v3 task that uses chain-of-thought reasoning in the prompt to generate better annotations. It’s based on the PromptNER paper and it requires you to pass an example to the prompt.

Example

Here’s an example configuration that uses spacy.NER.v3.

config.cfg
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["DISH", "INGREDIENT", "EQUIPMENT"]
description = "Entities are the names food dishes, ingredients, and any kind of cooking equipment. Adjectives, verbs, adverbs are not entities. Pronouns are not entities."

[components.llm.task.label_definitions]
DISH = "Known food dishes, e.g. Lobster Ravioli, garlic bread"
INGREDIENT = "Individual parts of a food dish, including herbs and spices."
EQUIPMENT = "Any kind of cooking equipment. e.g. oven, cooking pot, grill"

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "few-shot-examples.json"

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"

This configuration refers to a few-shot-examples.json file, which might have examples like below.

few-shot-examples.json
[
  {
    "text": "You can't get a great chocolate flavor with carob.",
    "spans": [
      {
        "text": "chocolate",
        "is_entity": false,
        "label": "==NONE==",
        "reason": "is a flavor in this context, not an ingredient"
      },
      {
        "text": "carob",
        "is_entity": true,
        "label": "INGREDIENT",
        "reason": "is an ingredient to add chocolate flavor"
      }
    ]
  },
  {
    "text": "You can probably sand-blast it if it's an anodized aluminum pan",
    "spans": [
      {
        "text": "sand-blast",
        "is_entity": false,
        "label": "==NONE==",
        "reason": "is a cleaning technique, not some kind of equipment"
      },
      {
        "text": "anodized aluminum pan",
        "is_entity": true,
        "label": "EQUIPMENT",
        "reason": "is a piece of cooking equipment, anodized is included since it describes the type of pan"
      }
    ]
  }
]

Let’s go over some of the differences of this setup compared to the previous configuration.

The spacy.NER.v3 configuration file also comes with a task description. This allows you to mention what you are and what you aren’t interested in detecting, which contributes more context to the prompt for the LLM.
The examples.json file is more expressive than before. Each example allows you to pass a reason with every label and you’re also able to give negative examples to indicate when something isn’t an entity.

That last change especially contributes a lot of context for the LLM and it’s also reflected in the generated prompt.

Show the generated prompt

You are an expert Named Entity Recognition (NER) system.
Your task is to accept Text as input and extract named entities.
Entities must have one of the following labels: DISH, EQUIPMENT, INGREDIENT.
If a span is not an entity label it: `==NONE==`.


Entities are the names food dishes,
ingredients, and any kind of cooking equipment.
Adjectives, verbs, adverbs are not entities.
Pronouns are not entities.
Below are definitions of each label to help aid you in what kinds of named entities to extract for each label.
Assume these definitions are written by an expert and follow them closely.
DISH: Known food dishes, e.g. Lobster Ravioli, garlic bread
INGREDIENT: Individual parts of a food dish, including herbs and spices.
EQUIPMENT: Any kind of cooking equipment. e.g. oven, cooking pot, grill

Q: Given the paragraph below, identify a list of entities, and for each entry explain why it is or is not an entity:

Paragraph: You can't get a great chocolate flavor with carob.
Answer:
1. chocolate | False | ==NONE== | is a flavor in this context, not an ingredient
2. carob | True | INGREDIENT | is an ingredient to add chocolate flavor

Paragraph: You can probably sand-blast it if it's an anodized aluminum pan
Answer:
1. sand-blast | False | ==NONE== | is a cleaning technique, not some kind of equipment
2. anodized aluminum pan | True | EQUIPMENT | is a piece of cooking equipment, anodized is included since it describes the type of pan

Paragraph: I know of a great pizza recipe with anchovis.
Answer:

Show the generated response

1. pizza | True | DISH | is a known food dish
2. anchovis | True | INGREDIENT | is an ingredient used in the pizza recipe

Notice how the response now contains a table-like format in the list? The task can deal with this because it comes with a parser than can handle this input, but it serves as a nice example of how these LLMs can really generate different outputs that might suit a task better.

In general, the most recent version of a task should also be the version to try first. But it can be helpful to remember that longer prompts might be more expensive too. However, if costs become a burden, it can be good to downgrade to a task that generates shorter prompts.

Using spacy-llm recipes for spans

If you’re interested in annotating spans that can overlap, you can use the spans.llm.correct and spans.llm.fetch recipes. These recipes are very similar to their ner.llm.correct and ner.llm.fetch counterparts, but their configuration allows span overlap and can be used to train models for span categorisation. In our example, that might mean that you’d also be interested in detecting an ingredient if it was part of the name of a dish.

The main difference, compared to the previous configuration file, is that you’d use the spacy.SpanCat task instead of spacy.NER.

Here’s what a revised configuration file might look like.

Basic spacy-llm config for spans
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"
save_io = true

[components.llm.task]
@llm_tasks = "spacy.SpanCat.v2"
labels = ["DISH", "INGREDIENT", "EQUIPMENT"]

[components.llm.task.label_definitions]
DISH = "Extract the name of a known dish."
INGREDIENT = "Extract the name of a cooking ingredient, including herbs and spices."
EQUIPMENT = "Extract any mention of cooking equipment. e.g. oven, cooking pot, grill"

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "span_examples.yaml"

[components.llm.cache]
@llm_misc = "spacy.BatchCache.v1"
path = "local-cached"
batch_size = 3
max_batches_in_mem = 10

This file refers to a span_examples.yaml file, which might look like this:

span_examples.yaml
- text: 'Mac and Cheese is a popular American pasta variant.'
  entities:
    INGREDIENT: ['Cheese']
    DISH: ['Mac and Cheese']

This configuration will generate a slightly different prompt mainly to make it clear that the spans can overlap.

What does the prompt look like now?

Example prompt for spans
You are an expert Named Entity Recognition (NER) system. Your task is to accept
Text as input and extract named entities for the set of predefined entity
labels.

The entities you extract for each label can overlap with each other.

From the Text input provided, extract named entities for each label in
the following format:

DISH: <comma delimited list of strings>
EQUIPMENT: <comma delimited list of strings>
INGREDIENT: <comma delimited list of strings>

Below are definitions of each label to help aid you in what kinds of named entities
to extract for each label. Assume these definitions are written by an expert and
follow them closely.

DISH: Extract the name of a known dish.
INGREDIENT: Extract the name of a cooking ingredient, including herbs and spices.
EQUIPMENT: Extract any mention of cooking equipment. e.g. oven, cooking pot, grill.


Below are some examples (only use these as a guide):

Text:
'''
Mac and Cheese is a popular American pasta
variant.
'''

INGREDIENT: Cheese
DISH: Mac and Cheese


Here is the text that needs labeling:

Text:
'''
Spaghetti Bolognaise is a dish.
'''

What does the response look like now?

Example response
INGREDIENT: Spaghetti
DISH: Spaghetti Bolognaise

The overlapping nature of these spans is also reflected in the annotation interface when you use the spans.llm.correct recipe.

Example

dotenvrun--prodigyspans.llm.correctannotated-recipesconfig.cfgexamples.jsonl

Note how the provided annotations are now nested.

This live demo requires JavaScript to be enabled.

Just like before, you may also choose to fetch these examples upfront using the spans.llm.fetch recipe. This is the *.fetch variant of the original *.correct recipe.

Example

dotenvrun--prodigyspans.llm.fetchconfig.cfgexamples.jsonlspancat-annotated.jsonl100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

More performance with spacy.Spans.v3

Just like with named entities, you can add chain of thought reasoning for spans too via the spacy.SpanCat.v3 task. Like before, the setup is largely the same but you’ll be required to add examples to the config.cfg file if you plan on using the v3 task.

What might the new config file look like?

[paths]
examples = null

[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "spacy.SpanCat.v3"
labels = ["DISH", "INGREDIENT", "EQUIPMENT"]
description = Entities are the names food dishes,
    ingredients, and any kind of cooking equipment.
    Adjectives, verbs, adverbs are not entities.
    Pronouns are not entities.

[components.llm.task.label_definitions]
DISH = "Known food dishes, e.g. Lobster Ravioli, garlic bread"
INGREDIENT = "Individual parts of a food dish, including herbs and spices."
EQUIPMENT = "Any kind of cooking equipment. e.g. oven, cooking pot, grill"

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "llm-examples.json"

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"

What might the new examples file look like?

[
  {
    "text": "Spaghetti Bolognaise is a great dish.",
    "spans": [
      {
        "text": "Spaghetti",
        "is_entity": true,
        "label": "INGREDIENT",
        "reason": "It is part of the dish name, but it indicates a key ingredient."
      },
      {
        "text": "Spaghetti Bolognaise",
        "is_entity": true,
        "label": "DISH",
        "reason": "It is the name of a popular pasta dish."
      }
    ]
  }
]

What might the prompt look like now?

You are an expert Entity Recognition system.
Your task is to accept Text as input and extract named entities.
The entities you extract can overlap with each other.

Entities must have one of the following labels: DISH, EQUIPMENT, INGREDIENT.
If a span is not an entity label it: `==NONE==`.


Entities are the names food dishes,
ingredients, and any kind of cooking equipment.
Adjectives, verbs, adverbs are not entities.
Pronouns are not entities.
Below are definitions of each label to help aid you in what kinds of named entities to extract for each label.
Assume these definitions are written by an expert and follow them closely.
DISH: Known food dishes, e.g. Lobster Ravioli, garlic bread
INGREDIENT: Individual parts of a food dish, including herbs and spices.
EQUIPMENT: Any kind of cooking equipment. e.g. oven, cooking pot, grill


Q: Given the paragraph below, identify a list of entities, and for each entry explain why it is or is not an entity:

Paragraph: Spaghetti Bolognaise is a great dish.
Answer:
1. Spaghetti | True | INGREDIENT | It is part of the dish name, but it indicates a key ingredient.
2. Spaghetti Bolognaise | True | DISH | It is the name of a popular pasta dish.

Paragraph: Spaghetti Bolognaise is a great dish.
Answer:

What might the response look like now?

1. Spaghetti | True | INGREDIENT | It is part of the dish name, but it indicates a key ingredient.
2. Spaghetti Bolognaise | True | DISH | It is the name of a popular pasta dish.
3. Bolognaise | False | NONE | While it is part of the dish name, it refers to the specific type of sauce used in the dish and not a standalone entity.
4. great dish | False | NONE | These are adjectives describing the dish and not entities.
5. Spaghetti Bolognaise | True | DISH | It is the name of a popular pasta dish.

You might notice how some candidates appear, but that they are listed as False. These are suggests that the Prodigy interface will not render.

Using spacy-llm recipes for Textcat.

The textcat.llm.correct and textcat.llm.fetch recipes are similar to their NER counterparts but can perform annotation for text categorisation tasks. That means that you could take a spacy-llm configuration file like below:

spacy-llm-config.cfg for text categorisation
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"
save_io = true

[components.llm.task]
@llm_tasks = "spacy.TextCat.v3"
labels = ["RECIPE", "FEEDBACK", "QUESTION"]
exclusive_classes = false

[components.llm.task.label_definitions]
RECIPE = "Cooking instructions for a dish."
FEEDBACK = "Comments that might inform the author."
QUESTION = "A question is being asked to the author."

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

[components.llm.cache]
@llm_misc = "spacy.BatchCache.v1"
path = "local-cached"
batch_size = 3
max_batches_in_mem = 10

This configuration file uses the spacy.TextCat.v3 task, which comes with different parameters than its NER counterpart. Specifically, you’ll notice that we’ve set exclusive_classes to false. For text classification we need to specify if the labels are exclusive (meaning they cannot overlap) or if they can be modelled as a set of binary classes. This is also reflected in the prompt that is generated.

Generated textcat prompt
You are an expert Text Classification system. Your task is to accept Text as input
and provide a category for the text based on the predefined labels.

Classify the text below to any of the following labels: RECIPE, FEEDBACK, QUESTION

The task is non-exclusive, so you can provide more than one label as long as
they're comma-delimited. For example: Label1, Label2, Label3.
Do not put any other text in your answer, only one or more of the provided labels with nothing before or after.
If the text cannot be classified into any of the provided labels, answer `==NONE==`.

Below are definitions of each label to help aid you in correctly classifying the text.
Assume these definitions are written by an expert and follow them closely.

RECIPE: Cooking instructions for a dish.
FEEDBACK: Comments that might inform the author.
QUESTION: A question is being asked to the author.


Here is the text that needs classification


Text:
'''
Cream cheese is really good in mashed potatoes.
'''

To use this configuration file directly you may use the textcat.llm.correct recipe to curate the annotations given by the large language model.

Example

dotenvrun--prodigytextcat.llm.correctannotated-recipesspacy-llm-config.cfgexamples.jsonl

The resulting interface in Prodigy

This live demo requires JavaScript to be enabled.

Alternatively you may also choose to fetch them upfront, so that the annotations can be used later in textcat.manual.

Example

dotenvrun--prodigytextcat.llm.fetchspacy-llm-config.cfgexamples.jsonltextcat-annotated.jsonl100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

LLMs for terminology lists New: 1.13.2

The terms.llm.fetch recipe can generate terms and phrases obtained from a large language model. These terms and phrases can then be curated and turned into patterns files, which can help with downstream annotation tasks.

To get started, you’ll need to configure a configuration file for spaCy LLM.

Example spacy-llm config for terms
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "prodigy.Terms.v1"
batch_size = 50

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

This configuration file describes the task as well as the backend to use. Because we’re interested in generating a lot diverse terms, it may be helpful to set the “temperature” setting higher than you might be used to. This will cause the LLM to give varied responses on each request.

From here you can use the terms recipe by describing the topic that you’d like to generate terms for. The example below demonstrates how to generate “skateboard tricks”.

Example

prodigyterms.llm.fetchskateboard-trick-termsconfig.cfg"skateboard tricks"100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

This will generate a list of skateboard tricks that are stored in the skateboard-trick-terms dataset.

The examples that have been generated may look something like this:

{"text":"pop shove it","meta":{"topic":"skateboard tricks"}}
{"text":"switch flip","meta":{"topic":"skateboard tricks"}}
{"text":"nose slides","meta":{"topic":"skateboard tricks"}}
{"text":"lazerflip","meta":{"topic":"skateboard tricks"}}
{"text":"lipslide","meta":{"topic":"skateboard tricks"}}

Given such a file you’re able to review the generated terms via the textcat.manual recipe.

Example

prodigytextcat.manualskateboard-tricksskateboard-tricks.jsonl--label tricks

This live demo requires JavaScript to be enabled.

From this interface you can manually accept or reject each example. Then, when you’re done annotating, you can export the annotated text into a patterns file via the terms.to-patterns recipe.

Example

prodigyterms.to-patternsskateboard-tricks./skateboard-patterns.jsonl--label skateboard-trick--spacy-model blank:en✨ Exported 129 patterns
./skateboard-patterns.jsonl

This will generate a file with patterns, like those shown below.

{"label":"skateboard-trick","pattern":[{"lower":"pop"},{"lower":"shove"},{"lower":"it"}]}
{"label":"skateboard-trick","pattern":[{"lower":"switch"},{"lower":"flip"}]}
{"label":"skateboard-trick","pattern":[{"lower":"nose"},{"lower":"slides"}]}
{"label":"skateboard-trick","pattern":[{"lower":"lazerflip"}]}
{"label":"skateboard-trick","pattern":[{"lower":"lipslide"}]}

From here, the skateboard-patterns.jsonl file can be used in recipes, like ner.manual, to make the annotation task easier.

Prompt engineering via tournaments New: 1.13.2

Sometimes you’d like to compare and benchmark prompts for a specific task. You could facilitate this with an A/B test, but if you have a large pool of prompts you may prefer to use a tournament to figure out the best performing candidate. This is especially helpful when you’re not just comparing prompts, but also different LLM backends.

This is where the ab.llm.tournament recipe might help. It uses the Glicko rating system internally to determine the duels as well as the best performing prompt/LLM combination.

As an example, let’s assume that we want to write humorous haikus about a given topic. Then you could create two Jinja2 templates that can each accept a topic, yet construct a different prompt.

prompts/prompt1.jinja2
Write a haiku about {{topic}} that rhymes.

prompts/prompt2.jinja2
Write a hilarious haiku about {{topic}} that rhymes.

These prompts both require a topic to be injected, which you can provide via a .jsonl file.

inputs.jsonl
{"topic": "Python"}
{"topic": "star wars"}
{"topic": "maths"}

Next, the recipe will also require spacy-llm configuration files, but you can also prepare a folder of these files if you want to compare more than one LLM backend. As an example, let’s configure one file to use OpenAI and another to use Cohere.

configs/gpt3-5.cfg
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "prodigy.TextPrompter.v1"

[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.3}

configs/cohere.cfg
[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "prodigy.TextPrompter.v1"

[components.llm.model]
@llm_models = "spacy.Command.v1"
name = "command"
config = {"temperature": 0.1}

Finally, as a nice touch, the recipe can also render some extra information to help give the user some context. This is also handled by a jinja2 template.

display-template.jinja2
Select the best haiku about {{topic}}.

You can now build a tournament to try all the different combinations of prompts and backends by calling the ab.llm.tournament recipe as follows:

Example

prodigyab.llm.tournamenthaiku-tournamentinputs.jsonl./prompts./configsdisplay-template.jinja2--resume

From here, the recipe will keep generating candidates and will present you with an interface like below.

This live demo requires JavaScript to be enabled.

As you annotate and choose between the candidates you’ll also get a summary printed on the terminal.

Output after annotating a few examples

============== Current winner: [prompt1.jinja2 + gpt3-5.cfg] ==============comparison                                                      prob   trials
[prompt1.jinja2 + gpt3-5.cfg] > [prompt1.jinja2 + cohere.cfg]   0.50        0
[prompt1.jinja2 + gpt3-5.cfg] > [prompt2.jinja2 + cohere.cfg]   0.50        0
[prompt1.jinja2 + gpt3-5.cfg] > [prompt2.jinja2 + gpt3-5.cfg]   0.71        1

Initially this table will show low trial counts as well as small probability values. As you annotate more and more however, these numbers will converge and the tournament will pick the winning candidates more often.

Output after annotating more examples, ratings will converge

============== Current winner: [prompt1.jinja2 + gpt3-5.cfg] ==============comparison                                                      prob   trials
[prompt1.jinja2 + gpt3-5.cfg] > [prompt1.jinja2 + cohere.cfg]   0.55       23
[prompt1.jinja2 + gpt3-5.cfg] > [prompt2.jinja2 + cohere.cfg]   0.82       18
[prompt1.jinja2 + gpt3-5.cfg] > [prompt2.jinja2 + gpt3-5.cfg]   0.91       12

Getting Started with OpenAI and Prodigy New: 1.12

If you want to get started with OpenAI and Prodigy you’ll want to set everything up so that you can work swiftly but also securely.

Account Setup: You will need to set up an account for OpenAI, which you can do here. You can choose to use a free account as you’re testing the service but you can consider paying as you go as well. Their pricing page gives lots of details. Be aware that new accounts have some extra rate-limit restrictions which are described in detail here.
Keys and a .env file: Once your account is set up it’s time to set up API keys, which you can do here.

The Prodigy recipes will assume that your keys are stored in a local .env file in your current working directory. It needs to contain a PRODIGY_OPENAI_KEY, which you’ve just created and a PRODIGY_OPENAI_ORG which you can find here. This is what the .env file would look like:
```
.env
PRODIGY_OPENAI_ORG = "org-..."
PRODIGY_OPENAI_KEY = "sk-..."
```
You should make sure that this dotenv file is added to a .gitignore such that it never gets uploaded. If somebody were to gain access to this key they might incur costs on your behalf with it.
Keep an eye on costs: the OpenAI service will incur costs on every request that you make. In order to prevent a large unexpected bill, we recommend setting a spending cap on the API. This can make sure you never spend more than a predefined amount per month. You can configure this by going to the usage limits section of the account page on OpenAI.
(Optional) Customise a recipe: The recipes provided by Prodigy are designed to be generally useful, but there can be good reasons to go beyond the zero-shot defaults. The NER and textcat recipes allow you to contribute examples to the prompt which might improve the output of OpenAI. These recipes also allow you to write custom prompts for OpenAI, which can be relevant if you’re interested in generating responses for non-English languages.

Usage

This section explains how each OpenAI recipe works in detail by describing the prompt that is sent to OpenAI as well as the response that is returned.

OpenAI for NER New: 1.12

The ner.openai.correct and ner.openai.fetch recipes leverage OpenAI to pre-annotate named entities in text that you provide. As a motivating example, let’s assume that we have a set of example texts that contain comments on a food recipe blog. It might have an example that looks like this:

examples.jsonl
{"text": "Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok."}

Given an examples file, you can use ner.openai.correct to help with annotating.

Example

prodigyner.openai.correctrecipe-nerexamples.jsonl--label dish,ingredient,equipment

Internally, this recipe will take the provided labels ("dish", "ingredient" and "equipment") together with the provided text in each example to generate a prompt for OpenAI. Here’s what such a prompt would look like:

NER prompt sent to OpenAI
From the text below, extract the following entities in the following format:
dish: <comma delimited list of strings>
ingredient: <comma delimited list of strings>
equipment: <comma delimited list of strings>

Text:
"""
Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok.
"""

This prompt is sent to OpenAI, which will return with a response. This response is not deterministic, but it might look something like this:

NER response from OpenAI
dish: hoisin stir fry
ingredient: Sriracha sauce
equipment: wok

If you use the ner.openai.correct recipe then you’ll be able to see this prompt and response from inside Prodigy.

This live demo requires JavaScript to be enabled.

Because this example is annotated correctly, you can simply accept the annotation without having to use your mouse to annotate the entities. This can save a tremendous amount of time, but it should be stressed that the OpenAI annotations can be wrong. You still want to be in the loop to curate the annotations.

Alternatively, you can also use ner.openai.fetch to download these annotations on disk such that you can later review them with the ner.manual recipe. This interface won’t give you the prompt information, but does allow you to call OpenAI once even if you want to show the data to multiple annotators.

Example

prodigyner.openai.fetchexamples.jsonlner-annotated.jsonl”dish,ingredient,equipment”100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

OpenAI for Text Classification New: 1.12

The textcat.openai.correct and textcat.openai.fetch recipes leverage OpenAI to attach classification labels in text that you provide. As a motivating example, let’s assume that we have a set of example texts that contain comments on a food recipe blog. It might have an example that looks like this:

examples.jsonl
{"text": "Cream cheese is really good in mashed potatoes."}

Given an examples file, you can use textcat.openai.correct to help with annotating labels.

Example

prodigytextcat.openai.correctrecipe-comments-textcatexamples.jsonl--label recipe,feedback,question

Internally, this recipe will take the provided labels ("recipe", "feedback" and "question") together with the provided text in each example to generate a prompt for OpenAI. Here’s what such a prompt would look like:

Textcat prompt sent to OpenAI
Classify the text below to any of the following labels: recipe, feedback, question

The task is non-exclusive, so you can provide more than one label as long as they are comma-delimited.
For example: Label1, Label2, Label3.

Your answer should only be in the following format:

answer:
reason:

Here is the text that needs classification

Text:
"""
Cream cheese is really good in mashed potatoes.
"""

This prompt is sent to OpenAI, which will return with a response. This response is not deterministic, but it might look something like this:

Textcat response from OpenAI
Answer: Feedback
Reason: The text does not provide instructions on how to make something, nor does it ask a question. Instead, it
provides an opinion on the use of cream cheese in mashed potatoes.

If you use the textcat.openai.correct recipe then you’ll be able to see this prompt and response from inside Prodigy. Note that you can see the parsed reason from OpenAI as meta information in the lower right hand corner of the card.

This live demo requires JavaScript to be enabled.

Alternatively, you can also use textcat.openai.fetch to download these annotations on disk such that you can later review them with the textcat.manual recipe. This interface won’t give you the prompt information, but does allow you to call OpenAI once even if you want to show the data to multiple annotators.

Example

prodigytextcat.openai.fetchexamples.jsonltextcat-annotated.jsonl--label recipe,feedback,question100%|████████████████████████████| 50/50 [00:12<00:00, 3.88it/s]

OpenAI for Terminology Lists New: 1.12

The terms.openai.fetch recipe can generate terms and phrases obtained from OpenAI. These terms can then be curated and turned into patterns files, which can help with downstream annotation tasks. To get started, you need to make a query to send to OpenAI. The example below demonstrates how to generate at least 100 examples of “skateboard tricks”.

Example

prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl--n 100

Alternatively, you may also want to steer the response from OpenAI by providing some examples. You can add such seed terms via the --seeds option.

Example with seeds

prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl--n 100--seeds ollie,kickflip

If you would like to generate more examples to add to the generated file, you can re-run the same command with the --resume flag. This will re-use the existing examples as seeds for the prompt to OpenAI.

Example that resumes an existing file

prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl--n 50--resume

After generating the examples, you’ll have a skateboard-tricks.jsonl file that has contents that might look like this:

{"text":"pop shove it","meta":{"openai_query":"skateboard tricks"}}
{"text":"switch flip","meta":{"openai_query":"skateboard tricks"}}
{"text":"nose slides","meta":{"openai_query":"skateboard tricks"}}
{"text":"lazerflip","meta":{"openai_query":"skateboard tricks"}}
{"text":"lipslide","meta":{"openai_query":"skateboard tricks"}}

Given such a file you’re able to review the generated terms via the textcat.manual recipe.

Example

prodigytextcat.manualskateboard-tricksskateboard-tricks.jsonl--label tricks

This live demo requires JavaScript to be enabled.

From this interface you can manually accept or reject each example. Then, when you’re done annotating, you can export the annotated text into a patterns file via the terms.to-patterns recipe.

Example

prodigyterms.to-patternsskateboard-tricks./skateboard-patterns.jsonl--label skateboard-trick--spacy-model blank:en✨ Exported 129 patterns
./skateboard-patterns.jsonl

This will generate a file with patterns, like those shown below.

{"label":"skateboard-trick","pattern":[{"lower":"pop"},{"lower":"shove"},{"lower":"it"}]}
{"label":"skateboard-trick","pattern":[{"lower":"switch"},{"lower":"flip"}]}
{"label":"skateboard-trick","pattern":[{"lower":"nose"},{"lower":"slides"}]}
{"label":"skateboard-trick","pattern":[{"lower":"lazerflip"}]}
{"label":"skateboard-trick","pattern":[{"lower":"lipslide"}]}

From here, the skateboard-patterns.jsonl file can be used in recipes, like ner.manual, to make the annotation task easier.

A/B testing Custom Prompts New: 1.12

The ab.openai.prompts recipe allows you to quickly compare the quality of outputs from two OpenAI prompts in a quantifiable and blind way.

As an example, let’s assume that we want to write humerous haikus about a given topic. Then you could create two jinja templates that can each accept a topic, yet construct a different prompt.

prompt1.jinja2
Write a haiku about {{topic}}.

prompt2.jinja2
Write a hilarious haiku about {{topic}}.

Next, you’ll want to have an input file in the appropriate format to feed these templates. The ab.openai.prompts recipe assumes data to be in the following format.

topics.jsonl
{"id": 0, "prompt_args": {"topic": "Python"}}
{"id": 0, "prompt_args": {"topic": "star wars"}}
{"id": 0, "prompt_args": {"topic": "maths"}}

Finally, it helps to also have a template that can add context to the annotation interface for the user. You can define another jinja template for that.

display-template.jinja2
Select the best haiku about {{topic}}.

When you put all of these templates together you can start annotating. The command below starts the annotation interface and also uses the --repeat 4 option. This will ensure that each topic will be used to generate a prompt at least 4 times.

Example

prodigyab.openai.promptshaikutopics.jsonldisplay-template.jinja2prompt1.jinja2prompt2.jinja2--repeat 4

This will generate an annotation interface like below.

This live demo requires JavaScript to be enabled.

When you’re done annotating, the terminal will also summarise the results for you.

Output after annotating

========================== ✨  Evaluation results ==========================✔ You preferred prompt1.jinja2
prompt1.jinja2   11
prompt2.jinja2    5

Tournaments New: 1.12

Instead of comparing two examples with each other, you can also create a tournament to compare any number of prompts via the ab.openai.tournament recipe. This recipe works just like the ab.openai.prompts recipe but it allows you to pass a folder of prompts to create a tournament. The recipe will then internally use the Glicko ranking system to keep track of the best performing candidate and will select duels between prompts accordingly.

Just like in the ab.openai.prompts recipe you’d need prompts in .jinja2 files, but now they can reside in a folder.

prompt_folder/prompt1.jinja2
Write a haiku about {{topic}}.

prompt_folder/prompt2.jinja2
Write a hilarious haiku about {{topic}}.

prompt_folder/prompt2.jinja2
Write a super funny haiku about {{topic}}.

You will also need to have a .jsonl file that contains the required prompt arguments.

topics.jsonl
{"id": 0, "prompt_args": {"topic": "Python"}}
{"id": 0, "prompt_args": {"topic": "star wars"}}
{"id": 0, "prompt_args": {"topic": "maths"}}

Finally, it also helps to have a template that can add context to the annotation interface for the user. You can define another jinja template for that.

display-template.jinja2
Select the best haiku about {{topic}}.

When you put all of these templates together you can start a tournament that will match prompts for comparison. The tournament recipe will use each annotation to update its internal belief of prompt performance to decide which prompts to match in the next round.

Example

prodigyab.openai.tournamenthaiku-tournamentinput.jsonltitle.jinja2prompt_folder

This will generate the same annotation interface like below.

This live demo requires JavaScript to be enabled.

As you annotate in the terminal you’ll also receive feedback on the prompt performance. Early on in the annotation process this feedback might look like this:

=================== Current winner: prompt3.jinja2 ===================

comparison                          value  count
P(prompt3.jinja2 > prompt1.jinja2)   0.60      7
P(prompt3.jinja2 > prompt2.jinja2)   0.83      3

But as you add more annotations the recipe will update its belief of ratings over all of the available prompts. So after a while, the feedback might look more like this:

=================== Current winner: prompt3.jinja2 ===================

comparison                           value  count
P(prompt3.jinja2 > prompt1.jinja2)   0.95      19
P(prompt3.jinja2 > prompt2.jinja2)   0.97       5

Few Shot Prompts for OpenAI recipes

OpenAI will make mistakes at times. However, you can attempt to steer the large language model in the right direction by providing some examples that it got wrong before. The named entitiy and text classification recipes provided by Prodigy can take such examples and make sure that they appear in the prompt in the right format.

The sections below explain how to do this and also how the mechanic works in more detail.

NER

As a motivating example, let’s assume that we have a set of example texts that contain comments on a food recipe blog. It might have an example that looks like this:

examples.jsonl
{"text": "Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok."}

Given such an examples file, you can use ner.openai.correct to help annotating named entities.

Example

prodigyner.openai.correctrecipe-nerexamples.jsonl--label dish,ingredient,equipment

This will internally generate a prompt for OpenAI.

Prompt sent to OpenAI
From the text below, extract the following entities in the following format:

dish: <comma delimited list of strings>
ingredient: <comma delimited list of strings>
equipment: <comma delimited list of strings>

Text:
"""
Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok.
"""

This prompt might be sufficient, but if you find examples where OpenAI makes a lot of mistakes then it might be good to save these so that you may add them to the prompt. To do this, you can create a ner-examples.yaml file, which might have the following format:

ner-examples.yaml
- text: "You can't get a great savory flavor with carob."
  entities:
    dish: []
    ingredient: ['carob']
    equipment: []

- text: "You can probably sand-blast it if it's an anodized aluminum pan."
  entities:
    dish: []
    ingredient: []
    equipment: ['anodized aluminum pan']

Next, you can add these examples to the recipe via:

Example

prodigyner.openai.correctrecipe-nerexamples.jsonl--label dish,ingredient,equipment--examples-path ./ner-examples.yaml

With these examples added, the prompt will update and contain the examples in the right format.

Prompt sent to OpenAI
From the text below, extract the following entities in the following format:

dish: <comma delimited list of strings>
ingredient: <comma delimited list of strings>
equipment: <comma delimited list of strings>

For example:

Text:
"""
You can't get a great chocolate flavor with carob.
"""
dish:
ingredient: carob
equipment:

Text:
"""
You can probably sand-blast it if it's an anodized aluminum pan.
"""
dish:
ingredient:
equipment: anodized aluminum pan

Here is the text that needs predictions.

Text:
"""
Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok.
"""

Maximum number of examples

Because the size of the prompt sent to OpenAI will contribute to more costs the recipes also provide a --max-examples argument that will limit the number of examples added to each prompt. It will randomly select examples on each request in that case.

Text Classification

As a motivating example, let’s assume that we have a set of example texts that contain comments on a food recipe blog. It might have an example that looks like this:

examples.jonl
{"text": "Cream cheese is really good in mashed potatoes."}

Given such an examples file, you can use textcat.openai.correct to help annotate classification labels.

Example

prodigytextcat.openai.correctrecipe-comments-textcatcomments.jsonl--label recipe,feedback,question

This command will internally generate the following prompt for OpenAI.

Generated OpenAI prompt
Classify the text below to any of the following labels: recipe, feedback, question

The task is non-exclusive, so you can provide more than one label as long as they are comma-delimited.
For example: Label1, Label2, Label3.

Your answer should only be in the following format:

answer:
reason:

Here is the text that needs classification

Text:
"""
Cream cheese is really good in mashed potatoes.
"""

You can add examples to this prompt but you need to be mindful of the type of classification task. Because the current task is a multilabel classification task we need to provide examples in the following format.

textcat-multilabel.yaml
- text: 'Can someone try this recipe?'
  answer: 'question'
  reason: 'It is a question about trying a recipe.'
- text:
    '1 cup of rice then egg and then mix them well. Should I add garlic last?'
  answer: 'question,recipe'
  reason: 'It is a question about the steps in making a fried rice.'

You can pass this file along via the --examples-path flag.

Example

prodigytextcat.openai.correctrecipe-comments-textcatcomments.jsonl--label recipe,feedback,question--examples-path ./textcat-multilabel.yaml

With these arguments, we will send a larger prompt to OpenAI.

Generated OpenAI prompt with multi-label examples
Classify the text below to any of the following labels: recipe, feedback, question

The task is non-exclusive, so you can provide more than one label as long as they are comma-delimited.
For example: Label1, Label2, Label3.

Your answer should only be in the following format:

answer:
reason:

Below are some examples (only use these as a guide):

Text:
"""
Can someone try this recipe?
"""

answer: question
reason: It is a question about trying a recipe.

Text:
"""
1 cup of rice then egg and then mix them well. Should I add garlic last?
"""

answer: question,recipe
reason: It is a question about the steps in making a fried rice.

Here is the text that needs classification

Text:
"""
Cream cheese is really good in mashed potatoes.
"""

Binary Labels

The previous example mentioned a multi-label example. However, it is also possible that you’re working on a strictly binary classification task. In that situation the examples need to be slightly different because OpenAI needs to strictly “accept” or “reject” an example.

To help explain this, let’s have a look at a default call for a binary task.

Example

prodigytextcat.openai.correctrecipe-comments-textcatcomments.jsonl--label recipe

This will generate the following prompt.

Generated OpenAI prompt
From the text below, determine whether or not it contains a recipe. If it is
a recipe, answer "accept." If it is not a recipe, answer "reject."


Your answer should only be in the following format:

answer: <string>
reason: <string>

Text:
"""
Cream cheese is really good in mashed potatoes.
"""

Like before, you’ll want to create a file that contains the examples that you want to add to the prompt.

textcat-binary.yaml
- text: 'This is a recipe for scrambled egg: 2 eggs, 1 butter, batter them, and then fry in a hot pan for 2 minutes'
  answer: 'accept'
  reason: 'This is a recipe for making a scrambled egg'
- text: 'This is a recipe for fried rice: 1 cup of day old rice, 1 butter, 2 cloves of garlic: put them all in a wok and stir them together.'
  answer: 'accept'
  reason: 'This is a recipe for making a fried rice'
- text: "I tried it and it's not good"
  answer: 'reject'
  reason: "It doesn't talk about a recipe."

And when we now use it via this prompt:

Example

prodigytextcat.openai.correctrecipe-comments-textcatcomments.jsonl--label recipe--examples-path ./textcat-binary.yaml

The prompt will update and look like this:

Generated OpenAI prompt with binary examples
From the text below, determine whether or not it contains a recipe. If it is
a recipe, answer "accept." If it is not a recipe, answer "reject."


Your answer should only be in the following format:

answer: <string>
reason: <string>


Below are some examples (only use these as a guide):


Text:
"""
This is a recipe for scrambled egg: 2 eggs, 1 butter, batter them, and then fry in a hot pan for 2 minutes
"""

answer: accept
reason: This is a recipe for making a scrambled egg

Text:
"""
This is a recipe for fried rice: 1 cup of day old rice, 1 butter, 2 cloves of garlic: put them all in a wok and stir them together.
"""

answer: accept
reason: This is a recipe for making a fried rice

Text:
"""
I tried it and it's not good
"""

answer: reject
reason: It doesn't talk about a recipe.


Here is the text that needs classification

Text:
"""
Cream cheese is really good in mashed potatoes.
"""

Maximum number of examples

Custom prompts for OpenAI

The recipes that Prodigy provides have been designed to work for an English use case. However, you might be interested in using these recipes for another language. If that’s the case you could try to design your own prompts to suit your specific use-case.

Prompts for NER

In order to customise the NER prompt, it helps to understand the Jinja template that Prodigy uses internally.

ner.jinja2
From the text below, extract the following entities in the following format:
{id="" whitespace #}
{%- for label in labels -%}
{{label}}: <comma delimited list of strings>
{id="" whitespace #}
{%- endfor -%}
{id="" whitespace #}
{id="" whitespace #}
{%- if examples -%}
{id="" whitespace #}
For example:
{id="" whitespace #}
{id="" whitespace #}
{%- for example in examples -%}
Text:
"""
{{ example.text }}
"""
{id="" whitespace #}
{%- for label, substrings in example.entities.items() -%}
{{ label }}: {{ ", ".join(substrings) }}
{id="" whitespace #}
{%- endfor -%}
{id="" whitespace #}
{% endfor -%}
{%- endif -%}
{id="" whitespace #}
This is the example that needs prediction.
{id="" whitespace #}
{id="" whitespace #}
Text:
"""
{{text}}
"""

This template is able to take an input text, labels and examples to turn it into a prompt for OpenAI. You can run the following Python code yourself to get a feeling for it.

import jinja2
import pathlib

template_text = pathlib.Path("ner.jinja2")
template = jinja2.Template(template_text).read_text()
template.render(
   text="Steve Jobs founded Apple in 1976.",
   labels=["name", "organisation"]
)

This example would generate the following prompt.

OpenAI Prompt
From the text below, extract the following entities in the following format:
name: <comma delimited list of strings>
organisation: <comma delimited list of strings>


This is the example that needs prediction.


Text:
"""
Steve Jobs founded Apple in 1976.
"""

You’re free to take the original prompt and to make changes to it. You can add details relevant to your domain or you can change the language of the prompt. Just make sure that you don’t change the names of the rendered variables and that you don’t change the structure of the prompt. The result from OpenAI will still need to get parsed and if OpenAI sends a response in an unexpected format the recipe won’t be able to render it.

In the case of NER the template assumes that labels is a list of strings referring to label names and text refers to the text of the current annotation example. The example variable refers to an example that is passed to steer the prompt. The NER examples gives more details on the expected format for that.

Once you have a custom template ready, you can refer to it from the command line.

Example with custom prompt

prodigyner.openai.correctrecipe-nerexamples.jsonl--label dish,ingredient,equipment--prompt-path ./custom-ner.jinja2

Prompts for Textcat

Customising the prompt for text classification works in a similar fashion as the named entity variant but with a different template.

textcat.jinja2
{% if labels|length == 1 %}
{% set label = labels[0] %}
From the text below, determine whether or not it contains a {{ label }}. If it is
a {{ label }}, answer "accept." If it is not a {{ label }}, answer "reject."
{% else %}
Classify the text below to any of the following labels: {{ labels|join(", ") }}
{% if exclusive_classes %}
The task is exclusive, so only choose one label from what I provided.
{% else %}
The task is non-exclusive, so you can provide more than one label as long as
they're comma-delimited. For example: Label1, Label2, Label3.
{% endif %}
{% endif %}
{id="" whitespace #}
Your answer should only be in the following format:
{id="" whitespace #}
answer: <string>
reason: <string>
{id="" whitespace #}
{% if examples %}
Below are some examples (only use these as a guide):
{id="" whitespace #}
{id="" whitespace #}
{% for example in examples %}
Text:
"""
{{ example.text }}
"""
{id="" whitespace #}
answer: {{ example.answer }}
reason: {{ example.reason }}
{% endfor %}
{% endif %}
{id="" whitespace #}
Here is the text that needs classification
{id="" whitespace #}
Text:
"""
{{text}}
"""

This template is more elaborate than the named entity one because this template makes a distinction between binary classification, multilabel classification and exclusive classification. Like before, you’re free to make any changes you like as long as you don’t change the names of the rendered variables and that you don’t change the structure of the prompt. The result from OpenAI will still need to get parsed and if OpenAI sends a response in an unexpected format the recipe won’t be able to render it.

Once you’ve created a custom template you can refer to it from the command line.

Example with custom prompt

prodigytextcat.openai.correctrecipe-comments-textcatexamples.jsonl--label recipe,feedback,question--prompt-path ./custom-textcat.jinja2

Prompts for Terms

If you’re interested in generating terms in another language then you may not need to write a seperate template. Instead you may also use a query that describes the language that you are interested in, like below.

Example

prodigyterms.openai.fetch"Dutch insults"dutch-insults.jsonl--n 100

Alternatively, if you prefer more control, you can also choose to write a custom template by making changes to the following jinja template:

terms.jinja2
Generate me {{n}} examples of {{query}}.

Here are the examples:
{%- if examples -%}
{%- for example in examples -%}
{id="" whitespace #}
- {{example}}
{%- endfor -%}
{%- endif -%}
{id="" whitespace #}
-

The query variable is defined by the “query” argument from the command line and the n variable is defined via --n. The examples in this template refer to the --seeds that are passed along.

Once you’ve created a custom template you can refer to it from the command line.

Example

prodigyterms.openai.fetch"Dutch insults"dutch-insults.jsonl--n 100--prompt-path ./custom-terms.jinja2