Custom Recipes

A Prodigy recipe is a Python function that can be run via the command line. Prodigy comes with lots of useful recipes, and it's very easy to write your own. All you have to do is wrap the @prodigy.recipe decorator around your function, which should return a dictionary of components, specifying the stream of examples, and optionally the web view, database, progress calculator, and callbacks on update, load and exit.

prodigy my-custom-recipe my_dataset -F recipe.py✨ Starting the web server on port 8080...

Custom recipes let you integrate machine learning models using any framework of your choice, load in data from different sources, implement your own storage solution or add other hooks and features. No matter how complex your pipeline is – if you can call it from a Python function, you can use it in Prodigy.

Writing a custom recipe

A recipe is a simple Python function that returns a dictionary of its components. The arguments of the recipe function will become available from the command line and let you pass in parameters like the dataset ID, the text source and other settings. Recipes can receive a name and a variable number of argument annotations, following the Plac syntax.

recipe.py

import prodigy @prodigy.recipe('my-custom-recipe', dataset=('Dataset ID', 'positional', None, str), view_id=('Annotation interface', 'option', 'v', str)) def my_custom_recipe(dataset, view_id='ner'): # load your own streams from anywhere you want stream = load_my_custom_stream() def update(examples): # this function is triggered when Prodigy receives annotations print("Received {} annotations!".format(len(examples))) return { 'dataset': dataset, 'view_id': view_id, 'stream': stream, 'update': update }

Custom recipes can be used from the command line just like the built-in recipes. All you need to do is point the -F option to the Python file containing your recipe.

Command line usage

prodigy my-custom-recipe my_dataset --view-id text -F recipe.py

Files can contain multiple recipes, so you can group them however you like. Argument annotations, as well as the recipe function's docstring will also be displayed when you use the --help flag on the command line.

Available recipe components

The components returned by the recipe should at least include an iterable stream of annotation examples and a dataset ID (if you want to use the storage to save the annotations to the database). The following components can be defined by a recipe:

ComponentTypeDescription
datasetunicode ID of the current project. Used to associate the annotation with a project in the database.
view_idunicodeAnnotation interface to use.
streamiterableStream of annotation tasks in Prodigy's JSON format.
updatecallable Function invoked when Prodigy receives annotations. Can be used to update a model.
progresscallable Function that takes the count of annotated tasks in the session and in total as its arguments and returns a progress value.
dbunicode, bool, callable ID of database to use, True for default database, False for no database or a custom database class following Prodigy's database API.
on_loadcallable Function that is executed when Prodigy is started. Can be used to update a model with existing annotations.
on_exitcallable Function that is executed when the user exits Prodigy. Can be used to save a model's state to disk or export other data.
get_session_idcallable Function that returns a custom session ID. If not set, a timestamp is used.
excludelistList of dataset IDs whose annotations to exclude.
configdict Recipe-specific configuration. Can be overwritten by the global and project config.

Example: Customer feedback sentiment

Let's say you've extracted examples of customer feedback and support emails, and you want to classify those by sentiment in the categories "happy", "sad", "angry" or "neutral". The result could be used to get some insights into your customers' overall satisfaction, or to provide statistics for your yearly report. Maybe you also want to experiment with training a model to predict the tone of incoming customer emails, so you can make sure that unhappy customers or critical situations can receive priority support. Your data could look something like this:

feedback.jsonl

{"text": "Thanks for your great work – really made my day!"} {"text": "Worst experience ever, never ordering from here again."} {"text": "My order arrived last Tuesday."}

Prodigy comes with a built-in choice interface that lets you render a task with a number of multiple or single choice options. The result could look like this:

Thanks for your great work – really made my day!

Recipes should be as reusable as possible, so in your custom sentiment recipe, you likely want a command-line option that allows passing in the file path to the texts. You can write your own function to load in data from a file, or use one of Prodigy's built-in loaders like JSONL. To add the four options, you can write a simple helper function that takes a stream of tasks, and yields individual tasks with an added option key.

recipe.py

import prodigy from prodigy.components.loaders import JSONL @prodigy.recipe('sentiment', dataset=prodigy.recipe_args['dataset'], file_path=("Path to texts", "positional", None, str)) def sentiment(dataset, file_path): """Annotate the sentiment of texts using different mood options.""" stream = JSONL(file_path) # load in the JSONL file stream = add_options(stream) # add options to each task return { 'dataset': dataset, # save annotations in this dataset 'view_id': 'choice', # use the choice interface 'stream': stream, } def add_options(stream): """Helper function to add options to every task in a stream.""" options = [{'id': 'happy', 'text': '😀 happy'}, {'id': 'sad', 'text': '😢 sad'}, {'id': 'angry', 'text': '😠 angry'}, {'id': 'neutral', 'text': '😶 neutral'}] for task in stream: task['options'] = options yield task

Each option requires a unique id, which will be used internally. Options are simple dictionaries and can contain the same properties as regular annotation tasks – so you can also use an image, or add spans to highlight entities in the text.

Using the recipe above, you can now start the Prodigy server with a new dataset:

prodigy dataset customer_feedback "Annotating feedback sentiment"prodigy sentiment customer_feedback feedback.jsonl -F recipe.py✨ Starting the web server on port 8080...

All tasks you annotate will be stored in the database in Prodigy's JSON format. When you annotate an example, Prodigy will add an "answer" key, as well as an "accept" key, mapped to a list of the selected option IDs. Since this example is using a single-choice interface, this list will only ever contain one or zero IDs.

Annotated task

{ "text": "Thanks for your great work – really made my day!", "options": [ {"id": "happy", "text": "😀 happy"}, {"id": "sad", "text": "😢 sad"}, {"id": "angry", "text": "😠 angry"}, {"id": "neutral", "text": "😶 neutral"} ], "answer": "accept", "accept": [ "happy" ] }

Adding other hooks

Once you're done with annotating a bunch of examples, you might want to see a quick overview of the total sentiment counts in the dataset. The easiest way to do this is to add an on_exit component to your recipe – a function that is invoked when you stop the Prodigy server. It takes the Controller as an argument, giving you access to the database.

def on_exit(controller):
    """Get all annotations in the dataset, filter out the accepted tasks,
    count them by the selected options and print the counts."""
    examples = controller.db.get_dataset(controller.dataset)
    examples = [eg for eg in examples if eg['answer'] == 'accept']
    for option in ('happy', 'sad', 'angry', 'neutral'):
        count = get_count_by_option(examples, option)
        print('Annotated {} {} examples'.format(count, option))

def get_count_by_option(examples, option):
    filtered = [eg for eg in examples if option in eg['accept']]
    return len(filtered)

Instead of getting all annotations in the dataset, you can also choose to only get the annotations of the current session. Prodigy creates an additional dataset for each sesssion, usually named after the current timestamp. The ID is available as the session_id attribute of the controller:

examples = controller.db.get_dataset(controller.session_id)

To ingreate the new hook, simply add it to the dictionary of components returned by your recipe, e.g. 'on_exit': on_exit. When you exit the Prodigy server, you'll now see a count of the selected options:

prodigy sentiment customer_feedback feedback.jsonl -F recipe.py✨ Starting the web server on port 8080...Annotated 56 happy examplesAnnotated 10 sad examplesAnnotated 12 angry examplesAnnotated 36 neutral examples

Example: Wrapping built-in recipes

In some cases, you might only want to change or programmatically create one specific component of the recipe – for example, use the ner.teach recipe, but load in a custom stream of examples from your database. Because the @recipe decorator leaves the original recipe function intact, you can simply import an existing recipe and wrap it in a custom recipe. When you call recipes.ner.teach with its recipe arguments, it will return a dictionary of components, which you can then return by your custom recipe.

Let's assume you're using a database my_database, which lets you load in examples with a column 'plain_text'. Your custom recipe should have the same arguments as ner.teach, plus an additional option to specify the name of the database to connect to. After connecting to the database and creating a stream of dictionaries, you can pass the arguments into the teach() function, receive the recipe components and return those by your custom recipe.

recipe.py

import prodigy from prodigy.recipes.ner import teach import my_database @prodigy.recipe('custom.ner.teach', dataset=prodigy.recipe_args['dataset'], spacy_model=prodigy.recipe_args['spacy_model'], database=("Database to connect to", "positional", None, str), label=prodigy.recipe_args['entity_label']) def custom_ner_teach(dataset, spacy_model, database, label=None): """Custom wrapper for ner.teach recipe that replaces the stream.""" database = my_database.load(database) stream = ({'text': row['plain_text']} for row in database) components = teach(dataset=dataset, spacy_model=spacy_model, source=stream, label=label) return components

You can then call your recipe from the command line:

prodigy custom.ner.teach product_entities en_core_web_sm products_database --label PRODUCT -F recipe.py✨ Starting the web server on port 8080...

Testing recipes

The @recipe decorator leaves the original function intact, so calling it from within Python will simply return a dictionary of its components. This lets you write comprehensive unit tests to ensure that your recipes are working correctly.

Prodigy recipe

@prodigy.recipe('my-recipe') def my_recipe(dataset, database): stream = my_database.load(database) view_id = 'classification' if database == 'products' else 'text' return {'dataset': dataset, 'stream': stream, 'view_id': view_id}

Unit test

def test_recipe(): components = my_recipe('my_dataset', 'products') assert 'dataset' in components assert 'stream' in components assert 'view_id' in components assert components['dataset'] == 'my_dataset' assert components['view_id'] == 'classification' assert hasattr(components['stream'], '__iter__')
scikit