Recipes

A Prodigy recipe is a Python function that can be run via the command line. Prodigy comes with lots of useful recipes, and it's very easy to write your own. All you have to do is add the @prodigy.recipe decorator around your function, which should return a dictionary of components, specifying the stream of examples, and optionally the web view, database, progress calculator, and callbacks on update, load and exit.

Recipes don't have to start the web server – you can also use the recipe decorator as a quick way to make your Python function into a command-line utility. This is useful for batch training, database management, and data exploration. Prodigy comes with built-in recipes of the following categories:

Interactively annotate and train models.
Test models and interactively create evaluation sets.
Preview data and annotations from the command line.
Organise your data sets and annotation projects.

ner.teach

Collect the best possible training data for a named entity recognition model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. All annotations will be stored in the database. You can stream in examples from a file or a live API. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.

prodigy ner.teach news_headlines en_core_web_sm "Australia" --api nyt✨ Starting the web server on port 8080...
Australia Advances to Rugby World Cup Final event
source: The New York Times

ner.batch-train

Batch train a Named Entity Recognition model from annotations. Prodigy will export the best result to the output directory, and include a JSONL file of the training and evaluation examples. You can either supply a dataset ID containing the evaluation data, or choose to split off a percentage of examples for evaluation. By default, 50% is split off for datasets under 1000 examples, and 20% for sets over 1000 examples. The factor specifies the portion of examples to train on – e.g. 0.2 for 20% or 1 for all examples. If no output directory is specified, no model will be exported.

prodigy ner.batch-train news_headlines --output /tmp/model --n-iter 10 --eval-split 0.2 --dropout 0.2Starting with model en_core_web_smUsing 20% of examples (310) for evaluationCorrect 1281Incorrect 269Baseline 0.65Accuracy 0.82Model: /tmp/modelTraining data: /tmp/model/training.jsonlEvaluation data: /tmp/model/evaluation.jsonl

ner.train-curve

Batch-train models with different portions of the training examples and print the accuracy figures and accuracy improvements. n_samples is the numer of sample models to train at different stages. For instance, 10 will train models for 10% of the examples, 20%, 30% and so on. This recipe is useful to determine the quality of the collected annotations, and whether more training examples will improve the accuracy. As a rule of thumb, if accuracy improves within the last 25%, training with more examples will likely result in better accuracy.

prodigy ner.train-curve news_headlines en_core_web_sm --n-iter 10 --eval-split 0.2 --dropout 0.2 --n-samples 4Starting with model en_core_web_smUsing 20% of examples (310) for evaluation % ACCURACY25% 0.63 +0.6350% 0.72 +0.0975% 0.81 +0.09100% 0.82 +0.01

ner.match

Suggest phrases that match a given patterns file, and mark whether they are examples of the entity you're interested in. The patterns file can include exact strings, regular expressions, or token patterns for use with spaCy's Matcher class. You can bootstrap match patterns by creating a terminology list using the terms.teach recipe and converting it using terms.to-patterns.

prodigy terms.to-patterns fruits_terms /tmp/fruits_patterns.jsonl --label FRUIT✨ Exported 49 patterns.prodigy ner.match fruits_ner en_core_web_sm /tmp/food_texts.jsonl --patterns /tmp/fruits_patterns.jsonl✨ Starting the web server on port 8080...
Peel the apples fruit, then cut each in half.

ner.manual

Mark entity spans in a text by highlighting them and selecting the respective labels. The model is used to tokenize the text to allow less sensitive highlighting, since the token boundaries are used to set the entity spans. The label set can be defined as a comma-separated list on the command line or as a path to a text file with one label per line. If no labels are specified, Prodigy will check if labels are present in the model. This recipe does not require an entity recognizer, and doesn't do any active learning.

prodigy ner.manual news_headlines en_core_web_sm news.jsonl --label "PERSON, ORG, GPE, LOC"✨ Starting the web server on port 8080...

ner.make-gold

Create gold-standard data for NER by correcting the model's suggestions. The spaCy model will be used to predict entities contained in the text, which the annotator can remove and correct if necessary.

prodigy ner.make-gold news_headlines_gold en_core_web_sm news.jsonl --label "PERSON, ORG, GPE, LOC" --exclude news_headlines✨ Starting the web server on port 8080...
Facebook org has added American Express org CEO Ken Chenault person to its board
source: Recode

ner.eval

Evaluate an NER model and build an evaluation set from a stream. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.. After you're done annotating, Prodigy will print detailed evaluation stats, based on all evaluation examples in the evaluation dataset. This means you can always restart the process to add more examples to the set.

prodigy ner.eval news_headlines_eval /tmp/model "Australia" --api guardian✨ Starting the web server on port 8080...Correct 1281Incorrect 269Total 1550Accuracy 0.82

ner.eval-ab

Evaluate an NER model by comparing it to another model, for example, the model before training to the model after training. Both models will be used to make predictions on a stream of text, and the results will be shown side by side. Similar to the compare recipe, Prodigy will randomly select which example to suggest as the correct answer, to prevent bias during annotation. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.

prodigy ner.eval-ab news_headlines_ab en_core_web_sm /tmp/model "Australia" --api nyt✨ Starting the web server on port 8080...A 82 BeforeB 217 AfterIgnore 25Total 324
Turnbull ramps up national security rhetoric, saying Australia is 'destroying' Isis
Turnbull person ramps up national security rhetoric
Turnbull org ramps up national security rhetoric

ner.print-stream

Pretty-print annotations from a stream on the command line. When piping the stream to less to view it, remember to use the -r flag so that the color displays correctly. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.

prodigy ner.print-stream en_core_web_sm "Australia" --api nyt | less -r0.61 Australia’s ImpressionistsNORP review0.54 Linda Burney attacks BoltPERSON's use of 'apartheid'0.70 MorningTIME mail: Labor wants inquiry on cladding

ner.print-best

Predict the highest-scoring parse for examples in a dataset. Scores are calculated using the annotations in the dataset, and the statistical model. The annotation examples will be printed on the command line, so they can be piped forward to a recipe like mark, or saved to a file.

prodigy ner.print-best raw_set en_core_web_sm | lessprodigy ner.print-best raw_set en_core_web_sm | prodigy mark annotated_set --label PERSON

ner.print-dataset

Pretty-print annotations from a given dataset on the command line. When piping the stream to less to view it, remember to use the -r flag so that the color displays correctly.

prodigy ner.print-dataset news_headlines | less -r0.61 Australia’s ImpressionistsNORP review0.54 Linda Burney attacks BoltPERSON's use of 'apartheid'0.70 MorningTIME mail: Labor wants inquiry on cladding

ner.gold-to-spacy

Convert a dataset of gold-standard NER annotations (created with ner.manual or ner.make-gold into training data for spaCy. See the spaCy training documentation for more details. Will export a JSONL file with one entry per line.

Default format

["I like London", {"entities": [[7, 13, "LOC"]]}]

BILUO format

["I like London", ["O", "O", "U-LOC", "O"]]

Exporting annotations in BILUO format requires a spaCy model for tokenization, which should be the same model used during annotation. The recipe will export a JSONL file or print to stdout if no output file is specified.

prodigy ner.gold-to-spacy news_headlines_gold /tmp/ner_gold.jsonl✨ Exported 429 examples.

ner.iob-to-gold

Convert a file with IOB tags into JSONL format for use in Prodigy. The input format should have one line per text, with whitespace-delimited tokens. Each token should have two or more fields delimited by the | character. The first field should be the text, and the last an IOB or IOB2 formatted NER tag. If no output is specified, the output is printed to stdout.

Example (IOB)

Then|RB|O ,|,|O the|DT|I-MISC International|NNP|I-MISC became|VBD|O polarised|VBN|O

Example (IOB2)

Then|RB|O ,|,|O the|DT|B-MISC International|NNP|I-MISC became|VBD|O polarised|VBN|O
prodigy ner.iob-to-gold /annotations.iob /annotations.iob.jsonl

textcat.teach

Collect the best possible training data for a text classification model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. All annotations will be stored in the database. You can stream in examples from a file or a live API. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe.

prodigy textcat.teach news_headlines en_core_web_sm "Silicon Valley" --api nyt --label POSITIVE✨ Starting the web server on port 8080...
POSITIVE
Yahoo Decides to Release a Rosy Forecast
source: The New York Times

textcat.batch-train

Batch train a new text classification model from annotations. Prodigy will export the best result to the output directory, and include a JSONL file of the training and evaluation examples. You can either supply a dataset ID containing the evaluation data, or choose to split off a percentage of examples for evaluation. By default, 50% is split off for datasets under 1000 examples, and 20% for sets over 1000 examples. The factor specifies the portion of examples to train on – e.g. 0.2 for 20% or 1 for all examples. If no input model is specified, a blank spaCy model is used. If no output directory is specified, no model will be exported.

prodigy textcat.batch-train gh_issues /tmp/model --n-iter 10 --eval-split 0.2 --dropout 0.2 --label DOCUMENTATIONStarting with blank modelUsing 20% of examples (156) for evaluationCorrect 142Incorrect 14Baseline 0.65Precision 0.87Recall 0.87F-score 0.87Model: /tmp/modelTraining data: /tmp/model/training.jsonlEvaluation data: /tmp/model/evaluation.jsonl

textcat.train-curve

Batch-train models with different portions of the training examples and print the accuracy figures and accuracy improvements. n_samples is the numer of sample models to train at different stages. For instance, 10 will train models for 10% of the examples, 20%, 30% and so on. This recipe is useful to determine the quality of the collected annotations, and whether more training examples will improve the accuracy. As a rule of thumb, if accuracy improves within the last 25%, training with more examples will likely result in better accuracy.

prodigy textcat.train-curve gh_issues --n-iter 10 --eval-split 0.2 --dropout 0.2 --n-samples 4 --label DOCUMENTATIONStarting with blank modelUsing 20% of examples (156) for evaluation % ACCURACY25% 0.73 +0.7350% 0.82 +0.0975% 0.84 +0.02100% 0.89 +0.05

textcat.eval

Evaluate a text classification model and build an evaluation set from a stream. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe. After you're done annotating, Prodigy will print detailed evaluation stats, based on all evaluation examples in the evaluation dataset. This means you can always restart the process to add more examples to the set.

prodigy textcat.eval gh_issues_eval /tmp/model "docs" --api github --label DOCUMENTATION✨ Starting the web server on port 8080...MODEL USER COUNTaccept accept 47accept reject 7reject reject 95reject accept 7Correct 142Incorrect 14Baseline 0.65Precision 0.87Recall 0.87F-score 0.87

textcat.print-stream

Pretty-print annotations from a stream on the command line. When piping the stream to less to view it, remember to use the -r flag so that the color displays correctly. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe.

prodigy textcat.print-stream en_core_web_sm "Silicon Valley" --api nyt | less -r0.34 POSITIVE Yahoo Decides to Release a Rosy Forecast0.67 POSITIVE Facebook’s Developer Conference Kicks Off0.12 POSITIVE Economy Has Become a Drag on Silicon Valley

textcat.print-dataset

Pretty-print annotations from a given dataset on the command line. When piping the stream to less to view it, remember to use the -r flag so that the color displays correctly.

prodigy textcat.print-dataset news_headlines | less -r0.34 POSITIVE Yahoo Decides to Release a Rosy Forecast0.67 POSITIVE Facebook’s Developer Conference Kicks Off0.12 POSITIVE Economy Has Become a Drag on Silicon Valley

pos.teach

Collect the best possible training data for a part-of-speech tagging model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. All annotations will be stored in the database. You can stream in examples from a file or a live API. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.

prodigy pos.teach news_headlines en_core_web_sm news.jsonl --label "VERB, NOUN, PROPN"✨ Starting the web server on port 8080...
Facebook propn is testing a section specifically for local news and events.
source: Recode

pos.make-gold

Create gold-standard data for part-of-speech tagging by correcting the model's suggestions. The spaCy model will be used to predict part-of-speech tags, which annotator can remove and correct if necessary. It's often more efficient to focus on a few labels at a time, instead of annotating all labels jointly.

prodigy pos.make-gold news_headlines_gold en_core_web_sm news.jsonl --label "VERB, NOUN, PROPN" --exclude news_headlines✨ Starting the web server on port 8080...
Facebook propn is verb testing verb a det section noun specifically adv for adp local adj news noun and cconj events noun . punct
source: Recode

pos.gold-to-spacy

Convert a dataset with annotated part-of-speech tags to the format required to train spaCy's part-of-speech tagger. The data will be formatted in the "simple training style" and can be read in and used to update the tagger. See the spaCy training documentation for more details.

The recipe will export a JSONL file or print to stdout if no output file is specified. Each entry is a line that looks like this:

["I like eggs", {"tags": ["NOUN", "VERB", "NOUN"]}]
["I like ripe bananas", {"tags": ["NOUN", "VERB", "-", "NOUN"]}]

spaCy's tagger can also be updated from partial annotations, which will be exported with the tag -. Note that this converter currently expects all annotated spans to be single tokens. If multiple tokens were highlighted as one tag, the example will be skipped.

prodigy pos.gold-to-spacy news_headlines_gold /tmp/pos-tags.jsonl✨ Exported 429 examples.

pos.batch-train

Batch train a part-of-speech tagging model from annotations. Prodigy will export the best result to the output directory, and include a JSONL file of the training and evaluation examples. You can either supply a dataset ID containing the evaluation data, or choose to split off a percentage of examples for evaluation. By default, 50% is split off for datasets under 1000 examples, and 20% for sets over 1000 examples. The factor specifies the portion of examples to train on – e.g. 0.2 for 20% or 1 for all examples. If no output directory is specified, no model will be exported.

prodigy pos.batch-train news_headlines --output /tmp/model --label PROPN --n-iter 10 --eval-split 0.2 --dropout 0.2Starting with model en_core_web_smUsing 20% of examples (310) for evaluationCorrect 1281Incorrect 269Baseline 0.65Accuracy 0.82Model: /tmp/modelTraining data: /tmp/model/training.jsonlEvaluation data: /tmp/model/evaluation.jsonl

pos.train-curve

Batch-train models with different portions of the training examples and print the accuracy figures and accuracy improvements. n_samples is the numer of sample models to train at different stages. For instance, 10 will train models for 10% of the examples, 20%, 30% and so on. This recipe is useful to determine the quality of the collected annotations, and whether more training examples will improve the accuracy. As a rule of thumb, if accuracy improves within the last 25%, training with more examples will likely result in better accuracy.

prodigy pos.train-curve news_headlines en_core_web_sm --label PROPN --n-iter 10 --eval-split 0.2 --dropout 0.2 --n-samples 4Starting with model en_core_web_smUsing 20% of examples (310) for evaluation % ACCURACY25% 0.63 +0.6350% 0.72 +0.0975% 0.81 +0.09100% 0.82 +0.01

dep.teach

Collect the best possible training data for a dependency parsing model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. All annotations will be stored in the database. You can stream in examples from a file or a live API. If no stream source is specified, it defaults to sys.stdin, letting you pipe through data via the pipe recipe or your own scripts.

prodigy dep.teach news_headlines en_core_web_sm data.jsonl✨ Starting the web server on port 8080...
First look pobjat the new MacBook

dep.batch-train

Batch train a dependency parsing model from annotations. Prodigy will export the best result to the output directory, and include a JSONL file of the training and evaluation examples. You can either supply a dataset ID containing the evaluation data, or choose to split off a percentage of examples for evaluation. By default, 50% is split off for datasets under 1000 examples, and 20% for sets over 1000 examples. The factor specifies the portion of examples to train on – e.g. 0.2 for 20% or 1 for all examples. If no output directory is specified, no model will be exported.

prodigy dep.batch-train news_headlines --output /tmp/model --n-iter 10 --eval-split 0.2 --dropout 0.2Starting with model en_core_web_smUsing 20% of examples (310) for evaluationCorrect 1281Incorrect 269Baseline 0.65Accuracy 0.82Model: /tmp/modelTraining data: /tmp/model/training.jsonlEvaluation data: /tmp/model/evaluation.jsonl

dep.train-curve

Batch-train models with different portions of the training examples and print the accuracy figures and accuracy improvements. n_samples is the numer of sample models to train at different stages. For instance, 10 will train models for 10% of the examples, 20%, 30% and so on. This recipe is useful to determine the quality of the collected annotations, and whether more training examples will improve the accuracy. As a rule of thumb, if accuracy improves within the last 25%, training with more examples will likely result in better accuracy.

prodigy dep.train-curve news_headlines en_core_web_sm --n-iter 10 --eval-split 0.2 --dropout 0.2 --n-samples 4Starting with model en_core_web_smUsing 20% of examples (310) for evaluation % ACCURACY25% 0.63 +0.6350% 0.72 +0.0975% 0.81 +0.09100% 0.82 +0.01

terms.train-vectors

Train a Word2vec or Sense2vec semantic similarity model, using spaCy and Gensim. For best results, you should provide a lot of input text (at least 10 million words). Because training is a batch process, the input stream must be fixed-length.

prodigy terms.train-vectors /tmp/model en_core_web_sm /tmp/reddit.bz2 --loader reddit✨ Trained Word2Vec model/tmp/model/word2vec.bin

Word2Vec trains a lookup table that provides a "meaning" vector for each term in your vocabulary. Terms which occur in similar contexts will be mapped to similar vectors. Given enough examples of a word, the model is able to build an accurate picture of its usage in context, which reveals a lot (but not all) about its meaning. The vectors can then be used like a thesaurus, or as input to a neural network model.

Because word2vec models are look-up tables, each term is assigned a single meaning. The --merge-ents and --merge-nps flags are particularly useful, because they let you assign vectors to long multi-word expressions, with more predictable results than the standard mutual information-based strategy.

terms.teach

Build a terminology list interactively using a model's word vectors and seed terms, either a comma-separated list or a text file containing one term per line.

prodigy terms.teach programming_langs /tmp/model "Python, C++, Ruby"✨ Starting the web server on port 8080...
JavaScript

terms.to-patterns

Convert a list of seed terms to a list of match patterns that can be used with ner.match, for example to collect annotations for training a new entity type. If no output file is specified, each pattern is printed so the recipe's output can be piped forward.

prodigy terms.to-patterns programming_langs /tmp/patterns.jsonl --label PROG_LANG✨ Exported 59 patterns.

Match patterns files are JSONL files with one pattern per line. Patterns can be both exact string matches, as well as token patterns supported by spaCy's matcher.

patterns.jsonl

{"label": "PROG_LANG", "pattern": [{"lower": "javascript"}]} {"label": "PROG_LANG", "pattern": [{"lower": "c++"}]} {"label": "PROG_LANG", "pattern": [{"lower": "ruby"}]}

image.manual

Annotate images by drawing rectangular bounding boxes and polygon shapes. Each shape will be added to the task's "spans" with its label, colour and a "points" property containing the [x, y] pixel coordinate tuples. Rectangular shapes can be added by clicking or dragging. Polygon shapes can be closed by clicking on the start point, or by double-clicking anywhere on the image. To select an existing shape, click on its label or bounding box. Selecting a shape allows you to change its label or delete it from the image.

prodigy image.manual image_objects /tmp/my_images/ --label PERSON,ANIMAL,CAR✨ Starting the web server on port 8080...

image.test

Test Prodigy's built-in object detection interface using one of the YOLOv2 object detection models via the LightNet library. The recipe will load in images from a directory or API, and create one task per detected object.

pip install lightnetpython -m lightnet download tiny-yoloprodigy image.test image_objects tiny-yolo /tmp/my_images/✨ Starting the web server on port 8080...
 skateboard 
source: Unsplash by: Kirk Morales url: unsplash.com/@knation

mark

Start the annotation server without any of the clever training logic. It just displays what it's given, and saves your decisions to the database. When using mark for evaluation, you'll usually want to add the --memorize flag, to enable the answer cache. This prevents you from being asked exactly the same question twice within the same dataset. Most powerful in combination with pipe.

prodigy mark ner_products ~/data/RC_2010-02.bz2 --loader reddit --view-id text --memorize✨ Starting the web server on port 8080...
Be a normal human being and grab a $100 copy of Windows Home Premium and call it a day.

compare

Compare the output of your model and the output of a baseline on the same inputs. Expects two JSONL files in Prodigy's simple compare format. To prevent bias during annotation, Prodigy will randomly decide which output to suggest as the correct answer. When you exit the application, you'll see detailed stats, including the preferred output.

prodigy compare text_gen_eval model_a.jsonl model_b.jsonl✨ Starting the web server on port 8080...A 56 model_a.jsonlB 193 model_b.jsonlIgnore 10Total 259
To Close Digital Divide, Microsoft to Harness Unused Television Channels
Microsoft is buying television bandwidth
Microsoft is selling its shares in local news stations

pipe

Load examples from an input source, and print them as newline-delimited JSON. This makes it easy to filter the stream with command-line utilities such as grep. It's also often useful to inspect the stream, by piping to less. If you're using one of Prodigy's recipes to pretty-print annotations, don't forget to set the -r flag with less so that the color displays correctly.

prodigy pipe "Australia" --api guardian | grep -v "Football\|Sport" | head -n 500 > ner_au_eval.jsonlprodigy dataset ner_au_eval "Evaluate NER model on Guardian headlines about Australia (excluding sport)"✨ Created dataset 'ner_au_eval'.prodigy db-in ner_au_eval ner_au_eval.jsonl✨ Imported 500 annotations to 'ner_au_eval'.

prodigy

Run a built-in or custom Prodigy recipe. The -F option lets you load a recipe from a simple Python file, containing one or more recipe functions. All recipe arguments will be available from the command line. To print usage info and a list of available arguments, use the --help flag.

prodigy my-recipe my_set "food" --label TASTY -F my_recipe.py✨ Starting the web server on port 8080...prodigy my-recipe -F my_recipe.py --helpusage: prodigy my-recipe [-h] [-l None] dataset queryStream images via the Unsplash APIpositional arguments: dataset Dataset ID query Search queryoptional arguments: -h, --help show this help message and exit -l None, --label None Category label

To make it easy to write custom recipe function, Prodigy comes with a @recipe decorator. Its first argument is the recipe name that lets you call the recipe from the command line, followed by optional annotations for the available arguments. Recipes can return a number of components, like a stream of examples or the annotation interface to use. If your recipe returns a dictionary of components, Prodigy will start the web application and load in examples from the stream.

my_recipe.py

import prodigy from prodigy.components.loaders import Unsplash @prodigy.recipe('my-recipe', dataset=("Dataset ID"), query=("Search query"), label=("Category label", "option", "l", str)) def my_recipe(dataset, query, label=None): """Stream images via the Unsplash API""" return { 'dataset': dataset, 'stream': Unsplash(query=query, key='xxx'), 'view_id': 'classification' if label else 'image' }
tasty
A photo of a plate of nachos with cheese
source: Unsplash by: Herson Rodriguez url: unsplash.com/@hero

dataset

Create a new dataset in the database. A dataset ID is required for most recipes and lets you group annotations together. The additional meta information, like author and description, is also displayed in the web application.

prodigy dataset news_headlines "Annotate news headlines"✨ Created dataset 'news_headlines'.

When you start annotating, Prodigy will also create a session dataset, using the timestamp as the dataset ID. This lets you view, export or discard annotations of a specific session. To see all dataset and session IDs, use the stats command with the flag -ls.

stats

Print Prodigy and database statistics. Specifying a dataset ID will show detailed stats for the dataset, like annotation counts and meta data. You can also choose to list all available dataset or session IDs.

prodigy stats news_headlines -l✨ Prodigy statsVersion 1.0.0Database Name SQLiteDatabase Id sqliteTotal Datasets 5Total Sessions 23✨ Dataset 'news_headlines'Dataset news_headlinesCreated 2017-07-29 15:29:28Description Annotate news headlinesAuthor InesAnnotations 1550Accept 671Reject 435Ignore 444

db-in

Import existing annotations to the database. Can load all file types supported by Prodigy. To import NER annotations, the files should be converted into Prodigy's JSONL format first. Setting the answer key lets you train from the annotations immediately – even if they were not created with Prodigy. By default, annotations are imported as "accept" if no answer is present on the data.

prodigy db-in new_set /tmp/news_headlines.jsonl✨ Imported 1550 annotations to 'new_set'.

db-out

Export annotations in Prodigy's JSONL format. If the output directory doesn't exist, it will be created. If no output directory is specified, the annotations will be printed instead. Setting --flagged-only will only export examples with "flagged": true. To enable the flagging button in the UI, e.g. to bookmark examples for later, set "show_flag": true in your Prodigy configuration.

prodigy db-out news_headlines /tmp✨ Exported 1550 annotations from 'news_headlines'.

drop

Remove a dataset or annotation session from a project. Can't be undone. To see all dataset and session IDs in the database, use prodigy stats -ls.

prodigy drop news_headlines✨ Removed 'news_headlines' from database SQLite.
scikit