local

Changelog

This page lists the history of changes to Prodigy. Whenever a new update is available, you’ll receive an email notification sent to the address specified at checkout. You can then download the new version via your personal download link. If your free upgrades expired, you can now add 12 months of updates to your license via our online shop. Please allow up to 24 hours for your download link to be reactivated.

v1.17.0 2024-11-18

This update features a brand new container interface, pages, to split a single annotation task over multiple sections, and even combine different interfaces, without losing the simplicity and efficiency of Prodigy’s card-based design. Paginated data can be loaded directly from JSON or from a simple file format and is already supported across all relevant built-in annotation recipes and in the Prodigy-PDF plugin.

Screenshot of the pages interface and data formats
new pages interface for multi-page tasks like longer documents, PDFs or collections of images.
newPages loader for loading paginated files and support for --loader pages and paginated input data across relevant built-in recipes.
newsplit_pages and merge_pages preprocessors and support for pages in train and data-to-spacy.
fixEnhance front-end validation of task data to provide more helpful error messages in the UI.
docUpdate documentation on custom interfaces, computer vision for PDFs, and NER with long texts.
fyi"pages" and "page_titles" are now a protected keys in the JSON data (like "text") and you should avoid using them for anything that’s not paginated examples.
fyiProperties in a JSON task’s "meta" that start with an underscore _ are now considered internal and not displayed in the annotation interface.

v1.16.0 2024-10-22

For this release, we’ve removed all Cython-compiled source code and are now shipping Prodigy as cross-platform Python wheels, which makes it easier to develop custom recipes and improves type checking and IDE support. The update also includes several front-end fixes, like the restoration of the timeline for the audio interface and enabling wrapping of versions in review. On the backend, we’ve refactored drop to mitigate issues related to SQLITE_MAX_VARIABLE_NUMBER.

fyiProdigy is now distributed as pure Python wheels without compiled Cython.
fixRestore timeline functionality for the audio interface.
fixEnable wrapping of versions in the review interface.
fixImprove llm.fetch family of recipes by adding the default accept answer.
fixImprove the drop logic to mitigate potential issues with SQLITE_MAX_VARIABLE_NUMBER.

v1.15.8 2024-10-03

This patch release restores the possibility to return an instance of the Controller object from the recipe.

fixRe-enable returning Controller object from the recipe.

v1.15.7 2024-07-30

This patch release fixes a bug in training config generation for the default textcat and textcat-multilabel spaCy components.

fixTraining config generation for textcat and textcat-multilabel.

v1.15.6 2024-06-19

This patch release pins the version of numpy to <2.0.0 to avoid installation issues due to backwards-incompatible change introduced in numpy 2.0.0.

fixPin numpy to <2.0.0.

v1.15.5 2024-06-13

This patch release updates the version of wavesurfer.js dependency, fixing a regression in audio.manual that prevented marking regions to the left of the cursor.

fixRestore marking regions to the left of the cursos in audio.manual.
fyiUpdated wavesurfer.js to ^7.7.15

v1.15.4 2024-05-23

This patch release fixes a few bugs in review, rel.manual and

metrics.iaa.doc recipes.

fixSupport non string labels in metric.iaa.doc.
fixFix token disabling via _ property patterns in relations.
fixFix the stream generation in review so that the relations annotations that differ in order only are not shown as different.

v1.15.3 2024-04-24

This release adds support for fastapi up to 0.111.0.

newAllow for fastapi<0.111.0.
fixFix the issue with accept_single flag not filtering the tasks annotated by a single annotator correctly in review.
fyiMake the No tasks available screen more informative.

v1.15.2 2024-03-26

This patch release updates ab.llm.tournament recipe to spacy-llm >= 0.7.0

fixUpdate the processing of the model’s response in ab.llm.tournament to spacy-llm >= 0.7.0 .
fyiInternal refactoring of dataset processing to support prodigy-evaluate plugin .

v1.15.1 2024-02-23

This release improves the training config generation used by train,

train-curve and data-to-spacy by fixing a bug that previously prevented the use of transformer spaCy pipelines as base models. Additionally, the sourcing of the tokenizer from the base model is now automated. We have also bumped uvicorn dependency to allow <0.27.

newAutomate the sourcing of the tokenizer from the base model.
newAllow for uvicorn <0.27.
fixAllow transformer as the embedding layer for spaCy base model.

v1.15.0 2024-02-15

This release adds support for the new Prodigy Company Plugins package that can be downloaded with a Company License. This first premium plugin for SSO (Single Sign-On) features OIDC authentication across a variety of providers, including Auth0, Okta, Google, Microsoft Entra, and more. For more details on the new company features see the OIDC docs.

newAdd support for prodigy-company-license 0.1.0

v1.14.14 2024-01-30

This update allows Prodigy to use spacy-llm <1.0.0 with its recent addition of new tasks including entity linking and translation, as well as support for arbitrarily long docs.

newAllow spacy-llm <1.0.0.

v1.14.13 2024-01-25

This update fixes the recipe config override from the task level so that the view_id attribute can be updated. It is now possible to design data streams with different view_ids without requiring a server restart.

fixIt is now possible to overwrite view_id (and other config atrributes) from the task level.
newAdded a “Support” column to metric.iaa.span result table to represent that number of examples on which the metric was calculated.

v1.14.12 2023-12-13

This patch release restores the functionality of the audio_rate setting. We have also upgraded our wavesurfer.js dependency to 7.4.4 which is related to the deprecation of the show_audio_timeline setting and a slight change in how the sound waves are rendered in the UI. We’re also excited to announce a new plugin for image segmentation that leverages Meta’s Segment Anything model.

newUpdated wavesurfer.js dependency to 7.4.4.
newAdded a new window.prodigy.resetQueue method available in the frontend, meant to be used together with custom events.
fixFixed audio_rate setting for audio recipes.
fyiDeprecated the show_audio_timeline setting due to wavesurfer.js upgrade.
docUpdated the custom recipes section to contain examples that leverage radicli instead of plac.

v1.14.11 2023-11-30

This patch release adds support for Python 3.12 and fixes a regression related to the prodigy.serve function introduced during the transition to radicli. Additionally, it restores the functionality of using stdin as a source.

newAdded support for Python 3.12.
newAdded the Controller.reset_stream method to allow custom recipes to reset the stream
fixFixed CLI argument proccessing in prodigy.serve to make it compatible with radicli.
fixRestored the use of stdin as source.

v1.14.10 2023-11-16

This patch release fixes inferring exclusive labels from spaCy textcat models.

fixFixed checking for exclusive labels in spaCy textcat models.

v1.14.9 2023-11-14

This patch release updates Prodigy to use spacy-llm <0.7.0.

newAlow spacy-llm <0.7.0.

v1.14.8 2023-11-09

This patch release fixes a bug when sourcing custom JavaScript from a string.

fixFixed a bug related to injecting inline JavaScript.

v1.14.7 2023-11-07

This release adds an extra validation step to some textcat recipes to make sure no empty annotations are written into the database. This behavior can be turned off with a flag if users prefer the original behavior.

newAdded validation for empty annotations for exclusive textcat models in textcat.manual and textcat.correct.
newAdded --accept_empty flag to textcat.manual and textcat.correct to turn off the new validation.
fixFixed a bug in texcat.correct that mistook exclusive textcat models for non-exclusive ones.
fixFixed a runtime error when combining the PatternMatcher with an nlp model in textcat recipes.
fixCorrected the spelling of spaCy in the output of the stats command.

v1.14.6 2023-11-02

This release updates Prodigy to be compatible with spaCy >=3.1.1,<3.8.0 and Pydantic >=1.10.8,<3.0.

newUpdated spacy and pydantic dependencies.

v1.14.5 2023-10-24

This release adds an improved character highlighting feature for

ner_manual and spans_manual that allows for switching between character and token highlighting from the UI, while annotating.

We have also facilitated developing with custom css and javascript by adding support for mounting css and javascript files from local directories and remote URLs.

newAdded a toggle for switching between character and token highlighting in ner_manual and spans_manual UIs.
newSupport mounting CSS and JS code from local directories and remote URLs.
fixFixed annotator filtering in IAA recipes.

v1.14.4 2023-10-12

This patch release improves error messages and fixes a bug in spacy-config that prevented to config to be saved on disk properly.

fixImprove error messages for iaa and stream modules.
fixFixed an issue with saving the config file to disc in spacy-config

v1.14.3 2023-10-06

This release adds two new commands for computing inter-annotator agreement for document level and token level annotations. We also introduce Prodigy Plugins: Prodigy-PDF, Prodigy-ANN and Prodigy-LUNR. Prodigy Plugins are add-ons that extend Prodigy’s functionality with third party libraries. They are open source and can be installed separately to be used with 1.14.3 and above.

newInter annotator agreement for document level and token level annotations.
fyiProdigy Plugins for PDF processing and selecting relevant subsets of data.
fixFixed the display of the history in the sidebar.
fixFixed the truncated display of the available recipes outputted by the prodigy command.
docAdded a new section on inter annotator agreement metrics.
docAdded a new section on Prodigy Plugins.

v1.14.2 2023-09-29

This patch update addresses a backward compatibility issue that was introduced in version 1.14.0, where the get_labels helper function was removed potentially affecting custom recipes.

fixRestored get_label function for backward compatibility.

v1.14.1 2023-09-29

This release adds support for custom Recipe Event Hooks to allow for basic interactivity in custom Prodigy Recipes. It adds a new window.prodigy.event function to the window.prodigy object available for use in Custom Recipe JavaScript. This completes the initial work on an ongoing, undocumented feature we’ve been using for a while.

newAdded support for basic interface interactivity via custom event hooks.

v1.14.0 2023-09-21

This release focuses on improving Prodigy internals. We have substituted plac with radicli for CLI development which brings DX improvements such as using type hints for argument parsing including support for custom types as well as custom CLI errors. Please check radicli documentation for a complete overview of benefits.

Higher versions of pydantic (<3.0), fastapi (<0.103.0) and spacy-llm (<0.6.0) dependencies are now supported. Since spacy-llm 0.5.0 adds support for chain-of-thought prompting, there’s now a corresponding section in the docs with examples.

We have also improved typing and error handling across Prodigy.

Finally, some of the older, deprecated helper functions are no longer available:

  • Reddit dataset loader
  • read_jsonl, write_jsonl, read_json, b64_uri_to_bytes, pretty_print_ner, pretty_print_tc
newImproved CLI by substituting plac with radicli.
newAllowed latest versions of pydantic, fastapi and spacy-llm.
docAdded LLM section with explainers for chain of thought prompts for NER and spancat.
fyiDeprecated Reddit loader and older helpers: read_jsonl, write_jsonl, read_json, b64_uri_to_bytes, pretty_print_ner, pretty_print_tc.

v1.13.3 2023-09-20

This patch release fixes a bug in the review recipe which prevented overwriting view-id attribute on CLI. This is particularly relevant when using datasets with block view-id as input to review, including the output of *.llm.correct recipes.

fixFixed a bug in review that didn’t permit overwriting view-id attribute for blocks interface.

v1.13.2 2023-09-07

This release introduces spacy-llm variants of terms.openai.fetch and

ab.openai.tournament recipes. The terms.llm.fetch recipe can generate terms and phrases using an LLM. And the ab.llm.tournament recipe can be used for prompt engineering and/or comparing different LLM backends. This means that we now have replacements for all the *.openai.* recipes, which is why they now all carry deprecation notices.

We have also added a new annotation interface llm-io to facilitate writing custom LLM recipes and fixed a task router bug related to server restarts.

newAdded terms.llm.fetch which can use spacy-llm to fetch relevant phrases and terms.
newAdded ab.llm.tournament which can be used for prompt engineering and comparing LLM backends.
newAdded llm-io interface to show the prompt/response from an LLM.
fixFixed a bug that caused inconsistencies in task routers when dealing with many server restarts.
fyiAll the *.openai.* recipes now carry deprecation warnings, because there are spacy-llm variants to replace them.

v1.13.1 2023-08-23

This release introduces recipes that allow spaCy pipelines to annotate examples. When you combine these recipes with the review recipe, you’re able to focus on examples where models disagree.

This pattern is powerful because these examples typically carry a lot of information for your model. But it is also very useful given the spaCy-LLM integration introduced in v1.13.0, which makes it relatively easy to compare your own model against an LLM pipeline.

newAdded ner.model-annotate, textcat.model-annotate, spans.model-annotate recipes to automatically annotate datasets with models.
newAdded make_ner_suggestions, make_spancat_suggestions and make_textcat_suggestions helper functions to make it easier to turn spaCy output into annotation examples.
newAdded filter_seen_before helper functions to make it easier to remove specific duplicates from your stream in custom recipes.
fixFixed a bug that caused duplicate log lines to appear.
fixFixed a bug related to config validation for the image.manual recipe.
fyiThe review recipe is now more explicit and strict when it comes to exiting immediately if the annotation interface isn’t supported.
docAdded a new section on reviewing annotations.

v1.13.0 2023-08-15

This release introduces support for spacy-llm, which gives an even wider support for large language models for NER, textcat and spancat annotations. Future recipes that leverage large language models will also use the spaCy-LLM backend and the OpenAI recipes will be deprecated.

newAdded spacy-llm based replacements for the OpenAI workflows for NER and Textcat via the ner.llm.correct, ner.llm.fetch, textcat.llm.correct and textcat.llm.fetch recipes.
newIntroduced LLM support for Spancat tasks via the spans.llm.correct and spans.llm.fetch recipes.
fyiProdigy will deprecate the *.openai.* recipes in the future due to a deprecation over at OpenAI. These recipes will all be replaced with *.llm.* variants that use spaCy LLM as a backend.
docUpdated the large language models section

v1.12.7 2023-08-10

This release fixes the issue where DatasetSource, GeneratorSource and ListSource might end up in erroneous state at the end of the iteration due to incorrect position resetting. This would also lead to unexpected progress bar updates.

fixRemove position reset when closing DatasetSource, GeneratorSource and ListSource.

v1.12.6 2023-08-08

This release fixes intermittent MySQL integrity errors during bulk DB operations.

fixRemove bulk inserts to DB to make the operations more stable.

v1.12.5 2023-07-28

This release adds temporary support to legacy (pre 1.12.X) loaders in the new get_stream utility. It also fixes a few minor CLI and config processing bugs. We have also improved the error message for missing DB drivers.

fixAdd support to legacy (pre 1.12.X) loaders in get_stream utility.
fixFix the processing of the :ignore, :accept, :reject suffixes in dataset source CLI.
fixImprove error message in the event of missing DB drivers.
fixFix the support for the delimiter argument of the legacy CSV loader.
fixFix the processing of hide_arrow_heads and hide_true_newline_token config settings in the rel.manual recipe.

v1.12.4 2023-07-19

This release includes an additional bug fix for the frontend.

fixFix issue where individual image spans could not be selected in the image_manual view for the Polygon and Freehand tools. This builds on the fix in v1.12.3 which only fixed selection of image spans annotated with the rectangle tool.

v1.12.3 2023-07-17

This release includes major bug fixes for the frontend and an extra video docs for task routing:

fixFix issue where individual image spans could not be selected in the image_manual view
fixFix an issue where the “Save” button could be clicked twice and save a duplicate answer to the database.
fixFix an issue that would render br elements in the frontend
docAdded docs on Database.get_hashes and Database.count_dataset Database methods
docAdded a new video on task routers that does a deep dive on how to construct your own

v1.12.2 2023-07-13

Fixes a bug when using a Prodigy Dataset as a source for an Audio or Image recipe using the dataset:my_dataset_name syntax.

fixFix FileNotFoundError when using a Dataset as a source for an Audio or Image recipe
docFixed inconsistencies related to session ids.

v1.12.1 2023-07-12

This update adds support for the latest spaCy version.

newExtend spaCy support to the latest v3.6.

v1.12.0 2023-07-05

For this version we have completely refactored Prodigy internals to make the annotation flow more tractable and more customizable. We have reimplemented the Controller and added new abstractions to better represent the stream of tasks and the input source. This let us deliver a number of new, exciting features such as partial, configurable feed overlap, custom task routers, custom session factories, source-based progress estimation, support for Parquet input files, experimental support for training coref component in train, new

filter-by-patterns recipe and DX improvements.

v1.12 also provides support for LLM-assisted workflows for data annotation and prompt engineering. We have provided 4 new recipes for bootstrapping NER and Textcat annotation, 1 for terminology generation and 2 for prompt engineering including a really creative ab.openai.tournament recipe. As of this version we support python 3.11 and we drop support for python 3.7.

Thanks to everyone who’s helped us by testing the alpha versions. See the changelog below for a full list of new features.

newAdded a new Controller to facilitate annotation workflow customization.
newAdded support for task routing, allowing you to customise who annotates each example.
newAdded annotations_per_task setting to easily configure a task router for partial annotator overlap.
newAdded a selection of task routers to the public API that can be used in custom recipes.
newAdded a session_factory callback for custom recipes, giving you control on how sessions are created.
newAdded support for spacy-experimental coref component in the train and train-curve recipes.
newAll of Prodigy’s internal recipes now support the .parquet file format as a data source.
newAdded allow_work_stealing setting in prodigy.json that allows you to turn off work stealing.
newAdded PRODIGY_LOG_LOCALS environment variable to supply local variables for when debugging Prodigy error messages
newAdded get_hash_count and get_hashes_min_cardinality methods to database class, which are useful in custom task routers.
newThe review recipe now provides a --accept-single flag to also automatically accept annotations from a single annotator when --auto-accept is also turned on.
newAdded a new filter-by-patterns recipe that can use match patterns to produce a relevant subset for a downstream task.
newAdded support for annotation workflows that use large language models from OpenAI as a model-in-the-loop via the ner.openai.correct, ner.openai.fetch, textcat.openai.correct and textcat.openai.fetch recipes.
newAdded support for pattern file generation via OpenAI’s large language model with terms.openai.fetch recipe.
newAdded support for prompt engineering recipes via the ab.openai.prompts and ab.openai.tournament recipes.
newAdded new progress calculation based on relative position in a source object.
newDistinguish between target progress and source progessin the UI.
fixFixed a bug related to allow_newline_highlight setting in NER recipes.
fixFixed a bug multiple labels in the mark recipe.
fixFixed a bug related to multiple labels in the choice interface
fixFixed a bug related to trailing slashes in session names. Prodigy will now ignore trailing slashes.
fixAdded a more helpful error message when a user needs to provide a /?session= via the URL
fyiRemoved auto_count_stream setting and automated counting of generators passed to stream component. See Controller Progress docs for new information on progress calculation.
fyiAdded a warning when custom recipes don’t ensure the hashes are set appropriately on their examples.
fyiTracebacks that Prodigy error messages try to hide will now appear in the logs when PRODIGY_LOGGING is configured. Details are explained here.
fyiDropped support for python 3.7 due to end of life on 2023-06-27.
docAdded a new usage guides on task routing, large language models and cloud deployments
docFixed typos and inconsistencies

v1.11.14 2023-05-23

This patch release adds a constraint for typing_extensions (<4.6.0) to avoid a version conflict in Pydantic.

fixAdd dependency for typing_extensions.

v1.11.13 2023-05-12

This patch release relaxes the following requirements: tqdm (>=4.38.0,<5.0.0), Jinja2 (no pin), python-dotenv (>=0.21.1,<2.0.0).

fixUpdate dependency versions for tqdm, Jinja2, and python-dotenv.

v1.11.12 2023-05-02

This patch release fixes an issue with the --base-model argument in

train which resulted in erroneous sourcing of the tok2vec component from the spaCy base model. It also updates the peewee (<3.17), pydantic (<2.0), FastAPI (<0.95.1) and typeguard (<4.0) dependencies.

fixFix --base-model parsing in train.
fixUpdate dependency versions for peewee, pydantic, typeguard and FastAPI.

v1.11.11 2023-02-23

This release reverts a backward incompatible change from v1.11.10 that removed database methods and helpers get_db, set_db and disconnect. These methods were not part of the documented API, but could be accessed via the Prodigy source available to users, which might have potentially caused problems.

fixRestore database methods get_db, set_db and disconnect.

v1.11.10 2023-01-31

This release updates Prodigy to use superior versions of spaCy (up to 3.5), pydantic (up to 1.10.4) and FastAPI (up to 0.89.1) dependencies.

fixUpdate dependency versions for spaCy, pydantic and FastAPI.

v1.11.9 2023-01-23

This update fixes an issue with duplicate examples in multi-user annotation flows and includes several smaller bug fixes in recipes.

newAdd logging from the frontend to the backend if the frontend ever receives a batch with duplicate tasks.
fixPrevent duplicate examples from being shown to annotators, specifically in higher latency, production scenarios.
fixFix the audio.transcribe recipe by putting the custom UI attributes on each task, instead of in the config.
fixFix an issue where some unsaved examples could be lost during a browser refresh.
fixFix the --wrap functionality in relations interface.
fixFix multiple copies of session IDs being rendered in review recipe.

v1.11.8 2022-07-20

This update includes various bug fixes and usability improvements and extends support to the latest spaCy versions.

newExtend spaCy support to the latest v3.4.
newAdd --use_annotations argument to Span Categorization recipes for suggesters that require annotations from other components.
new/health and /healthz API endpoints for health checks.
newSupport tokens with "disabled": true in input data for rel.manual.
newController.from_components classmethod.
newAdd validation to ensure choice options are unique.
fixAllow highlighting newlines in sent.correct.
fixPrevent binary conflicting spans from being added to both positive and negative examples in train.
fixFix --auto-accept for binary rejection agreements in review.
fixAutomatically prevent duplicates from appearing in training and evaluation set in train and data-to-spacy.
fixImprove default config generation for base models with vectors and ensure vectors are used.
fixImprove warnings for invalid setting combinations in PatternMatcher.

v1.11.7 2022-01-05

This update includes fixes to better support spaCy v3.2 and improves sentence segmentation correction, data export for binary NER annotations, and stream handling for multi-user sessions.

newUpdate spaCy version range to allow installing spaCy v3.2 with Prodigy by default.
fixFix vector handling for spaCy v3.2 compatibility.
fixImprove caching to prevent duplication in named multi-user sessions.
fixEnable newline and whitespace highlighting by default in sent.correct.
fixEnsure binary rejected entity spans are ignored in train and data-to-spacy if they’re also in the accepted annotations.
docFix typos and inconsistencies.

v1.11.6 2021-11-17

This update includes wheels for Python 3.10, as well as small fixes to recipes and config generation.

newPre-built wheels for Python 3.10.
fixAlways use best score in train-curve and train with --label-stats, not last.
fixEnsure hashes are set correctly in audio.transcribe.
fixFix issue that could cause progress 0 to not be reported correctly in custom functions.
fixCorrectly preserve tok2vec and transformer in train with base config.
fixDon’t auto-add examples already in dataset in review with --auto-accept.
fixUse relative paths for fonts to support custom path prefixes.

v1.11.5 2021-10-14

This update includes small fixes to the feed logic for multi-user sessions and custom progress handling.

fixSimplify feed history tracking for multi-user sessions to prevent duplication.
fixCorrectly handle progress 0.0 returned by custom progress functions.
fixFix display of input content in compare recipe.
fixInclude .prodigy-spans CSS class in ner_manual UI.

v1.11.4 2021-09-13

fixFix issue that could cause stream to repeat batches of questions in some scenarios.

v1.11.3 2021-09-08

This release includes various small fixes to stream handling, data loading and config generation.

newExclude previously accepted labels during binary annotation with --exclusive in textcat.manual.
fixFixing feed overlap bug that could cause session history to not be tracked correctly after reset.
fixEnsure recipe’s update callback is executed when annotating without a dataset.
fixFix handling of meta column in CSV loader and add its value to a dict.
fixHandle non-dictionary "meta" values correctly in PatternMatcher.
fixCorrectly set both _session_id and _annotator_id in controller.
fixDon’t remove frozen scores used by non-frozen components during config generation in train.
fixEnsure that pre-defined spans receive a "score" in ner.teach.

v1.11.2 2021-08-20

This update includes small fixes to the train curve plots and recipes that don’t save data.

fyiAdd warning when using deprecated --ner-missing in train and data-to-spacy.
fixAdd workaround for using v2.x and v3.x of plotext in train-curve with --show-plot.
fixFix handling of False as dataset value returned by recipe.
fixFix handling of "meta" dict in choice options.
fixFix links in PyPi server /index endpoint.

v1.11.1 2021-08-17

This update includes a new workflow for correcting a trained span categorizer, as well as various small fixes to the config generation, stream setup and UI.

new spans.correct workflow for correcting a trained span categorizer.
newSupport /index endpoint in PyPi download server for package index to use in requirements.txt.
fixFix config generation in train if no logger is present in the config.
fixPrevent error in stream counting for slower streams and don’t pre-count if --update is set.
fixFix issue that’d cause CSV reader to fail for None column headers.
fixEnsure hashes are always added before task validation.
fixPrevent labels from wrapping in spans_manual.

v1.11.0 2021-08-12

This release updates Prodigy to use the new spaCy v3, which brings you lots of new exciting features like end-to-end support for transformer-based pipelines, a training config system for reproducible results, as well as new trainable components for sentence segmentation and span categorization that you can create annotations for with Prodigy. Thanks to the 300+ (!) nightly users who helped us test this new release!

Prodigy v1.11 includes a bunch of new features, including a new installation process via pip and new wheels for Python 3.9 and ARM architectures, a new recipe and UI for annotating overlapping and nested spans, new recipes for improving a sentence recognizer model, new training and data export recipes that seamlessly integrate with spaCy’s config system and let you train multiple components with different evaluation sets, support for updating the model in the loop in ner.correct and a new textcat.correct to go along with it, improved handling of binary annotations in ner.teach for better results, as well as new customization options and settings.

newSupport for spaCy v3, including transformer-based pipelines, training configs, new trainable components and more.
newImproved wheel installation: download the best-matching wheel via pip using your license key!
newPre-built wheels for Python 3.9 and ARM architectures.
newNew train command that supports training multiple components from different datasets and evaluation datasets (using the eval: prefix) and mixing manual and binary datasets.
newNew data-to-spacy command that generates all data you need: training and evaluation corpora in spaCy’s binary format, initialized label sets for faster training and optional config file.
newNew train-curve command with support for multiple components and visual plots.
new spans.manual workflow and spans_manual UI for annotating any number of potentially overlapping and nested spans. Also see the span categorization docs for details.
newSupport for training spaCy’s new SpanCategorizer in train and data-to-spacy, based on annotations collected with spans.manual.
newUpdate the model in the loop in ner.correct using the --update flag.
new textcat.correct recipe for correcting and updating an existing text classifier.
new sent.teach and sent.correct recipes for improving a sentence recognizer model.
newImproved ner.teach workflow for more accurate results with spaCy v3. In addition to binary questions about entity suggestions, you’re now also asked questions about texts with no entities at all. If an example includes no highlighted suggestions, you can hit accept to confirm that it contains no entities, or reject if it contains entities.
new progress command to calculate annotation progress over time.
newThe -F argument to provide a file path for custom recipe scripts now supports multiple, comma-separated paths and can also be used to load custom registered functions for spaCy configs. It works across all recipes, including the built-in workflows.
newAdd --auto-accept flag to review recipe to automatically accept annotations with no conflicts and add them to the database.
newSupport "history_text" property in task for customizing preview shown in sidebar history.
newAllow optional "meta" and "score" in choice options, which will be displayed with the option.
newInclude the UNIX timestamp of when an annotation was answered in the UI as "_timestamp".
newSupport counting finite and potentially filtered generator streams for better progress estimation via "auto_count_stream": true. Note that this setting should only be used for streams that are not dynamic and depend on outside state (e.g. an updated model in the loop).
newAdd total_examples_target for the total number of examples that should be annotated to reach a progress of 100%. Useful for infinite streams or if completion doesn’t map to stream size.
newPRODIGY_CONFIG and PRODIGY_CONFIG_OVERRIDES environment variables to provide custom path to global config JSON and override individual config settings on the CLI.
newDark mode theme! Enable it by setting "theme": "dark" in your prodigy.json.
newSupport overriding color palettes used for labels via "palettes" in "custom_theme".
newSupport "style" property on individual "tokens" in ner_manual and spans_manual.
newAdd more human-readable CSS class names and data attributes.
fyiThe train command now handles binary annotations out-of-the-box, so you won’t have to explicitly set --binary or --ner-missing anymore. Future annotations created with binary workflows like ner.teach will now also set "_is_binary": true explicitly in the data.
fyiPer-label stats in the output of train can now be toggled via the --label-stats flag.
fyiThe --textcat-exclusive argument is not needed anymore in train and related workflows and has been removed. Instead, you can explicitly provide datasets via --textcat (exclusive categories) and --textcat-multilabel (non-exclusive categories).
fyiThe --init-tok2vec argument has been removed from textcat.teach. You can now pretrained embeddings directly via the spaCy pipeline you load in.
fyiExamples are now accepted automatically in textcat.manual if when you select an option with mutually exclusive categories. You can override this by setting "choice_auto_accept": false..
fyiThe force_stream_order config setting is now deprecated and the default behavior of the feeds. Batches are now always sent and re-sent in the same order wherever possible.
fyiLong deprecated old recipes and functions have been removed.
fixFix various issues and inconsistencies around stream handling and feed overlap when using named multi-user sessions with a single instance of Prodigy.
fixFix span selection in relations UI via tap on mobile devices.
fixCorrectly handle vectors for languages without uppercase/lowercase distinction in terms.teach.
fixFix issue that could cause next batch to be blocked when using "instant_submit": true.
fixFix issue in CSV loader that would handle title-cased Label columns incorrectly.
fixFix base64 conversion to be forwards-compatible.
fixEnsure that html_template overrides are correctly interpreted in blocks UI.
fixDeep-merge all config settings provided via global and local prodigy.json, recipe config and overrides to support changing only individual nested properties.
fixFix issue that could cause config keyword arguments to not be set correctly in prodigy.serve.
docAdd span categorization usage docs and feature page.
docUpdate installation docs and add instructions for PyPi.
docUpdate various documentation references for spaCy v3 and recipe API changes.

v1.10.8 2021-04-08

This release includes various small fixes to the annotation interfaces, UI customizability and progress reporting.

newFire prodigyend event if no more tasks are available.
newAdd batch_size argument to PatternMatcher.__call__.
newAdd human-readable .prodigy-spans class for containers in ner, ner_manual etc.
fixCorrectly recognize relations as manual UI in review.
fixFix issue with disabled pipes during --binary training.
fixHandle trailing newlines and wrapping correctly in relations UI.
fixFix handling of label position for long labels in relations UI.
fixEnsure total and session annotation counts are reflected correctly in controller passed to custom progress callback with update callback.
fixImprove error handling when using prodigy.serve with non-servable functions.

v1.10.7 2021-02-27

fixEnsure Prodigy works with the latest pydantic.

v1.10.6 2021-02-14

This release includes a few small fixes, including an update to prevent Prodigy’s dependencies from pulling in a package incompatible with Python 3.6.

newAdd newlines for newline tokens in relations UI (if wrapping is enabled).
newSupport path-based routing and path-strip proxies by using relative paths in the web app.
fixPrevent dependencies from pulling in newer uvloop version that’s incompatible with Python 3.6.
fixFix issue that could cause config overrides to not be reflected correctly when calling prodigy.serve.
fixFix bug in dep.correct that could cause the heads to not be sent to spaCy correctly.
fixPrevent closed socket from killing the server by removing custom signal handlers.
fixUse better pattern ID delimiter in pattern matcher to prevent conflicts with user-defined labels.

v1.10.5 2020-11-11

This release includes updates to the relations and review workflows, fixes for feed overlap and multi-user session handling, new UI customization options and a Portuguese UI translation, as well as various other small fixes and improvements.

newSupport custom span label colors in relations UI.
newSupport label_style setting config instead of just ner_manual_label_style to indicate that it applies to ner_manual, image_manual and relations UI.
newAdd --show-skipped option to review interface to include answers that would otherwise be skipped, like ignored answers or rejected examples in manual interfaces.
newAllow clicking on a version in review interface to update final annotation.
newAdd swipe_gestures config setting to customize left/right mapping.
newFire prodigyspanselected event in image_manual and relations.
newAdd UI translation for Portuguese. Thanks to Cristiana S Parada for the contribution!
fyiThe feed_overlap config setting now defaults to false and Prodigy will show a warning if an overlapping feed is used without named sessions.
fixUse symbols for whitespace characters in relations UI.
fixFix issue that could cause head or child span of first token to not be represented correctly in relations interface.
fixFix issue that could cause span label changes to not be reflected correctly in "relations" meta generated by relations UI and make sure "spans" are always sorted by default.
fixSet default batch size in calls to nlp.pipe in recipes.
fixEnsure that custom database is correctly passed to RepeatingFeed.
fixFix issue that could cause excluded datasets to not be represented correctly in named sessions.
fixAlways sort files alphabetically in loaders that read from directories.
fixMake train fail more gracefully if no data is available.
fixFix issue that could cause honor_token_whitespace to not be reflected correctly.
fixDon’t explicitly set delimiter in CSV loader and let Python guess.
fixCorrectly interpret image spans without "points" in image UI.

v1.10.4 2020-09-08

This update includes small fixes to the stream state management and hashing and improves handling of pre-defined tokens and spans in recipes and interfaces. It also introduces new customization options for the manual image and audio UIs.

newSupport customizing the JSON key used to store image spans in image_manual. Can be used to combine the interface with other interfaces that use "spans", e.g. ner_manual.
newExpose WaveSurfer instance in audio_manual to allow implementing custom controls.
fixSupport pre-defined "tokens" in rel.manual.
fixEnsure that custom values in pre-defined, non-edited "spans" are preserved in ner_manual.
fixFix issue that could cause --exclude datasets to not be considered with non-overlapping feeds.
fixAdd filter to prevent state conflicts of incoming answers in high-latency or low-CPU situations.
fixFix incorrect span offset issue in print-dataset and print-stream.
fixMake label detection in NER annotation model consistent with spaCy if label contains hyphen.
fixFix hasing consistency in review and ensure existing binary answer doesn’t impact hash.

v1.10.3 2020-08-03

This update fixes a problem with the exclude logic by input hashes and adds support for custom tokenization in the manual relation annotation workflow.

newAllow pre-defined "tokens" in rel.manual.
fixFix issue that could cause exclude_by": "input" to re-send tasks with overlapping feeds.

v1.10.2 2020-07-21

This update includes small fixes to the exclude logic in repeating streams, improvements to dependency parsing annotation and training, and updates to the ASGI server. Check out the release notes for v1.10 for all the new features introduced in the latest update!

fixFix issue that’d cause repeating feed to not honor exclude_by.
fixFix problem in dep.correct sentence segmentation that could lead Prodigy to incorrectly report misaligned tokenization.
fixCorrectly print dependency parser training results in train.
fixUpdate to support latest version of uvicorn.
fixAllow changing expected max length for MySQL via PRODIGY_MYSQL_MAX_LEN environment variable to prevent Prodigy from raising an error if the field type was changed to mediumblob.

v1.10.1 2020-07-02

This update includes small fixes for problems introduced in v1.10.0, as well as improvements to the spaCy v2.3 integration, more customization for audio and video transcription, and a new UI translation for French. Check out the release notes for v1.10 for all the new features introduced in the latest update!

newSupport customizing field ID used to store transcript in audio.transcribe.
newAdd UI translation for French. Thanks to Thierno Ibrahima DIOP for the contribution!
fixFix sentence segmentation issue that could cause ner_manual UI to crash.
fixMake terms.teach compatible with spaCy v2.3 by pre-populating the lexeme cache.
fixOnly show pattern matches for provided labels in ner.teach.
fixFix issue in ner_manual on touch devices that would prevent selecting first token.
fixEnsure overriding "text": None in blocks doesn’t cause errors in certain interfaces.

v1.10.0 2020-06-17

Our biggest release yet includes a bunch of new features, interfaces and recipes for dependency and relation annotation, audio and video annotation, as well as a new and improved manual image annotation interface with support for editing shapes and bounding boxes. We’ve also added new recipe callbacks for modifying examples placed in the database and validating answers at runtime, added more settings for whitespace-handling in manual NER annotation, including a mode for character-based highlighting, and introduced various new config settings to customize the web app and annotation interfaces. Thanks to everyone who’s helped us beta test the new features – your feedback has helped a lot! See the changelog below for a full list of new features.

newFlexible relations interface for fully manual dependency and relationship annotation and joint span and dependency relation annotation.
newNew recipes rel.manual, coref.manual and dep.correct for efficient manual and model-assisted dependency annotation.
new audio and audio_manual interfaces binary and fully manual audio and video annotation. Add and modify segments for different labels and collect feedback about pre-highlighted regions.
new audio.manual and audio.transcribe recipes for audio and video annotation and transcription, as well as community recipes for using Prodigy with pretrained pyannote.audio models for speaker diarization in the loop.
newNew and improved image_manual interface with support for moving and resizing shapes, adjusting polygons, freehand annotation, more detailed data format and more settings.
newSupport dataset:{name} and dataset:{name}:{answer} syntax as source argument in recipes to allow loading from existing datasets. For example, dataset:my_set will use examples dataset my_set as the input data and dataset:my_set:accept will only load in accepted answers.
newAdd validate_answer recipe component to perform custom validation of annotations created in the UI and prevent invalid answers from being submitted.
newAllow recipes to return a before_db callback for modifying examples before they’re placed in the database, e.g. to strip out base64 data.
newUpdate Prodigy for the latest spaCy v2.3 and new models.
newSupport multi-arc dependency annotations (e.g. created with dep.correct) in train.
newSet information about trailing whitespace in add_tokens and reflect whitespace (or lack of whitespace) between tokens in ner_manual (can be changed using the "honor_token_whitespace" setting).
newAdd --highlight-chars flag to ner.manual and use_chars argument to add_tokens to allow highlighting individual characters instead of full tokens.
newAdd "field_suggestions" property to text_input UI to allow specifying a list of auto-suggestions to show when the user types or presses .
newAllow disabling and reordering of the accept, reject, ignore and undo buttons at the bottom of the screen via the "buttons" config setting.
newAdd options --width (card with and maximum image width) and --remove-base64 (remove base64-encoded image data) to image.manual.
newAdd file_ext argument to Images and ImageServer loaders, always preserve original local file path as "path" and add Audio, AudioServer, Video and VideoServer loaders.
newExpose generic Base64 and Server helpers to load any data as a base64-string or via a web server and add generic fetch_media preprocessor.
newAdd --rehash flag to db-merge to force-overwrite hashes.
newAdd --base-model argument to data-to-spacy to customize tokenizer and sentencizer.
newAllow individual tasks to override global or UI config via a key "config".
newAdd "ui_lang" config and translations of descriptions, messages and tooltips in the annotation UI to German, Spanish, Dutch and Chinese.
newMake sidebar history length default to batch_size and allow customizing it via the history_size config setting. Note that the history size can’t be larger than the batch size.
newShow recipe name in project info in sidebar and allow customizing info via "project_info" config.
newAdd Controller methods and attributes for retrieving total counts and progress by session ID.
newWarn if global or local prodigy.json settings override potentially critical recipe components.
newSupport custom label colors manual interfaces and automatically pick contrasting text color.
newShow keyboard shortcuts for toolbar buttons on hover.
newShow friendlier error if prodigy.json contains invalid JSON.
fixMake progress function returned by recipes consistent and always pass it the controller and the return value of the update callback, if available.
fixCorrectly report per-session progress for streams with a length and multi-user sessions and take feed_overlap into account.
fixImprove support for using "force_stream_order": True (repeating feed) with "feed_overlap": False (no overlap between sessions).
fixMake all manual recipes default to "force_stream_order": True for more intuitive stream behavior: batches of tasks are now always re-sent until they’re answered and refreshing the page will show the same batch again.
fixFix issue that could cause the review to not displayed changes when user hits undo.
fixPreserve "choice_style" config setting on tasks so it can be re-applied when running review.
fixSupport simpler data format in diff interface to make it work combined with choice in a blocks interface and prevent clash of "accept" property used by both UIs.
fixAdjust display of spans with RTL text when "writing_mode": "rtl" is enabled.
fyiThe deprecated --api recipe argument has been removed and merged with --loader.
fyi textcat.manual now doesn’t perform additional checks for the pre-v1.9 syntax with an (unused) spaCy model argument anymore.
fyiLoading from standard input now requires the source argument to be set to - explicitly.
fyiThe "show_stats" setting to display detailed stats in the sidebar is now set to true by default.
fyiThe "spans" data created with image_manual now also include a "type" (either "rect", "polygon" or "freehand"), as well as "width", "height", "x", "y" and "center" values for rects.
fyiForced stream order and repeating batches by default means that you should use named sessions or set "force_stream_order": False if you want multiple users connecting to the same instance. Otherwise, you may get duplicate questions.
docAdd documentation and feature page for dependency relation annotation.
docAdd documentation and feature page for audio and speech annotation.
docUpdate feature pages for named entity recognition and computer vision.
docAdd docs section on efficient NER annotation for fine-tuning transformers like BERT.
docAdd docs section on recipe callback functions in detail.
docDocument b64_to_bytes, file_to_b64 and bytes_to_b64 utilities for converting base64.
docTidy up global config docs and move settings specific to a single interface to interface docs.
docFix various typos and inconsistencies.

v1.9.10 2020-06-05

This patch release includes small fixes to the force_stream_order setting to prevent a race condition and duplicate examples. Stay tuned for v1.10, which is coming soon and will include lots of cool new features!

newAdd Controller.all_session_ids, all named sessions that have connected to the current instance.
fixFix race condition that could cause force_stream_order to produce duplicate tasks.
fixCorrectly exclude currently shown task when requesting new questions with force_stream_order.

v1.9.9 2020-03-17

This release includes an important fix for a training regression introduced in the previous version, as well as small improvements.

fixFix issue that’d cause rejected binary text classification annotations to be filtered out in train.
fixImprove handling of rejected and ignored examples across different annotation types in train.
fixRelax unnecessarily strict validation for diff tasks.

v1.9.8 2020-03-14

This release includes a new built-in recipe match for selecting examples based on pattern matches, as well as various bug fixes and improvements.

newGeneral-purpose match recipe to only match patterns in text with various configurations.
fixUse custom --view-id set in review to determine how to merge examples.
fixImprove default configuration in train for NER models with --init-tok2vec.
fixFix filtering that could cause incorrect totals to be reported before training.
fixFix async handling of built-in and user-provided databases.
fixFix hashing of patterns that’d cause incorrect line numbers to be displayed.
fixFix compiler setting that’d cause print-stream and print-dataset to not output colored results.
fixCheck for correct view ID when printing text classification datasets with print-dataset.
fixCorrectly pass --eval-split to data-to-spacy.
fixShow correct path to Prodigy installation root (not recipe root) in stats command.
fixFix issue that could cause span rendering problems in ner if text contains emoji.
fixFix UI issue that’d cause card headings to overlay expanded sidebar on small screens.
docFix various typos and links.

v1.9.7 2020-02-21

This release includes small fixes and improvements to the built-in recipes and interfaces.

newAdd overwrite flag to add_tokens preprocessor to overwrite existing "tokens".
newAllow review recipe to overwrite view ID (e.g. to render blocks annotations differently).
fixAccept pre-set tokens correctly in add_tokens to make it easier to provide custom tokenization.
fixImprove backwards-compatibility checks of arguments in textcat.manual.
fixCorrectly report numbers of textcat examples in train and filter out ignored answers instead of just ignoring the examples during training and evaluation.
fixFix handling of integer option "id" values in print-dataset.
fixFix issue that’d cause text_input value to not reset and auto-focus correctly between tasks.
fixFix incorrect validation errors for dep UI and "card_css" setting.
fixSet more explicit MIME types for JS bundle for server configs that prevent MIME type sniffing.
fixAdjust eighties theme to prevent dark text on dark background in choice options.

v1.9.6 2020-01-27

This release includes small fixes related to async database usage in Python 3.7+ and text classification training with the new train recipe.

fixFix issue with async database usage in Python 3.7+ that could cause MySQL connection errors.
fixEnsure --textcat-exclusive setting is passed down correctly in train.

v1.9.5 2020-01-10

This release includes small fixes related to multiprocessing and new features introduced in v1.9.0.

fixMake ner.manual with --patterns correctly return all examples instead of only the matches.
fixFix error when loading evaluation examples in new train recipe with --binary enabled.
fixFix issue in train with --binary when restoring pipeline component before saving the model.
fixFix Foreign Key constraint error that could occur in Database.drop_examples.
fixAdd thread locking to database reconnect methods in controller.
fixFix accuracy output for tagger and parser in train.

v1.9.4 2019-12-28

This release includes small fixes, a new option for changing keyboard shortcuts for labels and multiple choice options, and a new loader for serving images.

newAdd "keymap_by_label" config to change keyboard shortcuts for labels and choice options.
newAdd image-server loader for serving images from a directory (and bypassing base64 encoding).
fixFix too strict validation for review content.

v1.9.3 2019-12-23

This release includes small fixes to the new interfaces introduced in v1.9.0.

fixFix too strict validation for blocks content.
fixPrevent input field in text_input from losing focus on update.

v1.9.2 2019-12-20

This release includes small fixes to bugs introduced in v1.9.0.

fixFix error in PatternMatcher when assigning combined matches to tasks with no "meta".
fixFix too strict validation for html tasks with no "html" key but "html_template".

v1.9.1 2019-12-19

This release includes small fixes to bugs introduced in v1.9.0.

fixFix issue with loading recipes from entry points.
fixFix too strict validation for "db" recipe component.

v1.9.0 2019-12-18

This release introduces tons of new features and improvements, including new recipes, interfaces and workflows. We also redesigned the website, rewrote the documentation from scratch and added lots of new pages, usage guides, demos and examples. We hope you like it! Some highlights in Prodigy v1.9 include new unified training recipes, two new annotation interfaces for free-form text input and combining different UIs, config settings for making streams repeat questions until they’re answered, and changing keyboard shortcuts, official support for spaCy v2.2 and a new recipe for converting Prodigy annotations of different types to a single training corpus in spaCy’s JSON format. See the changelog below for a full list of new features.

newAdd new general-purpose train and train-curve recipes to replace the task-specific training recipes and make overall training process more consistent.
newShow accuracy per entity type, tag or text category in training results.
newAdd data-to-spacy recipe that takes Prodigy datasets for NER, text classification, tagging and parsing and outputs a merged corpus (optionally split into training and evaluation data) in spaCy’s JSON format that you can use with spacy train.
newAdd --patterns argument to ner.manual to pre-highlight suggestions from patterns. This workflow is going to replace the binary ner.match.
newAdd general-purpose print-stream and print-dataset recipes that can output different data types. Those recipes are going to replace the more specific print utilities like ner.print-stream.
newAdd blocks interface to freely combine annotation interfaces.
newAdd text_input interface to collect free-form text input from annotators.
newNew "force_stream_order" config setting. If True, tasks will always be sent out in the same order and re-sent until they’re answered – even if you refresh the app in your browser.
newSupport customizing keyboard shortcuts.
newSupport tokenizing terms in terms.to-patterns to create patterns for multi-token terms.
newAdd "exclude_by" config setting to allow recipes to specify whether to filter by input hash or task hash so that manual recipes don’t repeat the same content with different suggestions.
newSupport blank:{lang}, e.g. blank:en as an alternative spaCy model in ner.manual, textcat.teach and train to start off with a blank model.
newPass --label values added to mark to the "labels" config so the recipe can be used with manual interfaces like ner_manual and image_manual.
newAdd --no-fetch flag to image.manual to disable base64 conversion of images.
newAdd --fetch-media flag to review recipe to temporarily replace paths with base64 data.
newAlso support - as the value of source arguments to read from standard input and make this the recommended best practice (instead of omitting the argument).
newAlways auto-create datasets and deprecate dataset command.
newMake compare and ner.eval-ab recipes use the more flexible choice interface and deprecate the compare UI.
newAdd more human-readable class names to use in custom CSS and JS.
newSupport new syntax in prodigy.serve that lets you pass in the full command-line command to start Prodigy from within Python.
newAdd data validation for prodigy.json / recipe config, recipe components and training examples.
newMake the printed output and messages prettier and more consistent.
newMake FastAPI the default REST API library and include interactive API docs.
newDrop support for Python 3.5 and make wheel installers support Python 3.8.
newUpdate Prodigy for spaCy v2.2.
fixShow an example suggested by patterns in textcat.teach only once with all matches instead of once per match.
fixFix error that’d occur when passing in long label sets on the command line (due to Prodigy checking if it’s a valid file path).
fixRemove unused spacy_model argument from textcat.manual.
fixExclude by input hash instead of task hash in ner.manual, ner.correct, pos.correct, textcat.manual and image.manual, using the new "exclude_by" setting. Examples will only be shown again if their content is identical, not if they include different highlighted suggestions.
fixFix handling of newline tokens in ner.manual for multiple newline character and adjust style of symbols. Newline-only tokens are now unselectable by default to prevent creating newline token entities. You can set "allow_newline_highlight": true to change this.
fixShow error if MySQL database is used and JSON blob saved to the database is longer than 65535 characters, to prevent MySQL DB from truncating example.
fyiRename ner.make-gold and pos.make-gold to ner.correct and pos.correct. The old names are still supported so your code won’t break.
fyiDeprecate various outdated recipes, the built-in live APIs and the recipe_args dict. You can still use all of these features and your code shouldn’t break but they’ll be removed in v2.
fyiRefactor the whole code base and module organization and various other internals, and added simple type annotations to recipe functions.
docNew documentation and website redesigned and rewritten completely from scratch, with tons of new content, demos and usage examples. The new site also replaces the PRODIGY_README.html that used to be available for download with Prodigy.
docUpdate prodigy-recipes repo.

v1.8.5 2019-10-19

This update includes a fix for a regression introduced in v1.8.4, as well as small improvements to the dataset creation and stream handling.

newWarn after exhausting streams with many duplicates.
fixFix issue introduced in v1.8.4 that could cause the client to send back empty answers if users annotated very quickly.
fixRemove default session from client and correctly populate session datasets.

v1.8.4 2019-10-07

This update includes various small fixes to the interfaces and recipes.

newExperimental: Allow moving selected bounding boxes in image_manual interface via keyboard shortcuts .
newAdd prodigyundo event for custom JavaScript.
fixFix issue that’d cause label change in image_manual to not be reflected correctly.
fixDisable unselecting of radio button if choice_auto_accept is enabled.
fixAlways prefer rending "html" in classification interface, if available.
fixImprove handling of choice tasks in review recipe.
fixRe-add default spacing for most common HTML elements in html interface.
fixEnsure bin/prodigy and bin/pgy are interpreted as shell scripts.
fixMake textcat.manual correctly support single-label use cases.
fixFix handling of pre-defined spans in EntityRecognizer.
fixFix detection of user databases via entry points.
fixFix race condition that’d fire prodigyanswer event incorrectly.
fixPrevent card labels from being displayed on top of modals.
fixImprove fallback if labels are provided to the app in incorrect format.
fixFix handling of related sessions in feeds if "feed_overlap" is enabled.

v1.8.3 2019-06-07

This update includes fixes to textcat.batch-train, the NER preprocessing logic and Prodigy’s dependencies.

fixFix issue in textcat.batch-train that wouldn’t pass exclusive setting to the model and converter functions correctly.
fixFix handling of multiple choice data in textcat.batch-train.
fixFix segmentation bug that caused spans ending on text boundaries to be dropped.
fixMake sure span is fully excluded if skip=True is set in add_tokens preprocessor.
fixAdd srsly to direct dependencies and pin to latest version.
docFix typos and inconsistencies.

v1.8.2 2019-05-28

This update includes small fixes to the terms.teach and review recipes, as well as improvements to the pretraining support.

fixFix issue in review recipe that’d raise error if no versions were generated.
fixMake terms.teach skip vocab entries with no vectors to prevent unnecessary warnings.
fixFix serialization issue of sentencizer in textcat.batch-train.
fixEnsure hyperparameters from pretraining are passed to textcat.batch-train.

v1.8.1 2019-05-21

This update includes small fixes to the text classification workflows.

fixFix handling of rejected example scores in textcat.manual.
fixEnsure handling of --eval-id in textcat.batch-train remains backwards-compatible.

v1.8.0 2019-05-20

This release updates Prodigy for the brand new spaCy v2.1, which features BERT-style language model pretraining, an extended match pattern API and faster tokenization. We’ve also added support for basic authentication and several completely new built-in recipes and workflows for reviewing annotations from multiple sessions and resolving conflicts, manual multiple-choice text classification, and merging two or more existing datasets.

newUpdate Prodigy for spaCy v2.1.
newAdd language model pretraining support via --init-tok2vec in training recipes.
newNew interface and review recipe for reviewing and reconciling annotations from multiple sessions on the same data. View conflicting annotations, resolve them in the UI and create a final training set.
newAdd textcat.manual recipe to annotate text categories using the choice UI.
newMake textcat.batch-train accept annotations in choice format.
newAdd --exclusive flag to textcat.batch-train to train mutually exclusive categories.
newAdd ner.silver-to-gold recipe to convert binary accept/reject annotations to gold-standard data with no missing values.
newAdd db-merge recipe to merge two or more datasets into a new set.
newAdd basic authentication to the app with PRODIGY_BASIC_AUTH_USER and PRODIGY_BASIC_AUTH_PASS env variables.
newAdd PRODIGY_ALLOWED_SESSIONS env variable to specify allowed named sessions.
newStore "_session_id" and "_view_id" with annotations.
newNew REST API powered by FastAPI. Set the PRODIGY_FASTAPI environment variable and install fastapi (Python 3.6+) to try it out.
fixFix issue in image_manual UI that’d cause boxes to not be deleted correctly.
fixMake sure flag button isn’t covered by title in annotation UI.
fixUse named logger "prodigy" to allow customizing logging behavior.
fixAllow textcat.eval recipe to read from stdin as expected.
fixPrevent incorrectly raised KeyError in split_sentences preprocessor.
fixRaise error if Database.add_examples doesn’t receive list/tuple of dataset names.
fixMake sure choice interface adds "accept": [] if no selection is made.
fixIf instant_submit is enabled, send answer before requesting new questions.
fixPrevent keyboard in custom <input> and <textarea> elements.
fixPreserve docstrings of compiled Cython classes, methods and functions.
docImprove various typos and inconsistencies and add new sections for new features.

v1.7.1 2019-02-23

This update includes a small fix to the "instant_submit" feature introduced in the previous release.

fixFix issue that could cause tasks to not receive an "answer" when "instant_submit" was enabled.

v1.7.0 2019-02-18

This update makes it easy to set up named multi-user sessions in a single instance. It also introduces a new setting for instant submissions and support for custom CSS and JavaScript across all interfaces. Of course, we also fixed various bugs and inconsistencies to make sure Prodigy runs as smoothly as possible.

By the way, if you want to add 12 months of updates to your license, you can now do so via our online shop!

newAdd "instant_submit" option to send back a task instantly after it’s answered in the app, skipping the history and immediately triggering the update callback if available.
newSupport custom named sessions via query parameters in the app to enable multi-user workflows in single instances. For example, accessing the app with /?session=alex will add all annotations to a session dataset dataset-alex. The boolean "feed_overlap" setting lets you control whether to have each example sent out once so it’s annotated by someone or whether to allow overlaps and send out each example to everyone (default).
newAdd "global_css" option across all interfaces, more human-readable class names and expose data-prodigy-view-id and data-prodigy-recipe for custom interface or recipe-specific styling.
newAdd "javascript" option across all interfaces and fire custom events on mount, update and answer.
newAdd --batch-size option to drop command to prevent database errors when deleting large datasets.
fixMake labels in pos.teach and pos.make_gold correctly default to built-in label scheme and raise error if no fine-grained labels are provided.
fixMake sure PatternMatcher only shows matches for recipe labels.
fixFix bug that would cause add_tokens preprocessor to raise an error.
fixCorrectly handle min_length in split_sentences preprocessor.
fixFix bug that’d cause text classification tasks to not be deep copied correctly.
fixRaise error if terms.to-patterns is used without label to prevent null value.
fixFix problem that’d cause dependency arcs to be rendered incorrectly.
fixImprove relative sizing of bounding boxes and labels for large images.
fixEnsure task can only be flagged via keyboard shortcut if "show_flag" is enabled.
fixDrop third-party dependency mmh3 that was causing problems for some users.
fixMake manual NER interface more touch-friendly.
docNew video: FAQ #1: Tips & tricks for NLP, annotation and training.
docImprove various typos and inconsistencies and add new sections for new features.

v1.6.1 2018-10-17

fixFix split_sentences pre-processor for untokenized examples.

v1.6.0 2018-10-16

This update takes advantage of pre-built binary wheels for our dependencies and speeds up the installation by up to 10 times! We’ve also added official support for Python 3.7, made excluding the current dataset the default behavior, fixed issues related to patterns, text classification and NER training and improved some internals to get Prodigy ready for multi-user workflows.

newAdd official support and wheels for Python 3.7.
newUse spaCy v2.0.16 to take advantage of pre-built wheels and allow up to 10 times faster installation.
newAutomatically exclude examples already present in the current dataset (e.g. make --exclude dataset the default behavior). To disable this feature, you can set "auto_exclude_current": false in your prodigy.json or recipe config.
newAdd --loader argument to image.manual.
newMake annotation card header sticky for long content.
newImprove internal handling of sessions and streams to get Prodigy ready for better multi-user workflows.
fixFix prior in PatternMatcher to prevent matches from being excluded by sorter.
fixEnsure spans and tokens are correctly updated in split_sentences preprocessor.
fixImprove textcat.eval recipe and make sure labels are added automatically.
fixFix issue that would require refreshing the app when using the manual interface with a low batch size.
fixMake sure dataset links are removed when dropping a dataset via drop.
fixFix memory leak in NER training that could cause segmentation fault for large datasets.
fixFix issue in TextClassifier, where active learning didn’t resume from weights.
fixExclude "model" key from hashes so that identical predictions by different models receive the same hash.
fixAdd server middleware to prevent response caching in IE 11.
fixImprove NER model loading with large vectors.
docFix various typos and inconsistencies.

v1.5.1 2018-06-13

This update includes several bug fixes and stability improvements related to the new part-of-speech tagging recipes and the built-in pattern matcher model, as well as a better identification system for match patterns.

newAdd --resume argument to ner.match to update matcher from dataset.
newUse hashes as pattern IDs to allow updating existing matchers even if pattern files change across sessions.
fixMake pos.teach and pos.batch-train work as expected with both fine-grained and coarse-grained part-of-speech tags.
fixFix bug in ner.iob-to-gold that’d cause export to fail.
fixSmall improvements to UI and web app stability.

v1.5.0 2018-06-07

This update includes new recipes for part-of-speech tagging, an experimental release of the new manual image labeling interface and a new mechanism for adding custom loaders, database connectors and recipes via Python entry points. We’ve also added validation for incoming streams and detailed error messages for incorrect task formats, enhanced the training options for sparse and gold-standard named entity data, and improved handling of newlines and formatting tokens in the manual NER interface.

newNew recipes for part-of-speech tagging: pos.teach, pos.batch-train and pos.train-curve.
newExperimental: Manual image annotation interface and image.manual recipe.
newAdd annotation task validation. Before the Prodigy server starts, your stream is checked against a schema to make sure it has the correct format. If not, Prodigy tells you what the problem is.
newAllow adding custom recipes, databases and loaders via entry points.
newAdd --no-missing flag to ner.batch-train to assume all correct spans are in the gold annotation, and any spans not in the gold annotation are incorrect. This is especially useful when training from annotations collected with ner.manual or ner.make-gold.
newAdd --resume argument to terms.teach to update target vector from dataset.
newAdd “true” newlines to newline tokens manual interfaces. The behavior can be turned off by setting "hide_true_newline_tokens": true.
newAllow marking tokens as "disabled": true in manual interfaces. Disabled tokens can’t be highlighted and can be used to assist annotators with formatting.
newConverter recipe ner.iob-to-gold to convert IOB tags to Prodigy’s JSONL.
fixDisable and restore other pipeline components in batch-train recipes.
fixEnsure seed terms are added to the dataset correctly.
fixFix bug that would cause web app to fail with annotation instructions.
fixMake keyboard shortcuts in choice interface work as expected again.
fixAdd missing import and make image.test work out-of-the-box again.
docAdd sections on Python entry points and document new recipes and interfaces.
docFix various typos and inconsistencies.

v1.4.2 2018-04-10

This update includes various bug fixes and efficiency improvements.

newAllow custom HTML in classification interface.
newAllow pre-defined selections in choice interface, e.g. "accept": [1, 3].
fixImprove memory usage of terms.teach.
fixFix data integrity error when dropping datasets using MySQL.
fixFix bug in error message of custom recipe validation introduced in v1.4.1.
fixResolve problem with image preloading in image interfaces.
fixMake keyboard shortcuts work as expected in choice interface.
docFix various typos and inconsistencies.

v1.4.1 2018-03-26

This update improves efficiency of the ner.batch-train recipe and fixes the handling of task and input hashes in the database methods and --exclude option. It also comes with various improvements to error messages and web app stability.

newImprove efficiency of ner.batch-train – up to 10× faster for some workloads!
fixFix problem that would cause text classification tasks created from pattern matches to not have a label assigned to the task.
fixEnsure that --exclude logic is always applied after the stream is (re)hashed.
fixFix bug that would cause hashes to not be returned correctly by the database.
fixAllow the "instructions" setting to be false or null.
fixImprove error messages if recipe file is not valid and if dataset doesn’t exist in terms.to-patterns.
fixVarious improvements to UI and web app stability.

v1.4.0 2018-03-11

This update includes a new annotation interface for relations and dependencies, as well as an experimental dep.teach recipe.

textcat.teach now takes a file of match patterns instead of seed terms, and manual interfaces now support lists of up to 30 labels with keyboard shortcuts. We’ve also improved the customization of various components.

newDependency and relation annotation interface and recipes dep.teach, dep.batch-train and dep.train-curve recipes for training a dependency parsing model. Still experimental!
newAllow using textcat.teach with a patterns file instead of seed terms.
newSupport list view and keyboard shortcuts for larger label sets in manual interfaces.
newAdd option to display modal with annotation instructions.
newAllow skipping examples with mismatched tokenization in add_tokens.
newMake swipe gestures optional via "swipe": true.
newAllow overwriting the host and port via PRODIGY_HOST and PRODIGY_PORT environment variables.
newAdd split_sents_threshold config setting and --unsegmented command-line option to disable sentence segmentation.
newUpdate NewsAPI loader to use v2.
fixPrevent MySQL server from timing out between requests.
fixCorrectly port over spans in split_sentences preprocessor.
fixAlways add labels from examples and --labels in ner.batch-train and consistently allow loading label sets from a string or a text file.
fixFix issue that caused print recipes to not display colors when piped to less.
fixEnsure that pre-set task meta isn’t overwritten in the PatternMatcher.
fixShow error message in the web app if view_id is invalid.
docAdd live demo for new dep interface.
docAdd Prodigy Cookbook with quick solutions to various tasks.
docAdd glossary to “First Steps” workflow.
docOrder recipes in PRODIGY_README.html table of contents by type.

v1.3.0 2018-02-01

This update introduces a new ner.make-gold recipe that lets you create gold-standard data faster by manually correcting a model’s predictions. We’ve also added a new pos.make-gold recipe for annotating part-of-speech tags, as well as converters to create spaCy training data from Prodigy datasets.

newImproved ner.make-gold workflow: run a model over your text and manually correct the entities to create gold-standard data.
newAdd "ner_manual_label_style" option to display label set as list of dropdown (always uses dropdown for more than 10 labels) and add number keyboard shortcuts to list of labels.
newExperimental pos.make-gold recipe for manual POS annotation.
newExperimental ner.gold-to-spacy and pos.gold-to-spacy converters.
newAdd option for custom label color schemes for NER and POS tagging.
newAdd UI option to “flag” tasks to bookmark them for later via "show_flag" setting and a flag icon and f keyboard shortcut. Add --flagged-only setting to db-out command.
newRename split_tokens pre-processor to add_tokens.
fixFix rendering and use icons for whitespace tokens in ner_manual.
fixFix rendering of RTL languages in manual interfaces via "writing_dir" setting.
fixOverwrite database settings correctly when using connect().
fixFix bug in logging timestamp and log minutes correctly.
fixOnly use colored CLI output if supported by user’s terminal.
fixDon’t disable entity recognizer in textcat.batch-train.
docDocument preprocessor components and models’ batch_train methods.
docFix various typos and add more examples.
docAdd docstrings to internals so they can be inspected using help().

v1.2.0 2018-01-09

This update introduces ner.manual, a new recipe and interface for manual NER annotation. You can now highlight one or more text spans per task and select the entity label from a dropdown menu. To allow faster annotation and less fiddly clicking, token boundaries are used to determine the entity spans when highlighting them. Note that this workflow replaces ner.mark and boundaries.

new ner.manual recipe and interface for manual NER annotation.
new"card_css" option to inject custom CSS into annotation card.
newExperimental "show_whitespace" for basic ner interface.
fixMake --exclude argument and recipe option work as expected.
fixDon’t merge and modify NER spans before adding example to the database.
docDocument API of PatternMatcher model.
docImprove formatting of available recipes in prodigy --help.
docFix various typos and inconsistencies.

v1.1.0 2017-12-18

newAutomatically add new entity labels in ner.batch-train.
newImprove speed during NER training and allow setting the beam width via CLI.
newFilter out ignored examples before creating training and evaluation sets.
newRe-add improved version of ner.eval recipe.
newHandle broken JSONL in Reddit loader.
newUse spaCy model to assign labels in ner.print-stream.
docSmall improvements to documentation.