Changelog

This page lists the history of changes to Prodigy. Whenever a new update is available, you'll receive an email notification sent to the address specified at checkout. You can then download the new version via your personal download link.

v1.6.1

fixFix split_sentences pre-processor for untokenized examples.

v1.6.0

This update takes advantage of pre-built binary wheels for our dependencies and speeds up the installation by up to 10 times! We've also added official support for Python 3.7, made excluding the current dataset the default behaviour, fixed issues related to patterns, text classification and NER training and improved some internals to get Prodigy ready for multi-user workflows and the upcoming Prodigy Scale.

newAdd official support and wheels for Python 3.7.
newUse spaCy v2.0.16 to take advantage of pre-built wheels and allow up to 10 times faster installation.
newAutomatically exclude examples already present in the current dataset (e.g. make --exclude dataset the default behaviour). To disabled this feature, you can set "auto_exclude_current": false in your prodigy.json or recipe config.
newAdd --loader argument to image.manual.
newMake annotation card header sticky for long content.
newImprove internal handling of sessions and streams to get Prodigy ready for better multi-user workflows.
fixFix prior in PatternMatcher to prevent matches from being excluded by sorter.
fixEnsure spans and tokens are correctly updated in split_sentences preprocessor.
fixImprove textcat.eval recipe and make sure labels are added automatically.
fixFix issue that would require refreshing the app when using the manual interface with a low batch size.
fixMake sure dataset links are removed when dropping a dataset via drop.
fixFix memory leak in NER training that could cause segmentation fault for large datasets.
fixFix issue in TextClassifier, where active learning didn't resume from weights.
fixExclude "model" key from hashes so that identical predictions by different models receive the same hash.
fixAdd server middleware to prevent response caching in IE 11.
fixImprove NER model loading with large vectors.
docFix various typos and inconsistencies.

v1.5.1

This update includes several bug fixes and stability improvements related to the new part-of-speech tagging recipes and the built-in pattern matcher model, as well as a better identification system for match patterns.

newAdd --resume argument to ner.match to update matcher from dataset.
newUse hashes as pattern IDs to allow updating existing matchers even if pattern files change across sessions.
fixMake pos.teach and pos.batch-train work as expected with both fine-grained and coarse-grained part-of-speech tags.
fixFix bug in ner.iob-to-gold that'd cause export to fail.
fixSmall improvements to UI and web app stability.

v1.5.0

This update includes new recipes for part-of-speech tagging, an experimental release of the new manual image labeling interface and a new mechanism for adding custom loaders, database connectors and recipes via Python entry points. We've also added validation for incoming streams and detailed error messages for incorrect task formats, enhanced the training options for sparse and gold-standard named entity data, and improved handling of newlines and formatting tokens in the manual NER interface.

newNew recipes for part-of-speech tagging: pos.teach, pos.batch-train and pos.train-curve.
newExperimental: Manual image annotation interface and image.manual recipe.
newAdd annotation task validation. Before the Prodigy server starts, your stream is checked against a schema to make sure it has the correct format. If not, Prodigy tells you what the problem is.
newAllow adding custom recipes, databases and loaders via entry points.
newAdd --no-missing flag to ner.batch-train to assume all correct spans are in the gold annotation, and any spans not in the gold annotation are incorrect. This is especially useful when training from annotations collected with ner.manual or ner.make-gold.
newAdd --resume argument to terms.teach to update target vector from dataset.
newAdd "true" newlines to newline tokens manual interfaces. The behaviour can be turned off by setting "hide_true_newline_tokens": true.
newAllow marking tokens as "disabled": true in manual interfaces. Disabled tokens can't be highlighted and can be used to assist annotators with formatting.
newConverter recipe ner.iob-to-gold to convert IOB tags to Prodigy's JSONL.
fixDisable and restore other pipeline components in batch-train recipes.
fixEnsure seed terms are added to the dataset correctly.
fixFix bug that would cause web app to fail with annotation instructions.
fixMake keyboard shortcuts in choice interface work as expected again.
fixAdd missing import and make image.test work out-of-the-box again.
docAdd sections on Python entry points and document new recipes and interfaces.
docFix various typos and inconsistencies.

v1.4.2

This update includes various bug fixes and efficiency improvements.

newAllow custom HTML in classification interface.
newAllow pre-defined selections in choice interface, e.g. "accept": [1, 3].
fixImprove memory usage of terms.teach.
fixFix data integrity error when dropping datasets using MySQL.
fixFix bug in error message of custom recipe validation introduced in v1.4.1.
fixResolve problem with image preloading in image interfaces.
fixMake keyboard shortcuts work as expected in choice interface.
docFix various typos and inconsistencies.

v1.4.1

This update improves efficiency of the ner.batch-train recipe and fixes the handling of task and input hashes in the database methods and --exclude option. It also comes with various improvements to error messages and web app stability.

newImprove efficiency of ner.batch-train – up to 10× faster for some workloads!
fixFix problem that would cause text classification tasks created from pattern matches to not have a label assigned to the task.
fixEnsure that --exclude logic is always applied after the stream is (re)hashed.
fixFix bug that would cause hashes to not be returned correctly by the database.
fixAllow the "instructions" setting to be false or null.
fixImprove error messages if recipe file is not valid and if dataset doesn't exist in terms.to-patterns.
fixVarious improvements to UI and web app stability.

v1.4.0

This update includes a new annotation interface for relations and dependencies, as well as an experimental dep.teach recipe. textcat.teach now takes a file of match patterns instead of seed terms, and manual interfaces now support lists of up to 30 labels with keyboard shortcuts. We've also improved the customisation of various components, and added the Prodigy Cookbook.

newDependency and relation annotation interface and recipes dep.teach, dep.batch-train and dep.train-curve recipes for training a dependency parsing model. Still experimental!
newAllow using textcat.teach with a patterns file instead of seed terms.
newSupport list view and keyboard shortcuts for larger label sets in manual interfaces.
newAdd option to display modal with annotation instructions.
newAllow skipping examples with mismatched tokenization in add_tokens.
newMake swipe gestures optional via "swipe": true.
newAllow overwriting the host and port via PRODIGY_HOST and PRODIGY_PORT environment variables.
newAdd split_sents_threshold config setting and --unsegmented command-line option to disable sentence segmentation.
newUpdate NewsAPI loader to use v2.
fixPrevent MySQL server from timing out between requests.
fixCorrectly port over spans in split_sentences preprocessor.
fixAlways add labels from examples and --labels in ner.batch-train and consistently allow loading label sets from a string or a text file.
fixFix issue that caused print recipes to not display colours when piped to less.
fixEnsure that pre-set task meta isn't overwritten in the PatternMatcher.
fixShow error message in the web app if view_id is invalid.
docAdd live demo for new dep interface.
docAdd Prodigy Cookbook with quick solutions to various tasks.
docAdd glossary to "First Steps" workflow.
docOrder recipes in PRODIGY_README.html table of contents by type.

v1.3.0

This update introduces a new ner.make-gold recipe that lets you create gold-standard data faster by manually correcting a model's predictions. We've also added a new pos.make-gold recipe for annotating part-of-speech tags, as well as converters to create spaCy training data from Prodigy datasets.

newImproved ner.make-gold workflow: run a model over your text and manually correct the entities to create gold-standard data.
newAdd "ner_manual_label_style" option to display label set as list of dropdown (always uses dropdown for more thant 10 labels) and add number keyboard shortcuts to list of labels.
newExperimental pos.make-gold recipe for manual POS annotation.
newExperimental ner.gold-to-spacy and pos.gold-to-spacy converters.
newAdd option for custom label color schemes for NER and POS tagging.
newAdd UI option to "flag" tasks to bookmark them for later via "show_flag" setting and a flag icon and F keyboard shortcut. Add --flagged-only setting to db-out command.
newRename split_tokens pre-processor to add_tokens.
fixFix rendering and use icons for whitespace tokens in ner_manual.
fixFix rendering of RTL languages in manual interfaces via "writing_dir" setting.
fixOverwrite database settings correctly when using connect().
fixFix bug in logging timestamp and log minutes correctly.
fixOnly use colored CLI output if supported by user's terminal.
fixDon't disable entity recognizer in textcat.batch-train.
docDocument preprocessor components and models' batch_train methods.
docFix various typos and add more examples.
docAdd docstrings to internals so they can be inspected using help().

v1.2.0

This update introduces ner.manual, a new recipe and interface for manual NER annotation. You can now highlight one or more text spans per task and select the entity label from a dropdown menu. To allow faster annotation and less fiddly clicking, token boundaries are used to determine the entity spans when highlighting them. Note that this workflow replaces ner.mark and boundaries.

newner.manual recipe and interface for manual NER annotation.
new"card_css" option to inject custom CSS into annotation card.
newExperimental "show_whitespace" for basic ner interface.
fixMake --exclude argument and recipe option work as expected.
fixDon't merge and modify NER spans before adding example to the database.
docDocument API of PatternMatcher model.
docImprove formatting of available recipes in prodigy --help.
docFix various typos and inconsistencies.

v1.1.0

newAutomatically add new entity labels in ner.batch-train.
newImprove speed during NER training and allow setting the beam width via CLI.
newFilter out ignored examples before creating training and evaluation sets.
newRe-add improved version of ner.eval recipe.
newHandle broken JSONL in Reddit loader.
newUse spaCy model to assign labels in ner.print-stream.
docSmall improvements to documentation.
scikit