Computer Vision

You can use Prodigy to train and evaluate models for almost any task in computer vision. Built-in interfaces are provided for key tasks such as object detection, image segmentation and image classification. Prodigy also supports more advanced tasks such as image captioning and generation, using the A/B evaluation recipes. Prodigy's object detection and image segmentation interfaces are designed to work with pre-trained models, which can be trained on large datasets, such as ImageNet or COCO.

Object Detection

Object detection is the task of locating and labelling objects in images or videos. Prodigy's object detection interface requires an existing model, to suggest labelled objects. Annotation consists of accepting or rejecting these suggestions. Because each annotation is a binary decision, annotation can proceed very quickly. You can also update the model in real-time during the annotation, so that it learns which questions to ask next.

To help you see this in action on your data, Prodigy ships with a small image.test recipe. It's powered by LightNet, a Python interface to the DarkNet neural network library written by Joseph Redmon. This lets you test Prodigy's image interface with the popular YOLOv2 object detection models. Please note that LightNet currently only supports macOS, OSX and Linux.

pip install lightnetpython -m lightnet download tiny-yoloprodigy image.test image_objects tiny-yolo /tmp/my_images/✨ Starting the web server on port 8080...

The YOLOv2 models can detect objects in 80 categories. Objects will be highlighted, and you can accept, reject or ignore the annotation.

source: Unsplash by: Kirk Morales url:

All other objects detected in the image are still present in the annotation tasks and can be displayed by setting "preview_bounding_boxes": true in your Prodigy config. This can often help with making the annotation decision, because you get to preview potentially better analyses and see the model's overall performance.

 skateboard  skateboard  person  person 
source: Unsplash by: Kirk Morales url:

Here's the annotation task Prodigy creates in the above example. Spans that are not in focus are marked as "hidden": true. When you annotate the task, Prodigy will also add an "answer" key containing the annotation choice.

Annotation task

{ "image": "skating.jpg", "width": 400, "height": 267, "spans": [ {"hidden": false, "color": "yellow", "label": "skateboard", "points": [[47.5, 171.4], [47.5, 238.8], [156.6, 238.8], [156.6, 171.4]] }, {"hidden": true, "color": "magenta", "label": "skateboard", "points": [[27.5, 131.4], [27.5, 258.8], [176.6, 258.8], [176.6, 131.4]] }, {"hidden": true, "color": "cyan", "label": "person", "points": [[334, 14.5], [334, 88.6], [369, 88.6], [369, 14.5]] }, {"hidden": true, "color": "greenyellow", "label": "person", "points": [[245.1, 50.5], [245.1, 111], [275, 111], [275, 50.5]] } ] }

Image Segmentation

Image segmentation is the task of dividing an image into regions, according to their contents. The segment boundaries can be complex shapes. Prodigy's image segmentation interface requires an existing model, to suggest boundaries. Annotation consists of accepting or rejecting these suggestions. Because each annotation is a binary decision, annotation can proceed very quickly. You can also update the model in real-time during the annotation, so that it learns which questions to ask next.

 car  car  car  car 
source: Unsplash by: Nabeel Syed url:

Prodigy's image interface can take a list of [x, y] coordinates describing the shape to draw onto the image. This means you can highlight rectangular bounding boxes, as well as any polygon shapes, no matter how complex. The annotation task for the above example looks like this:

Annotation task

{ "image": "cars.jpg", "width": 1080, "height": 720, "spans": [ { "label": "CAR", "color": "yellow", "points": [[6,6], [296,3], [343,22], [396,133], [433,143], [431,194], [410,199], [432,269], [426,349], [433,471], [406,490], [386,511], [381,618], [360,670], [289,687], [264,665], [107,678], [3,677], [6,6]] }, { "label": "CAR", "color": "cyan", "points": [[753,33], [818,30], [1000,30], [1037,35], [1044,68], [933,80], [911,90], [879,138], [858,195], [852,213], [836,226], [825,307], [817,310], [814,375], [796,381], [750,375], [744,331], [716,328], [712,253], [705,213], [706,170], [730,158], [735,86], [742,48], [753,33]] }, { "label": "CAR", "color": "magenta", "points": [[662,73], [694,66], [738,65], [731,158], [703,173], [708,220], [712,267], [680,267], [676,281], [640,278], [634,217], [634,145], [616,134], [616,127], [640,122], [662,73]] }, { "label": "CAR", "color": "deepskyblue", "points": [[904,98], [876,140], [854,209], [838,222], [825,313], [825,354], [829,371], [825,458], [832,523], [857,533], [899,532], [912,512], [972,591], [975,666], [995,712], [1074,714], [1076,65], [1007,71], [935,79], [912,87], [904,98]] } ] }

Example Application

Let's say you're a food delivery startup and you track a few customer preference variables. One of the important ones is "healthy". In order to drive sales, you want to select healthy image thumbnails for customers that like healthy food, while "satisfying" thumbnails are displayed for customer with different preferences. This type of task is something you're pretty sure the computer will be able to do it – but you don't know until you try. Using Prodigy, you can have the idea, start trying it out and easily see whether it's worth pursuing.

import prodigy from prodigy.components.sorters import prefer_uncertain @prodigy.recipe('custom-classify' dataset=prodigy.recipe_args['dataset'], model=("The image model to load", "positional", None, str), source=prodigy.recipe_args['source'], api=prodigy.recipe_args['api'], label=prodigy.recipe_args['label']) def custom_classify(dataset, model, source=None, api=None, label=None): # load your image model here image_model = load_my_image_model(model, label=label) stream = prodigy.get_stream(source, api=api, loader='images') return { 'dataset': dataset, 'view_id': 'classification', 'stream': prefer_uncertain(image_model(stream)), 'update': model.update }
prodigy dataset food_images "Classify photos of food"✨ Created dataset 'food_images'.prodigy custom-classify food_images food_model "food" --api unsplash --label HEALTHY✨ Starting the web server on port 8080...

In this example, we're using the built-in Unsplash API loader, which streams in images for a keyword from a selection of 240k+ high-quality stock photos. We first create a new dataset to group the annotations together, and start the web service.

Photo of a sandwich with salmon, red onions and herbs
source: Unsplash by: Monstruo Estudio url:

Annotation is performed by clicking accept or reject on each image in turn. As we click through, the model is updated, so that we don't have to be asked redundant questions. There's usually no shortage of images to annotate, so if the model is 99% sure an image is healthy, it will usually be more efficient to annotate a different image.

Photo of several portions of fried chicken at a market stand
source: Unsplash by: Brian Chan url:

This is an example of a different food preference. We might want to go back and label this "satisfying" later, potentially using the knowledge we built in the "healthy" label to make the predictions easier. However, the annotation goes much faster if we restrict the task to binary decisions. It's better to give the user a binary choice and show them more images, instead of showing them fewer images with a more complicated UI. The reduced cognitive load makes annotation faster, and the reduced friction helps prevent misclassifications.

Photo of a red smoothie or juice and a green layered slice of cake, decorated with pomegranate seeds
source: Unsplash by: Toa Heftiba url:

If you've run annotation projects before, you may have noticed that a minority of inputs require an exceptionally long amount of time to annotate. Is the food in this picture healthy? What even is it – and actually what does "healthy" mean? The only good answer here is "It doesn't matter – next image!". The best examples to annotate are ones where the model is uncertain, but you are not. If you're uncertain too, you should simply skip the example, and move on to the next one. The ignore button is very often your friend.

Loading your own images

Using Prodigy, you can also stream in your own images from a directory. All common file types, like .jpg, .png, .gif and .svg, are supported, and will be converted to base64 data URIs.

from prodigy.components.loaders import Images
stream = Images('/path/to/images')

If your incoming stream includes paths or URLs, you can also use the fetch_images preprocessing helper, which will load all images and convert them.

from prodigy.components.preprocess import fetch_images
stream = [{'image': '/tmp/x.jpg'}, {'image': ''}]
stream = fetch_images(stream)

When using the image loader, keep in mind that all task data will be also be stored in the database – including the base64-encoded images. In some cases, this can be convenient, as it lets you store the image data with the annotation. In other cases, it can lead to unexpected results and database bloat.