[How To] Prepare and Upload Coco Labels

May 4, 2020 | How To

Prepare and Upload Coco Labels to DATAGYM


In one of our latest Blog posts we introduced how to use our “Python API”  to import annotated image data directly into your DATAGYM Projects. The feature enables users to inspect and correct the results of their prediction models from within DATAGYM. This article introduces our new Coco importer within our Python API. The importer helps you upload coco formatted annotated data into DATAGYM for additional labeling. The code samples in this guide are also available as a Jupyter Notebook on GitHub.

Check it out and register your DATAGYM account hereit’s free!

What is Coco

Coco is a large-scale object detection, segmentation and captioning dataset. Coco has several features like object segmentation, recognition in context, 80 object categories, over 200k labeled images and 1.5 million labeled object instances. Learn more here: http://cocodataset.org/#home

The Coco dataset comes with its very own label format for each of the label categories: Detection, Captioning, Keypoints, Stuff, Panoptic. The Coco import function within the DATAGYM Python Package currently supports detection and captioning.

Connecting and setting up

Let’s get started by importing our DATAGYM client and the Coco importer.

First we must define which project and dataset we will be working with. The project needs to be manually set up in your browser.

Connect with your personal API key and get your project by name.

Create new dataset. Here we wrap the dataset creation in a try/except clause in case the dataset is already created.

Now we create our list of image urls that we want to upload to our dataset. This list of urls won’t necessarily be the same for you, depending on where you have saved your images. In this case we chose to create a list of the coco_urls that are provided within the coco json files.

Here we are choosing to upload the first 200 coco images from the url list.

In case the dataset is not already attached to the project, this is done here, again wrapped in try/except in case they are already connected to one another.

Create image id dictionary that returns the internal DATAGYM image id when given an image name.

Load coco labels from the respective jsons.

coco.add_object_detection_data is used for all instances__**__.json.

The when using the add_object_detection_data method you can either choose to upload the bounding box or the polygon containing the object.

coco.add_captions_data is used for all captions__**__.json.

Now that we have added all the relevant labels from their respective json files, we are ready to convert them into the uploadable DATAGYM json file. Here we will also print the first entry to show the format of this upload json.

Finally we can upload our json file to our DATAGYM project. Before we do so, it is important to make sure the label configuration contains all the coco super-categories and a field for the caption upload. Make sure that all geometries contain a nested freetext classification with an export key with the following naming scheme. See our extensive [documentation](https://docs.datagym.ai/documentation/label-configuration/what-is-a-label-configuration) for more information on how to set up your label configuration.

With your label configuration successfully set up you are now ready to upload your coco labels.

Success! With just a few lines of code you are able to upload a coco dataset into a DATAGYM project using our Python API. Equipped with this pre-labeled data you can let your reviewers refine the labels within the DATAGYM application.


Check it out and register your DATAGYM account hereit’s free!


We hope you enjoyed our article. Please contact us if you have any suggestions for future articles or if there are any open questions.