Pascal VOC(Visual Object Classes) is a format to store annotations for localizer or Object Detection datasets and is used by different annotation editors and tools to annotate, modify and train Machine Learning models. In PASCAL VOC format, for each image there is a xml annotation file containing image details, bounding box details, classes, rotation and other data. Here is representation of simple xml file with 2 labels for a class.


Writing and read this annotation file is easy with python packages and can be done with fewer lines of code.

Create Pascal Voc

All major annotation tools like label-studio, labelbox, labelimg and many other tools offers pascal voc format exports but you can also write pascal voc data easily using pip package. Just install this python package and you are good to go.

pip install pascal-voc-writer

Now we can import package, create a writer object and write as many annotations we want to the writer like above example.

from pascal_voc_writer import Writer

# create pascal voc writer (image_path, width, height)
writer = Writer('path/to/img.jpg', 800, 598)

# add objects (class, xmin, ymin, xmax, ymax)
writer.addObject('truck', 1, 719, 630, 468)
writer.addObject('person', 40, 90, 100, 150)

# write to file'path/to/img.xml')

As we can see, just with few lines of code, we can write annotations to xml file.

Read Pascal Voc

We can also read pascal voc xml in different ways pretty easily, out of which we will try two options as xml reader and xml to json python package.

Using XML

We can read xml tree elements using xml package in python and iterate to extract information and other annotations data.

import xml.etree.ElementTree as ET

# parse xml file
tree = ET.parse("PATH_TO_XML") 
root = tree.getroot() # get root object

Now first, we get image metadata details like image height, width and number of channels as it is consistent for all image annotations.

height = int(root.find("size")[0].text)
width = int(root.find("size")[1].text)
channels = int(root.find("size")[2].text)

All annotations are inside object element and there could be more than 1 annotations, so we iterate on all available objects.

bbox_coordinates = []
for member in root.findall('object'):
    class_name = member[0].text # class name
    # bbox coordinates
    xmin = int(member[4][0].text)
    ymin = int(member[4][1].text)
    xmax = int(member[4][2].text)
    ymax = int(member[4][3].text)
    # store data in list
    bbox_coordinates.append([class_name, xmin, ymin, xmax, ymax])

	['truck', 7, 119, 630, 468]
	['person', 40, 90, 100, 350]

Load as Dictionary

We can also load xml data as ordered dictionary using opensource xmltodict package. Install it using pip given below.

pip install xmltodict

Now we can open xml file, read its conten and parse using this package like given example.

import xmltodict

with open("XML_PATH") as file:
    file_data = # read file contents
    # parse data using package
    dict_data = xmltodict.parse(file_data)

This dictionary object can now be used to extract all information for annotation easily. There are different other xml to json and other formats also which can be used to read such data easily.