Pascal VOC(Visual Object Classes) is a format to store annotations for localizer or Object Detection datasets and is used by different annotation editors and tools to annotate, modify and train Machine Learning models. In PASCAL VOC format, for each image there is a xml annotation file containing image details, bounding box details, classes, rotation and other data. Here is representation of simple xml file with 2 labels for a class.
<annotation> <folder>vehicles</folder> <filename>ff9435ee-ba7e-4d32-93bb-d931b3d2aca7.jpg</filename> <path>E:\vehicles\ff9435ee-ba7e-4d32-93bb-d931b3d2aca7.jpg</path> <size> <width>800</width> <height>598</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>truck</name> <bndbox> <xmin>7</xmin> <ymin>119</ymin> <xmax>630</xmax> <ymax>468</ymax> </bndbox> </object> <object> <name>person</name> <bndbox> <xmin>40</xmin> <ymin>90</ymin> <xmax>100</xmax> <ymax>350</ymax> </bndbox> </object> </annotation>
Writing and read this annotation file is easy with python packages and can be done with fewer lines of code.
Create Pascal Voc
All major annotation tools like
labelimg and many other tools offers pascal voc format exports but you can also write pascal voc data easily using pip package. Just install this python package and you are good to go.
pip install pascal-voc-writer
Now we can import package, create a writer object and write as many annotations we want to the writer like above example.
from pascal_voc_writer import Writer # create pascal voc writer (image_path, width, height) writer = Writer('path/to/img.jpg', 800, 598) # add objects (class, xmin, ymin, xmax, ymax) writer.addObject('truck', 1, 719, 630, 468) writer.addObject('person', 40, 90, 100, 150) # write to file writer.save('path/to/img.xml')
As we can see, just with few lines of code, we can write annotations to xml file.
Read Pascal Voc
We can also read pascal voc xml in different ways pretty easily, out of which we will try two options as xml reader and xml to json python package.
We can read xml tree elements using xml package in python and iterate to extract information and other annotations data.
import xml.etree.ElementTree as ET # parse xml file tree = ET.parse("PATH_TO_XML") root = tree.getroot() # get root object
Now first, we get image metadata details like image height, width and number of channels as it is consistent for all image annotations.
height = int(root.find("size").text) width = int(root.find("size").text) channels = int(root.find("size").text)
All annotations are inside object element and there could be more than 1 annotations, so we iterate on all available objects.
bbox_coordinates =  for member in root.findall('object'): class_name = member.text # class name # bbox coordinates xmin = int(member.text) ymin = int(member.text) xmax = int(member.text) ymax = int(member.text) # store data in list bbox_coordinates.append([class_name, xmin, ymin, xmax, ymax]) print(bbox_coordinates)
[ ['truck', 7, 119, 630, 468] ['person', 40, 90, 100, 350] ]
Load as Dictionary
We can also load xml data as ordered dictionary using opensource
xmltodict package. Install it using pip given below.
pip install xmltodict
Now we can open xml file, read its conten and parse using this package like given example.
import xmltodict with open("XML_PATH") as file: file_data = file.read() # read file contents # parse data using package dict_data = xmltodict.parse(file_data)
This dictionary object can now be used to extract all information for annotation easily. There are different other xml to json and other formats also which can be used to read such data easily.