How to parse a voc dataset using predefined splits
Install and import IceVision and IceData
!pip install git+git://github.com/airctic/icevision.git
from icevision.all import *
Load Pascal VOC 2012 dataset
path = icedata.voc.load_data()
Set images, annotations and imagesets directories
annotations_dir = path / "Annotations"
images_dir = path / "JPEGImages"
imagesets_dir = path / "ImageSets/Main"
Define class_map
class_map = icedata.voc.class_map()
Split data using imagesets
ImageSets directory contains text files containing subsets of the dataset. We will split our dataset using the train and validation sets for aeroplanes.
ImageSets directory contains multiple text files containing subsets of images from JPEGImages. We can use these files to select subsets of our data ie aeroplanes. The values we need to pass to FixedSplitter are the values returned by imageid in our parser rather than the filenames.
train = [(line.split(" ",1)[0]) for line in open(imagesets_dir / "aeroplane_train.txt")]
val = [(line.split(" ",1)[0]) for line in open(imagesets_dir / "aeroplane_val.txt")]
presplits = [train, val]
data_splitter = FixedSplitter(presplits)
Parser: use icevision predefined VOC parser
parser = parsers.voc(
annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map
)
Train and validation records
train_records, valid_records = parser.parse(data_splitter)
show_records(train_records[:2], ncols=2, class_map=class_map)