Skip to content

Open In Colab

Custom Parser - Simple

This tutorial uses the Global Wheat Detection dataset, you can download it from Kaggle here.

Instaling icevision

!pip install icevision[all]

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

Important

Replace source with your own path for the dataset directory.

source = Path("/home/lgvaz/data/wheat")
df = pd.read_csv(source / "train.csv")
df.head()
image_id width height bbox source
0 b6ab77fd7 1024 1024 [834.0, 222.0, 56.0, 36.0] usask_1
1 b6ab77fd7 1024 1024 [226.0, 548.0, 130.0, 58.0] usask_1
2 b6ab77fd7 1024 1024 [377.0, 504.0, 74.0, 160.0] usask_1
3 b6ab77fd7 1024 1024 [834.0, 95.0, 109.0, 107.0] usask_1
4 b6ab77fd7 1024 1024 [26.0, 144.0, 124.0, 117.0] usask_1

At first glance, we can make the following assumptions:

  • Multiple rows with the same object_id, width, height
  • A different bbox for each row
  • source doesn't seem relevant right now

Once we know what our data provides we can create our custom Parser.

Create the Parser

When creating a Parser we inherit from smaller building blocks that provides the functionallity we want:

  • parsers.FasterRCNN: Since we only need to predict bboxes we will use a FasterRCNN model, this will parse all the requirements for using such a model.
  • parsers.FilepathMixin: Provides the requirements for parsing images filepaths.
  • parsers.SizeMixin: Provides the requirements for parsing the image dimensions.

The first step is to create a class that inherits from these smaller building blocks:

class WheatParser(parsers.Parser, parsers.FilepathMixin, parsers.LabelsMixin, parsers.BBoxesMixin):
    pass

We now use a method generate_template that will print out all the necessary methods we have to implement.

WheatParser.generate_template()
def __iter__(self) -> Any:
def imageid(self, o) -> Hashable:
def image_width_height(self, o) -> Tuple[int, int]:
    return get_image_size(self.filepath(o))
def filepath(self, o) -> Union[str, Path]:
def bboxes(self, o) -> List[BBox]:
def labels(self, o) -> List[int]:

With this, we know what methods we have to implement and what each one should return (thanks to the type annotations)!

Defining the __init__ is completely up to you, normally we have to pass our data (the df in our case) and the folder where our images are contained (source in our case).

We then override __iter__, telling our parser how to iterate over our data. In our case we call df.itertuples to iterate over all df rows.

__len__ is not obligatory but will help visualizing the progress when parsing.

And finally we override all the other methods, they all receive a single argument o, which is the object returned by __iter__ (a single DataFrame row here).

Important

Be sure to return the correct type on all overriden methods!

class WheatParser(parsers.FasterRCNN, parsers.FilepathMixin, parsers.SizeMixin):
    def __init__(self, df, source):
        self.df = df
        self.source = source

    def __iter__(self):
        yield from self.df.itertuples()

    def __len__(self):
        return len(self.df)

    def imageid(self, o) -> Hashable:
        return o.image_id

    def filepath(self, o) -> Union[str, Path]:
        return self.source / f"{o.image_id}.jpg"

    def image_width_height(self, o) -> Tuple[int, int]:
        return get_image_size(self.filepath(o))

    def labels(self, o) -> List[int]:
        return [1]

    def bboxes(self, o) -> List[BBox]:
        return [BBox.from_xywh(*np.fromstring(o.bbox[1:-1], sep=","))]

Let's randomly split the data and parser with Parser.parse:

parser = WheatParser(df, source / "train")
train_rs, valid_rs = parser.parse()

Let's take a look at one record:

show_record(train_rs[0], display_label=False)

png

Conclusion

And that's it! Now that you have your data in the standard library record format, you can use it to create a Dataset, visualize the image with the annotations and basically use all helper functions that IceVision provides!

Happy Learning!

If you need any assistance, feel free to join our forum.