Quickstart
EfficientDet: Scalable and Efficient Object Detection
Introduction
This tutorial walk you through the different steps of training the fridge dataset. the IceVision Framework is an agnostic framework. As an illustration, we will train our model using both the fastai library, and pytorch-lightning libraries.
For more information about how the fridge dataset as well as its corresponding parser check out the fridge folder in icedata.
Installing IceVision and IceData
!pip install icevision[all] icedata
Imports
from icevision.all import *
Datasets : Fridge Objects dataset
Fridge Objects dataset is tiny dataset that contains 134 images of 4 classes: - can, - carton, - milk bottle, - water bottle.
IceVision provides very handy methods such as loading a dataset, parsing annotations, and more.
# Loading Data
url = "https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjects.zip"
dest_dir = "fridge"
data_dir = icedata.load_data(url, dest_dir)
# Parser
class_map = ClassMap(["milk_bottle", "carton", "can", "water_bottle"])
parser = parsers.voc(annotations_dir=data_dir / "odFridgeObjects/annotations",
images_dir=data_dir / "odFridgeObjects/images",
class_map=class_map)
# Records
train_records, valid_records = parser.parse()
Visualization
Showing a batch of images with their corresponding boxes and labels
show_records(train_records[:3], ncols=3, class_map=class_map)
Train and Validation Dataset Transforms
# Transforms
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=384, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(384), tfms.A.Normalize()])
# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
Displaying the same image with different transforms
Note:
Transforms are applied lazily, meaning they are only applied when we grab (get) an item. This means that, if you have augmentation (random) transforms, each time you get the same item from the dataset you will get a slightly different version of it.
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3, class_map=class_map)
DataLoader
# DataLoaders
train_dl = efficientdet.train_dl(train_ds, batch_size=16, num_workers=4, shuffle=True)
valid_dl = efficientdet.valid_dl(valid_ds, batch_size=16, num_workers=4, shuffle=False)
batch, samples = first(train_dl)
show_samples(samples[:6], class_map=class_map, ncols=3)
Model
model = efficientdet.model(model_name="tf_efficientdet_lite0", num_classes=len(class_map), img_size=384)
Metrics
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
Training
IceVision is an agnostic framework meaning it can be plugged to other DL framework such as fastai2, and pytorch-lightning.
You could also plug to oth DL framework using your own custom code.
Training using fastai
learn = efficientdet.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.freeze()
learn.lr_find()
SuggestedLRs(lr_min=0.06309573650360108, lr_steep=0.5248074531555176)
learn.fine_tune(50, 1e-2, freeze_epochs=5)
epoch | train_loss | valid_loss | COCOMetric | time |
---|---|---|---|---|
0 | 2.108684 | 1.270576 | 0.008988 | 00:06 |
1 | 1.756881 | 1.273853 | 0.006717 | 00:05 |
2 | 1.595295 | 1.305949 | 0.009512 | 00:05 |
3 | 1.441863 | 1.150711 | 0.145460 | 00:05 |
4 | 1.313717 | 1.087273 | 0.142000 | 00:05 |
epoch | train_loss | valid_loss | COCOMetric | time |
---|---|---|---|---|
0 | 0.796611 | 0.987059 | 0.199351 | 00:07 |
1 | 0.764935 | 0.936920 | 0.196103 | 00:06 |
2 | 0.737171 | 0.865462 | 0.250470 | 00:06 |
3 | 0.707076 | 0.830861 | 0.291164 | 00:06 |
4 | 0.679944 | 0.785556 | 0.289883 | 00:06 |
5 | 0.657874 | 0.705734 | 0.371499 | 00:06 |
6 | 0.630947 | 0.657513 | 0.440564 | 00:06 |
7 | 0.608779 | 0.629073 | 0.462677 | 00:05 |
8 | 0.594612 | 0.531862 | 0.504480 | 00:06 |
9 | 0.573904 | 0.478792 | 0.548560 | 00:06 |
10 | 0.553602 | 0.436366 | 0.682166 | 00:05 |
11 | 0.526821 | 0.431016 | 0.669347 | 00:05 |
12 | 0.503331 | 0.443858 | 0.601617 | 00:05 |
13 | 0.490109 | 0.452481 | 0.594377 | 00:06 |
14 | 0.470531 | 0.462492 | 0.670671 | 00:05 |
15 | 0.455318 | 0.377674 | 0.681230 | 00:05 |
16 | 0.441404 | 0.409097 | 0.678400 | 00:05 |
17 | 0.426164 | 0.358776 | 0.697379 | 00:06 |
18 | 0.413478 | 0.395500 | 0.636250 | 00:06 |
19 | 0.404278 | 0.367352 | 0.668839 | 00:06 |
20 | 0.387968 | 0.390063 | 0.663779 | 00:06 |
21 | 0.374634 | 0.284140 | 0.730051 | 00:06 |
22 | 0.366777 | 0.280615 | 0.741545 | 00:06 |
23 | 0.359820 | 0.301929 | 0.686845 | 00:06 |
24 | 0.347181 | 0.300537 | 0.730203 | 00:06 |
25 | 0.338814 | 0.269294 | 0.767164 | 00:06 |
26 | 0.326413 | 0.245472 | 0.788555 | 00:06 |
27 | 0.314856 | 0.253438 | 0.784647 | 00:06 |
28 | 0.306478 | 0.227623 | 0.806798 | 00:06 |
29 | 0.297792 | 0.273537 | 0.726833 | 00:06 |
30 | 0.291198 | 0.215873 | 0.821891 | 00:06 |
31 | 0.283369 | 0.217524 | 0.827257 | 00:06 |
32 | 0.277785 | 0.213709 | 0.831023 | 00:05 |
33 | 0.273086 | 0.208569 | 0.816660 | 00:05 |
34 | 0.264948 | 0.216965 | 0.819404 | 00:06 |
35 | 0.257523 | 0.187829 | 0.844655 | 00:06 |
36 | 0.255689 | 0.197039 | 0.843100 | 00:06 |
37 | 0.250191 | 0.209165 | 0.777218 | 00:06 |
38 | 0.245002 | 0.181346 | 0.831535 | 00:05 |
39 | 0.241197 | 0.187171 | 0.822547 | 00:05 |
40 | 0.239846 | 0.177838 | 0.833223 | 00:05 |
41 | 0.234420 | 0.175321 | 0.827860 | 00:06 |
42 | 0.230797 | 0.168390 | 0.857568 | 00:05 |
43 | 0.227416 | 0.167590 | 0.858564 | 00:06 |
44 | 0.225182 | 0.167128 | 0.874714 | 00:05 |
45 | 0.223807 | 0.165946 | 0.870924 | 00:05 |
46 | 0.220755 | 0.164731 | 0.872577 | 00:05 |
47 | 0.216147 | 0.164051 | 0.869572 | 00:06 |
48 | 0.214723 | 0.162329 | 0.870547 | 00:05 |
49 | 0.212741 | 0.162104 | 0.869900 | 00:05 |
Training using Lightning
class LightModel(efficientdet.lightning.ModelAdapter):
def configure_optimizers(self):
return SGD(self.parameters(), lr=1e-2)
light_model = LightModel(model, metrics=metrics)
trainer = pl.Trainer(max_epochs=50, gpus=1)
trainer.fit(light_model, train_dl, valid_dl)
Inference
Predicting a batch of images
Instead of predicting a whole list of images at one, we can process small batch at the time: This option is more memory efficient.
infer_dl = efficientdet.infer_dl(valid_ds, batch_size=8)
samples, preds = efficientdet.predict_dl(model, infer_dl)
show_preds(
samples=samples[:6],
preds=preds[:6],
class_map=class_map,
denormalize_fn=denormalize_imagenet,
ncols=3,
)
Saving Model on Google Drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = Path('/content/gdrive/My Drive/')
torch.save(model.state_dict(), root_dir/'icevision/models/fridge/fridge_tf_efficientdet_lite0.pth')
Happy Learning!
If you need any assistance, feel free to join our forum.