While I’m losing my mind trying to get the multiple object detection using SSD to work, here’s a link to a working Notebook that classifies and identifies the images from the Pascal 2007 dataset.

The regression is still not perfect but you get the idea.

This uses a Linear layer as the final layer instead of a Convolution so YOLO.

SSD_ObjectDetection