Object Detection for Autonomous Vehicles

Experiments

I fine-tuned each model on the Berkeley DeepDrive object detection dataset and experimented with different hyperparameter settings. For each model, I use the Adam optimizer. For Faster R-CNN, I experiment with 2 different backbones, ResNet-50 and MobileNetV3, using the corresponding PyTorch models that are already pre-trained on COCO [11]. I follow the PyTorch "Torchvision Object Detection Finetuning Tutorial" [3] to fine-tune Faster R-CNN, but modify the code to save the best model, according to mAP score on the validation set, store the loss on the training set, compute loss on the validation set, and use a different learning rate scheduler. Since there are nearly 56,000 images in the training set, I update the learning rate during an epoch if the loss increases for 4 batches in a row (I determined the number 4 experimentally). Also, for Faster R-CNN with a ResNet-50 backbone, the largest batch size I could use without receiving a CUDA out of memory error on Google Colaboratory was 9.

For YOLO, I use the pre-trained YOLOv4 model from [12] and follow the instructions in the README to modify certain files in order to fine-tune on the Berkeley DeepDrive object detection dataset. Specifically, I modify the YOLOv4 cfg file by changing the number of classes to 13 and the number of filters in the convolutional layer before each YOLO layer to 54, which is (# of classes + 5) * 3. Further, I modify the schedule for decreasing the learning rate. Originally, the learning rate was multiplied by 0.5 after completing 80% and 90% of the iterations, but I changed this schedule to multiply the learning rate by 0.95 after about every 1 or 2 epochs, as I thought gradually decreasing the learning rate over epochs rather than only near the end of training would help the model learn better. I also create 5 new files, following the instructions in [12]: "obj.names" contains the name of each object class; "obj.data" has the number of classes and the location for the dataset, "obj.names" file, and where to store the model weights; and "train.txt", "val.txt", and "test.txt" list the locations for images in the training, validation, and test sets, respectively. Last, I create one txt label file for each image, as previously described, and then run the following command to fine-tune YOLOv4 from the pre-trained weights:

./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map