🔥Updating🔥
- Still to come:
- Separate sub losses to print. Add visualization of detection output.
- Combining the proposed method with model pruning/quantization method.
python2 tensorpack=0.8.6 tensorflow=1.8.0
First of all, clone the code
git clone https://github.com/twangnh/Distilling-Object-Detectors-Shuffledet
Note we split KITTI training set into train/val sets and eval our methods and models on val set, since test set label is not available. KITTI 2D object detection images are sampled from video, randomly split the training data into train and val set could lead to exceptionally high performance due to correlation between video frames, we follow MSCNN [1], Zhaowei Cai et.al. which split KIITI training set to train/val sets while ensuring images frames does not come from close video frames.
- download images at http://www.cvlibs.net/download.php?file=data_object_label_2.zip and extract into
./data/KITTI/training/image_2/
- download labels at http://www.cvlibs.net/download.php?file=data_object_label_2.zip and extract into
./data/KITTI/training/label_2/
- The train val split image index files are ready at
./data/KITTI/ImageSets
-
download imagenet pretrained 0.5x modified-shufflenet backbone model at 0.5x GoogleDrive and put it into
./pretrained_model/shuffle_backbone0.5x/
-
download imagenet pretrained 0.25x modified-shufflenet backbone model at 0.25x GoogleDrive and put it into
./pretrained_model/shuffle_backbone0.25x/
-
download trained 1x model at GoogleDrive and put it into
./kitti-1x-supervisor/
we have migrated to multi-gpu training with cross gpu batch normalization, currently batch size of 32 on 4 GPUs is reported, other settings could be tried.
- train with 0.5x student
python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.5 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.5x/model-960000
- train with 0.25x student
python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.25 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.25x/model-665000
- you can turn off imitation by passing
--without_imitation True
, then the training is only with ground truth supervision, like
python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.5 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.5x/model-960000 --without_imitation True
models will be saved in
train_dir
By default, the evaluation code runs while training progress, test all checkpoint saved, after training has started, e.g., the 0.5x student training, you can run
python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir /path_to/eval_dir --image_set val --gpu 0 --checkpoint_path /path_to/train_dir --student 0.5
Then, tensorboard records can be loaded as(change port if needed)
tensorboard --logdir=/path_to/eval_dir --port 4118
and viewed by opening the site
http://localhost:4118
Models | Flops /G |
Params /M |
car | pedestrian | cyclist | mAP | ckpt | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | Easy | Mod | Hard | |||||
1x | 5.1 | 1.6 | 85.7 | 74.3 | 65.8 | 63.2 | 55.6 | 50.6 | 69.7 | 51.0 | 49.1 | 62.8 | GoogleDrive |
0.5x | 1.5 | 0.53 | 81.6 | 71.7 | 61.2 | 59.4 | 52.3 | 45.5 | 59.7 | 43.5 | 42.0 | 57.4 | GoogleDrive |
0.5x-I | 1.5 | 0.53 | 84.9 | 72.9 | 64.1 | 60.7 | 53.3 | 47.2 | 69.0 | 46.2 | 44.9 | 60.4 | GoogleDrive |
+3.3 | +1.2 | +2.9 | +1.3 | +1.0 | +1.7 | +9.3 | +2.7 | +2.9 | +3.0 | ||||
0.25x | 0.67 | 0.21 | 67.2 | 56.6 | 47.5 | 54.7 | 48.4 | 42.1 | 49.1 | 33.3 | 32.9 | 48.0 | GoogleDrive |
0.25x-I | 0.67 | 0.21 | 76.6 | 62.3 | 54.6 | 56.8 | 48.2 | 42.6 | 56.6 | 37.3 | 36.5 | 52.4 | GoogleDrive |
+9.4 | +5.7 | +7.1 | +2.1 | -0.2 | +0.5 | +7.5 | +4.0 | +3.6 | +4.4 |
models with highest mAP are reported for both baseline and distilled model
Note the numbers are different from the paper as they are independent running of the algorithm and we have migrated from single GPU training to multi-gpu training with larger batch size.
- For example, to test the 0.5x distilled model download the trained model at the corresponding GoogleDrive link, then run
python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir xxx --image_set val --gpu 0 --checkpoint_path /path_to/model0.5x60.4/model.ckpt-33000 --run_once True --student 0.5
- To test the 1x supervisor model, run
python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1_supervisor --eval_dir xxx --image_set val --gpu 0 --checkpoint_path ./kitti-1x-supervisor/model.ckpt-725000 --run_once True
Note for model size, tensorflow saved checkpoint contains gradients/other information, so the size is larger than it should be, we have not yet freeze the model, to check model size, for exampel, the baseline 0.25x model without imitation, run
python param_count.py --model_path /home/wangtao/prj/shuffledet-multi-gpu-ckpt/model0.25x_nosup_48.0/model.ckpt-40000
Still to come...
- if you got permission denied when eval the model, please try
chmod +x ./dataset_tool/kitti-eval/cpp/evaluate_object
if not work, just compile the evaluate_object excutable from source, i.e., run make under ./dataset_tool/kitti-eval
@inproceedings{wang2019distilling,
title={Distilling Object Detectors With Fine-Grained Feature Imitation},
author={Wang, Tao and Yuan, Li and Zhang, Xiaopeng and Feng, Jiashi},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={4933--4942},
year={2019}
}
[1] Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. ECCV 2016
The code and the models are MIT licensed, as found in the LICENSE file.