A study of generalization in `pedestrian classification
Keywords:
Daimler; Inria; pedestrian detection; ResNet; SVM; transfer learning; TUD-BrusselsAbstract
Since the surge in popularity of Histogram of Oriented Gradients (HOG) in 2005 as the de facto feature vector for pedestrian detection, there have been many improvements in the detection pipeline that enable state of the art performance to be applicable to many real world problems. Nonetheless, the datasets available for training models have many biases, making it hard to use to detect pedestrians from videos and images obtained from other sources than the datasets.
This article presents a protocol to evaluate how pedestrian models generalize between different datasets. The protocol roughly consists of training a model with each dataset or dataset combination, and evaluating with the remaining dataset in each case.
We use the protocol to evaluate the performance of a typical pedestrian classification model based on HOG and/or LBP features and a SVM classifier. Alternatively, we also use a modern ConvNets model, to verify that the results of the protocol are due to the datasets and not the model.
We evaluate the models with the three most used datasets for pedestrian classification: INRIA, Daimler and TUD-Brussels. Our results show that while each dataset presents real world scenes, there are significant biases in each dataset that prevent models trained on one dataset to generalize to other datasets. Models trained on two fused datasets perform only marginally better on the third dataset than models trained on individual datasets, both for SVM and ConvNet classifiers.
References
Azulay, A., Weiss, Y. (2019). Why do deep convolutional networks generalize so poorly to small image transformations?. Journal of Machine Learning Research, 20, 1-25.
Benenson, R., Omran, M., Hosang, J., Schiele, B. (2015). Ten years of pedestrian detection, what have we learned? In: Computer Vision - ECCV 2014 Workshops. (pp. 613-627). Springer International Publishing.
Camele, G., Quiroga, F., Ronchetti, F., Hasperué, W., Lanzarini, L.C. (2018). Transferencia de aprendizaje para la detección de peatones. In: XXIV Congreso Argentino de Ciencias de la Computación, CACIC 2018. La Plata. (pp. 52-61). Red de Universidades con Carreras en Informática (RedUNCI).
Cao, X., Wang, Z., Yan, P., Li, X. (2013). Transfer learning for pedestrian detection. Neurocomputing, 100, 51-57, special issue: Behaviours in video.
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. 1, pp. 886-893.
Dollar, P., Wojek, C., Schiele, B., Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4), 743-761,doi: 10.1109/TPAMI.2011.155.
Enzweiler, M., Gavrila, D.M. (2008). Monocular pedestrian detection: Survey and experiments. IEEE Transactions on Pattern Analysis & Machine Intelligence, 31(12), 2179-2195. doi: 10.1109/TPAMI.2008.260.
Gan, G., Cheng, J. (2011). Pedestrian detection based on hog-lbp feature. 2011 Seventh International Conference on Computational Intelligence and Security (pp. 1184-1187). doi:10.1109/CIS.2011.262.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, (pp. 770-778). doi: 10.1109/CVPR.2016.90.
Mu, Y., Yan, S., Liu, Y., Huang, T., Zhou, B. (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. (pp. 1-8). doi:10.1109/CVPR.2008.4587800.
Ouyang, W., Wang, X. (2013). Single-pedestrian detection aided by multi-pedestrian detection. IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, (pp. 3198-3205). doi:10.1109/CVPR.2013.411
Pei, W.J., Zhang, Y.L., Zhang, Y., Zheng, C.H. (2014). Pedestrian detection based on HOG and LBP. In: Intelligent Computing Theory. (pp. 715-720). Springer International Publishing.
Wang, X., Han, T.X., Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, 2009. (pp. 32-39).
Wojek, C., Walk, S., Schiele, B. (2009). Multi-cue onboard pedestrian detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, 2009 (pp. 794-801), doi:10.1109/CVPRW. 2009.5206638.
Yan, J., Zhang, X., Lei, Z., Liao, S., Li, S.Z. (2013). Robust multi-resolution pedestrian detection in traffic scenes. 2013 IEEE Conference on Computer Vision and Pattern Recognition (pp. 3033-3040). Portland, OR, 2013. doi: 10.1109/CVPR.2013.390.
Zhang, L., Lin, L., Liang, X., He, K. (2016). Is faster r-cnn doing well for pedestrian detection?. Computer Vision and Pattern Recognition. ECCV 2016 (pp. 443-457). Springer International Publishing.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Franco Ronchetti , Facundo Quiroga, Genaro Camele, Waldo Hasperué, Laura Lanzarini
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.