Deep Learning Based Fine Grained Image Classification

Main Article Content

Priti P. Vaidya
S. M. Kamalapur

Abstract

Image classification, specifically object classification is the focused research area in the computer vision and machine learning field in the past decade. In image classification a label or category is assigned to an input image based on its content. With breakthroughs in deep learning-based approaches, performance of image classification models' has improved significantly, particularly fine-grained image classification, which includes discriminating between items of the same category with slight changes. The object classification can be categorised as coarse grained object classification, which identifies highly diverse object categories, such as an elephant and a bus. One example of this type of object classification is a bus and an elephant. On the other hand, fine-grained image categorization seeks to recognise photos as belonging to distinct species of animals, birds, or plants, as well as distinct models of automobiles, versions of aircraft, and so on. The purpose of this study is to evaluate previously published research that investigates deep learning techniques for the classification of fine-grained images and to compare the effectiveness of these techniques using datasets that are open to the public.

Article Details

How to Cite
Vaidya, P. P. ., & Kamalapur, S. M. . (2023). Deep Learning Based Fine Grained Image Classification. International Journal on Recent and Innovation Trends in Computing and Communication, 11(7s), 716–730. https://doi.org/10.17762/ijritcc.v11i7s.7532
Section
Articles

References

Xiu-Shen Wei, Yi-Zhe Song, Oisin Mac Aodha, Jianxin Wu, Yuxin Peng, Jinhui Tang, Jian Yang, Serge Belongie, “Fine-Grained Image Analysis with Deep Learning: A Survey”, 10.1109/TPAMI.2021.3126648, 2021 IEEE Transactions on Pattern Analysis and Machine Intelligence.

Lowe D G. “Object recognition from local scale-invariant features”, Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra. Greece: IEEE, 1099. pp. 1150-1157.

Dalal N, Triggs B. “Histograms of oriented gradients for human detection”. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego. USA: IEEE. 2005. pp. 886-893.

Jegou H. Douze M, Schmid C. Perez P. “Aggregating local descriptors into a compact image representation” Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco. USA: IEEE, 2010.

Sanchez Rerronnin E Mensink I, Verbeek “Image classification with the Fisher vector: theory and practice” International journal of Computer Vision. 2013, 105(3): pp.222-245.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097-1105.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431–3440.

R. Girshick, J. Donahue, T. Darrell, and J.Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587.

S. Branson, G. Van Horn , S. Belongie, and P. Perona, “Bird species categorization using pose normalized deep convolutional nets,” in Proc. Brit, Mach. Vis. Conf., 2014, pp. 1–14.

N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for fine-grained category detection,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 834–849.

D. Lin, X. Shen, C. Lu, and J. Jia, “Deep LAC: Deep localization, alignment and classification for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1666–1674.

H. Zhang et al., “SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1143–1152.

S. Huang, Z. Xu, D. Tao, and Y. Zhang, “Part-stacked CNN for fine-grained visual categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1173–1182.

X.-S. Wei, C.-W. Xie, J. Wu, and C. Shen, “Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization,” Pattern Recognit., vol. 76, pp. 704–714, 2018.

M. Lam, B. Mahasseni, and S. Todorovic, “Fine-grained recognition as HSnet search for informative image parts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2520–2529.

Z. Wang, S. Wang, H. Li, Z. Dou, and J. Li, “Graph-propagation based correlation learning for weakly supervised fine-grained image classification,” in Proc. AAAI Int. Conf. Artif. Intell., 2020, pp. 122 89–122 96.

C. Liu, H. Xie, Z.-J. Zha, L. Ma, L. Yu, and Y. Zhang, “Filtration and distillation: Enhancing region attention for fine-grained visual categorization,” in Proc. AAAI Int. Conf. Artif. Intell., 2020, pp. 11 555–11 562.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.

M. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.

T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang, “The application of two-level attention models in deep convolutional neural network for fine-grained image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 842–850.

L. Liu, C. Shen, and A. van den Hengel, “The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 4749–4757.

M. Simon and E. Rodner, “Neural activation constellations: Unsupervised part model discovery with convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1143–1151.

X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian, “Picking deep filter responses for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1134–1142.

Aaziz Fadhil , R. ., & Haddi Hassan, Z. A. . (2023). A Hybrid Honey-Badger Intelligence Algorithm with Nelder-Mead Method and Its Application for Reliability Optimization. International Journal of Intelligent Systems and Applications in Engineering, 11(4s), 136–145. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2580

Y.Wang, V. I. Morariu, and L. S. Davis, “Learning a discriminative filter bank within a CNN for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4148–4157.

Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6599–6608.

Z. Huang and Y. Li, “Interpretable and accurate fine-grained recognition via region grouping,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8662–8672.

X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian, “Picking deep filter responses for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1134–1142.

H. Larochelle and G. E. Hinton, “Learning to combine foveal glimpses with a third-order Boltzmann machine,” in Proc. Conf. Neural Inf. Process. Syst., 2010, pp. 1243–1251.

García, A., Petrovi?, M., Ivanov, G., Smith, J., & Cohen, D. Enhancing Medical Diagnosis with Machine Learning and Image Processing. Kuwait Journal of Machine Learning, 1(4). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/143

J. Fu, H. Zheng, and T. Mei, “Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4438–4446.

H. Zheng, J. Fu, T. Mei, and J. Luo, “Learning multi-attention convolutional neural network for fine-grained image recognition,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5209–5217.

Y. Peng, X. He, and J. Zhao, “Object-part attention model for finegrained image classification,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1487–1500, Mar. 2018.

H. Zheng, J. Fu, Z.-J. Zha, J. Luo, and T. Mei, “Learning rich part hierarchies with progressive attention networks for fine-grained image recognition,” IEEE Trans. Image Process., vol. 29, pp. 476–488, Jun. 2020.

X. He, Y. Peng, and J. Zhao, “Fast fine-grained image classification via weakly supervised discriminative localization,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 5, pp. 1394–1407, May 2019.

H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, “Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5012–5021.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD birds-200–2011 dataset,” Univ. California, Los Angeles, Ca, USA, Tech. Rep. CNS-TR-2011-001, 2011.

A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei , “Novel dataset for fine-grained image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop Fine-Grained Vis. Categorization, 2011, pp. 806–813.

J. Krause, M. Stark, J. Deng, and L. Fei-Fei , “3D object representations for fine-grained categorization,” in Proc. IEEE Int. Conf. Comput. Vis. Workshop 3D Representation Recognit., 2013, pp. 554–561.

Prof. Parvaneh Basaligheh. (2020). Mining Of Deep Web Interfaces Using Multi Stage Web Crawler. International Journal of New Practices in Management and Engineering, 9(04), 11 - 16. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/94

Khetani, V. ., Gandhi, Y. ., Bhattacharya, S. ., Ajani, S. N. ., & Limkar, S. . (2023). Cross-Domain Analysis of ML and DL: Evaluating their Impact in Diverse Domains. International Journal of Intelligent Systems and Applications in Engineering, 11(7s), 253–262.

M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolutional networks for mid and high level feature learning,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2018–2025.

Z. Xu, Y. Yang, and A. G. Hauptmann, “A discriminative CNN video representation for event detection,” in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., 2015, pp. 1798–1807.

M. Cimpoi, S. Maji, and A. Vedaldi, “Deep filter banks for texture recognition and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3828–3836.

B.-B. Gao, X.-S. Wei, J. Wu, and W. Lin, “Deep spatial pyramid: The devil is once again in the details,” 2015, arXiv:1504.05277.

L. Wang, J. Zhang, L. Zhou, C. Tang, and W. Li, “Beyond covariance: Feature representation with nonlinear kernel matrices,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4570–4578.

Q. Wang, J. Xie, W. Zuo, L. Zhang, and P. Li, “Deep CNNs meet global covariance pooling: Better representation and generalization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 8, pp. 2582–2597, Aug. 2021.

T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for fine-grained visual recognition,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1449–1457.

T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear convolutional neural networks for fine-grained visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1309–1322, Jun. 2018.

Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pooling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,pp. 317–326.

N. Pham and R. Pagh, “Fast and scalable polynomial kernels via explicit feature maps,” in Proc. 19th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2013, pp. 239–247.

S. Kong and C. Fowlkes, “Low-rank bilinear pooling for finegrained classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 365–374.

A. Dubey, O. Gupta, R. Raskar, and N. Naik, “Maximum entropy fine-grained classification,” in Proc. Conf. Neural Inf. Process. Syst., 2018, pp. 637–647.

A. Dubey, O. Gupta, P. Guo, R. Raskar, R. Farrell, and N. Naik, “Pairwise confusion for fine-grained visual

G. Sun,H.Cholakkal, S.Khan, F. S.Khan, and L. Shao, “Fine-grained recognition: Accounting for subtle differences between similar classes,” in Proc. AAAI Int. Conf. Artif. Intell., 2020, pp. 12047–12054.

D. Chang et al., “The devil is in the channels: Mutual-channel loss for fine-grained image classification,” IEEE Trans. Image Process., vol. 29, pp. 4683–4695, Feb. 2020.