A CNN and LSTM-based Model for Creating Captions for Photos
Main Article Content
Abstract
Can a machine interpret an image's meaning with the same speed as the human brain when it is seen? This problem was heavily researched by computer vision specialists, who believed it to be unsolvable until recently. It is now possible to develop models that can generate captions for pictures because of advancements in deep learning techniques, accessibility to large datasets, and processing power. This will be accomplished by the Python-based implementation of the article's deep learning convolutional neural network technique and a particular kind of recurrent neural network. Here the proposed model uses CNN and LSTM methods to achieve desired task
Article Details
References
Gupta, N., & Jalal, A. S. (2020). Integration of textual cues for fine-grained image captioning using deep CNN and LSTM. Neural Computing and Applications, 32(24), 17899-17908.
Khamparia, A., Pandey, B., Tiwari, S., Gupta, D., Khanna, A., & Rodrigues, J. J. (2020). An integrated hybrid CNN–RNN model for visual description and generation of captions. Circuits, Systems, and Signal Processing, 39(2), 776-788.
Ms. Madhuri Zambre. (2012). Performance Analysis of Positive Lift LUO Converter . International Journal of New Practices in Management and Engineering, 1(01), 09 - 14. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/3
Soh, M. (2016). Learning CNN-LSTM architectures for image caption generation. Dept. Comput. Sci., Stanford Univ., Stanford, CA, USA, Tech. Rep, 1.
Alzubi, J. A., Jain, R., Nagrath, P., Satapathy, S., Taneja, S., & Gupta, P. (2021). Deep image captioning using an ensemble of CNN and LSTM based deep neural networks. Journal of Intelligent & Fuzzy Systems, 40(4), 5761-5769.
Mondal , D. (2021). Green Channel Roi Estimation in The Ovarian Diseases Classification with The Machine Learning Model . Machine Learning Applications in Engineering Education and Management, 1(1), 07–12.
Al-Muzaini, H. A., Al-Yahya, T. N., & Benhidour, H. (2018). Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. International Journal of Advanced Computer Science and Applications, 9(6).
Sharma, H., Agrahari, M., Singh, S. K., Firoj, M., & Mishra, R. K. (2020, February). Image captioning: a comprehensive survey. In 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC) (pp. 325-328). IEEE.
Khatri, K. ., & Sharma, D. A. . (2020). ECG Signal Analysis for Heart Disease Detection Based on Sensor Data Analysis with Signal Processing by Deep Learning Architectures. Research Journal of Computer Systems and Engineering, 1(1), 06–10. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/11
Chen, M., Ding, G., Zhao, S., Chen, H., Liu, Q., & Han, J. (2017, February). Reference based LSTM for image captioning. In Thirty-first AAAI conference on artificial intelligence.
Wang, M., Song, L., Yang, X., & Luo, C. (2016, September). A parallel-fusion RNN-LSTM architecture for image caption generation. In 2016 IEEE international conference on image processing (ICIP) (pp. 4448-4452). IEEE.
Johnson, M., Williams, P., González, M., Hernandez, M., & Muñoz, S. Applying Machine Learning in Engineering Management: Challenges and Opportunities. Kuwait Journal of Machine Learning, 1(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/90
Loganathan, K., Kumar, R. S., Nagaraj, V., & John, T. J. (2020). Cnn & lstm using python for automatic image captioning. Materials Today: Proceedings.
Tan, Y. H., & Chan, C. S. (2017). phi-LSTM: a phrase-based hierarchical LSTM model for image captioning. In Asian conference on computer vision (pp. 101-117). Springer, Cham.
Al Fatta, H., & Fajar, U. (2019, December). Captioning image using convolutional neural network (CNN) and long-short term memory (LSTM). In 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) (pp. 263-268). IEEE.
Pa, W. P., & Nwe, T. L. (2020, May). Automatic Myanmar image captioning using CNN and LSTM-based language model. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 139-143).
Ana Silva, Deep Learning Approaches for Computer Vision in Autonomous Vehicles , Machine Learning Applications Conference Proceedings, Vol 1 2021.
Xu, K., Wang, H., & Tang, P. (2017, July). Image captioning with deep LSTM based on sequential residual. In 2017 IEEE International Conference on Multimedia and Expo (ICME) (pp. 361-366). IEEE.
Liu, M., Li, L., Hu, H., Guan, W., & Tian, J. (2020). Image caption generation with dual attention mechanism. Information Processing & Management, 57(2), 102178.
Yang, Z., Yuan, Y., Wu, Y., Cohen, W. W., & Salakhutdinov, R. R. (2016). Review networks for caption generation. Advances in neural information processing systems, 29.
Wang, H., Zhang, Y., & Yu, X. (2020). An overview of image caption generation methods. Computational intelligence and neuroscience, 2020.