Break Down Resumes into Sections to Extract Data and Perform Text Analysis using Python

Arvind Kumar  Sinha; Md. Amir Khusru  Akhtar; Mohit  Kumar

doi:10.17762/ijritcc.v11i6s.6945

PDF

Published: Jun 12, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i6s.6945

Keywords:

Resume Parser, Text Analysis, text classification, python, regular expressions, tagging, parsing

Arvind Kumar Sinha

Faculty of Computing and IT Usha Martin University Ranchi, India

Md. Amir Khusru Akhtar

Faculty of Computing and IT Usha Martin University Ranchi, India

Mohit Kumar

Department of Information Technology MIT Art, Design and Technology University Pune, India

Abstract

The objective of AI-based resume screening is to automate the screening process, and text, keyword, and named entity recognition extraction are critical. This paper discusses segmenting resumes in order to extract data and perform text analysis. The raw CV file has been imported, and the resume data cleaned to remove extra spaces, punctuation and stop words. To extract names from resumes, regular expressions are used. We have also used the spaCy library which is considered the most accurate natural language processing library. It includes already-trained models for entity recognition, parsing, and tagging. The experimental method is used with resume data sourced from Kaggle, and external Source (MTIS).

How to Cite

Sinha, A. K. ., Akhtar, M. A. K. ., & Kumar, M. . (2023). Break Down Resumes into Sections to Extract Data and Perform Text Analysis using Python . International Journal on Recent and Innovation Trends in Computing and Communication, 11(6s), 391–400. https://doi.org/10.17762/ijritcc.v11i6s.6945

Issue

Vol. 11 No. 6s (2023): Advances in Computational Modeling and Simulation of Computing Systems

Section

Articles

References

A. Sinha, Md. A. K. Akhtar, and A. Kumar, Resume Screening using Natural Language Processing and Machine Learning: A Systematic Review. In: Swain, D., Pattnaik, P.K., Athawale, T. (eds) Machine Learning and Information Processing. Advances in Intelligent Systems and Computing, vol 1311. 2021 Springer, Singapore.

D. Çelik et al., “Towards an Information Extraction System Based on Ontology to Match Resumes and Jobs,” in 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, Jul. 2013, pp. 333–338. doi: 10.1109/COMPSACW.2013.60.

T. Kiss and J. Strunk, “Unsupervised multilingual sentence boundary detection. Computational Linguistics,” pp. 485–525, 2006.

J. C. Reynar and A. Ratnaparkhi, “A maximum entropy approach to identifying sentence boundaries. Proceedings of the Fifth Conference on Applied Natural Language Processing,” pp. 16–19, 1997.

M. D. Riley, “Some applications of tree-based modelling to speech and language. Proceedings of the Workshop on Speech and Natural Language,” pp. 339–352, 1989.

“The Stanford Natural Language Processing Group.” https://nlp.stanford.edu/software/tokenizer.shtml (accessed Jul. 08, 2021).

C. D. Manning and H. Schütze, “Foundations of Statistical Natural Language Processing. MIT Press.,” 1999.

B. Jurish and K. M. Würzner, “Word and Sentence Tokenization with Hidden Markov Models.,” pp. 61–83, 2013.

“UCREL CLAWS5 Tagset.” http://ucrel.lancs.ac.uk/claws5tags.html (accessed Jul. 08, 2021).

L. Derczynski, D. Maynard, G. Rizzo, and M. Van Erp, “Analysis of named entity recognition and linking for tweets. Information Processing & Management,” pp. 32–49, 2015.

“Text Analysis Starter Guide: What You Need to Know,” MonkeyLearn. https://monkeylearn.com/text-analysis/ (accessed Jul. 08, 2021).

“Cluster analysis,” Wikipedia. Jun. 29, 2021. Accessed: Jul. 08, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Cluster_analysis&oldid=1031035663

“Python Stemming Lemmatization.” https://www.python-ds.com/python-stemming-lemmatization (accessed Jul. 08, 2021).

P. K. Roy, S. S. Chowdhary, and R. Bhatia, “A Machine Learning approach for automation of Resume Recommendation system,” 2019.

“Machine Learning with Python: Metrics: Accuracy, precision, recall, F1-Score.” https://www.python-course.eu/metrics.php (accessed Jul. 08, 2021).

A. Hetherington, “Evaluating Classifier Model Performance,” Medium, Jul. 05, 2020. https://towardsdatascience.com/evaluating-classifier-model-performance-6403577c1010 (accessed Jul. 08, 2021).

“Precision vs Recall | Precision and Recall Machine Learning,” Analytics Vidhya, Sep. 03, 2020. https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/ (accessed Jul. 08, 2021).

“spaCy · Industrial-strength Natural Language Processing in Python.” https://spacy.io/ (accessed Jul. 08, 2021).

O. Pathak, OmkarPathak/ResumeParser. 2021. Accessed: Jul. 08, 2021. [Online]. Available: https://github.com/OmkarPathak/ResumeParser

E. Loper and S. Bird, “Nltk: the natural language toolkit,” 2002.

H. Shah, N. ., T. Khan, D. ., A. Banu, A. ., & H. Shah, L. . (2023). Symmetric and Asymmetric Encryption Schemes for Internet of Things: A Survey . International Journal of Intelligent Systems and Applications in Engineering, 11(1), 254–260. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2465

Md Amir Khusru Akhtar, Mohit Kumar, and Gadadhar Sahoo. "Automata for santali language processing." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 939-943. IEEE, 2017.

Md Amir Khusru Akhtar, Gadadhar Sahoo, and Mohit Kumar. "Digital corpus of Santali language." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 934-938. IEEE, 2017.

Citation Indices	All	Since 2018
Citation	5854	3996
h-index	28	23
i10-index	119	72

Year	Rate
2019	12.6%
2018	18.3%
2017	16.9%
2016	18.8%
2015	22.9%
2014	28.9%
2013	26.1%

Break Down Resumes into Sections to Extract Data and Perform Text Analysis using Python

Abstract

References

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links:

Article Sidebar

Main Article Content

Abstract

Article Details

References

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links: