Break Down Resumes into Sections to Extract Data and Perform Text Analysis using Python
Main Article Content
Abstract
The objective of AI-based resume screening is to automate the screening process, and text, keyword, and named entity recognition extraction are critical. This paper discusses segmenting resumes in order to extract data and perform text analysis. The raw CV file has been imported, and the resume data cleaned to remove extra spaces, punctuation and stop words. To extract names from resumes, regular expressions are used. We have also used the spaCy library which is considered the most accurate natural language processing library. It includes already-trained models for entity recognition, parsing, and tagging. The experimental method is used with resume data sourced from Kaggle, and external Source (MTIS).
Article Details
References
A. Sinha, Md. A. K. Akhtar, and A. Kumar, Resume Screening using Natural Language Processing and Machine Learning: A Systematic Review. In: Swain, D., Pattnaik, P.K., Athawale, T. (eds) Machine Learning and Information Processing. Advances in Intelligent Systems and Computing, vol 1311. 2021 Springer, Singapore.
D. Çelik et al., “Towards an Information Extraction System Based on Ontology to Match Resumes and Jobs,” in 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, Jul. 2013, pp. 333–338. doi: 10.1109/COMPSACW.2013.60.
T. Kiss and J. Strunk, “Unsupervised multilingual sentence boundary detection. Computational Linguistics,” pp. 485–525, 2006.
J. C. Reynar and A. Ratnaparkhi, “A maximum entropy approach to identifying sentence boundaries. Proceedings of the Fifth Conference on Applied Natural Language Processing,” pp. 16–19, 1997.
M. D. Riley, “Some applications of tree-based modelling to speech and language. Proceedings of the Workshop on Speech and Natural Language,” pp. 339–352, 1989.
“The Stanford Natural Language Processing Group.” https://nlp.stanford.edu/software/tokenizer.shtml (accessed Jul. 08, 2021).
C. D. Manning and H. Schütze, “Foundations of Statistical Natural Language Processing. MIT Press.,” 1999.
B. Jurish and K. M. Würzner, “Word and Sentence Tokenization with Hidden Markov Models.,” pp. 61–83, 2013.
“UCREL CLAWS5 Tagset.” http://ucrel.lancs.ac.uk/claws5tags.html (accessed Jul. 08, 2021).
L. Derczynski, D. Maynard, G. Rizzo, and M. Van Erp, “Analysis of named entity recognition and linking for tweets. Information Processing & Management,” pp. 32–49, 2015.
“Text Analysis Starter Guide: What You Need to Know,” MonkeyLearn. https://monkeylearn.com/text-analysis/ (accessed Jul. 08, 2021).
“Cluster analysis,” Wikipedia. Jun. 29, 2021. Accessed: Jul. 08, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Cluster_analysis&oldid=1031035663
“Python Stemming Lemmatization.” https://www.python-ds.com/python-stemming-lemmatization (accessed Jul. 08, 2021).
P. K. Roy, S. S. Chowdhary, and R. Bhatia, “A Machine Learning approach for automation of Resume Recommendation system,” 2019.
“Machine Learning with Python: Metrics: Accuracy, precision, recall, F1-Score.” https://www.python-course.eu/metrics.php (accessed Jul. 08, 2021).
A. Hetherington, “Evaluating Classifier Model Performance,” Medium, Jul. 05, 2020. https://towardsdatascience.com/evaluating-classifier-model-performance-6403577c1010 (accessed Jul. 08, 2021).
“Precision vs Recall | Precision and Recall Machine Learning,” Analytics Vidhya, Sep. 03, 2020. https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/ (accessed Jul. 08, 2021).
“spaCy · Industrial-strength Natural Language Processing in Python.” https://spacy.io/ (accessed Jul. 08, 2021).
O. Pathak, OmkarPathak/ResumeParser. 2021. Accessed: Jul. 08, 2021. [Online]. Available: https://github.com/OmkarPathak/ResumeParser
E. Loper and S. Bird, “Nltk: the natural language toolkit,” 2002.
H. Shah, N. ., T. Khan, D. ., A. Banu, A. ., & H. Shah, L. . (2023). Symmetric and Asymmetric Encryption Schemes for Internet of Things: A Survey . International Journal of Intelligent Systems and Applications in Engineering, 11(1), 254–260. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2465
Md Amir Khusru Akhtar, Mohit Kumar, and Gadadhar Sahoo. "Automata for santali language processing." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 939-943. IEEE, 2017.
Md Amir Khusru Akhtar, Gadadhar Sahoo, and Mohit Kumar. "Digital corpus of Santali language." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 934-938. IEEE, 2017.