Review Paper on Named Entity Recognition and Attribute Extraction using Machine Learning

Hiba Momin, Shubham Jain, Hemil Doshi, Prof. Ankush Hutke

doi:10.17762/ijritcc.v4i11.2600

PDF

Published: Nov 30, 2016

DOI: https://doi.org/10.17762/ijritcc.v4i11.2600

Hiba Momin, Shubham Jain, Hemil Doshi, Prof. Ankush Hutke

Abstract

Named entity recognition (NER) is a subsidiary task under information extraction that aims at locating and classifying named entities in the text provided into pre-defined categories such as the names of people, locations, organizations, etc. In focused NER, once the entities are recognized we further aim at finding the most important named entities among all the others in a document, which we refer to as focused named entity recognition. We implement this using a classifier approach, i.e. Naïve Bayes classification, and we show that these focused named entities are useful for many natural language processing applications, such as document summarization, search result ranking, and entity detection and tracking. Attribute extraction on the other hand, involves automatic selection of attributes in your data (such as columns in tabular data) that are most relevant to the predictive problem you are working on. We try to implement an approach to extract the entities’ attributes from unstructured text corpus. The proposed method is an unsupervised machine learning method that extracts the entity attributes utilizing deep belief network (DBN), we work on training data sets that we extract via web scraping tools, and test files for the same. Our goal can be twofold in this respect, firstly we can aim at simply organizing information so that it is useful to people, or put it in a semantically precise form to make further inferences.

How to Cite

, H. M. S. J. H. D. P. A. H. (2016). Review Paper on Named Entity Recognition and Attribute Extraction using Machine Learning. International Journal on Recent and Innovation Trends in Computing and Communication, 4(11), 41 –. https://doi.org/10.17762/ijritcc.v4i11.2600