FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment

Main Article Content

V. Vadivu
N. Kavitha

Abstract

— The Information Extraction (IE) approach extracts useful data from unstructured and semi-structured data. Big Data, with its rising volume of multidimensional unstructured data, provides new tools for IE. Traditional Information Extraction (IE) systems are incapable of appropriately handling this massive flood of unstructured data. The processing capability of current IE systems must be enhanced because to the amount and variety of Big Data. Existing IE techniques for data preparation, extraction, and transformation, as well as representations of massive amounts of multidimensional, unstructured data, must be evaluated in terms of their capabilities and limits. The proposed FVI-BD Framework for IOT device Information Extraction in Big Data. The unstructured data has cleaned and integration using POS tagging and similarity finding using LTA method. The features are extracted using TF and IDF. The Information extracted using NLP with WordNet. The classification has done with FVI algorithm.  This research paper discovered that vast data analytics may be enhanced by extracting document feature terms with synonymous similarity and increasing IE accuracy.

Article Details

How to Cite
Vadivu, V. ., & Kavitha, N. . (2023). FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment. International Journal on Recent and Innovation Trends in Computing and Communication, 11(7s), 529–540. https://doi.org/10.17762/ijritcc.v11i7s.7032
Section
Articles

References

S. Ahmad, S. Zobaed, R. Gottumukkala and M. A. Salehi, “Edge computing for user-centric secure search on cloud-based encrypted big data,” IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2019, pp. 662-669.

K. T. Belerao and S. B. Chaudhari, “Summarization using mapreduce framework based big data and hybrid algorithm (HMM and DBSCAN),” IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, pp. 377-380.

J. Bian, Z. Jiang and Q. Chen, “Research on multi-document summarization based on LDA topic model,” Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 2, 2014, pp. 113-116.

H. S. Chiranjeevi, M. Shenoy, S. Prabhu and S. Sundhar, “DSSM with text hashing technique for text document retrieval in next-generation search engine for big data and data analytics,” IEEE international conference on engineering and technology (ICETECH), 2016, pp. 395-399.

Dr. Anasica S, Mrs. Sweta Batra. (2020). Analysing the Factors Involved In Risk Management in a Business. International Journal of New Practices in Management and Engineering, 9(03), 05 - 10. https://doi.org/10.17762/ijnpme.v9i03.89

R. Devarakonda, L. Hook, T. Killeffer, M. Krassovski, T. Boden and S. Wullschleger, “Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example,” IEEE International Conference on Big Data (Big Data), 2015, pp. 2814-2816.

R. K. Lomotey and R. Deters, “Towards knowledge discovery in big data,” IEEE 8th International Symposium on Service Oriented System Engineering, 2014, pp. 181-191.

C. K. S. Leung, R. K. MacKinnon and F., Jiang, “Reducing the search space for big data mining for interesting patterns from uncertain data,” IEEE International Congress on Big Data, 2014, pp. 315-322.

N. Ragavan, “Efficient key hash indexing scheme with page rank for category based search engine big data,” IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), 2017, pp. 1-6.

P. K. Kotturu, and A. Kumar, “Big Data based Adaptive Learning and Scope of Automation in Actionable Knowledge,” 4th International Conference on Trends in Electronics and Informatics (ICOEI), 2020, pp. 669-672.

F. Padillo, J. M. Luna and S. Ventura, “Subgroup discovery on big data: Pruning the search space on exhaustive search algorithms,” IEEE International Conference on Big Data (Big Data), 2016, pp. 1814-1823.

S. Tuarob, S. Bhatia, P. Mitra and C.L. Giles, “AlgorithmSeer: A system for extracting and searching for algorithms in scholarly big data,” IEEE Transactions on Big Data, vol. 2, no. 1, 2016, pp. 3-17.

D. Wan, Y. Xiao, P. Zhang and H. Leung, “Hydrological big data prediction based on similarity search and improved BP neural network,” IEEE International Congress on Big Data, 2015, pp. 343-350.

Z. Youzhuo, F. Yu, Z. Ruifeng, H. Shuqing and W. Yi, “Research on lucene based full-text query search service for smart distribution system,” 3rd international conference on artificial intelligence and big data (ICAIBD), 2020, pp. 338-341.

P. Zezula, “Similarity searching for the big data: Challenges and research objectives,” Mobile Networks and Applications, vol. 20, no. 4, 2015, pp. 487-496.

A. S. Alblawi and A. A. Alhamed, “Big data and learning analytics in higher education: Demystifying variety, acquisition, storage, NLP and analytics,” IEEE conference on big data and analytics (ICBDA), 2017, pp. 124-129.

A. B. Patel, M. Birla and U. Nair, “Addressing big data problem using Hadoop and Map Reduce,” Nirma University International Conference on Engineering (NUiCONE), 2012, pp. 1-5.

J. E. Petralba, “An extracted database content from WordNet for natural language processing and word games,” International Conference on Asian Language Processing (IALP), 2014, pp. 199-202.

C. I. Hsu and C. Chiu, “A hybrid Latent Dirichlet Allocation approach for topic classification,” IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), 2017, pp. 312-315.

R. V. J. Regalado, J. L. Chua, J. L. Co and T. J. Z. Tiam-Lee, “Subjectivity Classification of Filipino Text with Features Based on Term Frequency--Inverse Document Frequency,” International Conference on Asian Language Processing, 2013, pp. 113-116.

Paul Garcia, Ian Martin, Laura López, Sigurðsson Ólafur, Matti Virtanen. Personalized Learning Paths Using Machine Learning Algorithms. Kuwait Journal of Machine Learning, 2(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/166

S. Arslan, A. Saçan, E. Açar, I. H. Toroslu and A. Yaz?c?, “Comparison of multidimensional data access methods for feature-based image retrieval,” 20th International Conference on Pattern Recognition, 2010, pp. 3260-3263.