An Efficient Information Extraction Mechanism with Page Ranking and a Classification Strategy based on Similarity Learning of Web Text Documents

Main Article Content

Sunil Kumar Thota
G V S Raj Kumar
B. Raja Koti
K. Naveen Kumar

Abstract

Users have recently had more access to information thanks to the growth of the www information system. In these situations, search engines have developed into an essential tool for consumers to find information in a big space. The difficulty of handling this wealth of knowledge grows more difficult every day. Although search engines are crucial for information gathering, many of the results they offer are not required by the user because they are ranked according on user string matches. As a result, there were semantic disparities between the terms used in the user inquiry and the importance of catch phrases in the results. The problem of grouping relevant information into categories of related topics hasn't been solved. A Ranking Based Similarity Learning Approach and SVM based classification frame work of web text to estimate the semantic comparison between words to improve extraction of information is proposed in the work. The results of the experiment suggest improvisation in order to obtain better results by retrieving more relevant results.

Article Details

How to Cite
Thota, S. K. ., Kumar, G. V. S. R. ., Koti, B. R. ., & Kumar, K. N. . (2023). An Efficient Information Extraction Mechanism with Page Ranking and a Classification Strategy based on Similarity Learning of Web Text Documents. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8), 229–240. https://doi.org/10.17762/ijritcc.v11i8.7948
Section
Articles

References

J. Shen, E. Zheng, Z. Cheng, C. Deng, "Assisting Attraction Classi-fication by Harvesting Web Data", IEEE Access Volume: 5 Pages: 1600 - 1608, 2017.

Tzu-Yi Chan, Yue-Shan Chang, "Enhancing Classification Effec-tiveness of Chinese News Based on Term Frequency", IEEE 7th In-ternational Symposium on Cloud and Service Computing (SC2), Pages: 124- 131,2017.

C. Chen, X. Meng, Z. Xu, T. Lukasiewicz, "Location-Aware Person-alized News Recommendation with Deep Semantic Analysis", IEEE Access, Volume: 5 Pages: 1624 - 1638,2017. https://doi.org/10.1109/ACCESS.2017.2655150.

J. Gracia, E.Mena, "Web-Based Measure of Semantic Relatedness", In Proceedings of 9th International Conference On Web Information Systems Engineering (Wise '08), Vol. 5175, Pp. 136-150, 2008.

J. Ruohonen, "Classifying Web Exploits with Topic Modeling", 28th International Workshop on Database and Expert Systems Applica-tions (DEXA) Pages: 93 - 97, 2017.

U. Kumaresan, K. Ramanujam, "Web Dat a Extraction from Scien-tific Publishers' Website Using Heuristic Algorithm", International Journal of Intelligent Systems and Applications (IJISA), Vol.9, No.10, pp. 31 - 39,

R. L. Cilibrasi, P.M.B. Vitanyi, "The Google Similarity Distance", IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No 3, 370-383, 2007.

Tchiegue, R. Li, S. Ma, "A web text classification technique for un-labeled training samples", 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) Pages: 437 - 440, 2015.

T. M. Veeragangadhara Swamy, G. T. Raju, "A Novel Prefetching Technique through Frequent Sequential Patterns from Web Usage Data", An International Journal of Advanced Computer Technology, Vol. 4, No. 6, June 2015.

J. Hoxha, P. Mika, R. Blanco, "Learning Relevance of Web resources across Domains to make recommendations", 12th international con-ference on Machine Learning and Applications, vol. 2, pp. 325-330, 2013.

Y. Li, A. Algarni, M. Albathan, Y. Shen, and M.A. Bijaksana, "Rel-evance Feature Discovery for Text Mining", In IEEE Trans. Knowl. Data Eng., vol. 26, Jan. 2015.

P. Li, H. Wang, K. Q. Zhu, Z. Wang, and X. Wu, "Computing term similarity by large probabilistic is a knowledge", In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, ser. CIKM '13, New York, NY, USA, pp. 1401-1410, 2013.

Kalaivani, A. ., Karpagavalli, S. ., & Gulati, K. . (2023). Expert Automated System for Prediction of Multi-Type Dermatology Sicknesses Using Deep Neural Network Feature Extraction Approach. International Journal of Intelligent Systems and Applications in Engineering, 11(3s), 170–178. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2557

Y. Li, A. Algarni, and N. Zhong. "Mining positive and negative pat-terns for relevance feature discovery", In KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge dis-covery and data mining, pages 753-762, New York, NY, USA, 2010.

C. Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis, and Khaled F. Shaalan. "A survey of Web information extraction sys-tems", IEEE Transactions on Knowledge and Data Engineering, 18(10):1411-1428, 2006.

D. Zhou, X. Wu, W. Zhao, S. Lawless, J. Liu, "Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data”, IEEE Transactions on Knowledge and Data Engg Volume: 29, Issue:7, Pages: 1536 - 1548, 2017.

M. A. Siddiqui,"Mining Wikipedia to Rank Rock Guitarists", Inter-national Journal of Intelligent Systems and Applications (IJISA) , vol.7, no.12, pp.50 - 56,.

X. He, C.H.Q. Ding, H. Zha, H.D. Simon, "Automatic topic identifi-cation using webpage clustering", In Proceedings of IEEE Interna-tional Conference on Data Mining, pp.195-202, 2001.

W. Hua, Z. Wang, H. Wang, K. Zheng, and X. Zhou, "Understand Short Texts by Harvesting and Analyzing Semantic Knowledge", IEEE Transactions on Knowledge and Data Engineering, 1041-4347, 2016.

A. Ashari, M. Riasetiawan, "Document Summarization using Tex-tRank and Semantic Network", International Journal of In telligent Systems and Applications (IJISA), Vol.9, No.11, pp. 26 - 33,

X. Wu, Dong Zhou, Yu Xu, S. Lawless, "Personalized query expan-sion utilizing multi-relational social data", 12th International Work-shop on Semantic and Social Media Adaptation and Personalization (SMAP) Pages: 65-70, 2017.

S. Lawrence, L. Giles, A. Spink, "Inquirus Web metasearch tool: A user evaluation", In Proceedings of WebNet, PP. 819-820, 2000.

S. T. Wu, Y. Li, and Y. Xu, "Deploying approaches for pattern re-finement in text mining", In Proc. IEEE Conf. Data Mining, pp. 1157-1161, 2006.

N. Zhong, Yuefeng Li, and Sheng-Tang Wu, "Effective Pattern Discovery for Text Mining", Vol. 24, No.1,Jan 2012.

A. Anagnostopoulos, A. Broder, and K. Punera, "Effective and Effi-cient Classification on a Search-Engine Model, Knowledge and In-formation Systems, 2007.

Matti Virtanen, Jan de Vries, Thomas Müller, Daniel Müller, Giovanni Rossi. Machine Learning for Intelligent Feedback Generation in Online Courses . Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/188

Z. Zhang, Q. Li, and D. Zeng, "Mining evolutionary topic patterns in community question answering systems", IEEE Trans. Syst., Man, Cybern. Vol. 41, no. 5, pp. 828-833, 2011. https://doi.org/10.1109/TSMCA.2011.2157131.

J. Zhu, Member, K. Wang, Y. Wu, Zhongyi Hu, and H. Wang, "Min-ing User-Aware Rare Sequential Topic Patterns in Document Streams", IEEE Transactions on Knowledge and Data Engineering, 2016.

Thota, D. S. ., Sangeetha, D. M., & Raj , R. . (2022). Breast Cancer Detection by Feature Extraction and Classification Using Deep Learning Architectures. Research Journal of Computer Systems and Engineering, 3(1), 90–94. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/48

M. Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broad-head, and Oren Etzioni. "Open information extraction from the Web". In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670-2676, 2007.

M. S. Kamel, "An Efficient Concept Based Mining Model for En-hancing Text Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, October 2010.

Nidhi Grover, Ritika Wason, “Comparative Analysis Of Pagerank And HITS Algorithms”, Vol. 1 Issue 8, October – 2012.