Some Clustering Methods, Algorithms and their Applications
Main Article Content
Abstract
Clustering is a type of unsupervised learning [15]. When no target values are known, or "supervisors," in an unsupervised learning task, the purpose is to produce training data from the inputs themselves. Data mining and machine learning would be useless without clustering. If you utilize it to categorize your datasets according to their similarities, you'll be able to predict user behavior more accurately. The purpose of this research is to compare and contrast three widely-used data-clustering methods. Clustering techniques include partitioning, hierarchy, density, grid, and fuzzy clustering. Machine learning, data mining, pattern recognition, image analysis, and bioinformatics are just a few of the many fields where clustering is utilized as an analytical technique. In addition to defining the various algorithms, specialized forms of cluster analysis, linking methods, and please offer a review of the clustering techniques used in the big data setting.
Article Details
References
J. A. Hartigan, Clustering Algorithms. New York: Wiley, 1975.
Anju, Preeti Gulia, Clustering In Big Data: A Review, International Journal of Computer Applications (0975 – 8887) Volume 153 – No3, November 2016.
K. Krishna and M. Narasimha Murty, Genetic K-Means Algorithm, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 3, JUNE 1999
Osama Mahmoud Abu Abbas, Comparisons Between Data Clustering Algorithms, International Arab Journal of Information Technology , Vol. 5, No. 3, 2008, pp.: 320-325.
Jinghua Zhao, Wenbo Zhang, Yanwei Liu, Improved K-Means Cluster Algorithm in Telecommunications Enterprises Customer Segmentation, 978-1-4244-6943-7/10/$26.00 ©2010 IEEE.
Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C, Application of k-Means Clustering algorithm for prediction of Students’ Academic Performance, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, o. 1, 2010. ISSN 1947-5500.
Youguo Li, Haiyan Wu, A Clustering Method Based on K-Means Algorithm, 2012 International Conference on Solid State Devices and Materials Science, Available online atwww.sciencedirect.com, Physics Procedia 25 (2012) 1104 – 1109, doi:10.1016/j.phpro.2012.03.206.
Hailong Chen, Chunli Liu, Research and Application of Cluster Analysis Algorithm, 2013 2nd International Conference on Measurement, Information and Control, 978-1-4799-1392-3/13/$31.00 m013 IEEE.
Ali Farki, Reza Baradaran Kazemzadeh, and Elham AkhondzadehNoughabi, A Novel Clustering-Based Algorithm for Continuous and Noninvasive Cuff-Less Blood Pressure Estimation, Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 3549238, 13 pageshttps://doi.org/10.1155/2022/3549238.
Rupanka Bhuyan1 ,Samarjeet Borah2, A Survey of Some Density Based Clustering Techniques, https://www.researchgate.net/publication/265381945, DOI: 10.13140/2.1.4554.6887.
Pradeep Singh, Prateek A. Meshram, Survey of Density-Based Clustering Algorithms and its Variants, Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017) IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9.
T. Soni Madhulatha, AN OVERVIEW ON CLUSTERING METHODS, IOSR Journal of Engineering Apr. 2012, Vol. 2(4) pp: 719-725. ISSN: 2250-3021.
Peerzada Hamid Ahmad1, Dr. Shilpa Dang2, Performance Evaluation of Clustering Algorithms Using Different Datasets, International Journal of Advance Research in Computer Science and Management Studies, ISSN: 232 7782 1 (Online).
GATH AND A. B. GEVA, Unsupervised Optimal Fuzzy Clustering, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. II. NO. 7. JULY 1989.
T. Velmurugan and T.Santhanam, A Survey of Partition based Clustering algorithms in Data mining: An Experimental Approach,
Hany Alashwal 1 *, Mohamed El Halaby 2†, Jacob J. Crouse3, Areeg Abdalla2 and Ahmed A. Moustafa4, The Application of Unsupervised Clustering Methods to Alzheimer’s Disease, Front. Comput Neurosci 13:31. doi: 10.3389/fncom.2019.00031
Rong Zhou, 1,2 Yong Zhang, 1 Shengzhong Feng,1 and Nurbol Luktarhan3, A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets, Hindawi, Complexity Volume 2018, Article ID 2032461, 8 pageshttps://doi.org/10.1155/2018/2032461.
Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE, Survey of Clustering Algorithms, IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005.
Shadi Banitaan, Ali Bou Nassif, Mohammad Azzeh, Class Decomposition using K-means and Hierarchical Clustering, 2015 IEEE 14th International Conference on Machine Learning and Applications, 978-1-5090-0287-0/15 $31.00 © 2015 IEEE, DOI 10.1109/ICMLA.2015.169.
A. Fahad, N. Alshatri, Z. Tari, Member, IEEE, A. Alamri, I. Khalil A. Zomaya, Fellow, IEEE, S. Foufou, and A. Bouras, A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis, 10.1109/TETC.2014.2330519, IEEE Transactions on Emerging Topics in Computing.
Hong Changa, Dit-Yan Yeungb, ?, Robust path-based spectral clustering, Pattern Recognition 41 (2008) 191 – 203, 0031-3203/$30.00 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2007.04.010.
JuhaVesanto and EsaAlhoniemi, Student Member, IEEE, Clustering of the Self-Organizing Map,IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000.
CURE: AN EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATABASES+ ,SUDIPTO GUHA1, RAJEEV RASTOGI2, and KYUSEOK SHIMS3, Information Systems Vol. 26, No. 1, pp. 35-58,ZOOl 8 2001 Published by Elsevier Science Ltd. Printed in Great Britain 0306-4379/01.
Research on k-means Clustering Algorithm an Improved k-means Clustering Algorithm, Third International Symposium on Intelligent Information Technology and Security Informatics, 978-0-7695-4020-7/10 $26.00 © 2010 IEEE, DOI 10.1109/IITSI.2010.74.
Clustering with fuzzy supervised algorithm Fong-Jhu Yih1 , Yuan-Horng Lin2 and Jeng-Ming Yih3,a, MATEC Web of Conferences 119, 01007 (2017) DOI: 10.1051/matecconf/201711901007 IMETI 2016.
Christopher M. Bishop (2006) Pattern Recognition and Machine Learning, Springer ISBN 0-387-31073-8.
Yunus DO?AN, Derya B?RANT, Alp KUT, A New Approach for Weighted Clustering Using Decision Tree, 978-1-61284-922-5/11/$26.00 2011 IEEE.
Depa Pratima, Nivedita Nimmakanti, Pattern Recognition Algorithms for Cluster Identification Problem, International Journal of Computer Science & Informatics (IJCSI), ISSN (PRINT) : 2231–5292, Vol.- 2, Issue-3.
MUSTAFA TAREQ1, ELANKOVAN A. SUNDARARAJAN2, AN EVOLVING APPROACH TO DATA STREAMS CLUSTERING BASED ON CHEBYCHEV WITH FALSE MERGING, Journal of Theoretical and Applied Information Technology, 15th May 2021. Vol.99. No 9 © 2021 Little Lion Scientific.
Joelson Antonio dos Santos, Talat Iqbal Syed, Murilo C. Naldi ^, Ricardo J. G. B. Campello, and Joerg Sander, Hierarchical Density-Based Clustering Using MapReduce, IEEE TRANSACTIONS ON BIG DATA, VOL. 7, NO. 1, JANUARY-MARCH 2021,Digital Object Identifier no. 10.1109/TBDATA.2019.2907624.
GholamhoseinSheikholeslami, Surojit Chatterjee, Aidong Zhang, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases, The VLDB Journal c Springer-Verlag 2000.
Fionn Murtagh (1, 2) and Pedro Contreras (2), Methods of Hierarchical Clustering, arXiv:1105.02121v1[cs.IR] 30 Apr 2011.
(Tang 2005) Tang, Z., and MacLennan, J.: Data Mining with SQL Server 2005. Wiley Publishing, Inc., Indianapolis, Indiana, USA (2005) Chapter 8
Alauddin Yousif Al-Omary a*, Mohammad Shahid Jamil b, A new approach of clustering based machine-learning algorithm, Knowledge-Based Systems 19 (2006) 248–258, www.elsevier.com/locate/knosys.
Encyclopedia of Machine Learning pp 220–221
Ira Assent?, Clustering high dimensional data, WIREs Data Mining and Knowledge Discovery, c 2012 John Wiley & Sons, Inc. DOI: 10.1002/widm.1062 .
MacQueen J (1967) some methods for classification and analysis of multivariate observations. ProcFifth Berkeley Symp Math Stat Probab 1:281–297.
Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl36:3336–3341.
Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam). Finding group’s indata: an introduction to cluster analysis. Wiley, Hoboken.
Kaufman L, Rousseeuw P (2008) Finding groups in data: an introduction to cluster analysis, vol 344.Wiley, Hoboken. Doi: 10.1002/9780470316801 .
L. Kaufman and P.J. Rousseeuw. (1990) Finding Groups in Data:an Introduction to Cluster Analysis, John Wiley & Sons.
Zhang T, Ramakrishnan R, LivnyM (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25:103– 104.
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. ACMSIGMOD Rec 27:73–84
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512-521.
Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling.Computer 32:68–75.
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–231.
AnkerstM, BreunigM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clusteringstructure. In: Proceedings on 1999 ACMSIGMOD international conference on management of data, vol 28, pp 49–60.
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining 98: 58–65.
Das, S. K. ., Pani, S. K. ., Padhy, S. ., Dash, S. ., & Acharya, A. K. . (2023). Application of Machine Learning Models for Slope Instabilities Prediction in Open Cast mines. International Journal of Intelligent Systems and Applications in Engineering, 11(1), 111–121. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2449.
Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett76:175–181.
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial datamining. In VLDB, pp 186–195.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmodinternationalconference on management of data, vol 27, pp 94–105.
Hinneburg, A., & Keim, D. A. (1999). Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering.
Wu, C. J. (1983). On the convergence properties of the EM algorithm. The Annals of statistics, 95-103.
FisherD(1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172.
Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1-3), 1-6.
AblaChouni Benabella1*, Asmaa Benghabrit2, Imane Bouhaddou3, A survey of clustering algorithms for an industrial context, Second International Conference on Intelligent Computing in Data Sciences (ICDS 2018), Available online at www.sciencedirect.com, Procedia Computer Science 148 (2019) 291–302, 10.1016/j.procs.2019.01.022.
Tian Zhang, Raghu Ramakrishnan,MironLivny”, BIRCH: An Efficient Data Clustering Method for Very Large Databases, SIGMOD ’96 6/96 Montreal, Canada IQ 1996 ACM 0-89791 -794-4/96/0006 ...$3.5.
SI:~IPTO GUHA~, RAJEEV RASTOGI’, and KYUSEOK SHIMS, ROCK: A R.OBUST CLUSTERING ALGORITHM FOR, CATEGORICAL ATTRIBUTES+, InformafiotlSysrems Vol. 25, No. 5, pp. 345-366, 2000 0 2000 Published by Elsevier Science Ltd. Printed in Great Britain 0306-4379100 $20.00.
G. Naga Rama Devi, Comparative Study on Machine Learning Algorithms using Weka, International Journal of Engineering Research & Technology (IJERT) IJERT www.ijert.org NCDMA - 2014 Conference Proceedings ISSN: 2278-0181.