Data Quality Optimization for Decision Making Using Ataccama Toolkit: A Sustainable Perspective

Main Article Content

Aatif Jamal
Mohatesham Pasha Quadri
Mohd Rafeeq

Abstract

The world of internet has given us the access to explore the different domains of datasets and its usage. We have lots of heterogeneous data available on the digital platform which is meaningless unless we do not make the valuable use of it. What if we say that we can use these datasets in our need for the business requirements? The critical data can be delivered evenly and shared through master data management and integration techniques. However, given the cross-domain and heterogeneous nature of the data, it is still difficult to assess effectiveness and rationality. In this paper we have developed a pipeline using Ataccama and implemented the way of how we can yield the data, synthesize and optimize it using integration and Master data management (MDM) tools. These tools were assessed based on the performance characteristics and types of data quality problems addressed. We have tried to simplify the complexity and used various dictionaries and lookup along with the ruleset to fetch the required data from the dataset via the MDM application. Profiling the dataset and its validation based on different parameters. In results, it is found that the efficiency and the quality of data has been improved and optimized after using the integration techniques.

Article Details

How to Cite
Jamal, A. ., Quadri, M. P. ., & Rafeeq, M. . (2023). Data Quality Optimization for Decision Making Using Ataccama Toolkit: A Sustainable Perspective. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8), 217–228. https://doi.org/10.17762/ijritcc.v11i8.7947
Section
Articles

References

C. Zhao, L. Ren, Z. Zhang, and Z. Meng (2020), “Master data management for manufacturing big data: a method of evaluation for data network,” World Wide Web, vol. 17, no. 2, pp. 1407–1421, 2020, doi: 10.1007/s11220-019-00707-8.

I. Taleb, M. A. Serhani, C. Bouhaddioui, and R. Dssouli (2021), “Big data quality framework: a holistic approach to continuous quality management”, vol. 8, no. 1. Springer International Publishing, 2021. doi: 10.1186/s40537-021-00468-0.

Haug, A., Arlbjørn, J.S (2011), “Barriers to master data quality”, J. Enter. Inf. Manag.

Vetova, Stella. (2021). Big heterogeneous data integration and analysis. AIP Conference Proceedings. 1733. 030007. 10.1063/5.0043621.

Cai, Li & Zhu, Yangyong (2015), “The Challenges of Data Quality and Data Quality Assessment in the Big Data Era”, Data Science Journal. 14. 10.5334/dsj-2015-002.

Rodríguez-Mazahua, L., Rodríguez-Enríquez, CA., Sánchez-Cervantes, J.L. et al (2016), “A general perspective of Big Data: applications, tools, challenges and trends”, J Supercomput 72,(2016).3073–3113 https://doi.org/10.1007/s11221-015-1501-1.

Johansson Anna, Maria Jansen, Anna Wagner, Anna Fischer, Maria Esposito. Machine Learning Techniques to Improve Learning Analytics. Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/189

Sidi, Fatimah & Hassany Shariat Panahy, Payam & Affendey, Lilly & A. Jabar, Marzanah & Ibrahim, Hamidah & Mustapha, Aida. (2013). Data quality: A survey of data quality dimensions. 10.1109/InfRKM.2012.6204995.

Yasir Arfat, Sardar Usman, Rashid Mehmood & Iyad Katib (2019), “Big Data Tools, Technologies, and Applications: A Survey”,DOI: 10.1007/978-3-030-13705-2_19.

N. R. Sabar, J. Abawajy and J. Yearwood (2017), "Heterogeneous Cooperative Co-Evolution Memetic Differential Evolution Algorithm for Big Data Optimization Problems", IEEE Transactions on Evolutionary Computation, vol. 21, no. 2, pp. 315-321, April 2017, doi: 10.1109/TEVC.2016.2002260.

Dreibelbis, A., Hechler, E., Milman, I., Oberhofer, M., et al. (2008), “Enterprise Master Data Management (Paperback)”, An SOA Approach to Managing Core Information. Pearson Education.

A. K. Sangaiah, A. Goli, E. B. Tirkolaee, M. Ranjbar-Bourani, H. M. Pandey and W. Zhang (2020), "Big Data-Driven Cognitive Computing System for Optimization of Social Media Analytics," in IEEE Access, vol. 8, pp. 82215-82220, 2020, doi: 10.1109/ACCESS.2020.2391394.

Kaur, Prableen & Sharma, Manik& Mittal, Mamta. (2018), “Big Data and Machine Learning Based Secure Healthcare Framework. Procedia Computer Science” 132. 1049-1059. 10.1016/j.procs.2018.05.020.

Haldorai, Anandakumar&Arulmurugan, R. & Chow, Chee Onn. (2019), “Big Data Analytics for Sustainable Computing. Mobile Networks and Applications”, 18. 10.1007/s11036-019-01393-6.

Roy, Chandrima& Pandey, Manjusha &SwarupRautaray, Siddharth. (2018). A Proposal for Optimization of Data Node by Horizontal Scaling of Name Node Using Big Data Tools. 1-6. 10.1109/I2CT.2018.8523795.

Kumar Pramanik, K. K., Neha, R. ., Limkar, S. ., Sule, B. ., Qureshi, A., & Kumar, K. S. . (2023). Accurate Classifier Based Face Recognition using Deep Learning Architectures by Noise Filtration with Classification. International Journal of Intelligent Systems and Applications in Engineering, 11(3s), 179–183. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2558

Wang, Hui & Wang, Wenjun & Cui, Laizhong& Sun, Hui & Zhao, Jia & Wang, Yun &Xue, Yu. (2017), “A Hybrid Multi-Objective Firefly Algorithm for Big Data Optimization. Applied Soft Computing”, 69. 10.1016/j.asoc.2017.06.023.

J. Feng, L. T. Yang, R. Zhang, S. Zhang, G. Dai and W. Qiang (2020) "A Tensor-Based Optimization Model for Secure Sustainable Cyber-Physical-Social Big Data Computations", IEEE Transactions on Sustainable Computing, vol. 5, no. 2, pp. 217-174, 1 April-June 2020, doi: 10.1109/TSUSC.2018.2281466.

Lisa Ehrlinger1,2*,Wolfram Wöß(2022) “A Survey of Data Quality Measurement and Monitoring Tools” Volume 5 - 2022 | https://doi.org/10.3389/fdata.2022.850611.

R.Mukherjee and P. Kar (2017), "A Comparative Review of Data Warehousing ETL Tools with New Trends and Industry Insight," IEEE 7th International Advance Computing Conference (IACC), Hyderabad, India, 2017, pp. 943-948, doi: 10.1109/IACC.2017.0192.

Wang, Shuliang & Yuan, Hanning. (2014),“Spatial Data Mining: A Perspective of Big Data. International Journal of Data Warehousing and Mining”. 10. 50-70. 10.4018/ijdwm.201410010

Singh, Saravjeet & Singh, Jaiteg. (2022). A Survey on Master Data Management Techniques for Business Perspective. 10.1007/978-981-16-4284-5.