High-Performance Distributed Database Partitioning Using Machine Learning-Driven Workload Forecasting and Query Optimization

Authors

  • Parameswara Reddy Nangi Independent Researcher, USA. Author
  • Chaithanya Kumar Reddy Nala Obannagari Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V6I2P102

Keywords:

Homomorphic Distributed Database Partitioning, Machine Learning–Driven Workload Forecasting, Ai-Assisted Query Optimization, Learned Cost Models, Cardinality Estimation

Abstract

The modern large-scale applications have distributed databases as their core, and it is challenging to keep the latency low and the load balanced during the highly variable workloads. The common way of using static range or hash partitioning and traditional cost-based optimizers is as a one-time tuned system that is then allowed to run under changing access patterns resulting in skew of data and communication across partitions and inconsistent performance. In this paper, a single architecture of the High-Performance Distributed Database Partitioning is suggested whereby the Machine Learning-Fueled Workload Forecasting is integrated with AI-Aided Query Optimization. Workload forecasting module takes in streams of telemetry, query arrival rate, key access frequencies, utilization of resources, and uses the time-series and deep sequence models to forecast short term traffic patterns and the occurrence of hot keys. These forecasts drive an adaptive partitioning engine that selects split, merge, and migration actions to rebalance shards while explicitly constraining data-movement overhead. Simultaneously, a query optimizer based on learning applies learned cost models and cardinality estimators which know the current and predicted partition layout, and use this to compute topology-aware join order, replica choice and operator choice. The two elements constitute the closed feedback loop with the distributed storage layer, constantly improving the models as per the observed performance. Benchmark and cloud-inspired workload experimental assessments show that it has made great leaps in terms of latency, throughput, and load balance compared to the state-of-the-art learned optimizers, and scaled down to convergence much more rapidly than static partitioning and converges with fewer repartitioning actions. The findings point to the promise of an intensive integration of forecasting and query optimization as a viable step to self-optimizing, workload-conscious distributed databases

References

[1] Li, G., Zhou, X., & Cao, L. (2021, October). Machine learning for databases. In Proceedings of the First International Conference on AI-ML Systems (pp. 1-2).

[2] Özsu, M. T., & Valduriez, P. (2014). Distributed and Parallel Database Systems.

[3] Saxena, D., Kumar, J., Singh, A. K., & Schmid, S. (2023). Performance analysis of machine learning centered workload prediction models for cloud. IEEE Transactions on Parallel and Distributed Systems, 34(4), 1313-1330.

[4] Mahmud, M. S., Huang, J. Z., Salloum, S., & Wang, S. (2020). A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 3(2), 85 101. https://doi.org/10.26599/BDMA.2019.9020015

[5] Ahamed, Z., Khemakhem, M., Eassa, F., Alsolami, F., Basuhail, A., & Jambi, K. (2023). Deep reinforcement learning for workload prediction in federated cloud environments. Sensors, 23(15), 6911.

[6] Gao, J., Wang, H., & Shen, H. (2020). Machine learning based workload prediction in cloud computing. In 2020 29th International Conference on Computer Communications and Networks (ICCCN) (pp. 1 7). IEEE. https://doi.org/10.1109/ICCCN49398.2020.9209730

[7] Datta, A., Tsan, B., Izenov, Y., & Rusu, F. (2023). Analyzing Query Optimizer Performance in the Presence and Absence of Cardinality Estimates. arXiv preprint arXiv:2311.17293.

[8] Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., Papaemmanouil, O., & Tatbul, N. (2019). Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11), 1705 1718. https://doi.org/10.14778/3342263.3342644

[9] Anneser, C., Tatbul, N., Cohen, D., Xu, Z., Pandian, P., Laptev, N., & Marcus, R. (2023). Autosteer: Learned query optimization for any sql database. Proceedings of the VLDB Endowment, 16(12), 3515-3527.

[10] Chen, X., Wang, Z., Liu, S., Li, Y., Zeng, K., Ding, B., ... & Zheng, K. (2023). Base: Bridging the gap between cost and latency for query optimization. Proceedings of the VLDB Endowment, 16(8), 1958-1966.

[11] Sudhakar, & Pandey, S. K. (2018). An approach to improve load balancing in distributed storage systems for NoSQL databases: MongoDB. In Progress in Computing, Analytics and Networking: Proceedings of ICCAN 2017 (pp. 529-538). Singapore: Springer Singapore.

[12] Bai, Y., Chen, L., Lei, Y., & Xie, H. (2023, September). A deep learning prediction approach for machine workload in cloud computing. In 2023 5th international conference on data-driven optimization of complex systems (DOCS) (pp. 1-8). IEEE.

[13] Alsultanny, Y. (2010). Database management and partitioning to improve database processing performance. Journal of Database Marketing & Customer Strategy Management, 17(3), 271-276.

[14] Pothu, S. N., & Kailasam, D. S. (2023). Comparative Analysis of Predictive Models for Workload Scaling in Iaas Clouds: A Study on Model Effectiveness and Adaptability. Journal of Theoretical and Applied Information Technology, 101(23), 7574-7591.

[15] Feng, C., & Zhang, J. (2020). Assessment of aggregation strategies for machine-learning based short-term load forecasting. Electric Power Systems Research, 184, 106304.

[16] Siddiqui, T., Jindal, A., Qiao, S., Patel, H., & Le, W. (2020). Cost models for big data query processing: Learning, retrofitting, and our findings. Proceedings of the 2020 International Conference on Management of Data (SIGMOD). https://arxiv.org/abs/2002.12393

[17] Ghandeharizadeh, S., & DeWitt, D. J. (1990). Hybrid-Range Partitioning Skate y: A New Deelustering Strategy for Multiprocessor 8 atabase Machines.

[18] García, Á. L., De Lucas, J. M., Antonacci, M., Zu Castell, W., David, M., Hardt, M., ... & Wolniewicz, P. (2020). A cloud-based framework for machine learning workloads and applications. IEEE access, 8, 18681-18692.

[19] Yang, Z., Xu, Q., Gao, S., Yang, C., Wang, G., Zhao, Y., ... & Xiao, J. (2023). OceanBase Paetica: a hybrid shared-nothing/shared-everything database for supporting single machine and distributed cluster. Proceedings of the VLDB Endowment, 16(12), 3728-3740.

[20] Wang, F., Zhang, W., Lai, S., Hao, M., & Wang, Z. (2021). Dynamic GPU energy optimization for machine learning training workloads. IEEE Transactions on Parallel and Distributed Systems, 33(11), 2943-2954.

[21] Sundar, D. (2023). Serverless Cloud Engineering Methodologies for Scalable and Efficient Data Pipeline Architectures. International Journal of Emerging Trends in Computer Science and Information Technology, 4(2), 182–192. https://doi.org/10.63282/3050-9246.IJETCSIT-V4I2P118

[22] Bhat, J., & Sundar, D. (2022). Building a Secure API-Driven Enterprise: A Blueprint for Modern Integrations in Higher Education. International Journal of Emerging Research in Engineering and Technology, 3(2), 123–134. https://doi.org/10.63282/3050-922X.IJERET-V3I2P113

[23] Jayaram, Y. (2023). Cloud-First Content Modernization: Migrating Legacy ECM to Secure, Scalable Cloud Platforms. International Journal of Emerging Research in Engineering and Technology, 4(3), 130–139. https://doi.org/10.63282/3050-922X.IJERET-V4I3P114

[24] Sundar, D., & Jayaram, Y. (2022). Composable Digital Experience: Unifying ECM, WCM, and DXP through Headless Architecture. International Journal of Emerging Research in Engineering and Technology, 3(1), 127–135. https://doi.org/10.63282/3050-922X.IJERET-V3I1P113

[25] Bhat, J. (2023). Strengthening ERP Security with AI-Driven Threat Detection and Zero-Trust Principles. International Journal of Emerging Trends in Computer Science and Information Technology, 4(3), 154–163. https://doi.org/10.63282/3050-9246.IJETCSIT-V4I3P116

[26] Jayaram, Y., & Bhat, J. (2022). Intelligent Forms Automation for Higher Ed: Streamlining Student Onboarding and Administrative Workflows. International Journal of Emerging Trends in Computer Science and Information Technology, 3(4), 100–111. https://doi.org/10.63282/3050-9246.IJETCSIT-V3I4P110

[27] Sundar, D. (2022). Architectural Advancements for AI/ML-Driven TV Audience Analytics and Intelligent Viewership Characterization. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(1), 124–132. https://doi.org/10.63282/3050-9262.IJAIDSML-V3I1P113

[28] Bhat, J. (2022). The Role of Intelligent Data Engineering in Enterprise Digital Transformation. International Journal of AI, BigData, Computational and Management Studies, 3(4), 106–114. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V3I4P111

[29] Jayaram, Y., & Sundar, D. (2023). AI-Powered Student Success Ecosystems: Integrating ECM, DXP, and Predictive Analytics. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 4(1), 109–119. https://doi.org/10.63282/3050-9262.IJAIDSML-V4I1P113

[30] Sundar, D. (2023). Machine Learning Frameworks for Media Consumption Intelligence across OTT and Television Ecosystems. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 4(2), 124–134. https://doi.org/10.63282/3050-9262.IJAIDSML-V4I2P114

[31] Bhat, J., Sundar, D., & Jayaram, Y. (2022). Modernizing Legacy ERP Systems with AI and Machine Learning in the Public Sector. International Journal of Emerging Research in Engineering and Technology, 3(4), 104–114. https://doi.org/10.63282/3050-922X.IJERET-V3I4P112

[32] Jayaram, Y. (2023). Data Governance and Content Lifecycle Automation in the Cloud for Secure, Compliance-Oriented Data Operations. International Journal of AI, BigData, Computational and Management Studies, 4(3), 124–133. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V4I3P113

[33] Bhat, J., & Jayaram, Y. (2023). Predictive Analytics for Student Retention and Success Using AI/ML. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 4(4), 121–131. https://doi.org/10.63282/3050-9262.IJAIDSML-V4I4P114

[34] Sundar, D., Jayaram, Y., & Bhat, J. (2022). A Comprehensive Cloud Data Lakehouse Adoption Strategy for Scalable Enterprise Analytics. International Journal of Emerging Research in Engineering and Technology, 3(4), 92–103. https://doi.org/10.63282/3050-922X.IJERET-V3I4P111

[35] Jayaram, Y., Sundar, D., & Bhat, J. (2022). AI-Driven Content Intelligence in Higher Education: Transforming Institutional Knowledge Management. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(2), 132–142. https://doi.org/10.63282/3050-9262.IJAIDSML-V3I2P115

[36] Sundar, D., & Bhat, J. (2023). AI-Based Fraud Detection Employing Graph Structures and Advanced Anomaly Modeling Techniques. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 4(3), 103–111. https://doi.org/10.63282/3050-9262.IJAIDSML-V4I3P112

[37] Bhat, J. (2023). Automating Higher Education Administrative Processes with AI-Powered Workflows. International Journal of Emerging Trends in Computer Science and Information Technology, 4(4), 147–157. https://doi.org/10.63282/3050-9246.IJETCSIT-V4I4P116

[38] Jayaram, Y., & Sundar, D. (2022). Enhanced Predictive Decision Models for Academia and Operations through Advanced Analytical Methodologies. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(4), 113–122. https://doi.org/10.63282/3050-9262.IJAIDSML-V3I4P113

Downloads

Published

2024-03-06

Issue

Section

Articles

How to Cite

[1]
P. R. Nangi and C. K. Reddy Nala Obannagari, “High-Performance Distributed Database Partitioning Using Machine Learning-Driven Workload Forecasting and Query Optimization”, AIJCST, vol. 6, no. 2, pp. 11–21, Mar. 2024, doi: 10.63282/3117-5481/AIJCST-V6I2P102.

Similar Articles

11-20 of 124

You may also start an advanced similarity search for this article.