Scalable Computational Frameworks for Big Data Processing in Multi-Cloud Environments

Authors

  • Dr. Sarantuya Bolormaa School of Data Science and Digital Technology, National University of Mongolia, Ulaanbaatar, Mongolia. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V6I6P102

Keywords:

Big Data, Multi-Cloud Computing, Distributed Systems, Scalability, Cloud Orchestration, Data Analytics, Resource Optimization, Security

Abstract

The level of growth in big data remains shocking with the increased usage of digital systems, Internet of Things, industrialization and automation, in addition to sophisticated artificial intelligence application. In order to manage, process and extract insights of large-scale heterogeneous data, cloud-based infrastructures are gaining more and more popularity among organizations and researchers. Nonetheless, trusting a single cloud provider implies various constraints including vendor lock-in, a lack of global distribution, and much operational riskiness, as well as poor performance at peak demand. This has inspired the transformation towards multi-cloud environments, in which distributed computational workloads can trade-off between a variety of cloud environments (e.g., AWS, Azure, GCP) to provide greater elasticity, resilience, cost optimization, and data governance conscious of geo-location. In this paper, a full-fledged Scalable Computational Framework (SCF), which is meant to support big data analytics in the multi-cloud system, is presented. The framework combines the notion of container orchestration, federated storage system, real-time monitoring, and a smart scheduling scheme using the latency-conscious and predictive resource allocation models. All these components are aimed at maximizing the computational throughput and data locality optimization without violating privacy and security requirements. The architecture under proposal will be able to scale horizontally, to have hybrid processing pipelines (batch + streaming) and inter-cloud data synchronization with fault tolerance. The literature review includes an analysis of the state-of-the-art big data platforms, distributed processing engines (Spark, Flink, Presto), and resource federation methods across multiple providers. The areas of interest of methodology include a better Directed Acyclic Graph (DAG)-based job planner, dynamic scaling of microservice, as well as cross-cloud Secure Data Exchange (SDE) protocols. Synthetic and real world datasets are used in order to evaluate performances with different throughputs and cluster distributions. The findings indicate that processing speed has improved by up to 43 percentage points, the operational cost is reduced by 31 percentage points, and 97.2 percent system availability at cross-cloud failover recovery tests. The contribution represents a step forward in the practices of large-scale data engineering by closing the gaps in the capabilities of interoperability, automation, and resilience of multi-cloud big data workloads. There are suggested areas of applicability in the study to smart cities, healthcare analytics, finance, and science studies. The further improvement involves auto-scaling through reinforcement learning, the introduction of confidential computing and carbon-conscious workload distribution strategies

References

[1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.

[2] T. White, Hadoop: The Definitive Guide, 4th ed. O’Reilly Media, 2015.

[3] M. Isard et al., “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,” in Proc. EuroSys, 2007.

[4] M. Zaharia et al., “Apache Spark: Cluster Computing with Working Sets,” in Proc. HotCloud, 2010.

[5] P. Carbone et al., “Apache Flink: Stream and Batch Processing in a Single Engine,” IEEE Data Eng. Bull., vol. 38, no. 4, pp. 28–38, 2015.

[6] K. Tzoumas et al., “A Stream Processing Perspective on Apache Flink and Real-time Analytics,” VLDB Endowment, 2019.

[7] D. Phillips et al., “Presto: SQL on Everything,” Meta Engineering, Tech. Report, 2015.

[8] Apache Trino Authors, Trino: The Definitive Guide. O’Reilly Media, 2022.

[9] P. Mell and T. Grance, “The NIST Definition of Cloud Computing,” NIST Special Publication, 2011.

[10] D. Bernstein, “Cloud Federation: Revolution or Evolution?” in Proc. IEEE Cloud, 2012.

[11] R. Buyya et al., “Multi-Cloud Deployment of Applications: Taxonomy and Research Challenges,” ACM Comput. Surveys, vol. 48, no. 1, pp. 1–42, 2015.

[12] A. M. AlJahdali et al., “Hybrid Cloud Adoption: Motivations, Challenges, and Best Practices,” Future Internet, vol. 12, no. 11, pp. 1–18, 2020.

[13] V. Andrikopoulos et al., “Polyglot and Poly-Cloud Architectures: Trends and Challenges,” IEEE Software, vol. 35, no. 2, pp. 38–45, 2018.

[14] M. Bittencourt et al., “Application Scheduling in Multi-Cloud Environments Using Reinforcement Learning,” Future Gener. Comput. Syst., vol. 90, pp. 359–372, 2019.

[15] S. Dustdar et al., “Principles of Elasticity in Cloud Computing,” IEEE Internet Comput., vol. 15, no. 5, pp. 66–75, 2011.

[16] Mohanarajesh Kommineni. Revanth Parvathi. (2013) Risk Analysis for Exploring the Opportunities in Cloud Outsourcing.

[17] Designing LTE-Based Network Infrastructure for Healthcare IoT Application - Varinder Kumar Sharma - IJAIDR Volume 10, Issue 2, July-December 2019. DOI 10.71097/IJAIDR.v10.i2.1540

[18] The Role of Zero-Emission Telecom Infrastructure in Sustainable Network Modernization - Varinder Kumar Sharma - IJFMR Volume 2, Issue 5, September-October 2020. https://doi.org/10.36948/ijfmr.2020.v02i05.54991

[19] Aragani, Venu Madhav and Maroju, Praveen Kumar and Mudunuri, Lakshmi Narasimha Raju, Efficient Distributed Training through Gradient Compression with Sparsification and Quantization Techniques (September 29, 2021). Available at SSRN: https://ssrn.com/abstract=5022841 or http://dx.doi.org/10.2139/ssrn.5022841

[20] P. K. Maroju, "Empowering Data-Driven Decision Making: The Role of Self-Service Analytics and Data Analysts in Modern Organization Strategies," International Journal of Innovations in Applied Science and Engineering (IJIASE), vol. 7, Aug. 2021.

[21] Lakshmi Narasimha Raju Mudunuri, “AI Powered Supplier Selection: Finding the Perfect Fit in Supply Chain Management”, IJIASE, January-December 2021, Vol 7; 211-231.

[22] Kommineni, M. "Explore Knowledge Representation, Reasoning, and Planning Techniques for Building Robust and Efficient Intelligent Systems." International Journal of Inventions in Engineering & Science Technology 7.2 (2021): 105- 114.

[23] Security and Threat Mitigation in 5G Core and RAN Networks - Varinder Kumar Sharma - IJFMR Volume 3, Issue 5, September-October 2021. DOI: https://doi.org/10.36948/ijfmr.2021.v03i05.54992

[24] Thirunagalingam, A. (2022). Enhancing Data Governance Through Explainable AI: Bridging Transparency and Automation. Available at SSRN 5047713.

[25] P. K. Maroju, "Conversational AI for Personalized Financial Advice in the BFSI Sector," International Journal of Innovations in Applied Sciences and Engineering, vol. 8, no.2, pp. 156–177, Nov. 2022.

[26] Kulasekhara Reddy Kotte. 2022. ACCOUNTS PAYABLE AND SUPPLIER RELATIONSHIPS: OPTIMIZING PAYMENT CYCLES TO ENHANCE VENDOR PARTNERSHIPS. International Journal of Advances in Engineering Research , 24(6), PP – 14-24, https://www.ijaer.com/admin/upload/02%20Kulasekhara%20Reddy%20Kotte%2001468.pdf

[27] Gopi Chand Vegineni. 2022. Intelligent UI Designs for State Government Applications: Fostering Inclusion without AI and ML, Journal of Advances in Developmental Research, 13(1), PP – 1-13, https://www.ijaidr.com/research-paper.php?id=1454

[28] Hullurappa, M. (2022). The Role of Explainable AI in Building Public Trust: A Study of AI-Driven Public Policy Decisions. International Transactions in Artificial Intelligence, 6.

[29] Bhagath Chandra Chowdari Marella, “Driving Business Success: Harnessing Data Normalization and Aggregation for Strategic Decision-Making”, International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING, vol. 10, no.2, pp. 308 – 317, 2022. https://ijisae.org/index.php/IJISAE/issue/view/87

[30] Garg, A. (2022). Unified Framework of Blockchain and AI for Business Intelligence in Modern Banking . International Journal of Emerging Research in Engineering and Technology, 3(4), 32-42. https://doi.org/10.63282/3050-922X.IJERET-V3I4P105

[31] Varinder Kumar Sharma - AI-Based Anomaly Detection for 5G Core and RAN Components - International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 06 Issue: 01 | Jan-2022 .DOI: 10.55041/IJSREM11453

[32] Thirunagalingam, A. (2023). Improving Automated Data Annotation with Self-Supervised Learning: A Pathway to Robust AI Models Vol. 7, No. 7,(2023) ITAI. International Transactions in Artificial Intelligence, 7(7)

[33] Praveen Kumar Maroju, "Optimizing Mortgage Loan Processing in Capital Markets: A Machine Learning Approach, " International Journal of Innovations in Scientific Engineering, 17(1), PP. 36-55 , April 2023.

[34] P. K. Maroju, "Leveraging Machine Learning for Customer Segmentation and Targeted Marketing in BFSI," International Transactions in Artificial Intelligence, vol. 7, no. 7, pp. 1-20, Nov. 2023. –

[35] Kulasekhara Reddy Kotte. 2023. Leveraging Digital Innovation for Strategic Treasury Management: Blockchain, and Real-Time Analytics for Optimizing Cash Flow and Liquidity in Global Corporation. International Journal of Interdisciplinary Finance Insights, 2(2), PP - 1 - 17, https://injmr.com/index.php/ijifi/article/view/186/45

[36] Mudunuri L.N.R.; (December, 2023); “AI-Driven Inventory Management: Never Run Out, Never Overstock”; International Journal of Advances in Engineering Research; Vol 26, Issue 6; 24-36

[37] Lakshmi Narasimha Raju Mudunuri, “Risk Mitigation Through Data Analytics: A Proactive Approach to Sourcing”, Excel International Journal of Technology, Engineering and Management, vol. 10, no.4, pp. 159-170, 2023, https://doi.uk.com/7.000100/EIJTEM

[38] S. Panyaram, "Connected Cars, Connected Customers: The Role of AI and ML in Automotive Engagement," International Transactions in Artificial Intelligence, vol. 7, no. 7, pp. 1-15, 2023.

[39] Hullurappa, M. (2023). Intelligent Data Masking: Using GANs to Generate Synthetic Data for Privacy-Preserving Analytics. International Journal of Inventions in Engineering & Science Technology, 9, 9.

[40] B. C. C. Marella, “Data Synergy: Architecting Solutions for Growth and Innovation,” International Journal of Innovative Research in Computer and Communication Engineering, vol. 11, no. 9, pp. 10551–10560, Sep. 2023.

[41] Krishna Chaitanaya Chittoor, “ANOMALY DETECTION IN MEDICAL BILLING USING MACHINE LEARNING ON BIG DATA PIPELINES”, INTERNATIONAL JOURNAL OF CURRENT SCIENCE, 12(3), PP-788-796,2022, https://rjpn.org/ijcspub/papers/IJCSP22C1314.pdf

[42] Mohanarajesh Kommineni. (2023/6). Investigate Computational Intelligence Models Inspired By Natural Intelligence, Such As Evolutionary Algorithms And Artificial Neural Networks. Transactions On Latest Trends In Artificial Intelligence. 4. P30. Ijsdcs.

[43] Settibathini, V. S., Kothuru, S. K., Vadlamudi, A. K., Thammreddi, L., & Rangineni, S. (2023). Strategic analysis review of data analytics with the help of artificial intelligence. International Journal of Advances in Engineering Research, 26, 1-10

[44] Venkata SK Settibathini. Optimizing Cash Flow Management with SAP Intelligent Robotic Process Automation (IRPA). Transactions on Latest Trends in Artificial Intelligence, 2023/11, 4(4), PP 1-21, https://www.ijsdcs.com/index.php/TLAI/article/view/469/189

[45] Sehrawat, S. K. (2023). The role of artificial intelligence in ERP automation: state-of-the-art and future directions. Trans Latest Trends Artif Intell, 4(4).

[46] Thallam, N. S. T. (2023). Comparative Analysis of Public Cloud Providers for Big Data Analytics: AWS, Azure, and Google Cloud. International Journal of AI, BigData, Computational and Management Studies, 4(3), 18-29.

[47] Arpit Garg. (2022). Behavioral biometrics for IoT security: A machine learning framework for smart homes. Journal of Recent Trends in Computer Science and Engineering, 10(2), 71–92. https://doi.org/10.70589/JRTCSE.2022.2.7

[48] Mukkala, S. R. (2023). A Proficient Hospital Ratings Aware Patient Churn Prediction And Prevention System Using Abg-Fuzzy And Ner-Gfjdkmeans. Educational Administration: Theory and Practice, 29 (03), 1407-1424 Doi: 10.53555/kuey. v29i3, 9511.

[49] Varinder Kumar Sharma - Cloud-Edge Continuum in 5G: A Latency-Aware Network Design Review -International Scientific Journal of Engineering and Management Volume: 02 Issue: 03 | Mar – 2023. DOI: 10.55041/ISJEM00133

Downloads

Published

2024-11-06

Issue

Section

Articles

How to Cite

[1]
S. Bolormaa, “Scalable Computational Frameworks for Big Data Processing in Multi-Cloud Environments”, AIJCST, vol. 6, no. 6, pp. 13–24, Nov. 2024, doi: 10.63282/3117-5481/AIJCST-V6I6P102.

Similar Articles

1-10 of 103

You may also start an advanced similarity search for this article.