Building Scalable Data Replication Pipelines for Real-Time Analytics

Authors

  • Shiva Santosh Allenki Software Engineer at Bank of America, USA. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V6I1P108

Keywords:

Data Replication, Real-Time Analytics, Distributed Systems, Scalability, Streaming Architecture, ETL, Fault Tolerance, Big Data Pipelines, Apache Kafka, Cloud Computing

Abstract

Real-time analytics have become integral in decision-making for companies in the current data-driven era. Nevertheless, conventional data replication mechanisms frequently encounter issues related to scalability, latency, and data consistency when they are challenged with the increasing volume and velocity of data. This document presents a scalable data replication pipeline that aims to achieve high-throughput and low-latency data synchronization across distributed systems for real-time analytical processing. The newly introduced model is modular and resilient, integrating stream-based data ingestion, distributed messaging, and adaptive load balancing to allow for dependable and timely replication of any-source data without interruption in the flow. Its design features parallel data processing and fault-tolerant components capable of handling large-scale analytical workloads thus, it is free from limitations in data scale. The comparison of the different replication frameworks on real-time datasets revealed that this method significantly expedites data replication, enhances system fault tolerance, and increases throughput. The results demonstrate how the architecture offered enabling scaling to be done effortlessly across cloud and hybrid environments while still being able to maintain consistency even when there are network fluctuations and excessive data inflow rates. The flexibility of the system beyond performance is an advantage that supports the future AI-driven analytics platforms and edge computing infrastructures integration. This article through robust real-time data pipelines not only provides a viable framework but also opens the way for distributed data ecosystems that are capable of supporting instant analytics at scale.

References

[1] Doherty, Conor, and Gary Orenstein. "Building Real-Time Data Pipelines." (2015).

[2] Goel, Anil K., et al. "Towards scalable real-time analytics." (2015).

[3] Singu, Santosh Kumar. "Designing scalable data engineering pipelines using Azure and Databricks." ESP Journal of Engineering & Technology Advancements 1.2 (2021): 176-187.

[4] Pala, Sravan Kumar. "Databricks Analytics: empowering data processing, machine learning and real-time analytics." Machine Learning 10.1 (2021).

[5] Baljak, Valentina, et al. "A scalable realtime analytics pipeline and storage architecture for physiological monitoring big data." Smart Health 9 (2018): 275-286.

[6] Takkalapally, DevenderRao. “HoloSearchAI: AI-Driven Latency Optimization Framework for Distributed Search Systems”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 4, no. 3, Sept. 2023, pp. 217-2

[7] Goodhope, Ken, et al. "Building LinkedIn's Real-time Activity Data Pipeline." IEEE Data Eng. Bull. 35.2 (2012): 33-45.

[8] Parakala, Adityamallikarjunkumar. "Vendor Highlights–IoT, AI, and Process Mining." International Journal of Emerging Trends in Computer Science and Information Technology 4.4 (2023): 135-146.

[9] Psaltis, Andrew. Streaming Data: Understanding the real-time pipeline. Simon and Schuster, 2017.

[10] Kumar Doodala, Appala Nooka. “Offline-First Android Architecture for Waste Management in Low Connectivity Zones”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 4, no. 1, Mar. 2023, pp. 201-9.

[11] Vítor, Gonçalo, et al. "A scalable approach for smart city data platform: Support of real-time processing and data sharing." Computer Networks 213 (2022): 109027.

[12] Muppaneni, Rajarshi Krishna. “AI-Driven Forecasting in Dynamics 365 Sales: What Businesses Need to Know”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 1, Mar. 2023, pp. 168-76

[13] Wang, Xin, et al. "Reproducible and portable big data analytics in the cloud." IEEE Transactions on Cloud Computing 11.3 (2023): 2966-2982.

[14] Muppaneni, Kavya, and Mahesh Vejella. “Security and Data Privacy in Redux Stores”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 4, no. 4, Dec. 2023, pp. 153-62.

[15] Xu, Donna, et al. "Making real time data analytics available as a service." Proceedings of the 11th International ACM SIGSOFT Conference on Quality of Software Architectures. 2015.

[16] Gaddam, Rohit Reddy. “Progressive Delivery for Models With Quality KPIs”. American International Journal of Computer Science and Technology, vol. 5, no. 4, July 2023, pp. 33-47

[17] Gupta, Sumit. "Real-Time Big Data Analytics." (2016).

[18] Katangoori, Sivadeep, and Anudeep Katangoori. "Data-Centric AI in the Era of Large Volumes: Improving Model Outcomes through Data Quality Engineering." American Journal of Data Science and Artificial Intelligence Innovations 3 (2023): 430-457.

[19] Parakala, Adityamallikarjunkumar. "Citizen-Facing Automation: Chatbots and Self-Service in Public Services." International Journal of AI, BigData, Computational and Management Studies 4.4 (2023): 108-118.

[20] Singu, Santosh Kumar. "Real-Time Data Integration: Tools, Techniques, and Best Practices." ESP Journal of Engineering & Technology Advancements 1.1 (2021): 158-172.

[21] Suryadevara, Siva Sai Krishna, and Kareem Shaik. “Real-Time Anomaly Detection and Attack Mitigation for Cloud-Based Content Delivery Paths Using AI”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 1, Mar. 2023, pp. 175-8.

[22] Satyanarayanan, Aravind. "Optimizing Data Quality in Real-Time: A Self-Healing Pipeline Approach." International Journal of AI, BigData, Computational and Management Studies 3.2 (2022): 62-80.

[23] Amini, Sasan, Ilias Gerostathopoulos, and Christian Prehofer. "Big data analytics architecture for real-time traffic control." 2017 5th IEEE international conference on models and technologies for intelligent transportation systems (MT-ITS). IEEE, 2017.

[24] George, Jobin. "Optimizing hybrid and multi-cloud architectures for real-time data streaming and analytics: Strategies for scalability and integration." World Journal of Advanced Engineering Technology and Sciences 7.1 (2022): 10-30574.

Downloads

Published

2024-01-20

Issue

Section

Articles

How to Cite

[1]
S. S. Allenki, “Building Scalable Data Replication Pipelines for Real-Time Analytics”, AIJCST, vol. 6, no. 1, pp. 71–81, Jan. 2024, doi: 10.63282/3117-5481/AIJCST-V6I1P108.

Similar Articles

11-20 of 221

You may also start an advanced similarity search for this article.