Micro-Batch Financial Data Aggregation: Leveraging Throttling for Scalable and Reliable Pipelines
DOI:
https://doi.org/10.63282/3117-5481/AIJCST-V7I6P109Keywords:
Micro-Batch, Throttling, Backpressure, Financial Data Aggregation, Streaming, Spark Structured Streaming, Kafka, Rate Limiting, Reliability, ScalabilityAbstract
Micro-batch processing has emerged as a pragmatic middle ground between monolithic batch ETL and true record-at-a-time streaming, offering predictable throughput, simplified semantics and easy integration with batch-oriented sinks. In financial systems, where high-volume market feeds, transaction logs and customer event streams coexist with strict consistency, latency and compliance requirements; micro-batching combined with intelligent throttling (rate limiting and backpressure strategies) provides an effective approach to build scalable, resilient and cost-efficient aggregation pipelines. This paper reviews core concepts of micro-batching and throttling, examines architectural patterns and trade-offs important to financial data aggregation and presents design recommendations, operational controls and evaluation metrics. We also discuss integration with modern streaming platforms and highlight practical techniques (adaptive throttling, prioritized queues, idempotent sinks and checkpointing) that together deliver reliable, exactly-once or strongly consistent aggregation with predictable resource usage
References
[1] Apache Spark Project. (2023). Structured Streaming Programming Guide. Apache Software Foundation.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
[2] Sharad, S. (2024). How Apache Spark handles micro-batches and file processing in streaming workloads.
[3] Fedorovych, I. (2024). Performance benchmarking of continuous processing and micro-batching.
[4] DesignAndExecute. (2025). How to Manage Backpressure in Kafka. https://designandexecute.com
[5] Microsoft Azure HDInsight Team. (2023). Exactly-once semantics with Apache Spark Streaming. Microsoft Documentation.
[6] Databricks. (2025). Use foreachBatch to write to arbitrary data sinks
[7] DesignGurus. (2025). Backpressure in streaming data systems: Concepts and strategies. DesignGurus Publications.
