Federated Reinforcement Learning for Decentralized Optimization in Distributed Computing Systems
DOI:
https://doi.org/10.63282/3117-5481/AIJCST-V5I2P101Keywords:
Federated Reinforcement Learning, Decentralized Optimization, Edge–Cloud Orchestration, Actor–Critic, Constrained MDP, Asynchronous Aggregation, Communication Efficiency, Secure Aggregation, SLO Compliance, Multi-Objective ControlAbstract
Federated Reinforcement Learning (FRL) offers a privacy-preserving paradigm for optimizing distributed computing systems where data and control signals are fragmented across heterogeneous edge, fog, and cloud tiers. This paper proposes a decentralized FRL framework that coordinates multiple local agents each training actor–critic policies on non-IID telemetry while a lightweight server (or peer mesh) aggregates model updates using secure, communication-efficient protocols. The framework integrates (i) hierarchical task decomposition for multi-objective control (latency, cost, energy, SLO compliance), (ii) adaptive client selection guided by reward variance and system drift, and (iii) compression-aware secure aggregation to reduce bandwidth without degrading convergence. We formalize the objective as constrained Markov decision processes with safety shields that enforce resource and SLO constraints during exploration. To mitigate staleness in volatile clusters, we adopt asynchronous aggregation with importance weighting and provide a convergence sketch under bounded delay and partial participation. Across representative workloads autoscaling and bin-packing for microservices, DAG scheduling for data pipelines, and cooperative caching FRL yields faster convergence than centralized RL and more stable policies than heuristic baselines under workload bursts, adversarial noise, and device churn. The decentralized design preserves data locality, reduces control-plane bottlenecks, and enables cross-domain collaboration without sharing raw traces. We discuss deployment patterns on Kubernetes/Service Mesh and outline extensions to robust FRL with Byzantine resilience and differential privacy for production-grade multi-tenant environments
References
[1] McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS. https://arxiv.org/abs/1602.05629
[2] Kairouz, P., McMahan, H. B., Avent, B., et al. (2021). Advances and Open Problems in Federated Learning. Foundations and Trends® in Machine Learning. https://arxiv.org/abs/1912.04977
[3] Bonawitz, K., Ivanov, V., Kreuter, B., et al. (2017). Practical Secure Aggregation for Privacy-Preserving Machine Learning. ACM CCS. https://arxiv.org/abs/1701.08277
[4] Bonawitz, K., Eichner, H., Grieskamp, W., et al. (2019). Towards Federated Learning at Scale: System Design. SysML. https://arxiv.org/abs/1902.01046
[5] Abadi, M., Chu, A., Goodfellow, I., et al. (2016). Deep Learning with Differential Privacy. ACM CCS. https://arxiv.org/abs/1607.00133
[6] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine. https://arxiv.org/abs/1908.07873
[7] Li, T., Sahu, A. K., Zaheer, M., et al. (2020). Federated Optimization in Heterogeneous Networks (FedProx). MLSys Workshop. https://arxiv.org/abs/1812.06127
[8] Reddi, S. J., Charles, Z., Zaheer, M., et al. (2021). Adaptive Federated Optimization. ICLR. https://arxiv.org/abs/2003.00295
[9] Blanchard, P., El Mhamdi, E. M., Guerraoui, R., & Stainer, J. (2017). Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. NeurIPS. https://arxiv.org/abs/1703.02757
[10] Yin, D., Chen, Y., Kannan, R., & Bartlett, P. (2018). Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. ICML. https://arxiv.org/abs/1803.01498
[11] Alistarh, D., Grubic, D., Li, J., Tomioka, R., & Vojnović, M. (2017). QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. NeurIPS. https://arxiv.org/abs/1610.02132
[12] Lin, Y., Han, S., Mao, H., Wang, Y., & Dally, W. J. (2018). Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. ICLR. https://arxiv.org/abs/1712.01887
[13] Stich, S. U., Cordonnier, J.-B., & Jaggi, M. (2018). Sparsified SGD with Memory. NeurIPS. https://arxiv.org/abs/1809.07599
[14] Xie, C., Koyejo, O., & Gupta, I. (2019). Asynchronous Federated Optimization. arXiv preprint. https://arxiv.org/abs/1903.03934
[15] Liu, Y., Ding, J., Yang, Y., et al. (2020). A Communication Efficient Vertical Federated Learning Framework. (See hierarchical/vertical FL concepts) https://arxiv.org/abs/1912.11187
[16] Nishio, T., & Yonetani, R. (2019). Client Selection for Federated Learning with Heterogeneous Resources. ICC. https://arxiv.org/abs/1804.08333
[17] Zhao, Y., Li, M., Lai, L., et al. (2018). Federated Learning with Non-IID Data. arXiv preprint. https://arxiv.org/abs/1806.00582
[18] Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially Private Federated Learning: A Client Level Perspective. NIPS Workshop. https://arxiv.org/abs/1712.07557
[19] Fang, M., Cao, X., Jia, J., & Gong, N. Z. (2020). Local Model Poisoning Attacks to Byzantine-Robust FL. USENIX Security. https://arxiv.org/abs/1911.11815
[20] Mnih, V., Badia, A. P., Mirza, M., et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. ICML. https://arxiv.org/abs/1602.01783
[21] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv preprint. https://arxiv.org/abs/1707.06347
[22] Konečný, J., McMahan, H. B., Yu, F. X., et al. (2016). Federated Learning: Strategies for Improving Communication Efficiency. NIPS Workshop. https://arxiv.org/abs/1610.05492
[23] Satyanarayanan, M. (2017). The Emergence of Edge Computing. Computer. https://www.cs.cmu.edu/~satya/docdir/edgecom.pdf
[24] Li, Q., Wen, Z., He, B., et al. (2021). A Survey on Federated Learning Systems: Vision, Hype and Reality. IEEE TPDS. https://arxiv.org/abs/1902.04885
[25] Enabling Mission-Critical Communication via VoLTE for Public Safety Networks - Varinder Kumar Sharma - IJAIDR Volume 10, Issue 1, January-June 2019. DOI 10.71097/IJAIDR.v10.i1.1539
[26] Optimizing LTE RAN for High-Density Event Environments: A Case Study from Super Bowl Deployments - Varinder Kumar Sharma - IJAIDR Volume 11, Issue 1, January-June 2020. DOI 10.71097/IJAIDR.v11.i1.1542
[27] Security and Threat Mitigation in 5G Core and RAN Networks - Varinder Kumar Sharma - IJFMR Volume 3, Issue 5, September-October 2021. DOI: https://doi.org/10.36948/ijfmr.2021.v03i05.54992
[28] Kulasekhara Reddy Kotte. 2022. ACCOUNTS PAYABLE AND SUPPLIER RELATIONSHIPS: OPTIMIZING PAYMENT CYCLES TO ENHANCE VENDOR PARTNERSHIPS. International Journal of Advances in Engineering Research , 24(6), PP – 14-24, https://www.ijaer.com/admin/upload/02%20Kulasekhara%20Reddy%20Kotte%2001468.pdf
[29] Naga Surya Teja Thallam. (2022). Enhancing Security in Distributed Systems Using Bastion Hosts, NAT Gateways, and Network ACLs. International Scientific Journal of Engineering and Management, 1(1).
[30] Arpit Garg. (2022). Behavioral biometrics for IoT security: A machine learning framework for smart homes. Journal of Recent Trends in Computer Science and Engineering, 10(2), 71–92. https://doi.org/10.70589/JRTCSE.2022.2.7
[31] Varinder Kumar Sharma - AI-Based Anomaly Detection for 5G Core and RAN Components - International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 06 Issue: 01 | Jan-2022 .DOI: 10.55041/IJSREM11453
[32] Krishna Chaitanaya Chittoor, “ANOMALY DETECTION IN MEDICAL BILLING USING MACHINE LEARNING ON BIG DATA PIPELINES”, INTERNATIONAL JOURNAL OF CURRENT SCIENCE, 12(3), PP-788-796,2022, https://rjpn.org/ijcspub/papers/IJCSP22C1314.pdf
[33] Gopi Chand Vegineni. 2022. Intelligent UI Designs for State Government Applications: Fostering Inclusion without AI and ML, Journal of Advances in Developmental Research, 13(1), PP – 1-13, https://www.ijaidr.com/research-paper.php?id=1454
