A Deep Reinforcement Learning Approach for Efficient Cloud Service Orchestration and Resource Allocation

Fatima Al-Khatib

doi:10.63282/3117-5481/AIJCST-V3I1P101

Authors

Fatima Al-Khatib King Saud University, Riyadh, Saudi Arabia. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V3I1P101

Keywords:

Deep Reinforcement Learning, Cloud Orchestration, Resource Allocation, Service-Level Objectives (Slos), Multi-Objective Optimization, Kubernetes, Autoscaling, Energy Efficiency, Hierarchical Control, Edge–Cloud Systems

Abstract

We present a deep reinforcement learning (DRL) framework for cloud service orchestration and resource allocation that jointly optimizes performance, cost, and energy under service-level objectives (SLOs). The proposed system models orchestration as a sequential decision process over a heterogeneous cluster (VMs/containers, CPU–GPU accelerators, and autoscaling groups). A hierarchical agent first selects placement and scaling actions at the service level, while a fine-grained scheduler refines CPU/GPU quotas and preemption policies at the pod level. To handle non-stationary demand and burstiness, we combine model-free DRL (e.g., PPO/DDPG) with short-horizon model-based lookahead using a learned latency–throughput surrogate. Multi-objective rewards balance tail latency, cost, and energy per request with dynamic weights driven by SLO slack. The framework incorporates safe exploration via constraint shielding and offline warm-start from historical traces to limit SLO violations during learning. We integrate with Kubernetes via lightweight sidecars for online telemetry (queueing, contention, and thermal signals) and action application. Trace-driven experiments using realistic microservice workloads demonstrate improved SLO attainment, higher cluster utilization, and reduced energy per request compared to heuristic baselines (rule-based autoscaling and bin packing). Ablations show the value of hierarchical control, surrogate lookahead, and safety layers for stability under traffic regime shifts. The results indicate that DRL can serve as a practical control plane for elastic, cost-aware, and sustainable cloud operations across hybrid and edge–cloud deployments

References

[1] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv preprint. https://arxiv.org/abs/1707.06347

[2] Mnih, V., et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1602.01783

[3] Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust Region Policy Optimization. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1502.05477

[4] Haarnoja, T., et al. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1801.01290

[5] Mnih, V., et al. (2015). Human-Level Control through Deep Reinforcement Learning. Nature. https://arxiv.org/abs/1312.5602

[6] Lillicrap, T. P., et al. (2016). Continuous Control with Deep Reinforcement Learning. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1509.02971

[7] Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. USENIX NSDI. https://www.usenix.org/conference/nsdi11/dominant-resource-fairness

[8] Mao, H., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource Management with Deep Reinforcement Learning. HotNets / arXiv. https://arxiv.org/abs/1603.00659

[9] Mirhoseini, A., et al. (2018). A Hierarchical Model for Device Placement. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1706.04972

[10] Snoek, J., Larochelle, H., & Adams, R. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. NeurIPS. https://arxiv.org/abs/1206.2944

[11] Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to Trust Your Model: Model-Based Policy Optimization. NeurIPS. https://arxiv.org/abs/1906.08253

[12] Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS. https://arxiv.org/abs/2006.04779

[13] Fujimoto, S., Meger, D., & Precup, D. (2019). Off-Policy Deep RL without Exploration. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1812.02900

[14] García, J., & Fernández, F. (2015). A Comprehensive Survey on Safe Reinforcement Learning. Journal of Machine Learning Research (updated arXiv). https://arxiv.org/abs/1708.07287

[15] Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J. (2015). Large-Scale Cluster Management at Google with Borg. EuroSys. https://research.google/pubs/pub43438/

[16] Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM. https://dl.acm.org/doi/10.1145/2890784

[17] Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1609.02907

[18] Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv preprint (TCN). https://arxiv.org/abs/1803.01271

[19] Satyanarayanan, M. (2017). The Emergence of Edge Computing. IEEE Computer. https://ieeexplore.ieee.org/document/8014313

[20] Barroso, L. A., & Hölzle, U. (2007). The Case for Energy-Proportional Computing. IEEE Computer. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33375.pdf

[21] Qureshi, A., Weber, R., Balakrishnan, H., Guttag, J., & Maggs, B. (2009). Cutting the Electric Bill for Internet-Scale Systems. SIGCOMM. https://dl.acm.org/doi/10.1145/1592568.1592576

[22] Thallam, N. S. T. (2020). Comparative Analysis of Data Warehousing Solutions: AWS Redshift vs. Snowflake vs. Google BigQuery. European Journal of Advances in Engineering and Technology, 7(12), 133-141.

A Deep Reinforcement Learning Approach for Efficient Cloud Service Orchestration and Resource Allocation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Make a Submission

Cover

Menu

Information

Keywords

Publisher

Important Links