Teaching the Cloud to Remember Tomorrow: Using Graph-Transformer AI to Pre‑Warm Caches before the Traffic Surge Hits

Satyanarayana Gopisetty

doi:10.63282/3117-5481/AIJCST-V7I3P110

Authors

Satyanarayana Gopisetty Frisco, Texas, USA. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V7I3P110

Keywords:

Cloud Caching, Predictive Pre‑Warming, Graph Neural Networks, Temporal Transformers, Auto‑Scaling, Cold‑Start Mitigation, Content Delivery, Elastic Systems, AI for Systems

Abstract

Cloud caching systems today are surprisingly short‑sighted. When traffic spikes, auto‑scaling adds new instances, but those instances start with empty caches like opening a new store with no inventory. The result is a flood of expensive origin fetches, even if the overall ssystem has enough compute power. This paper asks a simple question: what if a cache could learn what content will be needed, seconds before it is actually requested? We propose a hybrid AI framework that combines a graph neural network (GNN) to capture relationships between content objects (e.g., “users who viewed A next requested B”) with a temporal transformer to model access patterns over multiple time scales. Using this architecture, the system predicts a future cache working set a short list of likely‑to‑be‑requested items during the unavoidable monitoring lag between traffic detection and new instance boot. Instead of waiting for a cache miss, we pre‑warm the new instance with only those predicted objects. Evaluated on real‑world CDN traces, our method reduces cold‑start miss rates by up to 42% and cuts data transfer costs during scale‑out events by nearly a third, compared to reactive caching policies. More importantly, the approach turns monitoring lag from a liability into a window of opportunity. The cloud no longer just reacts; it remembers tomorrow.

References

[1] Z. Li, L. Zhao, Y. Guo, Y. Wang, L. Qiao, and J. Xu, “Pagurus: Eliminating cold startup in serverless computing with inter-action container sharing,” arXiv preprint arXiv:2108.11202, Aug. 2021.

[2] A. Raza, I. Kim, M. K. E. G. Elshenawy, and A. Anwar, “LIBRA: An economical hybrid approach for cloud applications with strict SLAs,” arXiv preprint arXiv:2104.05623, Apr. 2021.

[3] Qu, C., Calheiros, R. N., & Buyya, R. (2018). Auto-scaling web applications in clouds: A taxonomy and survey. ACM Computing Surveys, 51(4), 1–33. https://doi.org/10.1145/3148141

[4] S. Abbas, A. M. Al‑Momani, I. A. T. Hashem, S. S. M. Ghazal, and Z. S. M. Zainudin, “Auto‑scaling techniques in cloud computing: Issues and research directions,” Sensors, vol. 24, no. 17, p. 5551, Aug. 2024.

[5] J. Shuja, K. Bilal, E. A. Alanazi, W. S. Alasmary, and A. S. Alashaikh, “Applying machine learning techniques for caching in edge networks: A comprehensive survey,” arXiv preprint arXiv:2006.16864, June 2020.

[6] Y. Khan, S. Mustafa, R. W. Ahmad, T. Maqsood, F. Rehman, J. Ali, and J. J. P. C. Rodrigues, “Content caching in mobile edge computing: A survey,” Cluster Computing, vol. 27, pp. 8817–8864, Apr. 2024.

[7] M. M. Khalil, A. D. Khomonenko, and M. D. Matushko, “Measuring the effect of monitoring on a cloud computing system by estimating the delay time of requests,” Radioelectronics and Informatics, vol. 34, no. 7, pp. 3968–3972, Jul. 2022.

[8] S. Das, K. Dhara, A. Mahajan, D. Pathak, and A. Jindal, “Moneyball: Proactive auto‑scaling in Microsoft Azure SQL database serverless,” Proceedings of the VLDB Endowment, vol. 15, no. 6, pp. 1278–1290, Feb. 2022.

[9] S. Krishnendu, B. N. Bharath, V. Bhatia, J. Nebhen, M. Dobrovolny, and T. Ratnarajah, “Wireless edge caching and content popularity prediction using machine learning,” IEEE Consumer Electronics Magazine, early access, doi: 10.1109/MCE.2022.3160585, 2022.

[10] L. S. Gonçalves, J. M. S. de Jesus, and D. G. da Silva, “Predicting popularity of video streaming services with representation learning: A survey and a real‑world case study,” Sensors, vol. 21, no. 21, p. 7328, Nov. 2021.

[11] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11106–11115, May 2021.

[12] Z. Hajiakhondi Meybodi, A. Mohammadi, M. Shojafar, Z. Ding, and R. Tafazolli, “TEDGE‑Caching: Transformer‑based edge caching towards 6G networks,” arXiv preprint arXiv:2112.00633, Dec. 2021.

[13] M. Catillo, U. Villano, and M. Rak, “A survey on auto-scaling: How to exploit cloud elasticity,” International Journal of Grid and Utility Computing, vol. 14, no. 1, pp. 37–50, 2023.

[14] A. Ioannou and S. Vassiliades, “A systematic survey on content caching in ICN and ICN-IoT: Challenges, approaches and strategies,” Computer Networks, vol. 233, p. 109896, Sep. 2023.

[15] L. Lian, N. Chen, X. Yuan, and J. Lu, “Low‑complexity collaborative caching strategy based on spatio‑temporal graph convolutional model,” Computer Networks, vol. 249, p. 110490, Jul. 2024.

[16] J. Zhu, R. Li, X. Chen, S. Mao, J. Wu, and Z. Zhao, “Semantics‑enhanced temporal graph networks for content popularity prediction,” IEEE Transactions on Mobile Computing, vol. 23, no. 8, pp. 8478–8492, Aug. 2024.

Teaching the Cloud to Remember Tomorrow: Using Graph-Transformer AI to Pre‑Warm Caches before the Traffic Surge Hits

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Make a Submission

Cover

Menu

Information

Keywords

Publisher

Important Links