Predictive Performance Modeling for Multi-Core and Many-Core Computing Architectures Using AI-Driven Analytics

Meera Bharadwaj

doi:10.63282/3117-5481/AIJCST-V4I4P101

Authors

Prof. Meera Bharadwaj Department of Artificial Intelligence, Hyderabad Technological University, Hyderabad, India. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V4I4P101

Keywords:

Predictive Performance Modeling, Multi-Core Computing, Many-Core Architectures, AI-Driven Analytics, Machine Learning, Performance Prediction, Parallel Processing, High-Performance Computing (HPC)

Abstract

Modern multi-core and many-core processors expose massive parallelism, deep cache hierarchies, and non-uniform memory effects that make manual performance prediction increasingly fragile. This paper proposes an AI-driven analytics framework for predictive performance modeling that unifies workload characterization, hardware telemetry ingestion, and hybrid learning. We construct multi-scale feature views from hardware performance counters, memory/IO traces, and compiler IR to capture compute intensity, synchronization pressure, NUMA locality, and cache/branch behavior. A two-stage learner couples fast analytical baselines (e.g., roofline and queueing approximations) with machine-learned residuals, where gradient boosting and graph neural networks model code data topology interactions across CPU/GPU tiles. To generalize across architectures, we employ transfer learning and meta-features describing microarchitectural knobs (core count, cache geometry, interconnect, DVFS states) and use domain adaptation to reduce drift when porting workloads. Online updates with lightweight Bayesian last-layer adaptation provide calibrated uncertainty for “what-if” queries core pinning, memory placement, and DVFS policies while SHAP-style attributions expose bottlenecks for developer feedback and autotuners. The framework outputs latency/throughput predictions, energy performance trade-offs, and scaling curves under contention, enabling proactive scheduling and configuration search in CI and runtime orchestration. Experiments span stencil codes, graph analytics, and ML inference pipelines, demonstrating robust accuracy under workload and platform shifts. By blending interpretable theory with data-centric learning, the approach delivers portable, explainable performance predictions suitable for cloud, edge, and HPC settings

References

[1] Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An Insightful Visual Performance Model for Multicore Architectures. Communications of the ACM. https://crd.lbl.gov/assets/pubs_presos/roofline/roofline-CACM.pdf

[2] Stengel, H., Treibig, J., Hager, G., & Wellein, G. (2015). Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model. Proceedings of the 29th Int’l Conference on Supercomputing (ICS). https://arxiv.org/abs/1511.02136

[3] Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K., Santos, E., Subramonian, R., & von Eicken, T. (1993). LogP: Towards a Realistic Model of Parallel Computation. ACM SIGPLAN Notices. https://dl.acm.org/doi/10.1145/165854.165874

[4] Amdahl, G. M. (1967). Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. AFIPS Conference Proceedings. https://dl.acm.org/doi/10.1145/1465482.1465560

[5] Gustafson, J. L. (1988). Reevaluating Amdahl’s Law. Communications of the ACM. https://dl.acm.org/doi/10.1145/42411.42415

[6] Bienia, C., Kumar, S., Singh, J. P., & Li, K. (2008). The PARSEC Benchmark Suite: Characterization and Architectural Implications. PACT. https://parsec.cs.princeton.edu/parsec3-doc.htm

[7] SPEC. (2017). SPEC CPU2017 Benchmark Suite. Standard Performance Evaluation Corporation. https://www.spec.org/cpu2017/

[8] Bailey, D. H., Barszcz, E., Barton, J. T., et al. (1991). The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications. https://www.nas.nasa.gov/publications/npb.html

[9] Beamer, S., Asanović, K., & Patterson, D. (2015). The GAP Benchmark Suite. arXiv preprint. https://arxiv.org/abs/1508.03619

[10] McCalpin, J. D. (1995). STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. https://www.cs.virginia.edu/stream/

[11] Mucci, P., Browne, S., Deane, C., & Ho, G. (1999/2008). PAPI: A Portable Interface to Hardware Performance Counters. ICPP Workshops. https://icl.utk.edu/papi/

[12] Weaver, V. (2013). Linux perf_event Features and Overhead. FastPath ’13. https://perf.wiki.kernel.org/index.php/Main_Page

[13] David, H., Gorbatov, E., Hanebutte, U. R., Khanna, R., & Le, C. (2010). RAPL: Memory Power Estimation and Capping. Intel Technology Journal. https://www.intel.com/content/www/us/en/developer/articles/technical/rapl-power-model.html

[14] Chen, T., Moreau, T., Jiang, Z., et al. (2018). TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. OSDI. https://arxiv.org/abs/1802.04799

[15] Chen, T., Zheng, L., Yan, E., et al. (2018). Learning to Optimize Tensor Programs (AutoTVM). NeurIPS. https://arxiv.org/abs/1805.08166

[16] Zheng, L., Ye, S., Zhang, C., et al. (2020). Ansor: Generating High-Performance Tensor Programs for Deep Learning. OSDI. https://arxiv.org/abs/2006.06762

[17] Kaufman, S., Moskovitch, Y., Stephenson, M., Amarasinghe, S., & Yahav, E. (2019). Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation Using Deep Neural Networks. ICML. https://arxiv.org/abs/1808.07412

[18] Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS (SHAP). https://arxiv.org/abs/1705.07874

[19] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. NeurIPS. https://papers.nips.cc/paper_files/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html

[20] Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. https://gaussianprocess.org/gpml/chapters/

[21] Mao, H., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource Management with Deep Reinforcement Learning (DeepRM). HotNets. https://dl.acm.org/doi/10.1145/3005745.3005750

[22] Curtsinger, C., & Berger, E. D. (2013). STABILIZER: Statistically Sound Performance Evaluation. ASPLOS. https://dl.acm.org/doi/10.1145/2451116.2451141

[23] Viebke, A., Pllana, S., Memeti, S., & Kolodziej, J. (2019). Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures. arXiv.

[24] Zhang, P., Fang, J., Yang, C., Huang, C., Tang, T., & Wang, Z. (2020). Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach. arXiv.

[25] Hofmann, J., Alappat, C. L., Hager, G., Fey, D., & Wellein, G. (2019). Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors. arXiv.

[26] Butko, A., Bruguier, F., Novo, D., Gamatié, A., & Sassatelli, G. (2019). Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures. arXiv.

[27] Arndt, O. J., Lüders, M., Riggers, C., et al. (2020). Multicore Performance Prediction with MPET: Using Scalability Characteristics for Statistical Cross-Architecture Prediction. Journal of Signal Processing Systems, 92, 981-998.

[28] Designing LTE-Based Network Infrastructure for Healthcare IoT Application - Varinder Kumar Sharma - IJAIDR Volume 10, Issue 2, July-December 2019. DOI 10.71097/IJAIDR.v10.i2.1540

[29] Krishna Chaitanaya Chittoor, “Architecting Scalable Ai Systems For Predictive Patient Risk”, INTERNATIONAL JOURNAL OF CURRENT SCIENCE, 11(2), PP-86-94, 2021, https://rjpn.org/ijcspub/papers/IJCSP21B1012.pdf

[30] Thallam, N. S. T. (2020). Comparative Analysis of Data Warehousing Solutions: AWS Redshift vs. Snowflake vs. Google BigQuery. European Journal of Advances in Engineering and Technology, 7(12), 133-141.

[31] The Role of Zero-Emission Telecom Infrastructure in Sustainable Network Modernization - Varinder Kumar Sharma - IJFMR Volume 2, Issue 5, September-October 2020. https://doi.org/10.36948/ijfmr.2020.v02i05.54991

[32] Thallam, N. S. T. (2021). Performance Optimization in Big Data Pipelines: Tuning EMR, Redshift, and Glue for Maximum Efficiency.

Predictive Performance Modeling for Multi-Core and Many-Core Computing Architectures Using AI-Driven Analytics

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Make a Submission

Cover

Menu

Information

Keywords

Publisher

Important Links