CLC number: TP391.9
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-01-04
Cited: 0
Clicked: 2603
Juan FANG, Sheng LIN, Huijing YANG, Yixiang XU, Xing SU. A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200449 @article{title="A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system", %0 Journal Article TY - JOUR
CPU-GPU异构系统感知和预测的批处理内存调度策略北京工业大学信息学部,中国北京市,100124 摘要:当多个处理器(CPU)核心和集成图形处理器(GPU)共享片外主存时,CPU和GPU应用程序会竞争关键内存资源,导致严重的资源竞争,并对系统整体性能产生负面影响。本文描述了CPU-GPU异构多核架构下共享内存资源的竞争情况,提出一种基于感知和预测的批处理共享内存请求调度策略。该策略通过感知请求缓冲区中CPU和GPU内存请求情况,估计GPU延迟容忍度,并通过批量处理CPU或GPU内存请求减少CPU和GPU之间的相互干扰。实验结果表明,CPU性能提升8.53%,相互干扰降低10.38%,该调度策略具有较低硬件复杂度。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ausavarungnirun R, Chang KKW, Subramanian L, et al., 2012. Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. Proc 39th Annual Int Symp on Computer Architecture, p.416-427. ![]() [2]Binkert N, Beckmann B, Black G, et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2):1-7. ![]() [3]Bitalebi H, Safaei F, 2023. Criticality-aware priority to accelerate GPU memory access. J Supercomput, 79(1):188-213. ![]() [4]Bouvier D, Cohen B, Fry W, et al., 2014. Kabini: an AMD accelerated processing unit system on a chip. IEEE Micro, 34(2):22-33. ![]() [5]Chen W, Ray S, Bhadra J, et al., 2017. Challenges and trends in modern SoC design verification. IEEE Des Test, 34(5):7-22. ![]() [6]di Sanzo P, Pellegrini A, Sannicandro M, et al., 2020. Adaptive model-based scheduling in software transactional memory. IEEE Trans Comput, 69(5):621-632. ![]() [7]Fang J, Yu L, Liu ST, et al., 2015. KL_GA: an application mapping algorithm for mesh-of-tree (MoT) architecture in network-on-chip design. J Supercomput, 71(11):4056-4071. ![]() [8]Fang J, Wang MX, Wei ZL, 2020. A memory scheduling strategy for eliminating memory access interference in heterogeneous system. J Supercomput, 76(4):3129-3154. ![]() [9]Hazarika A, Poddar S, Rahaman H, 2020. Survey on memory management techniques in heterogeneous computing systems. IET Comput Dig Tech, 14(2):47-60. ![]() [10]Jamieson C, Chandrashekar A, 2022. gem5 GPU accuracy profiler (GAP). Proc 4th gem5 Users Workshop, p.44. ![]() [11]Jeong MK, Erez M, Sudanthi C, et al., 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. Proc Design Automation Conf, p.850-855. ![]() [12]Jog A, Kayiran O, Nachiappan NC, et al., 2013. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance. ACM SIGPLAN Not, 48(4):395-406. ![]() [13]Jog A, Kayiran O, Pattnaik A, et al., 2016. Exploiting core criticality for enhanced GPU performance. Proc ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Science, p.351-363. ![]() [14]Kim Y, Han D, Mutlu O, et al., 2010. ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. Proc 16th Int Symp on High-Performance Computer Architecture, p.1-12. ![]() [15]Lin CH, Liu JC, Yang PK, 2020. Performance enhancement of GPU parallel computing using memory allocation optimization. Proc 14th Int Conf on Ubiquitous Information Management and Communication, p.1-5. ![]() [16]Mittal S, Vetter JS, 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput Surv, 47(4):69. ![]() [17]Mutlu O, Moscibroda T, 2008. Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. Proc Int Symp on Computer Architecture, p.63-74. ![]() [18]Power J, Basu A, Gu JL, et al., 2013. Heterogeneous system coherence for integrated CPU-GPU systems. Proc 46th Annual IEEE/ACM Int Symp on Microarchitecture, p.457-467. ![]() [19]Rai S, Chaudhuri M, 2017. Using criticality of GPU accesses in memory management for CPU-GPU heterogeneous multi-core processors. ACM Trans Embed Comput Syst, 16(5s):133. ![]() [20]Subramanian L, Lee D, Seshadri V, et al., 2015. The blacklisting memory scheduler: balancing performance, fairness and complexity. https://arxiv.org/abs/1504.00390v1 ![]() [21]Usui H, Subramanian L, Chang KKW, et al., 2016. DASH: deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Trans Archit Code Optim, 12(4):65. ![]() [22]Wang HN, Jog A, 2019. Exploiting latency and error tolerance of GPGPU applications for an energy-efficient DRAM. Proc 49th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks, p.362-374. ![]() [23]Wang QH, Peng Z, Ren B, et al., 2022. MemHC: an optimized GPU memory management framework for accelerating many-body correlation. ACM Trans Archit Code Optim, 19(2):24. ![]() [24]Zhan XS, Bao YG, Bienia C, et al., 2016. PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. ACM SIGARCH Comput Archit News, 44(5):1-16. ![]() [25]Zhang F, Zhai JD, He BS, et al., 2017. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Trans Parall Distrib Syst, 28(3):905-918. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>