CLC number: TP303
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2018-10-10
Cited: 0
Clicked: 4106
Xiang-hui Xie, Xun Jia. Exploring high-performance processor architecture beyond the exascale[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(10): 1224-1229.
@article{title="Exploring high-performance processor architecture beyond the exascale",
author="Xiang-hui Xie, Xun Jia",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="10",
pages="1224-1229",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800424"
}
%0 Journal Article
%T Exploring high-performance processor architecture beyond the exascale
%A Xiang-hui Xie
%A Xun Jia
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 10
%P 1224-1229
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800424
TY - JOUR
T1 - Exploring high-performance processor architecture beyond the exascale
A1 - Xiang-hui Xie
A1 - Xun Jia
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 10
SP - 1224
EP - 1229
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800424
Abstract: The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, high-performance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.
[1]Esmaeilzadeh H, Blem E, Amant RS, et al., 2011. Dark silicon and the end of multicore scaling. 38th Annual Int Symp on Computer Architecture, p.365-376.
[2]Fang JR, Fu HH, Zhao WL, et al., 2017. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. 31st Int Parallel and Distributed Processing Symp, p.615-624.
[3]Fu HH, Liao JF, Yang JZ, et al., 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):1-15.
[4]Fu HH, He CH, Chen BW, et al., 2017. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. 30th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-12.
[5]García-Flores V, Ayguade E, Pe na AJ, 2017. Efficient data sharing on heterogeneous systems. Proc 46th Int Conf on Parallel Processing, p.121-130.
[6]Hemmert S, 2016. Green HPC: from nice to necessity. Comput Sci Eng, 12(6):8-10.
[7]Jia X, Wu GM, Xie XH, 2017. A high-performance accelerator for floating-point matrix multiplication. 15th Int Symp on Parallel and Distributed Processing with Applicatons, p.396-402.
[8]Jouppi NP, Young C, Patil N, et al., 2017. In-datacenter performance analysis of a tensor processing unit. 44th Annual Int Symp on Computer Architecture, p.1-12.
[9]Lin H, Tang XC, Yu BW, et al., 2017. Scalable graph on Sunway TaihuLight with ten million cores. 31st Int Parallel and Distributed Processing Symp, p.635-645.
[10]Ozdal MM, Yesil S, Kim T, et al., 2016. Energy efficient architecture for graph analytics accelerators. 43rd Int Symp on Computer Architecture, p.166-177.
[11]Pedram A, Gerstlauer A, van de Geijn RA, 2011. A high-performance, low-power linear algebra core. 22nd Int Conf on Application-specific System, Architecture and Processors, p.35-42.
[12]Schulte MJ, Ignatowski M, Loh GH, et al., 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 35(4):26-36.
[13]Shalf JM, Leland R, 2015. Computing beyond Moore's law. Computer, 48(12):14-23.
[14]Silbertstein M, 2017. OmniX: an accelerator-centric OS for omni-programmable systems. 16$^rm th$ Workshop on Hot Topics in Operating Systems, p.69-75.
[15]Williams RS, 2017. What's next? [The end of Moore's law] Comput Sci Eng, 19(2):7-13.
[16]Xu ZG, Lin J, Matsuoka S, 2017. Benchmarking SW26010 many-core processor. 31st Int Conf on Parallel and Distributed Processing Symp Workshops, p.743-752.
[17]Yang C, Xue W, Fu HH, et al., 2016. 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 29th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.57-68.
[18]Zhao B, Gao W, Zhao RC, et al., 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. 1st Int Conf on Big Data Computing and Communications, p.257-272.
[19]Zheng F, Zhang K, Wu GM, et al., 2014. Architecture techniques of many-core processor for energy-efficient in high performance computing. Chin J Comput, 37(10):2176-2186 (in Chinese).
[20]Zheng F, Li HL, Lv H, et al., 2015. Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J Comput Sci Technol, 30(1):145-162.
Open peer comments: Debate/Discuss/Question/Opinion
<1>