Journal of Zhejiang University

ENGINEERING Information Technology & Electronic Engineering 2026 Vol.27 No.4 P.1-16

http://doi.org/10.1631/ENG.ITEE.2025.0008

WSC optimizer: an optimization tool for wafer-scale chip architecture exploration

Author(s): Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG
Affiliation(s): 1. Information Engineering University, Zhengzhou 450001, China more
Corresponding email(s): weis0906@163.com
Key Words: Wafer-scale chip, Hardware–,software co-design, Chip layout, Design space exploration

Share this article to： More <<< Previous Article \|Next Article >>>

Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG. WSC optimizer: an optimization tool for wafer-scale chip architecture exploration[J]. Journal of Zhejiang University Science C, 2026, 27(4): 1-16.

@article{title="WSC optimizer: an optimization tool for wafer-scale chip architecture exploration",
author="Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG",
journal="Journal of Zhejiang University Science C",
volume="27",
number="4",
pages="1-16",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0008"
}

%0 Journal Article
%T WSC optimizer: an optimization tool for wafer-scale chip architecture exploration
%A Wenbo ZHANG
%A Bo DING
%A Shuai WEI
%A Qinrang LIU
%A Hong YU
%A Ke SONG
%A Wei GUO
%A Bo MEI
%A Rui ZHENG
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 4
%P 1-16
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0008

TY - JOUR
T1 - WSC optimizer: an optimization tool for wafer-scale chip architecture exploration
A1 - Wenbo ZHANG
A1 - Bo DING
A1 - Shuai WEI
A1 - Qinrang LIU
A1 - Hong YU
A1 - Ke SONG
A1 - Wei GUO
A1 - Bo MEI
A1 - Rui ZHENG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 4
SP - 1
EP - 16
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0008

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: In recent years, mature advanced packaging technologies have increasingly enabled the integration of multiple small dies into larger chips, while retaining chip-scale density and high-bandwidth interconnects. To address the inefficiencies of manual design and the challenges of heterogeneous optimization in wafer-scale chip (WSC) development, we systematically explore key factors in WSC architecture design. We integrate chip layout, operator mapping, and hardware–;software co-design, and formulate the WSC architecture exploration problem as a multi-objective optimization task. First, we establish a hierarchical architecture model for WSCs, unifying the quantification of core constraints and interconnect topology constraints; second, we propose a hierarchical multi-objective collaborative optimization framework to jointly optimize physical constraints and task mapping communication patterns; finally, we develop a WSC optimizer toolchain that supports mixed-granularity simulation and generates optimal configurations for representative workloads. Experimental results demonstrate that compared with traditional computer architectures, the optimized architectures generated by our WSC optimizer achieve up to a 22× throughput improvement and a 5× latency reduction in application domains, such as cryptographic decryption and signal processing.

WSC优化器：晶圆级芯片架构探索的优化工具

张文博¹，丁博²，魏帅¹，刘勤让³，于洪¹，宋克¹，郭威¹，梅波¹，郑锐¹
¹信息工程大学，中国郑州市，450001
²嵩山实验室，中国郑州市，450002
³复旦大学大数据研究院，中国上海市，200433
摘要：近年来，先进封装技术将多颗小芯片集成为更大规模的芯片，同时保留芯片级的集成密度与高带宽互连特性。针对晶圆级芯片（WSC）研发中人工设计效率低与异构优化难度大的问题，本文系统探究了WSC架构设计的关键影响因素。融合芯片布局、算子映射与软硬件协同设计，将WSC架构探索问题建模为多目标优化任务。首先，构建了WSC层次化架构模型，统一量化核心资源约束与互连拓扑约束；其次，提出层次化多目标协同优化框架，联合优化物理约束与任务映射通信模式；最后，开发了支持混合粒度仿真的WSC优化器工具链，可针对典型负载生成最优配置。实验结果表明，相较于传统计算机架构，本工具生成的优化架构在密码解密和信号处理等场景下，可实现最高22倍的吞吐量提升与5倍的延迟降低。

关键词：晶圆级芯片；软硬件协同设计；芯片布局；设计空间探索

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 Technical Report. https://api.semanticscholar.org/CorpusID:257532815 [Accessed on Dec. 1, 2025].

[2]Ahmad M, DeLaCruz J, Ramamurthy A, 2022. Heterogeneous integration of chiplets: cost and yield tradeoff analysis. Proc 23^rd Int Conf on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems, p.1-9.

[3]Ali H, Tariq UU, Hardy J, et al., 2021. A survey on system-level energy optimisation for MPSoCs in IoT and consumer electronics. Comput Sci Rev, 41: 100416.

[4]Baktash JA, Dawodi M, 2023. GPT-4: a review on advancements and opportunities in natural language processing. J Elect Electron Eng, 2(4):548-549.

[5]Binkert N, Beckmann B, Black G, et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2):1-7.

[6]Bohr M, 2009. The new era of scaling in an SoC world. Proc IEEE Int Solid-State Circuits Conf—Digest of Technical Papers, p.23-28.

[7]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34^th Int Conf on Neural Information Processing Systems, Article 159.

[8]Burns JA, Aull BF, Chen CK, et al., 2006. A wafer-scale 3-D circuit integration technology. IEEE Trans Electron Dev, 53(10):2507-2516.

[9]Chakaravarthy RV, Kwon H, Jiang H, 2021. Vision control unit in fully self-driving vehicles using Xilinx MPSoC and opensource stack. Proc 26^th Asia and South Pacific Design Automation Conf, p.311-317.

[10]Chen SX, Li SY, Zhuang Z, et al., 2024. Floorplet: performance-aware floorplan framework for chiplet integration. IEEE Trans Comput-Aid Des Integr Circ Syst, 43(6):1638-1649.

[11]Chen YW, Wang RH, Cheng YH, et al., 2024. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans Comput-Aid Des Integr Circ Syst, 43(7):2163-2176.

[12]Chowdhery A, Narang S, Devlin J, et al., 2023. PaLM: scaling language modeling with pathways. J Mach Learn Res, 24(1):240.

[13]Deng CH, Li XY, Feng Z, et al., 2022. GARNet: reduced-rank topology learning for robust and scalable graph neural networks.

[14]Feng YX, Ma KS, 2022. Chiplet actuary: a quantitative cost model and multi-chiplet architecture exploration. Proc 59^th ACM/IEEE Design Automation Conf, p.121-126.

[15]Hammarlund P, Martinez AJ, Bajwa AA, et al., 2014. Haswell: the fourth-generation Intel Core Processor. IEEE Micro, 34(2):6-20.

[16]Han YH, Xu HB, Lu MX, et al., 2024. The big chip: challenge, model and architecture. Fund Res, 4(6):1431-1441.

[17]Hu Y, Lin XH, Wang HZ, et al., 2024. Wafer-scale computing: advancements, challenges, and future perspectives. IEEE Circ Syst Mag, 24(1):52-81.

[18]IEEE, 2024. International Roadmap for Devices and Systems^TM. https://irds.ieee.org/images/files/pdf/2024/2024IRDS_MET.pdf [Accessed on Dec. 1, 2025].

[19]Jung S, Lee H, Myung S, et al., 2022. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature, 601(7892):211-216.

[20]Leon V, Minaidis P, Lentaris G, et al., 2023. Accelerating AI and computer vision for satellite pose estimation on the Intel Myriad X embedded SoC. Microprocess Microsyst, 103: 104947.

[21]Leon V, Minaidis P, Soudris D, et al., 2024. MPAI: a co-processing architecture with MPSoC & AI accelerators for vision applications in space. Proc 31^st IEEE Int Conf on Electronics, Circuits and Systems, p.1-2.

[22]Li FP, Wang Y, Cheng YQ, et al., 2022. GIA: a reusable general interposer architecture for agile chiplet integration. Proc IEEE/ACM Int Conf on Computer Aided Design, p.1-9.

[23]Li ZS, Liu LB, Deng YD, et al., 2017. Aggressive pipelining of irregular applications on reconfigurable hardware. Proc 44^th Annual Int Symp on Computer Architecture, p.575-586.

[24]Loh GH, Xie Y, Black B, 2007. Processor design in 3D die-stacking technologies. IEEE Micro, 27(3):31-48.

[25]Markidis S, Der Chien SW, Laure E, et al., 2018. NVIDIA tensor core programmability, performance & precision. Proc IEEE Int Parallel and Distributed Processing Symp Workshops, p.522-531.

[26]Pal S, Petrisko D, Tomei M, et al., 2019. Architecting waferscale processors—a GPU case study. Proc IEEE Int Symp on High Performance Computer Architecture, p.250-263.

[27]Pal S, Liu JY, Alam I, et al., 2021. Designing a 2048-chiplet, 14336-core waferscale processor. Proc 58^th ACM/IEEE Design Automation Conf, p.1183-1188.

[28]Panousopoulos V, Papaloukas E, Leon V, et al., 2024. HW/SW co-design on embedded SoC FPGA for star tracking optimization in space applications. J Real-Time Image Proc, 21(1):16.

[29]Patel D, Wong G, 2023. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE. Proc Demystifying GPT-4: the Engineering Tradeoffs that Led OpenAI to Their Architecture, p.1-17.

[30]Raffel C, Shazeer N, Roberts A, et al., 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res, 21(140):1-67.

[31]Shao YS, Clemons J, Venkatesan R, et al., 2019. Simba: scaling deep-learning inference with multi-chip-module-based architecture. Proc 52^nd Annual IEEE/ACM Int Symp on Microarchitecture, p.14-27.

[32]Talpes E, Williams D, Sarma DD, 2022. DOJO: the microarchitecture of Tesla’s exa-scale computer. Proc IEEE Hot Chips 34 Symp, p.1-28.

[33]Tang XP, Tian RQ, Wong DF, 2001. Fast evaluation of sequence pair in block placement by longest common subsequence computation. IEEE Trans Comput-Aid Des Integr Circ Syst, 20(12):1406-1413.

[34]Tatar G, Bayar S, Çiçek İ, 2024. Real-time multi-learning deep neural network on an MPSoC-FPGA for intelligent vehicles: harnessing hardware acceleration with pipeline. IEEE Trans Intell Veh, 9(6):5021-5032.

[35]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models.

[36]Turner WJ, Poulton JW, Wilson JM, et al., 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. Proc IEEE Custom Integrated Circuits Conf, p.1-8.

[37]Venkatesan R, Shao YS, Wang MR, et al., 2019. MAGNet: a modular accelerator generator for neural networks. Proc IEEE/ACM Int Conf on Computer-Aided Design, p.1-8.

[38]Weng J, Liu SH, Dadu V, et al., 2020. DSAGen: synthesizing programmable spatial accelerators. Proc 47^th Annual Int Symp on Computer Architecture, p.268-281.

[39]Wu JX, Liu QR, Shen JL, et al., 2024. From SoC to SDSoW: a new paradigm for microelectronics development. Sci Sin Inform, 54:1350-1368.

[40]Xu QZ, Wang CH, Li ZQ, et al., 2025. A wafer-scale heterogeneous integration thermal simulator. Appl Therm Eng, 264: 125459.

[41]Yenduri G, Ramalingam M, Selvi GC, et al., 2024. GPT (generative pre-trained transformer)—a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 12:54608-54649.

[42]Zhang JM, Wang XY, Ye YY, et al., 2024. M2M: a fine-grained mapping framework to accelerate multiple DNNs on a multi-chiplet architecture. IEEE Trans VLSI Syst, 32(10):1864-1877.

[43]Zhang SS, Roller S, Goyal N, et al., 2022. OPT: open pre-trained transformer language models.

[44]Zhu JC, Xue CH, Chen YQ, et al., 2025. Theseus: exploring efficient wafer-scale chip design for large language models. IEEE Trans Comput-Aid Des Integr Circ Syst, 44(12):4793-4806.

[45]Zhuang Z, Yu B, Chao KY, et al., 2022. Multi-package co-design for chiplet integration. Proc 41^st IEEE/ACM Int Conf on Computer-Aided Design, Article 4.

[46]Zou DX, Wang GG, Pan G, et al., 2016. A modified simulated annealing algorithm and an excessive area model for floorplanning using fixed-outline constraints. Front Inform Technol Electron Eng, 17(11):1228-1244.

Open peer comments: Debate/Discuss/Question/Opinion

<1>