
CLC number: TP336
On-line Access: 2026-04-24
Received: 2025-08-30
Revision Accepted: 2026-04-24
Crosschecked: 2025-12-05
Cited: 0
Clicked: 11
Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG. WSC optimizer: an optimization tool for wafer-scale chip architecture exploration[J]. Journal of Zhejiang University Science C, 2026, 27(4): 1-16.
@article{title="WSC optimizer: an optimization tool for wafer-scale chip architecture exploration",
author="Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG",
journal="Journal of Zhejiang University Science C",
volume="27",
number="4",
pages="1-16",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0008"
}
%0 Journal Article
%T WSC optimizer: an optimization tool for wafer-scale chip architecture exploration
%A Wenbo ZHANG
%A Bo DING
%A Shuai WEI
%A Qinrang LIU
%A Hong YU
%A Ke SONG
%A Wei GUO
%A Bo MEI
%A Rui ZHENG
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 4
%P 1-16
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0008
TY - JOUR
T1 - WSC optimizer: an optimization tool for wafer-scale chip architecture exploration
A1 - Wenbo ZHANG
A1 - Bo DING
A1 - Shuai WEI
A1 - Qinrang LIU
A1 - Hong YU
A1 - Ke SONG
A1 - Wei GUO
A1 - Bo MEI
A1 - Rui ZHENG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 4
SP - 1
EP - 16
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0008
Abstract: In recent years, mature advanced packaging technologies have increasingly enabled the integration of multiple small dies into larger chips, while retaining chip-scale density and high-bandwidth interconnects. To address the inefficiencies of manual design and the challenges of heterogeneous optimization in wafer-scale chip (WSC) development, we systematically explore key factors in WSC architecture design. We integrate chip layout, operator mapping, and hardware–;software co-design, and formulate the WSC architecture exploration problem as a multi-objective optimization task. First, we establish a hierarchical architecture model for WSCs, unifying the quantification of core constraints and interconnect topology constraints; second, we propose a hierarchical multi-objective collaborative optimization framework to jointly optimize physical constraints and task mapping communication patterns; finally, we develop a WSC optimizer toolchain that supports mixed-granularity simulation and generates optimal configurations for representative workloads. Experimental results demonstrate that compared with traditional computer architectures, the optimized architectures generated by our WSC optimizer achieve up to a 22× throughput improvement and a 5× latency reduction in application domains, such as cryptographic decryption and signal processing.
[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 Technical Report. https://api.semanticscholar.org/CorpusID:257532815 [Accessed on Dec. 1, 2025].
[2]Ahmad M, DeLaCruz J, Ramamurthy A, 2022. Heterogeneous integration of chiplets: cost and yield tradeoff analysis. Proc 23rd Int Conf on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems, p.1-9.
[3]Ali H, Tariq UU, Hardy J, et al., 2021. A survey on system-level energy optimisation for MPSoCs in IoT and consumer electronics. Comput Sci Rev, 41: 100416.
[4]Baktash JA, Dawodi M, 2023. GPT-4: a review on advancements and opportunities in natural language processing. J Elect Electron Eng, 2(4):548-549.
[5]Binkert N, Beckmann B, Black G, et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2):1-7.
[6]Bohr M, 2009. The new era of scaling in an SoC world. Proc IEEE Int Solid-State Circuits Conf—Digest of Technical Papers, p.23-28.
[7]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.
[8]Burns JA, Aull BF, Chen CK, et al., 2006. A wafer-scale 3-D circuit integration technology. IEEE Trans Electron Dev, 53(10):2507-2516.
[9]Chakaravarthy RV, Kwon H, Jiang H, 2021. Vision control unit in fully self-driving vehicles using Xilinx MPSoC and opensource stack. Proc 26th Asia and South Pacific Design Automation Conf, p.311-317.
[10]Chen SX, Li SY, Zhuang Z, et al., 2024. Floorplet: performance-aware floorplan framework for chiplet integration. IEEE Trans Comput-Aid Des Integr Circ Syst, 43(6):1638-1649.
[11]Chen YW, Wang RH, Cheng YH, et al., 2024. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans Comput-Aid Des Integr Circ Syst, 43(7):2163-2176.
[12]Chowdhery A, Narang S, Devlin J, et al., 2023. PaLM: scaling language modeling with pathways. J Mach Learn Res, 24(1):240.
[13]Deng CH, Li XY, Feng Z, et al., 2022. GARNet: reduced-rank topology learning for robust and scalable graph neural networks.
[14]Feng YX, Ma KS, 2022. Chiplet actuary: a quantitative cost model and multi-chiplet architecture exploration. Proc 59th ACM/IEEE Design Automation Conf, p.121-126.
[15]Hammarlund P, Martinez AJ, Bajwa AA, et al., 2014. Haswell: the fourth-generation Intel Core Processor. IEEE Micro, 34(2):6-20.
[16]Han YH, Xu HB, Lu MX, et al., 2024. The big chip: challenge, model and architecture. Fund Res, 4(6):1431-1441.
[17]Hu Y, Lin XH, Wang HZ, et al., 2024. Wafer-scale computing: advancements, challenges, and future perspectives. IEEE Circ Syst Mag, 24(1):52-81.
[18]IEEE, 2024. International Roadmap for Devices and SystemsTM. https://irds.ieee.org/images/files/pdf/2024/2024IRDS_MET.pdf [Accessed on Dec. 1, 2025].
[19]Jung S, Lee H, Myung S, et al., 2022. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature, 601(7892):211-216.
[20]Leon V, Minaidis P, Lentaris G, et al., 2023. Accelerating AI and computer vision for satellite pose estimation on the Intel Myriad X embedded SoC. Microprocess Microsyst, 103: 104947.
[21]Leon V, Minaidis P, Soudris D, et al., 2024. MPAI: a co-processing architecture with MPSoC & AI accelerators for vision applications in space. Proc 31st IEEE Int Conf on Electronics, Circuits and Systems, p.1-2.
[22]Li FP, Wang Y, Cheng YQ, et al., 2022. GIA: a reusable general interposer architecture for agile chiplet integration. Proc IEEE/ACM Int Conf on Computer Aided Design, p.1-9.
[23]Li ZS, Liu LB, Deng YD, et al., 2017. Aggressive pipelining of irregular applications on reconfigurable hardware. Proc 44th Annual Int Symp on Computer Architecture, p.575-586.
[24]Loh GH, Xie Y, Black B, 2007. Processor design in 3D die-stacking technologies. IEEE Micro, 27(3):31-48.
[25]Markidis S, Der Chien SW, Laure E, et al., 2018. NVIDIA tensor core programmability, performance & precision. Proc IEEE Int Parallel and Distributed Processing Symp Workshops, p.522-531.
[26]Pal S, Petrisko D, Tomei M, et al., 2019. Architecting waferscale processors—a GPU case study. Proc IEEE Int Symp on High Performance Computer Architecture, p.250-263.
[27]Pal S, Liu JY, Alam I, et al., 2021. Designing a 2048-chiplet, 14336-core waferscale processor. Proc 58th ACM/IEEE Design Automation Conf, p.1183-1188.
[28]Panousopoulos V, Papaloukas E, Leon V, et al., 2024. HW/SW co-design on embedded SoC FPGA for star tracking optimization in space applications. J Real-Time Image Proc, 21(1):16.
[29]Patel D, Wong G, 2023. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE. Proc Demystifying GPT-4: the Engineering Tradeoffs that Led OpenAI to Their Architecture, p.1-17.
[30]Raffel C, Shazeer N, Roberts A, et al., 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res, 21(140):1-67.
[31]Shao YS, Clemons J, Venkatesan R, et al., 2019. Simba: scaling deep-learning inference with multi-chip-module-based architecture. Proc 52nd Annual IEEE/ACM Int Symp on Microarchitecture, p.14-27.
[32]Talpes E, Williams D, Sarma DD, 2022. DOJO: the microarchitecture of Tesla’s exa-scale computer. Proc IEEE Hot Chips 34 Symp, p.1-28.
[33]Tang XP, Tian RQ, Wong DF, 2001. Fast evaluation of sequence pair in block placement by longest common subsequence computation. IEEE Trans Comput-Aid Des Integr Circ Syst, 20(12):1406-1413.
[34]Tatar G, Bayar S, Çiçek İ, 2024. Real-time multi-learning deep neural network on an MPSoC-FPGA for intelligent vehicles: harnessing hardware acceleration with pipeline. IEEE Trans Intell Veh, 9(6):5021-5032.
[35]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models.
[36]Turner WJ, Poulton JW, Wilson JM, et al., 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. Proc IEEE Custom Integrated Circuits Conf, p.1-8.
[37]Venkatesan R, Shao YS, Wang MR, et al., 2019. MAGNet: a modular accelerator generator for neural networks. Proc IEEE/ACM Int Conf on Computer-Aided Design, p.1-8.
[38]Weng J, Liu SH, Dadu V, et al., 2020. DSAGen: synthesizing programmable spatial accelerators. Proc 47th Annual Int Symp on Computer Architecture, p.268-281.
[39]Wu JX, Liu QR, Shen JL, et al., 2024. From SoC to SDSoW: a new paradigm for microelectronics development. Sci Sin Inform, 54:1350-1368.
[40]Xu QZ, Wang CH, Li ZQ, et al., 2025. A wafer-scale heterogeneous integration thermal simulator. Appl Therm Eng, 264: 125459.
[41]Yenduri G, Ramalingam M, Selvi GC, et al., 2024. GPT (generative pre-trained transformer)—a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 12:54608-54649.
[42]Zhang JM, Wang XY, Ye YY, et al., 2024. M2M: a fine-grained mapping framework to accelerate multiple DNNs on a multi-chiplet architecture. IEEE Trans VLSI Syst, 32(10):1864-1877.
[43]Zhang SS, Roller S, Goyal N, et al., 2022. OPT: open pre-trained transformer language models.
[44]Zhu JC, Xue CH, Chen YQ, et al., 2025. Theseus: exploring efficient wafer-scale chip design for large language models. IEEE Trans Comput-Aid Des Integr Circ Syst, 44(12):4793-4806.
[45]Zhuang Z, Yu B, Chao KY, et al., 2022. Multi-package co-design for chiplet integration. Proc 41st IEEE/ACM Int Conf on Computer-Aided Design, Article 4.
[46]Zou DX, Wang GG, Pan G, et al., 2016. A modified simulated annealing algorithm and an excessive area model for floorplanning using fixed-outline constraints. Front Inform Technol Electron Eng, 17(11):1228-1244.
Open peer comments: Debate/Discuss/Question/Opinion
<1>