
CLC number: TP393.03
On-line Access: 2026-03-23
Received: 2025-08-29
Revision Accepted: 2026-02-02
Crosschecked: 2026-03-23
Cited: 0
Clicked: 8
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0000-4973-0563
https://orcid.org/0000-0002-9957-7365
https://orcid.org/0009-0000-6542-9797
https://orcid.org/0009-0008-1608-6597
Shuaikang HOU, Qinrang LIU, Wenbo ZHANG, Ping LV, Peijie LI, Wei GUO. FTHOE: a Hamiltonian-driven fault-tolerant routing algorithm for wafer-scale interconnection networks[J]. Journal of Zhejiang University Science C, 2026, 27(3): 1-20.
@article{title="FTHOE: a Hamiltonian-driven fault-tolerant routing algorithm for wafer-scale interconnection networks",
author="Shuaikang HOU, Qinrang LIU, Wenbo ZHANG, Ping LV, Peijie LI, Wei GUO",
journal="Journal of Zhejiang University Science C",
volume="27",
number="3",
pages="1-20",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0005"
}
%0 Journal Article
%T FTHOE: a Hamiltonian-driven fault-tolerant routing algorithm for wafer-scale interconnection networks
%A Shuaikang HOU
%A Qinrang LIU
%A Wenbo ZHANG
%A Ping LV
%A Peijie LI
%A Wei GUO
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 3
%P 1-20
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0005
TY - JOUR
T1 - FTHOE: a Hamiltonian-driven fault-tolerant routing algorithm for wafer-scale interconnection networks
A1 - Shuaikang HOU
A1 - Qinrang LIU
A1 - Wenbo ZHANG
A1 - Ping LV
A1 - Peijie LI
A1 - Wei GUO
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 3
SP - 1
EP - 20
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0005
Abstract: As application scenarios continue to grow in complexity, wafer-scale systems impose increasingly stringent requirements on the reliability of interconnection networks. Under inevitable process-induced manufacturing defects and environmental disturbances, node and link faults occur frequently in wafer-scale interconnection networks, making fault tolerance a key factor in improving overall system reliability. To address chiplet node faults and link faults in wafer-scale interconnection networks, this paper proposes a load-balancing virtual-channel-less fault-tolerant routing algorithm, termed FTHOE. The proposed algorithm is based on a Hamiltonian routing strategy and the odd–;even turn model. By exploiting local fault vector information at the current node, FTHOE dynamically adjusts the output port selection priority, thereby shortening detour paths around faulty regions while effectively reducing the probability of packets being trapped in fault neighborhoods. At the same time, FTHOE preserves a relatively high degree of minimal path diversity by retaining the adaptiveness of Hamiltonian-based routing under fault conditions, thereby enhancing network load-balancing and overall communication performance. Simulation results demonstrate that, compared with existing fault-tolerant routing algorithms, FTHOE significantly reduces average network latency and improves throughput, exhibiting robust fault tolerance and load-balancing performance under complex fault scenarios.
[1]Agarwal N, Krishna T, Peh LS, et al., 2009. GARNET: a detailed on-chip network model inside a full-system simulator. IEEE Int Symp on Performance Analysis of Systems and Software, p.33-42.
[2]Bahrebar P, Stroobandt D, 2015. The Hamiltonian-based odd–even turn model for maximally adaptive routing in 2D mesh networks-on-chip. Comput Electr Eng, 45:386-401.
[3]Bohr M, 2007. A 30 year retrospective on Dennard’s MOSFET scaling paper. IEEE Sol-State Circ Soc Newsl, 12(1):11-13.
[4]Charif A, Zergainoh NE, Nicolaidis M, 2016. Addressing transient routing errors in fault-tolerant networks-on-chips. 21st IEEE European Test Symp, p.1-6.
[5]Dally WJ, Seitz CL, 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans Comput, C-36(5):547-553.
[6]Daneshtalab M, Ebrahimi M, Xu TC, et al., 2011. A generic adaptive path-based routing method for MPSoCs. J Syst Architect, 57(1):109-120.
[7]Ebrahimi M, Daneshtalab M, 2015. A light-weight fault-tolerant routing algorithm tolerating faulty links and routers. Computing, 97(6):631-648.
[8]Guan J, Cai JP, Wang YQ, et al., 2023. A low-cost oblivious and fault-tolerant routing strategy for NoCs. J Air Force Eng Univ, 24(1):95-102 (in Chinese).
[9]Hu Y, Lin XH, Wang HZ, et al., 2024. Wafer-scale computing: advancements, challenges, and future perspectives. IEEE Circ Syst Mag, 24(1):52-81.
[10]Jerger NE, Kannan A, Li ZM, et al., 2014. NoC architectures for silicon interposer systems: why pay for more wires when you can get them (from your interposer) for free? 47th Annual IEEE/ACM Int Symp on Microarchitecture, p.458-470.
[11]Joshi B, Thakur MK, 2023. A traffic intensive virtual channels allocation scheme in network-on-chip. Arab J Sci Eng, 48(8):9619-9633.
[12]Lowe-Power J, Ahmad AM, Akram A, et al., 2020. The gem5 simulator: version 20.0+.
[13]Mohapatra H, Rath AK, 2019. Fault tolerance in WSN through PE-LEACH protocol. IET Wirel Sens Syst, 9(6):358-365.
[14]Moore GE, 1998. Cramming more components onto integrated circuits. Proc IEEE, 86(1):82-85.
[15]Nehnouh C, Senouci M, 2019. A new fault tolerant routing algorithm for networks on chip. Int J Embed Real-Time Commun Syst, 10(3):68-85.
[16]Pal S, Petrisko D, Tomei M, et al., 2019. Architecting waferscale processors—a GPU case study. IEEE Int Symp on High Performance Computer Architecture, p.250-263.
[17]Pal S, Liu JY, Alam I, et al., 2021. Designing a 2048-chiplet, 14336-core waferscale processor. 58th ACM/IEEE Design Automation Conf, p.1183-1188.
[18]Rahaman MM, Ghosal P, Das TS, 2019. Latency, throughput and power aware adaptive NoC routing on orthogonal convex faulty region. J Circ Syst Comput, 28(4):1950055.
[19]Renani NB, Yaghoubi E, Sadehnezhad N, et al., 2022. NLR-OP: a high-performance optical router based on North-Last turning model for multicore processors. J Supercomput, 78(2):2442-2476.
[20]Reza A, Jolani P, Reshadi M, 2019. CAFT: cost-aware and fault-tolerant routing algorithm in 2D mesh network-on-chip. J Adv Comput Eng Technol, 5(4):205-212.
[21]Wu JX, Liu QR, Shen JL, et al., 2024. From SoC to SDSoW: a new paradigm for microelectronics development. Sci Sin Inform, 54(6):1350-1368 (in Chinese).
[22]Xie RL, Cai JP, Xin X, 2016. Simple fault-tolerant method to balance load in network-on-chip. Electron Lett, 52(10):814-816.
[23]Xie RL, Cai JP, Xin X, et al., 2018. LBFT: a fault-tolerant routing algorithm for load-balancing network-on-chip based on odd–even turn model. J Supercomput, 74(8):3726-3747.
[24]Xu Z, Kong DH, Liu JX, et al., 2025. WSC-LLM: efficient LLM service and architecture co-exploration for wafer-scale chips. Proc 52nd Annual Int Symp on Computer Architecture, p.1-17.
[25]Yang QZ, Wei TQ, Guan SH, et al., 2025. PD constraint-aware physical/logical topology co-design for network on wafer. Proc 52nd Annual Int Symp on Computer Architecture, p.49-64.
[26]Yu XM, Jiang DC, Deng JY, et al., 2025. Cramming a data center into one cabinet, a co-exploration of computing and hardware architecture of waferscale chip. Proc 52nd Annual Int Symp on Computer Architecture, p.631-645.
[27]Zhang YJ, Fan WB, Han ZJ, et al., 2021. Fault-tolerant routing algorithm based on disjoint paths in 3-ary n-cube networks with structure faults. J Supercomput, 77(11):13090-13114.
Open peer comments: Debate/Discuss/Question/Opinion
<1>