Full Text:   <684>

Summary:  <462>

CLC number: TP302

On-line Access: 2015-12-07

Received: 2015-02-01

Revision Accepted: 2015-08-26

Crosschecked: 2015-11-04

Cited: 0

Clicked: 1929

Citations:  Bibtex RefMan EndNote GB/T7714


Zhi-xiang Chen


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.12 P.1018-1033


Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity

Author(s):  Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou

Affiliation(s):  Department of Automation, Tsinghua University, Beijing 100084, China; more

Corresponding email(s):   chen-zx10@mails.tsinghua.edu.cn

Key Words:  Schedule refining, Multi-core processor, Heterogeneity, Representative chip operating point

Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou. Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(12): 1018-1033.

@article{title="Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity",
author="Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
%A Zhi-xiang Chen
%A Zhao-lin Li
%A Shan Cao
%A Fang Wang
%A Jie Zhou
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 12
%P 1018-1033
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500035

T1 - Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
A1 - Zhi-xiang Chen
A1 - Zhao-lin Li
A1 - Shan Cao
A1 - Fang Wang
A1 - Jie Zhou
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 12
SP - 1018
EP - 1033
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500035

Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations, can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining (HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains. We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme, representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%.

This paper is concerned with task scheduling techniques for optimal throughput on homogeneous multi-core processors taking into account intra-/inter-die frequency difference caused by silicon process variation. The paper proposes an HATS scheme, which adapts the existing DAG-based scheduling techniques to actual maximum frequencies of cores. Some representive chip operating points are chosen first from all possible conditions to reduce memory usage, and then these points are stored into on-chip memory. During chip running, one appropriate point is further chosen and bound to cores according to actual maximum clock frequencies. The paper shows that the HATS scheme can improve the throughput of application benchmarks compared with other scheduling techniques. The study is well motivated and the authors clearly describe the scheduling challenge of different core clock frequencies, due to intra-/inter-die silicon process variation. Both candidate’s selection and its binding to chip are well presented (in particular, Algorithms 1,2,3 are very helpful for the reader). The paper also defines the problem in a formulation. That is useful and I enjoyed reading that.




Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Aguilera, P., Lee, J., Farmahini-Farahani, A., et al., 2014. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. Design, Automation and Test in Europe Conf. and Exhibition, p.176.1-176.6.

[2]Bell, S., Edwards, B., Amann, J., et al., 2008. TILE64 processor: a 64-core SoC with mesh interconnect. IEEE Int. Solid-State Circuits Conf., p.588-598.

[3]Bowman, K.A., Duvall, S.G., Meindl, J.D., 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE J. Solid-State Circ., 37(2):183-190.

[4]Bowman, K.A., Alameldeen, A.R., Srinivasan, S.T., et al., 2009. Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors. IEEE Trans. VLSI Syst., 17(12):1679-1690.

[5]Chon, H., Kim, T., 2009. Timing variation-aware task scheduling and binding for MPSoC. Proc. Asia and South Pacific Design Automation Conf., p.137-142.

[6]Dick, R.P., Rhodes, D.L., Wolf, W., 1998. TGFF: task graphs for free. Proc. 6th Int. Workshop on Hardware/Software Codesign, p.97-101.

[7]Dietrich, M., Haase, J., 2012. Process Variations and Probabilistic Integrated Circuit Design. Springer, New York, p.69-89.

[8]Ferrandi, F., Lanzi, P.L., Pilato, C., et al., 2010. Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 29(6):911-924.

[9]Huang, L., Xu, Q., 2010. Performance yield-driven task allocation and scheduling for MPSoCs under process variation. Proc. 47th Design Automation Conf., p.326-331.

[10]Huang, W., Rajamani, K., Stan, M.R., et al., 2011. Scaling with design constraints: predicting the future of big chips. IEEE Micro, 31(4):16-29.

[11]ITRS, 2013. International Technology Roadmap for Semiconductors. Available from http://www.itrs.net/reports.html [Accessed on Feb. 1, 2015]

[12]Khailany, B., Dally, W.J., Kapasi, U.J., et al., 2001. Imagine: media processing with streams. IEEE Micro, 21(2):35-46.

[13]Khodabandeloo, B., Khonsari, A., Gholamian, F., et al., 2014. Scenario-based quasi-static task mapping and scheduling for temperature-efficient MPSoC design under process variation. Microprocess. Microsyst., 38(5):399-414.

[14]Lin, Y.C., Lu, F., Cheng, K.T., 2005. Pseudo-functional scan-based BIST for delay fault. Proc. 23rd IEEE VLSI Test Symp., p.229-234.

[15]Mirzoyan, D., Akesson, B., Goossens, K., 2012. Process-variation aware mapping of real-time streaming applications to MPSoCs for improved yield. Proc. 13th Int. Symp. on Quality Electronic Design, p.41-48.

[16]Mirzoyan, D., Akesson, B., Goossens, K., 2014. Process-variation-aware mapping of best-effort and real-time streaming applications to MPSoCs. ACM Trans. Embed. Comput. Syst., 13(2s):61.1-61.24.

[17]Momtazpour, M., Goudarzi, M., Sanaei, E., 2010a. Variation-aware task and communication scheduling in MPSoCs for power-yield maximization. IEICE Trans. Fundament. Electron. Commun. Comput. Sci., 93(12):2542-2550.

[18]Momtazpour, M., Sanaei, E., Goudarzi, M., 2010b. Power-yield optimization in MPSoC task scheduling under process variation. Proc. 11th Int. Symp. on Quality Electronic Design, p.747-754.

[19]Momtazpour, M., Ghorbani, M., Goudarzi, M., et al., 2011. Simultaneous variation-aware architecture exploration and task scheduling for MPSoC energy minimization. Proc. 21st Symp. on GLSVLSI, p.271-276.

[20]Momtazpour, M., Goudarzi, M., Sanaei, E., 2013. Static statistical MPSoC power optimization by variation-aware task and communication scheduling. Microprocess. Microsyst., 37(8B):953-963.

[21]Omara, F.A., Arafa, M.M., 2010. Genetic algorithms for task scheduling problem. J. Parall. Distrib. Comput., 70(1):13-22.

[22]Ramamritham, K., 1995. Allocation and scheduling of precedence-related periodic tasks. IEEE Trans. Parall. Distrib. Syst., 6(4):412-420.

[23]Raychowdhury, A., Ghosh, S., Roy, K., 2005. A novel on-chip delay measurement hardware for efficient speed-binning. Proc. 11th IEEE Int. On-Line Testing Symp., p.287-292.

[24]Sarangi, S.R., Greskamp, B., Teodorescu, R., et al., 2008. VARIUS: a model of process variation and resulting timing errors for microarchitects. IEEE Trans. Semicond. Manufact., 21(1):3-13.

[25]Singhal, L., Bozorgzadeh, E., 2008. Process variation aware system-level task allocation using stochastic ordering of delay distributions. Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, p.570-574.

[26]Stuijk, S., Geilen, M., Basten, T., 2006. SDF3: SDF for free. Proc. 6th Int. Conf. on Application of Concurrency to System Design, p.276-278.

[27]Taylor, M.B., Kim, J., Miller, J., et al., 2002. The raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25-35.

[28]Topcuoglu, H., Hariri, S., Wu, M.Y., 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parall. Distrib. Syst., 13(3):260-274.

[29]Von Mises, R., 1964. Mathematical Theory of Probability and Statistics. Academic Press, New York, p.329-367.

[30]Wang, F., Chen, Y., Nicopoulos, C., et al., 2011. Variation-aware task and communication mapping for MPSoC architecture. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 30(2):295-307.

[31]Yi, Y., Han, W., Zhao, X., et al., 2009. An ILP formulation for task mapping and scheduling on multi-core architectures. Design, Automation and Test in Europe Conf. and Exhibition, p.33-38.

[32]Yu, Z., Baas, B.M., 2009. High performance, energy efficiency, and scalability with GALS chip multiprocessors. IEEE Trans. VLSI Syst., 17(1):66-79.

[33]Zhao, W., Liu, F., Agarwal, K., et al., 2009. Rigorous extraction of process variations for 65-nm CMOS design. IEEE Trans. Semicond. Manufact., 22(1):196-203.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE