Full Text:   <2864>

CLC number: TP332

On-line Access: 

Received: 2008-07-27

Revision Accepted: 2008-10-28

Crosschecked: 2009-04-27

Cited: 2

Clicked: 3537

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE A 2009 Vol.10 No.7 P.1067~1074


New method for high performance multiply-accumulator design

Author(s):  Bing-jie XIA, Peng LIU, Qing-dong YAO

Affiliation(s):  Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   icysummer@zju.edu.cn, liupeng@isee.zju.edu.cn

Key Words:  Multiply-accumulator (MAC), Pipeline, Compressor, Partial product reduction tree (PPRT), Split structure

Share this article to: More <<< Previous Article|

Bing-jie XIA, Peng LIU, Qing-dong YAO. New method for high performance multiply-accumulator design[J]. Journal of Zhejiang University Science A, 2009, 10(7): 1067~1074.

@article{title="New method for high performance multiply-accumulator design",
author="Bing-jie XIA, Peng LIU, Qing-dong YAO",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T New method for high performance multiply-accumulator design
%A Bing-jie XIA
%A Peng LIU
%A Qing-dong YAO
%J Journal of Zhejiang University SCIENCE A
%V 10
%N 7
%P 1067~1074
%@ 1673-565X
%D 2009
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A0820566

T1 - New method for high performance multiply-accumulator design
A1 - Bing-jie XIA
A1 - Peng LIU
A1 - Qing-dong YAO
J0 - Journal of Zhejiang University Science A
VL - 10
IS - 7
SP - 1067
EP - 1074
%@ 1673-565X
Y1 - 2009
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A0820566

This study presents a new method of 4-pipelined high-performance split multiply-accumulator (MAC) architecture, which is capable of supporting multiple precisions developed for media processors. To speed up the design further, a novel partial product compression circuit based on interleaved adders and a modified hybrid partial product reduction tree (PPRT) scheme are proposed. The MAC can perform 1-way 32-bit, 4-way 16-bit signed/unsigned multiply or multiply-accumulate operations and 2-way parallel multiply add (PMADD) operations at a high frequency of 1.25 GHz under worst-case conditions and 1.67 GHz under typical-case conditions, respectively. Compared with the MAC in 32-bit microprocessor without interlocked piped stages (MIPS), the proposed design shows a great advantage in speed. Moreover, an improvement of up to 32% in throughput is achieved. The MAC design has been fabricated with Taiwan Semiconductor Manufacturing Company (TSMC) 90-nm CMOS standard cell technology and has passed a functional test.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Abdelgawad, A., Bayoumi, M., 2007. High Speed and Area-efficient Multiply Accumulate (MAC) Unit for Digital Signal Processing Applications. Proc. IEEE Int. Symp. on Circuits and Systems. New Orleans, USA, p.3199-3202.

[2] Chang, C.H., Gu, J.M., Zhang, M.Y., 2004. Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., 51(10):1985-1997.

[3] Chang, C.H., Gu, J.M., Zhang, M.Y., 2005. A review of 0.18-μm full adder performances for tree structured arithmetic circuits. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 13(6):686-695.

[4] Chen, K.H., Chu, Y.S., 2007. A low-power multiplier with the spurious power suppression technique. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 15(7):846-850.

[5] Chong, K.S., Gwee, B.H., Chang, J.S., 2007. Low energy 16-bit Booth leapfrog array multiplier using dynamic adders. IET Proc. Circ., Devices & Syst., 1(2):170-174.

[6] Clark, L., Hoffman, E.J., Miller, J., Biyani, M., Liao, L.Y., Strazdus, S., Morrow, M., Velarde, K.E., Yarch, M.A., 2001. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE J. Solid-State Circ., 36(11):1599-1608.

[7] Danysh, A., Tan, D., 2005. Architecture and implementation of a vector/SIMD multiply-accumulate unit. IEEE Trans. Comput., 54(3):284-293.

[8] Elguibaly, F., 2000. A fast parallel multiplier-accumulator using the modified Booth algorithm. IEEE Trans. Circuits Syst. II: Analog Digital Sign. Process., 47(9):902-908.

[9] Fang, C.J., Huang, C.H., Wang, J.S., Yeh, C.W., 2002. Fast and Compact Dynamic Ripple Carry Adder Design. Proc. IEEE Asia-Pacific Conf. on ASIC. Taipei, Taiwan, p.25-28.

[10] Kim, Y., Kim, L., 2001. 64-bit carry-select adder with reduced area. Electr. Lett., 37(10):614-615.

[11] Kwon, O., Nowka, K., Swartzlander, E.E., 2000. A 16-bit×16-bit MAC Design Using Fast 5:2 Compressors. Proc. IEEE Int. Conf. on Application-specific Systems, Architectures, and Processors. Boston, USA, p.235-243.

[12] Liao, Y., Roberts, D., 2002. A high-performance and low-power 32-bit multiply-accumulate unit with single-instruction-multiple-data (SIMD) feature. IEEE J. Solid-State Circ., 37(7):926-931.

[13] MIPS Technologies, Inc., 2006. MIPS32 34KTM Processor Core Family Software User’s Manual. p.29-52.

[14] MIPS Technologies, Inc., 2007. MIPS32 74KTM Processor Core Family Software User’s Manual. p.29-40.

[15] Oklobdzija, V., Villeger, D., 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 3(2):292-301.

[16] Pai, Y.T., Chen, Y.K., 2004. The Fastest Carry Lookahead Adder. IEEE Int. Workshop on Electronic Design, Test and Applications, Perth, Australia, p.434-436.

[17] Parandeh-Afshar, H., Ahmadvand, M., Safari, S., 2006. A Novel Merged Multiplier-accumulator Embedded in DSP Coprocessor. Proc. IEEE Int. Conf. on Electronics, Circuits and Systems, Nice, France, p.119-122.

[18] Perri, S., Corsonello, P., Cocorullo, G., 2005. Efficient recursive multiply architecture for FPGAs. Electr. Lett., 41(24):1314.

[19] Rabaey, J.M., 2002. Digital Integrated Circuits—A Design Perspective. Prentice-Hall International Publisher, New Jersey, USA, p.564-586.

[20] Sundeepkumar, A., Pavankumar, V., Yorkesh, R., 2008. Energy Efficient, High Performance Circuits for Arithmetic Units. Proc. Int. Conf. on VLSI Design, Bangalore, India, p.371-376.

[21] Tan, D., Danysh, A., Liebelt, M., 2003. Multiple-precision Fixed-point Vector Multiply-accumulator Using Shared Segmentation. Proc. IEEE Symp. on Computer Arithmetic, Santiago de. Compostela, Spain, p.12-19.

[22] Wallace, C.S., 1964. A suggestion for a fast multiplier. IEEE Trans. Electr. Comput., 13(1):14-17.

[23] Wang, L.R., Jou, S.J., Lee, C.L., 2008. A Well-structured Modified Booth Multiplier Design. IEEE Int. Symp. on VLSI Design, Automation and Test, Hsinchu, Taiwan, p.85-88.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE