JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Anadaptive outlier correction quantization method for vision transformers

Author(s): Zheyang LI^1, ², Chaoxiang LAN², Kai ZHANG², Wenming TAN², Ye REN², Jun XIAO¹
Affiliation(s): ¹College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): lizheyang@zju.edu.cn, lizheyang@hikvision.com, junx@cs.zju.edu.cn
Key Words: Transformer; Model compression and acceleration; Post-training quantization; Outlier

Share this article to： More <<< Previous Paper \|Next Paper >>>

Zheyang LI^1,², Chaoxiang LAN², Kai ZHANG², Wenming TAN², Ye REN², Jun XIAO¹. Anadaptive outlier correction quantization method for vision transformers[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400994

@article{title="Anadaptive outlier correction quantization method for vision transformers",
author="Zheyang LI^1,², Chaoxiang LAN², Kai ZHANG², Wenming TAN², Ye REN², Jun XIAO¹",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2400994"
}

%0 Journal Article
%T Anadaptive outlier correction quantization method for vision transformers
%A Zheyang LI^1
%A²
%A Chaoxiang LAN²
%A Kai ZHANG²
%A Wenming TAN²
%A Ye REN²
%A Jun XIAO¹
%J Frontiers of Information Technology & Electronic Engineering
%P
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2400994"

TY - JOUR
T1 - Anadaptive outlier correction quantization method for vision transformers
A1 - Zheyang LI^{1
A1 -}²
A1 - Chaoxiang LAN²
A1 - Kai ZHANG²
A1 - Wenming TAN²
A1 - Ye REN²
A1 - Jun XIAO¹
J0 - Frontiers of Information Technology & Electronic Engineering
SP -
EP -
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2400994"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements. This poses challenges for deployment on resource-constrained devices. Quantization, as an effective model compression method, can significantly reduce the operational time of transformers on edge devices. Notably, transformers display more substantial outliers than CNNs, leading to uneven feature distribution among different channels and tokens. To address this issue, we propose an Adaptive Outlier Correction Quantization (AOCQ) method for transformers, which significantly alleviates the adverse effects of these outliers. AOCQ adjusts the notable discrepancies in channels and tokens across three levels: operator level, framework level, and loss level. We introduce a new operator that equivalently balances the activations across different channels and insert an extra stage to optimize the activation quantization step on the framework level. Additionally, we transfer the imbalanced activations across tokens and channels to the optimization of model weights on the loss level. Based on the theoretical study, our method can reduce the quantization error. The effectiveness of the proposed method is verified on various benchmark models and tasks. Surprisingly, DeiT-B with 8-bit post-training quantization (PTQ) can achieve 81.57% with only 0.3% drop while enjoying 4× faster runtime. Furthermore, the weights of Swin and DeiT on several tasks including classification and object detection can be post-quantized to an ultra-low 4 bits, with a minimal accuracy loss of 2%, while requiring nearly 8× less memory.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference