Full Text:  <0>

CLC number: 

On-line Access: 2025-08-15

Received: 2024-11-11

Revision Accepted: 2025-07-08

Crosschecked: 0000-00-00

Cited: 0

Clicked: 2

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


Anadaptive outlier correction quantization method for vision transformers


Author(s):  Zheyang LI1, 2, Chaoxiang LAN2, Kai ZHANG2, Wenming TAN2, Ye REN2, Jun XIAO1

Affiliation(s):  1College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):  lizheyang@zju.edu.cn, lizheyang@hikvision.com, junx@cs.zju.edu.cn

Key Words:  Transformer; Model compression and acceleration; Post-training quantization; Outlier


Share this article to: More <<< Previous Paper|Next Paper >>>

Zheyang LI1,2, Chaoxiang LAN2, Kai ZHANG2, Wenming TAN2, Ye REN2, Jun XIAO1. Anadaptive outlier correction quantization method for vision transformers[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400994

@article{title="Anadaptive outlier correction quantization method for vision transformers",
author="Zheyang LI1,2, Chaoxiang LAN2, Kai ZHANG2, Wenming TAN2, Ye REN2, Jun XIAO1",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2400994"
}

%0 Journal Article
%T Anadaptive outlier correction quantization method for vision transformers
%A Zheyang LI1
%A
2
%A Chaoxiang LAN2
%A Kai ZHANG2
%A Wenming TAN2
%A Ye REN2
%A Jun XIAO1
%J Frontiers of Information Technology & Electronic Engineering
%P
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2400994"

TY - JOUR
T1 - Anadaptive outlier correction quantization method for vision transformers
A1 - Zheyang LI1
A1 -
2
A1 - Chaoxiang LAN2
A1 - Kai ZHANG2
A1 - Wenming TAN2
A1 - Ye REN2
A1 - Jun XIAO1
J0 - Frontiers of Information Technology & Electronic Engineering
SP -
EP -
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2400994"


Abstract: 
Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements. This poses challenges for deployment on resource-constrained devices. Quantization, as an effective model compression method, can significantly reduce the operational time of transformers on edge devices. Notably, transformers display more substantial outliers than CNNs, leading to uneven feature distribution among different channels and tokens. To address this issue, we propose an Adaptive Outlier Correction Quantization (AOCQ) method for transformers, which significantly alleviates the adverse effects of these outliers. AOCQ adjusts the notable discrepancies in channels and tokens across three levels: operator level, framework level, and loss level. We introduce a new operator that equivalently balances the activations across different channels and insert an extra stage to optimize the activation quantization step on the framework level. Additionally, we transfer the imbalanced activations across tokens and channels to the optimization of model weights on the loss level. Based on the theoretical study, our method can reduce the quantization error. The effectiveness of the proposed method is verified on various benchmark models and tasks. Surprisingly, DeiT-B with 8-bit post-training quantization (PTQ) can achieve 81.57% with only 0.3% drop while enjoying 4× faster runtime. Furthermore, the weights of Swin and DeiT on several tasks including classification and object detection can be post-quantized to an ultra-low 4 bits, with a minimal accuracy loss of 2%, while requiring nearly 8× less memory.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE