Full Text:  <3>

CLC number: 

On-line Access: 2025-08-15

Received: 2025-03-14

Revision Accepted: 2025-06-04

Crosschecked: 0000-00-00

Cited: 0

Clicked: 6

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


Temporal fidelity enhancement for video action recognition


Author(s):  Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2

Affiliation(s):  1Faculty of Information Technology, Beijing Univerisity of Technology, Beijing 100124, China; more

Corresponding email(s):  swxu@emails.bjut.edu.cn, jiaxibin@bjut.edu.cn, sunqianmei5825@126.com, cj006006@126.com

Key Words:  Action recognition; Disentangled information bottleneck (DisenIB); Temporal modeling; Temporal fidelity


Share this article to: More <<< Previous Paper|Next Paper >>>

Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2. Temporal fidelity enhancement for video action recognition[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500164

@article{title="Temporal fidelity enhancement for video action recognition",
author="Shaowu XU1, Xibin JIA1, Qianmei SUN2, Jing CHANG2",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2500164"
}

%0 Journal Article
%T Temporal fidelity enhancement for video action recognition
%A Shaowu XU1
%A Xibin JIA1
%A Qianmei SUN2
%A Jing CHANG2
%J Frontiers of Information Technology & Electronic Engineering
%P
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2500164"

TY - JOUR
T1 - Temporal fidelity enhancement for video action recognition
A1 - Shaowu XU1
A1 - Xibin JIA1
A1 - Qianmei SUN2
A1 - Jing CHANG2
J0 - Frontiers of Information Technology & Electronic Engineering
SP -
EP -
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2500164"


Abstract: 
Temporal attention mechanisms are essential for video action recognition, enabling models to focus on semantically informative moments. However, these models frequently exhibit temporal infidelity-misaligned attention weights caused by limited training diversity and the absence of fine-grained temporal supervision. While video-level labels provide coarse-grained action guidance, the lack of detailed constraints allows attention noise to persist, especially in complex scenarios with distracting spatial elements. To address this issue, we propose temporal fidelity enhancement (TFE), a competitive learning paradigm based on disentangled information bottleneck (Dis-enIB) theory. TFE mitigates temporal infidelity by decoupling action-relevant semantics from spurious correlations through adversarial feature disentanglement. Using pre-trained representations for initialization, TFE establishes an adversarial process in which segments with elevated temporal attention compete against contexts with diminished action relevance. This mechanism ensures temporal consistency and enhances the fidelity of attention patterns with-out requiring explicit fine-grained supervision. Extensive studies on UCF-101, HMDB-51, and Charades benchmarks validate the effectiveness of our method, with significant improvements in action recognition accuracy.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE