CLC number: TP391.4
On-line Access: 2025-05-06
Received: 2024-01-27
Revision Accepted: 2024-06-27
Crosschecked: 2025-05-06
Cited: 0
Clicked: 1914
Citations: Bibtex RefMan EndNote GB/T7714
Lijian GAO, Qing ZHU, Yaxin SHEN, Qirong MAO, Yongzhao ZHAN. Dynamic prompting class distribution optimization for semi-supervised sound event detection[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400061 @article{title="Dynamic prompting class distribution optimization for semi-supervised sound event detection", %0 Journal Article TY - JOUR
基于动态提示类分布优化的半监督声音事件检测方法1江苏大学计算机科学与通信工程学院,中国镇江市,212016 2江苏省大数据泛在感知与智能农业应用工程研究中心,中国镇江市,212016 摘要:半监督声音事件检测任务通常利用大规模无标签数据和合成数据提升模型的泛化能力,从而有效降低模型在少量有标注数据上的过拟合。然而,泛化训练过程通常伴随伪标签噪声和域知识差异造成的干扰。为缓解半噪声干扰类分布学习的问题,提出一种基于动态提示优化的半监督类分布学习方法(PADO)。具体而言,当给定真实标签数据时,PADO动态嵌入一组可学习的独立参数(类令牌)以挖掘真实分布的先验知识,作为额外提示信息,与带噪后验分布知识动态交互,从而实现类分布知识的优化,并保留模型泛化性能。基于此,PADO能够显著提升类分布学习效率。在DCASE2019、2020及2021数据集上的实验结果表明,PADO明显优于当前先进方法,且易于迁移至其他主流模型。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Bilen Ç, Ferroni G, Tuveri F, et al., 2020. A framework for the robust evaluation of sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.61-65. ![]() [2]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. ![]() [3]Chan TK, Chin CS, 2021. Detecting sound events using convolutional macaron net with pseudo strong labels. Proc IEEE 23rd Int Workshop on Multimedia Signal Processing, p.1-6. ![]() [4]Crocco M, Cristani M, Trucco A, et al., 2016. Audio surveillance: a systematic review. ACM Comput Surv, 48(4):52. ![]() [5]Dinkel H, Wu MY, Yu K, 2021. Towards duration robust weakly supervised sound event detection. IEEE/ACM Trans Audio Speech Lang Process, 29:887-900. ![]() [6]Fu YW, Xu KL, Mi HB, et al., 2019. A mobile application for sound event detection. Proc 28th Int Joint Conf on Artificial Intelligence, p.1-7. ![]() [7]Gao LJ, Mao QR, Dong M, et al., 2019. On learning disentangled representation for acoustic event detection. Proc 27th ACM Int Conf on Multimedia, p.2006-2014. ![]() [8]Gao LJ, Zhou L, Mao QR, et al., 2022. Adaptive hierarchical pooling for weakly-supervised sound event detection. Proc 30th ACM Int Conf on Multimedia, p.1779-1787. ![]() [9]Gao LJ, Mao QR, Dong M, 2023. Joint-Former: jointly regularized and locally down-sampled Conformer for semi-supervised sound event detection. Proc 24th Annual Conf of the Int Speech Communication Association, p.2753-2757. ![]() [10]Gao LJ, Mao QR, Dong M, 2024. On local temporal embedding for semi-supervised sound event detection. IEEE/ACM Trans Audio Speech Lang Process, 32:1687-1698. ![]() [11]Gao TY, Fisch A, Chen DQ, 2021. Making pre-trained language models better few-shot learners. Proc 59th Annual Meeting of the Association for Computational Linguistics and 11th Int Joint Conf on Natural Language Processing, p.3816-3830. ![]() [12]Gemmeke JF, Ellis DPW, Freedman D, et al., 2017. Audio Set: an ontology and human-labeled dataset for audio events. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.776-780. ![]() [13]Gu YX, Han X, Liu ZY, et al., 2022. PPT: pre-trained prompt tuning for few-shot learning. Proc 60th Annual Meeting of the Association for Computational Linguistics, p.8410-8423. ![]() [14]Gu ZD, He KJ, 2024. Affective prompt-tuning-based language model for semantic-based emotional text generation. Int J Semantic Web Inform Syst, 20(1):1-19. ![]() [15]Guan YD, Xue JB, Zheng GB, et al., 2022. Sparse self-attention for semi-supervised sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.821-825. ![]() [16]Gulati A, Qin J, Chiu CC, et al., 2020. Conformer: convolution-augmented Transformer for speech recognition. Proc 21st Annual Conf of the Int Speech Communication Association, p.5036-5040. ![]() [17]Imoto K, Tonami N, Koizumi Y, et al., 2020. Sound event detection by multitask learning of sound events and scenes with soft scene labels. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.621-625. ![]() [18]Jia ML, Tang LM, Chen BC, et al., 2022. Visual prompt tuning. Proc 17th European Conf on Computer Vision, p.709-727. ![]() [19]Koh CY, Chen YS, Liu YW, et al., 2021. Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.376-380. ![]() [20]Kong QQ, Xu Y, Wang WW, et al., 2020. Sound event detection of weakly labelled data with CNN-Transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process, 28:2450-2460. ![]() [21]Li YX, Liu ML, Drossos K, et al., 2020. Sound event detection via dilated convolutional recurrent neural networks. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.286-290. ![]() [22]Lin LW, Wang XD, Liu H, et al., 2020. Guided learning for weakly-labeled semi-supervised sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.626-630. ![]() [23]Mesaros A, Heittola T, Virtanen T, 2016. Metrics for polyphonic sound event detection. Appl Sci, 6(6):162. ![]() [24]Mesaros A, Heittola T, Virtanen T, et al., 2021. Sound event detection: a tutorial. IEEE Signal Process Mag, 38(5):67-83. ![]() [25]Miyazaki K, Komatsu T, Hayashi T, et al., 2020a. Conformer-based sound event detection with semi-supervised learning and data augmentation. Proc 5th Workshop on Detection and Classification of Acoustic Scenes and Events, p.100-104. ![]() [26]Miyazaki K, Komatsu T, Hayashi T, et al., 2020b. Weakly-supervised sound event detection with self-attention. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.66-70. ![]() [27]Murugesan B, Hussain R, Bhattacharya R, et al., 2024. Prompting classes: exploring the power of prompt class learning in weakly supervised semantic segmentation. Proc IEEE/CVF Winter Conf on Applications of Computer Vision, p.290-301. ![]() [28]Park JS, Kim SH, 2020. Sound learning-based event detection for acoustic surveillance sensors. Multimed Tools Appl, 79(23-24):16127-16139. ![]() [29]Serizel R, Turpault N, Shah A, et al., 2020. Sound event detection in synthetic domestic environments. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.86-90. ![]() [30]Singhal K, Azizi S, Tu T, et al., 2023. Large language models encode clinical knowledge. Nature, 620:172-180. ![]() [31]Sohn K, Chang H, Lezama J, et al., 2023. Visual prompt tuning for generative transfer learning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.19840-19851. ![]() [32]Tarvainen A, Valpola H, 2017. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. 31st Int Conf on Neural Information Processing Systems, p.1195-1204. ![]() [33]Turpault N, Serizel R, Shah AP, et al., 2019. Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. Workshop on Detection and Classification of Acoustic Scenes and Events, p.253-257. ![]() [34]Turpault N, Wisdom S, Erdogan H, et al., 2020. Improving sound event detection in domestic environments using sound separation. 5th Workshop on Detection and Classification of Acoustic Scenes and Events, p.205-209. ![]() [35]Wakayama K, Saito S, 2022. CNN-Transformer with self-attention network for sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.806-810. ![]() [36]Wang YH, Chauhan J, Wang W, et al., 2023. Universality and limitations of prompt tuning. 37th Int Conf on Neural Information Processing Systems, Article 3305. ![]() [37]Wisdom S, Erdogan H, Ellis DPW, et al., 2021. What’s all the fuss about free universal sound separation data? Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.186-190. ![]() [38]Xu H, Xie HT, Tan QF, et al., 2023. Meta semi-supervised medical image segmentation with label hierarchy. Health Inform Sci Syst, 11(1):26. ![]() [39]Yan J, Song Y, Dai LR, et al., 2020. Task-aware mean teacher method for large scale weakly labeled semi-supervised sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.326-330. ![]() [40]Zheng X, Song Y, Dai LR, et al., 2021a. An effective mutual mean teaching based domain adaptation method for sound event detection. Proc 22nd Annual Conf of the Int Speech Communication Association, p.556-560. ![]() [41]Zheng X, Song Y, McLoughlin I, et al., 2021b. An improved mean teacher based method for large scale weakly labeled semi-supervised sound event detection. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.356-360. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>