Full Text:   <8585>

Summary:  <2012>

CLC number: TN912.3

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 2020-09-08

Cited: 0

Clicked: 5747

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Qi-rong Mao

https://orcid.org/0000-0002-0616-4431

Jing-jing Chen

https://orcid.org/0000-0003-2968-0313

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2020 Vol.21 No.11 P.1639-1650

http://doi.org/10.1631/FITEE.2000019


Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder


Author(s):  Jing-jing Chen, Qi-rong Mao, You-cai Qin, Shuang-qing Qian, Zhi-shen Zheng

Affiliation(s):  School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China; more

Corresponding email(s):   2221808071@stmail.ujs.edu.cn, mao_qr@ujs.edu.cn, 2211908026@stmail.ujs.edu.cn, 2211908025@stmail.ujs.edu.cn, 3160602062@stmail.ujs.edu.cn

Key Words:  Speech separation, Generative factors, Autoencoder, Deep learning



Abstract: 
Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE