CLC number: TP391.4
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2013-06-06
Cited: 4
Clicked: 7009
Qi-rong Mao, Xiao-lei Zhao, Zheng-wei Huang, Yong-zhao Zhan. Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features[J]. Journal of Zhejiang University Science C, 2013, 14(7): 573-582.
@article{title="Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features",
author="Qi-rong Mao, Xiao-lei Zhao, Zheng-wei Huang, Yong-zhao Zhan",
journal="Journal of Zhejiang University Science C",
volume="14",
number="7",
pages="573-582",
year="2013",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.CIDE1310"
}
%0 Journal Article
%T Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
%A Qi-rong Mao
%A Xiao-lei Zhao
%A Zheng-wei Huang
%A Yong-zhao Zhan
%J Journal of Zhejiang University SCIENCE C
%V 14
%N 7
%P 573-582
%@ 1869-1951
%D 2013
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.CIDE1310
TY - JOUR
T1 - Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
A1 - Qi-rong Mao
A1 - Xiao-lei Zhao
A1 - Zheng-wei Huang
A1 - Yong-zhao Zhan
J0 - Journal of Zhejiang University Science C
VL - 14
IS - 7
SP - 573
EP - 582
%@ 1869-1951
Y1 - 2013
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.CIDE1310
Abstract: functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.
[1]Bachorowski, J.A., Smoski, M.J., Owren, M.J., 2011. The acoustic features of human laughter. J. Acoust. Soc. Am., 110(3):1581-1597.
[2]Berler, A., Shimony, S.E., 1997. Bayes Networks for Sonar Sensor Fusion. Proc. 13th Conf. on Uncertainty in Artificial Intelligence, p.14-21.
[3]Devillers, L., Vidrascu, L., 2006. Real-Life Emotions Detection with Lexical and Paralinguistic Cues on Human-Human Call Center Dialogs. Proc. Interspeech, p.801-804.
[4]El Ayadi, M., Kamel, M.S., Karray, F., 2011. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn., 44(3):572-587.
[5]Fujie, S., Ejiri, Y., Matsusaka, Y., Kikuchi, H., 2003. Recognition of Paralinguistic Information and Its Application to Spoken Dialogue System. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, p.231-236.
[6]Hayashi, Y., 1999. Recognition of Vocal Expression of Emotions in Japanese: Using the Interjection eh ‘Korean’. Proc. Int. Conf. on Phonetic Sciences, p.2355-2359.
[7]Huang, C.L., Wang, C.J., 2006. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl., 31(2):231-240.
[8]Huang, C.W., Jin, Y., Zhao, Y., Yu, Y.H., Zhao, L., 2010. Design and establishment of practical speech emotion database. Techn. Acoust., 29(4):396-399 (in Chinese).
[9]Huq, S., Moussavi, Z., 2012. Acoustic breath-phase detection using tracheal breath sounds. Med. Biol. Eng. Comput., 50(3):297-308.
[10]Ishi, C.T., Ishiguro, H., Hagita, N., 2006. Evaluation of Prosodic and Voice Quality Features on Automatic Extraction of Paralinguistic Information. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.9-15.
[11]Ishi, C.T., Ishiguro, H., Hagita, N., 2008. Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality. Speech Commun., 50(6):531-543.
[12]Jones, C., Jonsson, I.M., 2008. Using paralinguistic cues in speech to recognize emotions in older car drivers. LNCS, 4868:229-240.
[13]Kennedy, L.S., Ellis, D.P.W., 2004. Laughter Detection in Meetings. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing Meeting Recognition Workshop, p.118-121.
[14]Kleckova, J., 2009. Important Nonverbal Attributes for Spontaneous Speech Recognition. 4th Int. Conf. on Systems, p.13-16.
[15]Li, C.G., 2004. Paralinguistic Studying. MS Thesis, Heilongjiang University, Harbin, China (in Chinese).
[16]Li, Y.C., Wang, B., Wei, J., Qian, C., Huang, Y., 2002. An efficient combination rule of evidence theory. J. Data Acquis. Process., 17(1):33-36 (in Chinese).
[17]Li, Y.X., He, Q.H., 2011. Detecting laughter in spontaneous speech by constructing laughter bouts. Int. J. Speech Technol., 14(3):211-225.
[18]Maekawam, K., 2004. Production and Perception of ‘Paralinguistic’ Information. Int. Conf. on Speech Prosody, p.367-374.
[19]Mao, Q.R., Wang, X.J., Zhan, Y.Z., 2010. Speech emotion recognition method based on improved decision tree and layered feature selection. Int. J. Human. Rob., 7(2):245-261.
[20]Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H., 2006. Detection of cough signals in continuous audio recordings using hidden Markov models. IEEE Trans. Biomed. Eng., 53(6):1078-1083.
[21]Pal, P., Iyer, A.N., Yantorno, R.E., 2006. Emotion Detection from Infant Facial Expressions and Cries. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.721-724.
[22]Petridis, S., Pantic, M., 2008. Audiovisual Discrimination Between Laughter and Speech. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.5117-5120.
[23]Pudil, P., Novovicova, J., Kittler, J., 1994. Flating search methods in feature selection. Pattern Recogn. Lett., 15(11):1119-1125.
[24]Sundaramb, S., Narayananc, S., 2007. Automatic acoustic synthesis of human-like laughter. J. Acoust. Soc. Am., 121(1):527-535.
[25]Szameitat, D.P., Darwin, C.J., Szameitat, A.J., 2007. Formant Characteristics of Human Laughter. Interdisciplinary Workshop on the Phonetics of Laughter, p.4-5.
[26]Truong, K.P., van Leeuwen, D.A., 2005. Automatic Detection of Laughter. Proc. 9th European Conf. on Speech Communication and Technology, p.485-488.
[27]Truong, K.P., van Leeuwen, D.A., 2007. Automatic discrimination between laughter and speech. Speech Commun., 49(2):144-158.
[28]Yang, Y.M., Liu, X., 1999. A Re-examination of Text Categorization Methods. Proc. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.42-49.
Open peer comments: Debate/Discuss/Question/Opinion
<1>