A novel algorithm for detection of speaker emotion for interaction between human and robot

Authors

Department of Mechanical Engineering, K. N. Toosi University of Technology, Tehran, Iran

Abstract

The easiest method of communication between humans and machines is through speech and one of the essentials aspects of this relationship, is perception of humanistic sentiments by machine. As a result, getting speech’s patterns and creating a system based on this model has been a challenge for researchers in recent years. Although the emotion shown in speech could ranges in a very divergent spectrum, because of pander to accent, culture and environment, but a fixed patterns could be found in feelings of people’s speech. In this paper, a new algorithm to detect emotion in human voice is presented. In proposed algorithm, features are extracted from the audio signal, inspired by human hearing. And then the optimal features are chosen from the extracted features with the aim of increasing the speed and accuracy of diagnosis with an intelligent method. In addition, classifying is done by combining a set classifiers subsequently, patterns of anger, joy, fear, boredom, disgust and sadness is distinguished by the designed intelligent system. Results of the simulations of the implemented algorithm is presented with two databases, Farsi and Germany and then compared with the outcomes of other algorithms with the same databases. Results indicate that the proposed algorithm could predict emotions of anger, joy, fears, boredom, disgust and sadness with good accuracy. This algorithm could be used in designing control systems and robot guidance. In addition, emotion recognition system could be utilized in psychology, medicine, and behavioral science and security applications such as polygraph.

Keywords

Main Subjects


[1]      Nicholson, J., Takahashi, K., and Nakatsu, M., "Emotion Recognition in Speech using Neural Networks", Neural Computation, Vol. 1, pp. 9290–9296, (2000).
 
[2]    Cowie, E., Campbell, N., Cowie, R., and Roach, P., "Emotional Speech: Towards a New Generation of Data Bases Original Research Article", Speech Communication, Vol. 40, pp. 33-60, (2003).
 
[3]      Banse, R., and Scherer, K., "Acoustic Profiles in Vocal Emotion Expression", J. Pers. Soc. Psychol. Vol. 70, No. 3, pp. 614–636, (1996).
 
[4]   Hozjan, V., and Kacic, Z., "Context-independent Multi Lingual Emotion Recognition from Speech Signal", Int. J. Speech Technol, Vol. 6, pp. 311–320, (2003).
 
[5]   Kleinginna, Jr., and Kleinginna, A.M., "A Categorized List of Emotion Definitions, with Suggestions for a Consensual Definition", Motivation Emotion, Vol. 5, No. 4, pp. 345–379, (1981).
[6]   Fernandez, R., "A Computational Model for the Automatic Recognition of Affect in Speech", Ph.D. Thesis, Massachusetts Institute of Technology, MIT Media Arts and Science, February, (2004).
 
[7]   Bradley, M. M., and Lang, P. J., "Measuring Emotion: The Self-assessment Manikin and the Semantic Differential", Journal of Behavior Therapy & Experimental Psychiatry, Vol.  25, No. 1, pp. 49-59, (1990).
 
[8]   Schubiger, M., "English in to Nation: its form and Function", Niemeyer, Tubingen, Germany, (1958).
 
[9]   O’Connor, J., and Arnold, G., "Intonation of Colloquial English", Seconded. Longman, London, UK, (1973).
 
[10]  Kim, D.H., "Fuzzy Rule Based Voice Emotion Control for user Demand Speech Generation of Emotion Robot", Computer Applications Technology (ICCAT), Germany, (2013).
 
[11]  Sudhkar, R., “Analysis of Speech Features for Emotion Detection: A Review”, International Conference on Computing Communication Control and Automation, Germany, (2015).
 
[12]    Gharsellaoui, S., Selouani, S., and Dahmane, A., "Automatic Emotion Recognition using Auditory and Prosodic Indicative Features",Proceeding of the IEEE 28th Canadian Conference on Electrical and Computer Engineering, Halifax, Canada, (2015).
 
[13]  Bosse, T., and Zwanenburg, E.," There's Always Hope: Enhancing Agent Believability Through Expectation-based Emotions", ACII 2009, 3rd International Conference, Amsterdam, Netherlands, pp. 1–8, (2009).
 
[14]  Hudlicka, E., and Broekens, J., "Foundations for Modeling Emotions in Game Characters: Modelling Emotion Effects on Cognition", ACII 2009, 3rd International Conference, (2009).
 
[15]  Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., and McOwan, Peter, W., "It's All in the Game: Towards an Affect Sensitive and Context Aware Game Companion", ACII 2009, 3rd International Conference, Amsterdam, Netherlands, pp. 1–8, (2009).
 
[16]    Ling, He., Lech, M., Maddage, N., and Allen, N., "Emotion Recognition in Speech of Parents of Depressed Adolescents", Bioinformatics and Biomedical Engineering, ICBBE 2009, 3rd International Conference, Beijing, China,  pp. 1–4, June (2009).
 
[17]   Ververidis, D., Constantine, K., "Automatic Speech Classification to Five Emotional States Based on Gender Information", Signal Processing Conference, 12th European,  pp. 341-344, (2004).
 
[18]   Morrison, D., Wang, R., and Silva, L., "Ensemble Methods for Spoken Emotion Recognition in Call-centres Original Research Article", Speech Communication, Vol.  49, No. 2, pp. 98-112, (2007).
[19]    Pao, T.L., Liao, V.Y., Chen, Y.T., and Yeh, J.H, "Comparison of Several Classifiers for Emotion Recognition from Noisy Mandarin Speech", Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kaohsiung City, Taiwan, November 26-28, pp. 21-26, (2007).
 
[20]    Yang, X.S., "A New Metaheuristic Bat-inspired Algorithm", Nature Inspired Cooperative Strategies for Optimization (NICSO), Studies in Computational Intelligence, Vol. 284, pp. 65-74, (2010).
 
[21]    Khan, K., and Sahai, A., "A Comparison of BA, GA, PSO, BP and LM for Training Feed forward Neural Networks in E-Learning Context", I.J. Intelligent Systems and Applications, Vol. 7, pp. 23-29, (2012).
 
[22]     Kuncheva, L.I., "Combining Pattern Classifiers: Methods and Algorithms", John Wiley & Sons, (2004).
 
[23]  Krogh, A., and Vedelsby, J., "Neural Network Ensembles, Cross Validation and Active Learning", Advances in Neural Information Processing Systems, Vol. 7, pp. 85-91, (1995).
 
[24]    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B., "A Database of German Emotional Speech", Proceedings Inter Speech, Lissabon, Portugal, (2005).
 
[25]    Mervi, H., and Esmailian, Z., "Showing up New Data Base for Detect Emotion from Speech", (2013).
 
[26]    Esmailian, Z., and Marvi, H., "Recognition of Emotion in Speech using Variogram Based Features", Malaysian Journal of Computer Science, Vol. 27, No. 3, pp. 21-27, (2014).
 
[27]    Ayadi, M., Kamel, M.S., and Karray, F., "Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models", IEEE, (2007).
Volume 20, Issue 1 - Serial Number 50
System Dynamics and Solid Mechanics
June 2018
Pages 92-109
  • Receive Date: 26 October 2016
  • Revise Date: 23 January 2017
  • Accept Date: 01 July 2018