In computer applications such as synergistic games, gratifying robots, and speech recognition systems, the ability to identify emotions is invaluable. But how should effective algorithms and systems be designed for discerning emotions from diverse speech? Lotfian and Busso present a machine learning program for effectively boosting the training procedure of deep neural networks (DNNs) in speech sentiment detection systems.
The authors present extensive yet concise reviews of research efforts and solutions related to understanding speech emotion. Without a doubt, new systems must be equipped with (1) models that capitalize on using artificial intelligence (AI) techniques to delve into the inadequate training datasets for recognizing speech emotion, and (2) algorithms for identifying known uncertainties in the emotional speech by humans and computers with imperfect training.
Considering the challenges that speech sentence evaluators of abstruse emotional content face, how should reliable metrics be created for reconciling the differences among evaluators to construct reliable classifications of emotional perceptions? The authors present statistical models for identifying, clustering, and categorizing emotional attributes. Specifically, a regression model is proposed for categorizing the dimensions of emotions for speech evaluation, a dichotomous model is used to identify the boundaries of emotional dimensions, and an algorithm is used to estimate the accuracy of speech emotional ratings by alternative evaluators.
The paper presents numerous experiments performed with datasets from various sources. Compared to similar research, the results tend to indicate the reliability of the multifaceted approach outlined. The authors clearly recognize the impact of being able to accurately identify training samples. They offer new techniques for determining reliability among speech evaluators.