UT doctoral student teaches robot to express emotion
On October 6, Kairi Tamuri defended the doctoral dissertation ‘Basic emotions in speech read out in Estonian: acoustic analysis and modelling’ at the University of Tartu, during the preparation of which she managed to identify the acoustic sound of anger and sadness in text that is read aloud and teach a robot producing speech to express these.
Synthetic speech is used in many areas, such as communication between humans and machines, multimedia and assistive tools for people with impairments, which is why it is necessary for synthetic speech to sound natural. One way to achieve this is to add emotion to it with acoustic models. To create such models, it is necessary to know how emotions are expressed in human speech vocally, meaning what exactly the acoustic parameters are so that the machine can express recognisable emotion when following them.
“Human speech always includes emotion, which is why it should also be perceptible in synthetic speech mimicking human speech,” said Tamuri. “The relevance of speech is something you sense daily, for instance, during phone conversations when all communication is done at the level of sound alone.”
The author had two aims in her dissertation: to determine the acoustic expression of three emotions (joy, sadness and anger) in speech read out in Estonian; and to create emotional speech acoustic models for an Estonian speech synthesiser based on the results. Since the expression of emotions differs between languages, both aims required separate study.
To create the models it had to be identified whether and to what extent emotions affect the values of the acoustic parameters – such as tone, intensity and speech tempo – and which parameters distinguish emotions from one another and neutral speech. The aim set in the thesis was partially achieved: the synthesiser satisfactorily expressed anger and sadness, but based on the acoustic models of joy it was unable to express the emotion so that it could be understood.
“The problem was not recognising joy in Estonian human speech, since there’s nothing difficult about expressing or perceiving joy,” Tamuri explained. “The difficulty was in the speech synthesising. The situation could be improved by expanding the search area of the parameters, meaning playing with these parameters on a larger scale than in this research. Emotional synthetic speech should also definitely be tried out with machine learning methods. Studying the acoustics of emotion and modelling emotions are still in their early stages and there’s room for improvement.”
The synthetic clips can be heard on the website of the Institute of the Estonian Language (EKI).
Additional information: Kairi Tamuri, dissertation author, (+372) 50 65 572, kairi.tamuri [ät] eki.ee