Words are important to express ourselves. What we don’t say, however, may be even more instrumental in conveying emotions. Humans can often tell how people around them feel through non-verbal cues embedded in our voice.
Now, researchers in Germany have sought to find out if technical tools, too, can accurately predict emotional undertones in fragments of voice recordings. To do so, they compared three ML models’ accuracy to recognize diverse emotions in audio excepts. Their results were published in Frontiers in Psychology.
“Here we show that machine learning can be used to recognize emotions from audio clips as short as 1.5 seconds,” said the article’s first author Hannes Diemerling, a researcher at the Center for Lifespan Psychology at the Max Planck Institute for Human Development. “Our models achieved an accuracy similar to humans when categorizing meaningless sentences with emotional coloring spoken by actors.”