End-to-End Speech Emotion Recognition Using Multimodal Data Fusion

Authors

  • Rafael Barbosa Department of Computer Science, Federal University of São João del-Rei, Brazil
  • Renato Costa Department of Computer Engineering, Pontifical Catholic University of Rio de Janeiro, Brazil

Abstract

This paper presents an end-to-end framework for speech emotion recognition (SER) by integrating multimodal data fusion techniques. We propose a novel approach that combines acoustic, linguistic, and visual features to enhance the accuracy and robustness of emotion recognition systems. Our approach leverages advanced deep learning models for feature extraction and fusion, followed by a unified classification framework. Experiments on benchmark datasets demonstrate the effectiveness of our method compared to traditional SER systems.

Downloads

Published

2024-07-18

Issue

Section

Articles