Global-Local-Feature-Fused Driver Speech Emotion Detection for Intelligent Cockpit in Automated Driving

Affective interaction between the intelligent cockpit and humans is becoming an emerging topic full of opportunities. Robust recognition of the driver's emotions is the first step for affective interaction, and the intelligent cockpit recognizes emotions through the driver's speech, which has a wide range of technical application potential. In this paper, the authors first proposed a multi-feature fusion parallel structure speech emotion recognition network, which complementarily fuses the global acoustic features and local spectral features of the entire speech. Second, the authors designed and conducted the speech data collection under the driver's emotion and established the driver's speech emotion (SpeechEmo) dataset in the dynamic driving environment including 40 participants. Finally, the proposed model was validated on the SpeechEmo and public datasets, and quantitative analysis was carried out. It was found that the proposed model achieved advanced recognition performance, and the ablation experiments verified the importance of different components of the model. The proposed model and dataset are beneficial to the realization of human-vehicle affective interaction in intelligent cockpits in the future toward a better human experience.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01897214
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Oct 23 2023 4:52PM