Legacy Theses & Dissertations (2009 - 2024)

Emotion forecasting in dyadic conversation : characterizing and predicting future emotion with audio-visual information using deep learning

Sadat Shahriar, University at Albany, State University of New YorkFollow

Date of Award

1-1-2019

Language

English

Document Type

Master's Thesis

Degree Name

Master of Science (MS)

College/School/Department

Department of Electrical and Computer Engineering

Content Description

1 online resource (vii, 49 pages) : illustrations (chiefly color)

Dissertation/Thesis Chair

Gary J Saulnier

Committee Members

Daphney-Stavroula Zois, Mohammed Agamy

Keywords

Deep Learning, Emotion, LSTM, Machine Learning, Multimodal emotion, Emotion recognition, Interactive multimedia, Machine learning, Computer software, Speech processing systems

Subject Categories

Artificial Intelligence and Robotics | Computer Engineering | Psychology

DOI

https://doi.org/10.54014/E8SW-R2VE

Abstract

Emotion forecasting is the task of predicting the future emotion of a speaker, i.e., the emotion label of the future speaking turn–based on the speaker’s past and current audio-visual cues. Emotion forecasting systems require new problem formulations that differ from traditional emotion recognition systems. In this thesis, we first explore two types of forecasting windows(i.e., analysis windows for which the speaker’s emotion is being forecasted): utterance forecasting and time forecasting. Utterance forecasting is based on speaking turns and forecasts what the speaker’s emotion will be after one, two, or three speaking turns. Time forecasting forecasts what the speaker’s emotion will be after a certain range of time, such as 3–8, 8–13, and 13–18 seconds. We then investigate the benefit of using the past audio-visual cues in addition to the current utterance. We design emotion forecasting models using deep learning. We compare the performances of FC-DNN, D-LSTM, and D-BLSTM which allows us to examine the benefit of modelling dynamic patterns in emotion forecasting tasks. Our experimental results on the IEMOCAP bench-mark dataset demonstrate that D-BLSTM and D-LSTM outperform FC-DNN by up to2.42% in unweighted recall. When using both the current and past utterances, deep dynamic models show an improvement of up to 2.39% compared to their performance when using only the current utterance. We further analyze the benefit of using current and past utterance information compared to using the current and randomly chosen utterance in-formation, and we find the performance improvement rises to 7.53%. The novelty in this study comes from its formulation of emotion forecasting problems and the understanding of how current and past audio-visual cues reveal future emotional information.

Recommended Citation

Shahriar, Sadat, "Emotion forecasting in dyadic conversation : characterizing and predicting future emotion with audio-visual information using deep learning" (2019). Legacy Theses & Dissertations (2009 - 2024). 2379.
https://doi.org/10.54014/E8SW-R2VE

Download

Included in

Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Psychology Commons

COinS

Scholars Archive

Legacy Theses & Dissertations (2009 - 2024)

Emotion forecasting in dyadic conversation : characterizing and predicting future emotion with audio-visual information using deep learning

Date of Award

Language

Document Type

Degree Name

College/School/Department

Content Description

Dissertation/Thesis Chair

Committee Members

Keywords

Subject Categories

DOI

Abstract

Recommended Citation

Included in

Browse

Search

Formatting Guidelines & Resources

Links

Scholars Archive

Legacy Theses & Dissertations (2009 - 2024)

Emotion forecasting in dyadic conversation : characterizing and predicting future emotion with audio-visual information using deep learning

Author

Date of Award

Language

Document Type

Degree Name

College/School/Department

Content Description

Dissertation/Thesis Chair

Committee Members

Keywords

Subject Categories

DOI

Abstract

Recommended Citation

Included in

Share

Browse

Search

Formatting Guidelines & Resources

Links