Date of Award
1-1-2019
Language
English
Document Type
Master's Thesis
Degree Name
Master of Science (MS)
College/School/Department
Department of Electrical and Computer Engineering
Content Description
1 online resource (vii, 49 pages) : illustrations (chiefly color)
Dissertation/Thesis Chair
Gary J Saulnier
Committee Members
Daphney-Stavroula Zois, Mohammed Agamy
Keywords
Deep Learning, Emotion, LSTM, Machine Learning, Multimodal emotion, Emotion recognition, Interactive multimedia, Machine learning, Computer software, Speech processing systems
Subject Categories
Artificial Intelligence and Robotics | Computer Engineering | Psychology
Abstract
Emotion forecasting is the task of predicting the future emotion of a speaker, i.e., the emotion label of the future speaking turn–based on the speaker’s past and current audio-visual cues. Emotion forecasting systems require new problem formulations that differ from traditional emotion recognition systems. In this thesis, we first explore two types of forecasting windows(i.e., analysis windows for which the speaker’s emotion is being forecasted): utterance forecasting and time forecasting. Utterance forecasting is based on speaking turns and forecasts what the speaker’s emotion will be after one, two, or three speaking turns. Time forecasting forecasts what the speaker’s emotion will be after a certain range of time, such as 3–8, 8–13, and 13–18 seconds. We then investigate the benefit of using the past audio-visual cues in addition to the current utterance. We design emotion forecasting models using deep learning. We compare the performances of FC-DNN, D-LSTM, and D-BLSTM which allows us to examine the benefit of modelling dynamic patterns in emotion forecasting tasks. Our experimental results on the IEMOCAP bench-mark dataset demonstrate that D-BLSTM and D-LSTM outperform FC-DNN by up to2.42% in unweighted recall. When using both the current and past utterances, deep dynamic models show an improvement of up to 2.39% compared to their performance when using only the current utterance. We further analyze the benefit of using current and past utterance information compared to using the current and randomly chosen utterance in-formation, and we find the performance improvement rises to 7.53%. The novelty in this study comes from its formulation of emotion forecasting problems and the understanding of how current and past audio-visual cues reveal future emotional information.
Recommended Citation
Shahriar, Sadat, "Emotion forecasting in dyadic conversation : characterizing and predicting future emotion with audio-visual information using deep learning" (2019). Legacy Theses & Dissertations (2009 - 2024). 2379.
https://scholarsarchive.library.albany.edu/legacy-etd/2379
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Psychology Commons