Анализ мультимодальных данных в распознавании эмоций

Даниил Алексеевич Бердышев; Алексей Геннадиевич Шишкин

doi:10.18255/1818-1015-2025-3-252-281

Анализ мультимодальных данных в распознавании эмоций

Даниил Алексеевич Бердышев, Алексей Геннадиевич Шишкин

https://doi.org/10.18255/1818-1015-2025-3-252-281

Полный текст:

PDF (Rus)

сгенерировать QR код

Аннотация

Использование мультимодальных данных в системах распознавания эмоций имеет огромный потенциал для приложений в различных областях: здравоохранение, человеко-машинные интерфейсы, контроль состояния операторов, маркетинг. До недавнего времени развитие систем распознавания эмоций на основе мультимодальных данных сдерживалось недостаточной мощностью вычислительной техники. Однако с появлением высокопроизводительных систем на основе графических процессоров и разработкой эффективных архитектур глубоких нейронных сетей произошел всплеск исследований, направленных на использование нескольких модальностей, таких как аудио, видео и физиологические сигналы, для точного определения человеческих эмоций. Помимо этого, немаловажную роль стали играть физиологические данные, полученные с помощью носимых устройств, благодаря относительной простоте их сбора и точности, которую они позволяют достигать. В данной статье рассмотрены архитектуры и методы применения глубоких нейронных сетей для анализа мультимодальных данных с целью повышения точности и надежности систем распознавания эмоций, представлены современные подходы к реализации таких алгоритмов и существующие открытые наборы мультимодальных данных.

Ключевые слова

распознавание эмоций, мультимодальные данные, нейронные сети, машинное обучение

MSC2020: 68T07

Об авторах

Даниил Алексеевич Бердышев

Московский государственный университет им. М.В. Ломоносова
Россия

Алексей Геннадиевич Шишкин

Московский государственный университет им. М.В. Ломоносова
Россия

Список литературы

1. P. Tarnowski, M. Kołodziej, A. Majkowski, and R. J. Rak, “Emotion recognition using facial expressions,” Procedia Computer Science, vol. 108, pp. 1175–1184, 2017.

2. O.-W. Kwon, K. Chan, J. Hao, and T.-W. Lee, “Emotion recognition by speech signals,” in Proceedings of the 8th European Conference on Speech Communication and Technology, 2003, pp. 125–128.

3. S. M. S. A. Abdullah, S. Y. A. Ameen, M. A. M. Sadeeq, and S. Zeebaree, “Multimodal emotion recognition using deep learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 1, pp. 73–79, 2021.

4. Y. Huang, C. Du, Z. Xue, X. Chen, H. Zhao, and L. Huang, “What makes multi-modal learning better than single (provably),” Advances in Neural Information Processing Systems, vol. 34, pp. 10944–10956, 2021.

5. L. Kessous, G. Castellano, and G. Caridakis, “Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis,” Journal on Multimodal User Interfaces, vol. 3, pp. 33–48, 2010.

6. H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal emotion recognition using deep learning architectures,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2016, pp. 1–9.

7. H. Huang, Z. Hu, W. Wang, and M. Wu, “Multimodal emotion recognition based on ensemble convolutional neural network,” IEEE Access, vol. 8, pp. 3265–3271, 2019.

8. M. G. Huddar, S. S. Sannakki, and V. S. Rajpurohit, “Attention-based multi-modal sentiment analysis and emotion detection in conversation using RNN,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 6, pp. 112–121, 2021.

9. H.-D. Le, G.-S. Lee, S.-H. Kim, S. Kim, and H.-J. Yang, “Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning,” IEEE Access, vol. 11, pp. 14742–14751, 2023.

10. H. Al Osman and T. H. Falk, “Multimodal affect recognition: Current approaches and challenges,” in Emotion and Attention Recognition Based on Biological Signals and Images, 2017, pp. 59–86.

11. D. Priyasad, T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Attention driven fusion for multi-modal emotion recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2020, pp. 3227–3231.

12. S. Poria, N. Majumder, R. Mihalcea, and E. Hovy, “Emotion recognition in conversation: Research challenges, datasets, and recent advances,” IEEE access, vol. 7, pp. 100943–100953, 2019.

13. R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler, and N. Sebe, “ASCERTAIN: Emotion and personality recognition using commercial sensors,” IEEE Transactions on Affective Computing, vol. 9, no. 2, pp. 147–160, 2016.

14. Z. Zhang, F. Ringeval, B. Dong, E. Coutinho, E. Marchi, and B. Schüller, “Enhanced semi-supervised learning for multimodal emotion recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 5185–5189.

15. A. Althnian et al., “Impact of dataset size on classification performance: an empirical evaluation in the medical domain,” Applied sciences, vol. 11, no. 2, p. 796, 2021.

16. Z. Zhang et al., “Multimodal spontaneous emotion corpus for human behavior analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3438–3446.

17. V. Ovsyannikova, “K voprosu o klassifikacii emocij: kategorial'nyj i mnogomernyj podhody,” Finansovaya Analitika: Problemy i Resheniya, no. 37, pp. 43–48, 2013.

18. A. Ortony and T. J. Turner, “What's basic about basic emotions?,” Psychological review, vol. 97, no. 3, pp. 315–331, 1990.

19. P. Ekman, “An argument for basic emotions,” Cognition & emotion, vol. 6, no. 3, pp. 169–200, 1992.

20. C. E. Izard, The psychology of emotions. Springer Science & Business Media, 1991.

21. W. Wundt, “Outlines of psychology,” in Wilhelm Wundt and the Making of a Scientific Psychology, 1980, pp. 179–195.

22. J. A. Russell, “Core affect and the psychological construction of emotion.,” Psychological review, vol. 110, no. 1, p. 145, 2003.

23. J. A. Miranda-Correa, M. K. Abadi, N. Sebe, and I. Patras, “Amigos: A dataset for affect, personality and mood research on individuals and groups,” IEEE Transactions on Affective Computing, vol. 12, no. 2, pp. 479–493, 2018.

24. W.-L. Zheng, W. Liu, Y. Lu, B.-L. Lu, and A. Cichocki, “Emotionmeter: A multimodal framework for recognizing human emotions,” IEEE Transactions on Cybernetics, vol. 49, no. 3, pp. 1110–1122, 2018.

25. V. Markova, T. Ganchev, and K. Kalinkov, “Clas: A database for cognitive load, affect and stress recognition,” in Proceedings of the International Conference on Biomedical Innovations and Applications, 2019, pp. 1–4.

26. K. Sharma, C. Castellini, E. L. Van Den Broek, A. Albu-Schaeffer, and F. Schwenker, “A dataset of continuous affect annotations and physiological signals for emotion analysis,” Scientific data, vol. 6, no. 1, p. 196, 2019.

27. S. Koelstra et al., “Deap: A database for emotion analysis; using physiological signals,” IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31, 2011.

28. M. K. Abadi, R. Subramanian, S. M. Kia, P. Avesani, I. Patras, and N. Sebe, “DECAF: MEG-based multimodal database for decoding affective physiological responses,” IEEE Transactions on Affective Computing, vol. 6, no. 3, pp. 209–222, 2015.

29. C. Y. Park et al., “K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations,” Scientific Data, vol. 7, no. 1, p. 293, 2020.

30. C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Language Resources and Evaluation, vol. 42, pp. 335–359, 2008.

31. S. Katsigiannis and N. Ramzan, “DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 1, pp. 98–107, 2017.

32. A. A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency, “Multimodal language analysis in the wild: CMU-MOSEI Dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2236–2246.

33. D. Kollias et al., “The 6th affective behavior analysis in-the-wild (abaw) competition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4587–4598.

34. T. Bänziger, D. Grandjean, and K. R. Scherer, “Emotion recognition from expressions in face, voice, and body: the Multimodal Emotion Recognition Test (MERT),” Emotion, vol. 9, no. 5, p. 691, 2009.

35. R. M. Sabour, Y. Benezeth, P. De Oliveira, J. Chappe, and F. Yang, “UBFC-Phys: A multimodal database for psychophysiological studies of social stress,” IEEE Transactions on Affective Computing, vol. 14, no. 1, pp. 622–636, 2021.

36. J. F. Cohn, Z. Ambadar, and P. Ekman, “Observer-based measurement of facial expression with the Facial Action Coding System,” in The Handbook of Emotion Elicitation and Assessment, 2007, pp. 203–221.

37. S. Gupta, “Facial emotion recognition in real-time and static images,” in Proceedings of the 2nd International Conference on Inventive Systems and Control, 2018, pp. 553–560.

38. A. R. Khan, “Facial emotion recognition using conventional machine learning and deep learning methods: current achievements, analysis and remaining challenges,” Information, vol. 13, no. 6, p. 268, 2022.

39. A. Savran et al., “Bosphorus database for 3D face analysis,” in Biometrics and Identity Management: First European Workshop, BIOID 2008, Roskilde, Denmark, May 7-9, 2008. Revised Selected Papers 1, 2008, pp. 47–56.

40. M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju, “Speech based human emotion recognition using MFCC,” in Proceedings of the International Conference on Wireless Communications, Signal Processing and Networking, 2017, pp. 2257–2260.

41. L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub, and C. Cleder, “Automatic speech emotion recognition using machine learning,” in Social Media and Machine Learning [Working Title], 2019, pp. hal-02432557.

42. A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Systems with Applications, vol. 57, pp. 117–126, 2016.

43. F. A. Acheampong, H. Nunoo-Mensah, and W. Chen, “Transformer models for text-based emotion detection: a review of BERT-based approaches,” Artificial Intelligence Review, vol. 54, no. 8, pp. 5789–5829, 2021.

44. S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations.” 2019.

45. M. Egger, M. Ley, and S. Hanke, “Emotion recognition from physiological signal analysis: A review,” Electronic Notes in Theoretical Computer Science, vol. 343, pp. 35–55, 2019.

46. H. Ma, J. Wang, H. Lin, B. Zhang, Y. Zhang, and B. Xu, “A transformer-based model with self-distillation for multimodal emotion recognition in conversations,” IEEE Transactions on Multimedia, vol. 26, pp. 776–788, 2023.

47. S. Anand, N. K. Devulapally, S. D. Bhattacharjee, and J. Yuan, “Multi-label emotion analysis in conversation via multimodal knowledge distillation,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 6090–6100.

48. Y. Li et al., “A novel bi-hemispheric discrepancy model for EEG emotion recognition,” IEEE Transactions on Cognitive and Developmental Systems, vol. 13, no. 2, pp. 354–367, 2020.

49. B. Fu, C. Gu, M. Fu, Y. Xia, and Y. Liu, “A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals,” Frontiers in Neuroscience, vol. 17, p. 1234162, 2023.

50. C. Tan, M. vSarlija, and N. Kasabov, “NeuroSense: Short-term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns,” Neurocomputing, vol. 434, pp. 137–148, 2021.

51. P. Bhattacharya, R. K. Gupta, and Y. Yang, “Exploring the contextual factors affecting multimodal emotion recognition in videos,” IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1547–1557, 2021.

52. H. Zhang, “Expression-EEG based collaborative multimodal emotion recognition using deep autoencoder,” IEEE Access, vol. 8, pp. 164130–164143, 2020.

53. F. Al-Naima, S. Y. Ameen, and A. F. Al-Saad, “Destroying steganography content in image files,” in Proceedings of IEEE Fifth International Symposium on Communication Systems, Networks and Digital Signal Processing, 2006, pp. 1–4.

54. S. Bouktif, A. Fiaz, A. Ouni, and M. A. Serhani, “Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting,” Energies, vol. 13, no. 2, p. 391, 2020.

55. T. D. Nguyen, “Multimodal emotion recognition using deep learning techniques,” phdthesis, Queensland University of Technology, 2020.

56. A. Joshi, A. Bhat, A. Jain, A. V. Singh, and A. Modi, “COGMEN: COntextualized GNN based multimodal emotion recognitioN.” 2022.

57. J.-H. Lee, H.-J. Kim, and Y.-G. Cheong, “A multi-modal approach for emotion recognition of TV drama characters using image and text,” in Proceedings of the IEEE International Conference on Big Data and Smart Computing, 2020, pp. 420–424.

58. W. Ai, F. Zhang, Y. Shou, T. Meng, H. Chen, and K. Li, “Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2025, vol. 39, pp. 11418–11426.

59. S. Siriwardhana, A. Reis, R. Weerasekera, and S. Nanayakkara, “Jointly Fine-Tuning "BERT-like’ Self Supervised Models to Improve Multimodal Speech Emotion Recognition.” 2020.

60. D. N. Krishna and A. Patil, “Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks,” in Proceedings of the Interspeech, 2020, pp. 4243–4247.

61. B. Fu, C. Gu, M. Fu, Y. Xia, and Y. Liu, “A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals,” Frontiers in Neuroscience, vol. 17, p. 1234162, 2023.

62. Z. Pan, Z. Luo, J. Yang, and H. Li, “Multi-modal Attention for Speech Emotion Recognition.” 2020.

63. Y. Huang, J. Lin, C. Zhou, H. Yang, and L. Huang, “Modality competition: What makes joint training of multi-modal network fail in deep learning? (provably),” in International Conference on Machine Learning, 2022, pp. 9226–9259.

64. W. Wang, D. Tran, and M. Feiszli, “What makes training multi-modal classification networks hard?,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12695–12705.

65. H. Li, X. Li, P. Hu, Y. Lei, C. Li, and Y. Zhou, “Boosting multi-modal model performance with adaptive gradient modulation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22214–22224.

66. K. Kontras, C. Chatzichristos, M. Blaschko, and M. De Vos, “Improving Multimodal Learning with Multi-Loss Gradient Modulation.” 2024.

67. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, pp. 12449–12460, 2020.

68. W. Liu, W.-L. Zheng, and B.-L. Lu, “Emotion recognition using multimodal deep learning,” in Proceedings of the Neural Information Processing: 23rd International Conference, Part II 23, 2016, pp. 521–529.

69. J. Ma, H. Tang, W.-L. Zheng, and B.-L. Lu, “Emotion recognition using multimodal residual LSTM network,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 176–183.

70. H. Tang, W. Liu, W.-L. Zheng, and B.-L. Lu, “Multimodal emotion recognition using deep neural networks,” in Proceedings of the Neural Information Processing: 24th International Conference, Part IV 24, 2017, pp. 811–819.

71. P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and S. Zafeiriou, “End-to-end multimodal emotion recognition using deep neural networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1301–1309, 2017.

72. W. Zhang, B. Ma, F. Qiu, and Y. Ding, “Multi-modal facial affective analysis based on masked autoencoder,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5793–5802.

73. R. G. Praveen and J. Alam, “Recursive joint cross-modal attention for multimodal fusion in dimensional emotion recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4803–4813.

Рецензия

Для цитирования:

Бердышев Д.А., Шишкин А.Г. Анализ мультимодальных данных в распознавании эмоций. Моделирование и анализ информационных систем. 2025;32(3):252-281. https://doi.org/10.18255/1818-1015-2025-3-252-281

For citation:

Berdyshev D.A., Shishkin A.G. Multimodal data analysis in emotion recognition: a review. Modeling and Analysis of Information Systems. 2025;32(3):252-281. (In Russ.) https://doi.org/10.18255/1818-1015-2025-3-252-281

JATS XML

Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Моделирование и анализ информационных систем

Анализ мультимодальных данных в распознавании эмоций

Полный текст:

Аннотация

Ключевые слова

Об авторах

Список литературы

Рецензия

Для цитирования:

For citation:

Использование куки-файлов