Preview

Modeling and Analysis of Information Systems

Advanced search

Hierarchical multi-task learning methodology for ERNIE-3-Type neural networks in Russian-language text analysis and generation

https://doi.org/10.18255/1818-1015-2025-3-282-297

Abstract

The article addresses the development of a methodology for hierarchical multi-task learning of neural networks, inspired by the ERNIE 3 architecture, and its experimental validation using the FRED-T5 model for Russian-language text analysis and generation tasks. Hierarchical multi-task learning represents a promising approach for creating universal language models capable of efficiently solving a variety of natural language processing (NLP) tasks. The proposed methodology integrates specialized encoder blocks for natural language understanding (NLU) tasks with a shared decoder for natural language generation (NLG) tasks, thus improving model performance and reducing computational costs. This paper presents a comparative analysis of the developed methodology’s performance using the open Russian SuperGLUE benchmark and the pre-trained Russian-language model FRED-T5-1.7B. Experimental results confirm a significant improvement in model quality in both zero-shot and few-shot scenarios compared to the baseline configuration. Additionally, the paper explores practical applications of the developed approach in real NLP tasks and provides recommendations for further advancement of the methodology and its integration into applied systems for processing Russian-language texts.

About the Authors

Ekaterina V. Totmina
Novosibirsk National Research State University
Russian Federation


Ivan Bondarenko
Novosibirsk National Research State University
Russian Federation


Aleksandr V. Seredkin
Novosibirsk National Research State University
Russian Federation


References

1. T. Brown et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.

2. H. Touvron et al., “LLaMA: Open and Efficient Foundation Language Models.” 2023.

3. A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.

4. C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.

5. Y. Zhu et al., “Can Large Language Models Understand Context?,” in Findings of the Association for Computational Linguistics: EACL 2024, 2024, pp. 2004–2018.

6. D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends, challenges,” Multimedia Tools, Applications, vol. 82, pp. 3713–3744, 2023, doi: 10.1007/s11042-022-13428-4.

7. D. Hupkes et al., “A taxonomy, review of generalization research in NLP,” Nature Machine Intelligence, vol. 5, pp. 1161–1174, 2023, doi: 10.1038/s42256-023-00729-y.

8. P. P. Ray, “ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope,” Internet of Things and Cyber-Physical Systems, vol. 3, pp. 121–154, 2023.

9. Y. Yang and Z. Xue, “Training Heterogeneous Features in Sequence to Sequence Tasks: Latent Enhanced Multi-filter Seq2Seq Model,” in Intelligent Systems, Applications, 2023, pp. 103–117.

10. Y. Sun et al., “ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation.” 2021.

11. D. Zmitrovich et al., “A Family of Pretrained Transformer Language Models for Russian,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024, pp. 507–524.

12. M. Song and Y. Zhao, “Enhance RNNLMs with Hierarchical Multi-Task Learning for ASR,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6102–6106.

13. A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 5998–6008.

14. V. Sanh et al., “Multitask Prompted Training Enables Zero-Shot Task Generalization.” 2022.

15. Y. Tay et al., “UL2: Unifying Language Learning Paradigms.” 2023.

16. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum Learning,” in Proceedings of the 26th International Conference on Machine Learning, 2009, pp. 41–48.

17. I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3994–4003.

18. Y. Sun et al., “ERNIE 2.0: A continual pre-training framework for language understanding,” in Proceedings of the AAAI conference on Artificial Intelligence, 2020, vol. 34, no. 05, pp. 8968–8975.

19. L. Xue et al., “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 483–498, doi: 10.18653/v1/2021.naacl-main.41.

20. S. Wang et al., “ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation.” 2021.

21. J. Pfeiffer, A. Kamath, A. R"uckl'e, K. Cho, and I. Gurevych, “AdapterFusion: Non-Destructive Task Composition for Transfer Learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 487–503, doi: 10.18653/v1/2021.eacl-main.39.

22. W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,” Journal of Machine Learning Research, vol. 23, no. 120, pp. 1–39, 2022.

23. S. Longpre et al., “The flan collection: Designing data and methods for effective instruction tuning,” in Proceedings of the International Conference on Machine Learning, 2023, pp. 22631–22648.

24. N. Houlsby et al., “Parameter-efficient transfer learning for NLP,” in Proceedings of the International Conference on Machine Learning, 2019, pp. 2790–2799.

25. H. A. A. Al-Khamees, M. E. Manaa, Z. H. Obaid, and N. A. Mohammedali, “Implementing Cyclical Learning Rates in Deep Learning Models for Data Classification,” in Proceedings of the International Conference on Forthcoming Networks and Sustainability in the AIoT Era, 2024, pp. 205–215.

26. A. Koloskova, H. Hendrikx, and S. U. Stich, “Revisiting gradient clipping: Stochastic bias and tight convergence guarantees,” in Proceedings of the International Conference on Machine Learning, 2023, pp. 17343–17363.


Review

For citations:


Totmina E.V., Bondarenko I., Seredkin A.V. Hierarchical multi-task learning methodology for ERNIE-3-Type neural networks in Russian-language text analysis and generation. Modeling and Analysis of Information Systems. 2025;32(3):282-297. (In Russ.) https://doi.org/10.18255/1818-1015-2025-3-282-297

Views: 5


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)