Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)
https://doi.org/10.18255/1818-1015-2025-1-80-94
Abstract
About the Authors
Valentin Y. MamedovRussian Federation
Danil A. Kovalevsky
Russian Federation
Dmitry A. Morozov
Russian Federation
Stepan S. Stolyarov
Russian Federation
Sergey S. Ospichev
Russian Federation
References
1. M. Gusenbauer, “Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases,” Scientometrics, vol. 118, pp. 177–214, 2019, doi: 10.1007/s11192-018-2958-5.
2. M. Fire and C. Guestrin, “Over-optimization of academic publishing metrics: Observing Goodhart's Law in action,” GigaScience, vol. 8, p. giz053, Jun. 2019, doi: 10.1093/gigascience/giz053.
3. R. Martinez-Cruz, A. J. Lopez-Lopez, and J. Portela, “ChatGPT vs state-of-the-art models: a benchmarking study in keyphrase generation task,” Applied Intelligence, vol. 55, no. 1, pp. 1–25, 2025, doi: 10.1007/s10489-024-05901-4.
4. M. Song et al., “Is ChatGPT A Good Keyphrase Generator? A Preliminary Study.” 2023.
5. A. Glazkova, D. Morozov, and T. Garipov, “Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases.” 2024.
6. A. V. Glazkova, D. A. Morozov, M. S. Vorobeva, and A. A. Stupnikov, “Keyword Generation for Russian-Language Scientific Texts Using the mT5 Model,” Automatic Control and Computer Sciences, vol. 58, no. 7, pp. 995–1002, 2024, doi: 10.3103/S014641162470041X.
7. K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, p. 150, 2019, doi: 10.3390/info10040150.
8. S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep Learning--based Text Classification: A Comprehensive Review,” ACM Computing Surveys, vol. 54, no. 3, 2021, doi: 10.1145/3439726.
9. S. Garg and G. Ramakrishnan, “BAE: BERT-based Adversarial Examples for Text Classification,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. 6174–6181, doi: 10.18653/v1/2020.emnlp-main.498.
10. X. Sun et al., “Text Classification via Large Language Models,” in Findings of the Association for Computational Linguistics, 2023, pp. 8990–9005, doi: 10.18653/v1/2023.findings-emnlp.603.
11. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber, and L. E. Barnes, “HDLTex: Hierarchical Deep Learning for Text Classification,” in Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, 2017, pp. 364–371, doi: 10.1109/ICMLA.2017.0-134.
12. S. Strydom, A. M. Dreyer, and B. van der Merwe, “Automatic assignment of diagnosis codes to free-form text medical note,” Journal of Universal Computer Science, vol. 29, no. 4, pp. 349–373, 2023, doi: 10.3897/jucs.89923.
13. R. A. Stein, P. A. Jaques, and J. F. Valiati, “An analysis of hierarchical text classification using word embeddings,” Information Sciences, vol. 471, pp. 216–232, 2019, doi: 10.1016/j.ins.2018.09.001.
14. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” Journal of Machine Learning Research, vol. 5, pp. 361–397, 2004.
15. Y. Wang et al., “Towards Better Hierarchical Text Classification with Data Generation,” in Findings of the Association for Computational Linguistics, 2023, pp. 7722–7739, doi: 10.18653/v1/2023.findings-acl.489.
16. A. Zangari, M. Marcuzzo, M. Rizzo, L. Giudice, A. Albarelli, and A. Gasparetto, “Hierarchical Text Classification and Its Foundations: A Review of Current Research,” Electronics, vol. 13, no. 7, p. 1199, 2024, doi: 10.3390/electronics13071199.
17. M. Kragelj and M. Borstnar, “Automatic classification of older electronic texts into the Universal Decimal Classification-UDC,” Journal of Documentation, vol. 77, no. 3, pp. 755–776, 2021, doi: 10.1108/JD-06-2020-0092.
18. A. Y. Romanov, K. E. Lomotin, E. S. Kozlova, and A. L. Kolesnichenko, “Research of neural networks application efficiency in automatic scientific articles classification according to UDC,” in Proceedings of the International Siberian Conference on Control and Communications (SIBCON), 2016, pp. 1–5.
19. O. Nevzorova and D. Almukhametov, “Towards a Recommender System for the Choice of UDC Code for Mathematical Articles,” in Supplementary Proceedings of the XXIII International Conference on Data Analytics and Management in Data Intensive Domains, 2021, pp. 54–62.
20. L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, “Multilingual E5 Text Embeddings: A Technical Report.” 2024.
21. A. Snegirev, M. Tikhonova, A. Maksimova, A. Fenogenova, and A. Abramov, “The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design.” 2025.
22. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization.” 2017, [Online]. Available: https://arxiv.org/abs/1412.6980.
Review
For citations:
Mamedov V.Y., Kovalevsky D.A., Morozov D.A., Stolyarov S.S., Ospichev S.S. Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example). Modeling and Analysis of Information Systems. 2025;32(1):80-94. (In Russ.) https://doi.org/10.18255/1818-1015-2025-1-80-94