Обзор моделей автоматической оценки сходства ответа учащегося с эталонным ответом
https://doi.org/10.18255/1818-1015-2025-1-42-65
Аннотация
Разработка систем автоматического оценивания является актуальной задачей, призванной упростить рутинный труд учителя и ускорить обратную связь для учащегося. Обзор посвящён исследованиям в области автоматической оценки ответов учащихся на основе эталонного ответа учителя. Авторы работы проанализировали модели текстов, применяемые для задач автоматической оценки коротких ответов (ASAG) и автоматизированной оценки эссе (AES). Также принималось во внимание несколько подходов для задачи определения близости текстов, так как она является аналогичной задачей, и методы её решения могут быть полезны и для анализа ответов студентов. Модели текста можно разделить на несколько больших категорий. Первая — это лингвистические модели, основанные на разнообразных стилометрических характеристиках, как простых вроде мешка слов и n-грамм, так и сложных вроде синтаксических и семантических. Ко второй категории авторы отнесли нейросетевые модели, основанные на разнообразных эмбеддингах. В ней выделяются большие языковые модели как универсальные, популярные и качественные методы моделирования. Третья категория включает в себя комбинированные модели, которые объединяют в себе как лингвистические характеристики, так и нейросетевые эмбеддинги. Сравнение современных исследований по моделям, методам и метрикам качества показало, что тренды в предметной области совпадают с трендами в компьютерной лингвистике в целом. Большое количество авторов выбирают для решения своих задач большие языковые модели, но и стандартные характеристики остаются востребованными. Универсальный подход выделить нельзя, каждая подзадача требует отдельного выбора метода и настройки его параметров. Комбинированные и ансамблевые подходы позволяют достичь более высокого качества, чем остальные методы. В подавляющем большинстве работ исследуются тексты на английском языке. Однако успешные результаты для национальных языков также встречаются. Можно сделать вывод, что разработка и адаптация методов оценки ответов студентов на национальных языках является актуальной и перспективной задачей.
Ключевые слова
MSC2020: 68T50
Об авторах
Надежда Станиславовна ЛагутинаРоссия
Ксения Владимировна Лагутина
Россия
Список литературы
1. R. Gao, H. E. Merzdorf, S. Anwar, M. C. Hipwell, and A. R. Srinivasa, “Automatic assessment of text-based responses in post-secondary education: A systematic review,” Computers and Education: Artificial Intelligence, vol. 6, p. 100206, 2024, doi: 10.1016/j.caeai.2024.100206.
2. N. A. Medvedeva, N. G. Malkov, and M. L. Prozorova, “Professional and Public Accreditation as an Assessment of Agricultural Educational Program Quality in Russia,” Asian Journal of University Education, vol. 17, no. 1, pp. 100–111, 2021, doi: 10.24191/ajue.v17i1.12611.
3. S. Bonthu, S. Rama Sree, and M. H. M. Krishna Prasad, “Automated short answer grading using deep learning: A survey,” in Proceedings of the 5th International Cross-Domain Conference Machine Learning and Knowledge Extraction, 2021, pp. 61–78, doi: 10.1007/978-3-030-84060-0_5.
4. D. Ramesh and S. K. Sanampudi, “An automated essay scoring systems: a systematic literature review,” Artificial Intelligence Review, vol. 55, no. 3, pp. 2495–2527, 2022, doi: 10.1007/s10462-021-10068-2.
5. S. Burrows, I. Gurevych, and B. Stein, “The eras and trends of automatic short answer grading,” International journal of artificial intelligence in education, vol. 25, pp. 60–117, 2015, doi: 10.1007/s40593-014-0026-8.
6. L. Parra G and X. Calero S, “Automated writing evaluation tools in the improvement of the writing skill.,” International Journal of Instruction, vol. 12, no. 2, pp. 209–226, 2019, doi: 10.29333/iji.2019.12214a.
7. A. Mizumoto and M. Eguchi, “Exploring the potential of using an AI language model for automated essay scoring,” Research Methods in Applied Linguistics, vol. 2, no. 2, p. 100050, 2023, doi: 10.1016/j.rmal.2023.100050.
8. S. Hsu, T. W. Li, Z. Zhang, M. Fowler, C. Zilles, and K. Karahalios, “Attitudes surrounding an imperfect AI autograder,” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–15, doi: 10.1145/3411764.3445424.
9. N. S"uzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, “Automatic short answer grading and feedback using text mining methods,” Procedia Computer Science, vol. 169, pp. 726–743, 2020, doi: 10.1016/j.procs.2020.02.171.
10. A. Y. Kerbeneva, “Razrabotka procedur intellektual'noj ocenki znanij na osnove semanticheskoj obrabotki otvetov pol'zovatelej na estestvennom yazyke,” in Informacionnye sistemy i tekhnologii IST-2021, 2021, pp. 187–191.
11. A. A. Poguda and J. Tape, “Development of an Algorithm and Module for Automatic Evaluation of Student Papers Based on Semantic Analysis of Text,” Open Education, vol. 28, no. 3, pp. 46–55, 2024, doi: 10.21686/1818-4243-2024-3-46-55.
12. F. S. Pribadi, A. E. Permanasari, and T. B. Adji, “Short answer scoring system using automatic reference answer generation and geometric average normalized-longest common subsequence (GAN-LCS),” Education and Information Technologies, vol. 23, pp. 2855–2866, 2018, doi: 10.1007/s10639-018-9745-z.
13. A.-B. Lekshmi-Narayanan and P. Brusilvosky, “Evaluating Correctness of Student Code Explanations: Challenges and Solutions,” in Proceedings of 8th Educational Data Mining in Computer Science Education Workshop, 2024, p. 1079.
14. M. Meccawy, A. A. Bayazed, B. Al-Abdullah, and H. Algamdi, “Automatic Essay Scoring for Arabic Short Answer Questions using Text Mining Techniques,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, 2023.
15. H. A. Abdeljaber, “Automatic Arabic short answers scoring using longest common subsequence and Arabic WordNet,” IEEE Access, vol. 9, pp. 76433–76445, 2021, doi: 10.1109/ACCESS.2021.3082408.
16. I. G. P. A. Buditjahjanto, M. Idhom, M. Munoto, and M. Samani, “An Automated Essay Scoring Based on Neural Networks to Predict and Classify Competence of Examinees in Community Academy.,” TEM Journal, vol. 11, no. 4, 2022, doi: 10.18421/TEM114-34.
17. O. B. Mishunin, A. P. Savinov, and D. I. Firstov, “Sostoyanie i uroven' razrabotok sistem avtomaticheskoj ocenki svobodnyh otvetov na estestvennom yazyke,” Modern High Technologies, no. 1, pp. 38–44, 2016.
18. N. S. Lagutina, K. V. Lagutina, and V. N. Kopnin, “Automatic determination of semantic similarity of student answers with the standard one using modern models,” Modeling and Analysis of Information Systems, vol. 31, no. 2, pp. 194–205, 2024, doi: 10.18255/1818-1015-2024-2-194-205.
19. A. V. Kryukova, “Computing semantic similarity of Russian texts by means of DKPro Similarity tool,” in Trudy ob"edinyonnoj nauchnoj konferencii «Internet i sovremennoe obshchestvo», 2017, no. 1, pp. 87–97, doi: 10.17586/2541-9781-2017-1-87-97.
20. D. B"ar, T. Zesch, and I. Gurevych, “Dkpro similarity: An open source framework for text similarity,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2013, pp. 121–126.
21. V. Nandini and P. Uma Maheswari, “Automatic assessment of descriptive answers in online examination system using semantic relational features,” The Journal of Supercomputing, vol. 76, no. 6, pp. 4430–4448, 2020, doi: 10.1007/s11227-018-2381-y.
22. A. G. Leonov, N. S. Martynov, K. A. Mashchenko, A. A. Kholkina, and A. V. Shlyakhov, “Automation of semantic analysis for textual responses of students in a digital educational platform,” Software & Systems, vol. 37, no. 3, pp. 440–452, 2024, doi: 10.15827/0236-235X.142.440-452.
23. D. D. Zafievsky, N. S. Lagutina, O. A. Melnikova, and A. Y. Poletaev, “Text Model for the Automatic Scoring of Business Letter Writing,” Automatic Control and Computer Sciences, vol. 57, no. 7, pp. 828–840, 2023, doi: 10.3103/S0146411623070167.
24. N. A. Prokopyev, “Automatic grading of answers in knowledge control for guillemotleftDefinitionguillemotright and guillemotleftDescriptionguillemotright question types,” Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, vol. 166, no. 4, pp. 580–593, 2024, doi: 10.26907/2541-7746.2024.4.580-593.
25. A. A. Khoroshilov, A. V. Kan, E. A. Evdokimova, and S. G. Pitskhelauri, “Establishing Similarities between Text Documents,” Modelling and Data Analysis, vol. 13, no. 4, pp. 45–58, 2023, doi: 10.17759/mda.2023130403.
26. L. Ouahrani and D. Bennouar, “AR-ASAG an Arabic dataset for automatic short answer grading evaluation,” in Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020, pp. 2634–2643.
27. V. A. Kozhevnikov and O. Y. Sabinin, “System of automatic verification of answers to open questions in Russian,” St. Petersburg State Polytechnical University Journal. Computer Science. Telecommunications and Control Systems, vol. 11, no. 3, pp. 57–72, 2018, doi: 10.18721/JCSTCS.11306.
28. D. V. Gadasin, A. V. Shvedov, and I. S. Vakurin, “Opredelenie semanticheskoj blizosti tekstov s ispol'zovaniem algoritma sravneniya sushchnosti grafov,” REDS: Telekommunikacionnye ustrojstva i sistemy, vol. 12, no. 4, pp. 11–19, 2022.
29. N. Maharjan and V. Rus, “A concept map based assessment of free student answers in tutorial dialogues,” in Artificial Intelligence in Education: 20th International Conference, AIED, 2019, pp. 244–257, doi: 10.1007/978-3-030-23204-7_21.
30. I. Y. Kashirin, “Binarnye ierarhicheskie chisla dlya vychisleniya semanticheskoj blizosti predlozhenij estestvennogo yazyka,” Vestnik of RSREU, no. 86, pp. 110–120, 2023, doi: 10.21667/1995-4565-2023-86-110-121.
31. I. Y. Kashirin, “Ierarhicheskie chisla dlya proektirovaniya ICF-taksonomij iskusstvennogo intellekta,” Vestnik of RSREU, no. 71, pp. 71–82, 2020, doi: 10.21667/1995-4565-2020-71-71-82.
32. C. B. Minnegalieva, G. A. Sabitova, and A. M. Gayaliev, “Method of Pre-Assessment of Students' Answers Based on the Vector Model of Documents,” Russian Digital Libraries Journal, vol. 26, no. 3, pp. 324–339, 2023, doi: 10.26907/1562-5419-2023-26-3-324-339.
33. C. Minnegalieva, I. Kashapov, and O. Morozova, “Automated students' short answers grading using language models,” Russian Digital Libraries Journal, vol. 27, no. 3, pp. 278–293, 2024, doi: 10.26907/1562-5419-2024-27-3-278-293.
34. I. G. Ndukwe, C. E. Amadi, L. M. Nkomo, and B. K. Daniel, “Automatic Grading System Using Sentence-BERT Network,” in Artificial Intelligence in Education, 2020, pp. 224–227.
35. M. Fateen and T. Mine, “In-Context Meta-Learning vs. Semantic Score-Based Similarity: A Comparative Study in Arabic Short Answer Grading,” in Proceedings of ArabicNLP 2023, 2023, pp. 350–358, doi: 10.18653/v1/2023.arabicnlp-1.28.
36. S. K. Gaddipati, D. Nair, and P. G. Pl"oger, “Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading.” 2020.
37. J. Schneider, R. Richner, and M. Riser, “Towards trustworthy autograding of short, multi-lingual, multi-type answers,” International Journal of Artificial Intelligence in Education, vol. 33, no. 1, pp. 88–118, 2023.
38. M. Hendre, P. Mukherjee, R. Preet, and M. Godse, “Efficacy of deep neural embeddings based semantic similarity in automatic essay evaluation,” International Journal of Computing and Digital Systems, vol. 9, pp. 1–11, 2020.
39. A. Doewes, A. Saxena, Y. Pei, and M. Pechenizkiy, “Individual Fairness Evaluation for Automated Essay Scoring System,” in Proceedings of the 15th International Conference on Educational Data Mining, 2022, pp. 206–216, doi: 10.5281/zenodo.685315.
40. Q. Wang, “The use of semantic similarity tools in automated content scoring of fact-based essays written by EFL learners,” Education and Information Technologies, vol. 27, no. 9, pp. 13021–13049, 2022.
41. B. F. Dhini, A. S. Girsang, U. U. Sufandi, and H. Kurniawati, “Automatic essay scoring for discussion forum in online learning based on semantic and keyword similarities,” Asian Association of Open Universities Journal, vol. 18, no. 3, pp. 262–278, 2023, doi: 10.1108/aaouj-02-2023-0027.
42. S. V. Bogolepova and M. G. Zharkova, “Researching the potential of generative language models for essay evaluation and feedback provision,” Domestic and Foreign Pedagogy, vol. 1, no. 5, pp. 123–137, 2024, doi: 10.24412/2224–0772–2024–101–123–137.
43. M. N. Evstigneev, “Thematic control and criteria-based assessment of foreign language writing skills using artificial intelligence technologies,” Tambov University Review. Series: Humanities, vol. 29, no. 4, pp. 913–926, 2024, doi: 10.20310/1810-0201-2024-29-4-913-926.
44. V. N. Kopnin and N. S. Lagutina, “Opredelenie semanticheskogo skhodstva tekstov s ispol'zovaniem yazykovyh modelej na osnove transformerov,” in Matematicheskoe i informacionnoe modelirovanie : materialy Vserossijskoj konferencii molodyh uchenyh, Tyumen', 2024, no. 22, pp. 143–146.
45. W. H. Gomaa and A. A. Fahmy, “Ans2vec: A scoring system for short answers,” in The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) 4, 2020, pp. 586–595.
46. R. Weegar and P. Idestam-Almquist, “Reducing workload in short answer grading using machine learning,” International Journal of Artificial Intelligence in Education, vol. 34, no. 2, pp. 247–273, 2024, doi: 10.1007/s40593-022-00322-1.
47. Z. Li, Y. Tomar, and R. J. Passonneau, “A semantic feature-wise transformation relation network for automatic short answer grading,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6030–6040, doi: 10.18653/v1/2021.emnlp-main.487.
48. H. Tan, C. Wang, Q. Duan, Y. Lu, H. Zhang, and R. Li, “Automatic short answer grading by encoding student responses via a graph convolutional network,” Interactive Learning Environments, vol. 31, no. 3, pp. 1636–1650, 2023, doi: 10.1080/10494820.2020.1855207.
49. J. Xie, K. Cai, L. Kong, J. Zhou, and W. Qu, “Automated essay scoring via pairwise contrastive regression,” in Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 2724–2733.
50. J. Garg, J. Papreja, K. Apurva, and G. Jain, “Domain-specific hybrid BERT based system for automatic short answer grading,” in 2022 2nd International Conference on Intelligent Technologies (CONIT), 2022, pp. 1–6, doi: 10.1109/CONIT55038.2022.9847754.
51. D. Witschard, I. Jusufi, R. M. Martins, K. Kucher, and A. Kerren, “Interactive optimization of embedding-based text similarity calculations,” Information Visualization, vol. 21, no. 4, pp. 335–353, 2022, doi: 10.1177/14738716221114372.
52. C. N. Tulu, O. Ozkaya, and U. Orhan, “Automatic short answer grading with SemSpace sense vectors and MaLSTM,” IEEE Access, vol. 9, pp. 19270–19280, 2021.
53. Y. Zhang, C. Lin, and M. Chi, “Going deeper: Automatic short-answer grading by combining student and question models,” User modeling and user-adapted interaction, vol. 30, no. 1, pp. 51–80, 2020.
54. F. F. Lubis et al., “Automated Short-Answer Grading using Semantic Similarity based on Word Embedding,” International Journal of Technology, vol. 12, no. 3, pp. 571–581, 2021, doi: 10.14716/ijtech.v12i3.4651.
55. D. Shashavali et al., “Sentence similarity techniques for short vs variable length text using word embeddings,” Computaci'on y Sistemas, vol. 23, no. 3, pp. 999–1004, 2019, doi: 10.13053/cys-23-3-3273.
56. A. Prabhudesai and T. N. B. Duong, “Automatic short answer grading using Siamese bidirectional LSTM based regression,” in Proceedings of the IEEE International Conference on Engineering, Technology and Education (TALE), 2019, pp. 1–6, doi: 10.1109/TALE48000.2019.9226026.
57. P. S. Lakshmi, J. B. Simha, and R. Ranjan, “Empowering Educators: Automated Short Answer Grading with Inconsistency Check and Feedback Integration using Machine Learning,” SN Computer Science, vol. 5, no. 5, p. 653, 2024, doi: 10.1007/s42979-024-02954-7.
58. B. Hassan, S. E. Abdelrahman, R. Bahgat, and I. Farag, “UESTS: An unsupervised ensemble semantic textual similarity method,” IEEE Access, vol. 7, pp. 85462–85482, 2019, doi: 10.1109/ACCESS.2019.2925006.
59. E. Del Gobbo, A. Guarino, B. Cafarelli, and L. Grilli, “GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation,” Knowledge and Information Systems, vol. 65, no. 10, pp. 4295–4334, 2023.
60. J. Zhao, Y. Li, and W. Feng, “Investigating the Validity and Reliability of a Comprehensive Essay Evaluation Model of Integrating Manual Feedback and Intelligent Assistance,” International Journal of Emerging Technologies in Learning, vol. 18, no. 4, 2023, doi: 10.3991/ijet.v18i04.38241.
61. M. Faseeh et al., “Hybrid Approach to Automated Essay Scoring: Integrating Deep Learning Embeddings with Handcrafted Linguistic Features for Improved Accuracy,” Mathematics, vol. 12, no. 21, p. 3416, 2024, doi: 10.3390/math12213416.
62. I. Gagliardi and M. T. Artese, “Ensemble-Based Short Text Similarity: An Easy Approach for Multilingual Datasets Using Transformers and WordNet in Real-World Scenarios,” Big Data and Cognitive Computing, vol. 7, no. 4, p. 158, 2023, doi: 10.3390/bdcc7040158.
63. M. Beseiso and S. Alzahrani, “An empirical analysis of BERT embedding for automated essay scoring,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 10, 2020, doi: 10.14569/ijacsa.2020.0111027.
Рецензия
Для цитирования:
Лагутина Н.С., Лагутина К.В. Обзор моделей автоматической оценки сходства ответа учащегося с эталонным ответом. Моделирование и анализ информационных систем. 2025;32(1):42-65. https://doi.org/10.18255/1818-1015-2025-1-42-65
For citation:
Lagutina N.S., Lagutina K.V. A survey of models for automatic assessment of similarity of student's answer to the reference answer. Modeling and Analysis of Information Systems. 2025;32(1):42-65. (In Russ.) https://doi.org/10.18255/1818-1015-2025-1-42-65