Preview

Modeling and Analysis of Information Systems

Advanced search

The impact of different prompt types on the quality of automatic assessment of student answers by artificial intelligence models

https://doi.org/10.18255/1818-1015-2025-4-396-416

Abstract

Artificial intelligence (AI) models can fully or partially automate the assessment of student assignments, making assessment methods more accurate and objective. The performance of such models depends not only on the underlying algorithms and training data but also on the effectiveness of the queries they formulate. The aim of the work is to investigate the possibility of using open artificial intelligence models to evaluate students' answers for compliance with the teacher's standard answer, as well as to increase the quality of problem solving using prompt engineering. The method for determining this quality was selected by statistical characteristics of the results of classifying answer texts into four categories: correct, partially correct, incorrect, inappropriate to the topic of the question, by GAI models using the following prompt options: simple prompt, role-playing prompt, "chain of thoughts" prompt, prompt generated by artificial intelligence. Models available for open use were selected for the study: ChatGPT o3-mini, DeepSeek V3, Mistral-Small-3.1-24B-Instruct-2503-IQ4_XS and Grok 3. Testing of the models was carried out on a corpus of student texts collected by teachers of Demidov Yaroslavl State University, from 507 answers to 8 questions. The best quality of answer assessment was shown by the ChatGPT o3-mini model. with the prompt it generated. The accuracy rate was 0.82, the mean square error (MSE) was 0.2, and the F-score reached 0.8, demonstrating the potential of GAI as not only an assessment tool but also a means of automatically generating instructions. The Fleiss coefficient was used to assess the consistency of the model's responses across 10 identical queries. For this model-prompt pair, it ranged from 0.48 for complex questions to 0.69 for simple questions.

About the Authors

Ivan A. Meshcheryakov
P.G. Demidov Yaroslavl State University
Russian Federation


Nadezhda S. Lagutina
P.G. Demidov Yaroslavl State University
Russian Federation


References

1. S. Khan, L. Blessing, and Y. Ndiaye, “Artificial intelligence for competency assessment in design education: a review of literature,” in International Conference on Research into Design, 2023, pp. 1047–1058, doi: 10.1007/978-981-99-0428-0_85.

2. J. Lu, B. K. Balasubramanian, M. Joy, and Q. Xu, “Survey and Analysis for the Challenges in Computer Science to the Automation of Grading Systems,” ACM Computing Surveys, p. 3748521, 2025, doi: 10.1145/3748521.

3. L. Yan et al., “Practical and ethical challenges of large language models in education: A systematic scoping review,” British Journal of Educational Technology, vol. 55, no. 1, pp. 90–112, 2024, doi: doi.org/10.1111/bjet.13370.

4. L. Kaldaras, H. O. Akaeze, and M. D. Reckase, “Developing valid assessments in the era of generative artificial intelligence,” in Frontiers in Education, 2024, vol. 9, p. 1399377, doi: 10.3389/feduc.2024.1399377.

5. L. S. Lo, “The art and science of prompt engineering: a new literacy in the information age,” Internet Reference Services Quarterly, vol. 27, no. 4, pp. 203–210, 2023, doi: 10.1080/10875301.2023.2227621.

6. J. Park and S. Choo, “Generative AI prompt engineering for educators: Practical strategies,” Journal of Special Education Technology, p. 01626434241298954, 2024, doi: 10.1177/01626434241298954.

7. C. H. Leung, “Promoting Optimal Learning with ChatGPT: A Comprehensive Exploration of Prompt Engineering in Education,” Asian Journal of Contemporary Education, vol. 8, no. 2, pp. 104–114, 2024, doi: 10.55493/5052.v8i2.5101.

8. L. S. Lo, “The CLEAR path: A framework for enhancing information literacy through prompt engineering,” The Journal of Academic Librarianship, vol. 49, no. 4, p. 102720, 2023, doi: 10.1016/j.acalib.2023.102720.

9. C. Jin et al., “Apeer: Automatic prompt engineering enhances large language model reranking,” in Companion Proceedings of the ACM on Web Conference 2025, 2025, pp. 2494–2502, doi: 10.1145/3701716.3717574.

10. A. Gilson et al., “How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment,” JMIR Medical Education, vol. 9, no. 1, p. e45312, 2023, doi: 10.2196/45312.

11. L. Morjaria et al., “Examining the efficacy of ChatGPT in marking short-answer assessments in an undergraduate medical program,” International Medical Education, vol. 3, no. 1, pp. 32–43, 2024, doi: 10.3390/ime3010004.

12. T. Jade and A. Yartsev, “ChatGPT for automated grading of short answer questions in mechanical ventilation.” 2025, doi: 10.48550/arXiv.2505.04645.

13. L. Henrickson and A. Mero no-Pe nuela, “Prompting meaning: a hermeneutic approach to optimising prompt engineering with ChatGPT,” AI & SOCIETY, vol. 40, no. 2, pp. 903–918, 2025, doi: 10.1007/s00146-023-01752-8.

14. G. Kortemeyer, “Performance of the pre-trained large language model GPT-4 on automated short answer grading,” Discover Artificial Intelligence, vol. 4, no. 1, p. 47, 2024, doi: 10.1007/s44163-024-00147-y.

15. J. Flod'en, “Grading exams using large language models: A comparison between human and AI grading of exams in higher education using ChatGPT,” British Educational Research Journal, vol. 51, no. 1, pp. 201–224, 2025, doi: 10.1002/berj.4069.

16. A. V. Rezaev and N. D. Tregubova, “ChatGPT and AI in the Universities: An Introduction to the Near Future,” Higher Education in Russia, vol. 32, no. 6, pp. 19–37, 2023, doi: 10.31992/0869-3617-2023-32-6-19-37.

17. P. A. A., “Potentials of integrating generative artificial intelligence technologies into formative assessment processes in higher education,” Vestnik Majkopskogo Gosudarstvennogo Tehnologiceskogo Universiteta, vol. 16, no. 2, pp. 98–109, 2024, doi: 10.47370/2078-1024-2024-16-2-98-109.

18. A. Kong et al., “Better Zero-Shot Reasoning with Role-Play Prompting,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 4099–4113, doi: 10.18653/v1/2024.naacl-long.228.

19. J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.

20. B. Wang et al., “Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 2717–2739.

21. G. Chochlakis, N. M. Pandiyan, K. Lerman, and S. Narayanan, “Larger language models don’t care how you think: Why chain-of-thought prompting fails in subjective tasks,” in Proceedings of the 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing, 2025, pp. 1–5.

22. J. Cheng et al., “Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation.” 2025.

23. Z. Li, Y. Du, J. Hu, X. Wan, and A. Gao, “Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs.” 2024.

24. P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models: Techniques and applications.” 2024.

25. X. Wang and D. Zhou, “Chain-of-thought reasoning without prompting,” Advances in Neural Information Processing Systems, vol. 37, pp. 66383–66409, 2024.


Review

For citations:


Meshcheryakov I.A., Lagutina N.S. The impact of different prompt types on the quality of automatic assessment of student answers by artificial intelligence models. Modeling and Analysis of Information Systems. 2025;32(4):396-416. (In Russ.) https://doi.org/10.18255/1818-1015-2025-4-396-416

Views: 75


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)