Vol 29, No 4 (2022)
View or download the full issue
PDF (Russian)
Algorithms
366-371 350
Abstract
The question is considered - which graphs are isomorphic to the reachability graphs of Petri nets. Reachability graphs, or sets of achievable states, represent sets of all possible different network states resulting from a given initial state s0 by a finite chain of permissible transitions. They have a natural structure of an oriented graph with a dedicated initial state, all other states of which are reachable from the initial one, taking into account orientation. At the same time, if the network transitions are marked, the reachability graphs also receive the corresponding marks of all arcs. At the same time, the concept of isomorphism of marked graphs arises, but this publication deals only with issues for networks without markings. Even for this simpler case, some questions remain open. The paper proves that any finite directed graph is modeled by a suitable Petri net, that is, it is isomorphic to the reachability graph of the network. For infinite graphs, examples of non-modeled graphs are given.
372-387 367
Abstract
In this paper, we study undirected multiple graphs of any natural multiplicity к > 1. There are edges of three types: ordinary edges, multiple edges and multi-edges. Each edge of the last two types is a union of к linked edges, which connect 2 or (к + 1) vertices, correspondingly. The linked edges should be used simultaneously. If a vertex is incident to a multiple edge, it can be also incident to other multiple edges and it can be the common end of к linked edges of some multi-edge. If a vertex is the common end of some multi-edge, it cannot be the common end of another multi-edge. Divisible multiple graphs are characterized by a possibility to divide the graph into к parts, which are adjusted on the linked edges and which have no common edges. Each part is an ordinary graph. As for an ordinary graph, we can define the integer function of the length of an edge for a multiple graph and set the problem of the shortest path joining two vertices. Any multiple path is a union of к ordinary paths, which are adjusted on the linked edges of all multiple and multi-edges. In the article, we show that the problem of the shortest path is polynomial for a divisible multiple graph. The corresponding polynomial algorithm is formulated. Also we suggest the modification of the algorithm for the case of an arbitrary multiple graph. This modification has an exponential complexity in the parameter к.
Theory of Data
286-314 616
Abstract
Modern educational process involves the use of electronic educational environments. These are special information systems that are both a means for storing educational materials and a tool for conducting tests, collecting homework, keeping a grade book, and working together. Such environments produce a large amount of data containing the recorded behavior of students and teachers within the educational process. This paper proposes an approach that allows one to analyze such data and discover typical student trajectories that lead to successful or unsuccessful learning outcomes. It is shown how process mining can be used to build models of the educational process based on the available data. We also show how you can evaluate the extent to which the synthesized model reflects the actual behavior of the system recorded in event logs. The paper contains not only a description of the proposed approach, but also a case study with its application to a real data set for an undergraduate educational program. It is clearly shown how, using our approach, it is possible to find out what factors lead to the formation of successful and unsuccessful student trajectories. The bottlenecks of the educational process were identified, as well as errors in the data, indicating the incorrect operation of the system. As a result of the analysis, points of special attention for administrators of the educational program were identified, as well as some signal events, the appearance of which in a student’s individual trajectory can be an alarm. The application of the approach involves the use of free open source software, which further facilitates its deployment in a variety of educational organizations.
Anna Valerevna Glazkova,
Olga Vladimirovna Zakharova,
Anton Viktorovich Zakharov,
Natalya Nikolayevna Moskvina,
Timur Ruslanovich Enikeev,
Arseniy Nikolaevich Hodyrev,
Vsevolod Konstantinovich Borovinskiy,
Irina Nikolayevna Pupysheva
316-332 1027
Abstract
The paper is devoted to the task of searching for mentions of green practices in social media texts. The relevance of this task is dictated by the need to expand existing knowledge about the use of green practices in society and the spread of existing green practices. This paper uses a text corpus consisting of the texts published on the environmental communities of the VKontakte social network. The corpus is equipped with an expert markup of the mention of nine types of green practices. As part of this work, a semi-automatic approach is proposed to the collection of additional texts to reduce the class imbalance in the corpus. The approach includes the following steps: detecting the most frequent words for each practice type; automatic collecting texts in social media that contain the detected frequent words; expert verification and filtering of collected texts. The four machine learning models are compared to find the mentions of green practices on the two variants of the corpus: original and augmented using the proposed approach. Among the listed models, the highest averaged F1-score (81.32%) was achieved by Conversational RuBERT fine-tuned on the augmented corpus. Conversational RuBERT model was chosen for the implementation of the application prototype. The main function of the prototype is to detect the presence of the mention of nine types of green practices in the text. The prototype is implemented in the form of the Telegram chatbot.
334-347 626
Abstract
The article investigates modern vector text models for solving the problem of genre classification of Russian-language texts. Models include ELMo embeddings, BERT language model with pre-training and a complex of numerical rhythm features based on lexico-grammatical features. The experiments were carried out on a corpus of 10,000 texts in five genres: novels, scientific articles, reviews, posts from the social network Vkontakte, news from OpenCorpora. Visualization and analysis of statistics for rhythm features made it possible to identify both the most diverse genres in terms of rhythm: novels and reviews, and the least ones: scientific articles. Subsequently, these genres were classified best with the help of rhythm features and the neural network-classifier LSTM. Clustering and classifying texts by genre using ELMo and BERT embeddings made it possible to separate one genre from another with a small number of errors. The multiclassification F-score reached 99%. The study confirms the efficiency of modern embeddings in the tasks of computational linguistics, and also allows to highlight the advantages and limitations of the complex of rhythm features on the material of genre classification.
Daniil Dmitrievich Zafievsky,
Nadezhda Stanislavona Lagutina,
Oksana Andreyevna Melnikova,
Anatoliy Yurievich Poletaev
348-365 444
Abstract
This study is aimed at building an automated model for business writing assessment, based on 14 rubrics that integrate EFL teacher assessment frameworks and identify expected performance against various criteria (including language, task fulfillment, content knowledge, register, format, and cohesion). We developed algorithms for determining the corresponding numerical features using methods and tools for automatic text analysis. The algorithms are based on a syntactic analysis with the use of dictionaries. The model performance was subsequently evaluated on a corpus of 20 teacher-assessed business letters. Heat maps and UMAP results represent comparison between teachers’ and automated score reports. Results showed no significant discrepancies between teachers’ and automated score reports, yet detected bias in teachers’ reports. Findings suggest that the developed model has proved to be an efficient tool for natural language processing with high interpretability of the results, the roadmap for further improvement and a valid and unbiased alternative to teachers’ assessment. The results may lay the groundwork for developing an automatic students’ language profile. Although the model was specifically designed for business letter assessment, it can be easily adapted for assessing other writing tasks, e.g. by replacing dictionaries.
ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)
ISSN 2313-5417 (Online)