Preview

Modeling and Analysis of Information Systems

Advanced search
Vol 31, No 2 (2024)
View or download the full issue PDF (Russian)

Computer System Organization

152-163 205
Abstract
Developing more accurate and adaptive methods for detecting malicious code is a critical challenge in the context of constantly evolving cybersecurity threats. This requires constant attention to new vulnerabilities and attack methods, as well as the search for innovative approaches to detecting and preventing cyber threats. The paper examines an algorithm for detecting the execution of malicious code in the process of a protected program. This algorithm is based on a previously proposed approach, when the legitimate execution of a protected program is described by a profile of differences in the return addresses of called functions, also called a distance profile. A concept has been introduced called positional distance, which is determined by the difference between the call numbers in the program trace. The main change was the ability to add to the profile the distances between the return addresses of not only neighboring functions, but also several previous ones with a given positional distance. In addition to modifying the detection algorithm, the work developed a tool for automating the construction of a distance profile and experimentally studied the dependence of the probability of false detection of an atypical distance on the training duration for four well-known browsers. Experiments confirm that with a slight increase in verification time, the number of atypical distances detected by the proposed algorithm can be significantly less than the number of atypical distances detected by the basic algorithm. However, it should be noted that the effect of the transition from the basic algorithm to the proposed one, as the results showed, depends on the characteristics of the specific program being protected. The study highlights the importance of continually improving malware detection techniques to adapt them to changing threats and software operating conditions. As a result, this will ensure more reliable protection of information and systems from cyber attacks and other cyber threats.

Theory of Software

120-141 309
Abstract
The article continues the series of works on development and verification of control programs based on LTL-specifications of a special type. Previously, it was proposed a declarative LTL-specification, which allows describing the behavior of control programs and building program code based on it in the imperative ST-language for programmable logic controllers. The LTL-specification can be directly verified for compliance with specified temporal properties by the model checking method using the nuXmv symbolic verification tool. In general, it is not required translating LTL-formulas of the specification into another formalism — an SMV-specification (code in the input language of the nuXmv tool).

The purpose of this work is to explore alternative ways of representing a program behavior model corresponding to the declarative LTL-specification during its verification within the nuXmv tool.
In the article, we transform the declarative LTL-specification into various SMV-specifications with accompanying changes of formulation of the verification problem, what leads to a significant reduction in time costs when checking temporal properties by using the nuXmv tool. The acceleration of verification is due to the reduction of the state space of a model being verified. The SMV-specifications obtained as a result of the proposed transformations specify identical or bisimulationally equivalent transition systems. It is ensuring the same verification results when replacing one SMV-specification with another.

Artificial Intelligence

182-193 908
Abstract
The availability of unmanned aerial vehicles (UAVs) has led to a significant increase in the number of offenses involving their use. This makes the development of UAV detection systems relevant. Solutions based on deep neural networks show the best results in detecting UAVs on video. This article presents a study of various neural network detectors and focuses on identifying objects as small as possible, up to the size of $4\times4$ and even $3\times3$ pixels. The work investigates architectures SSD (VGG16) and YOLOv3 and it's modifications. Precision and recall metrics are calculated separately for different intervals of the object areas. The best result have been shown by YOLOv3 model with bbox parameters chosen as the result of object sizes clustering. Small ($3\times3$ px) drones have been successfully identified with 76% precision and a very small recall of 26%. For objects between 10 and 20 pixels in area, the recall is 64% with an accuracy of 75%. For objects with an area more than 20px the recall is about 90%, the precision is 89%, and the F1 score is 90%. These results show that it is possible to recognize even $4\times4$ pixel drones, which can be used in video surveillance systems.
194-205 324
Abstract
The paper presents the results of a study of modern text models in order to identify, on their basis, the semantic similarity of English-language texts. The task of determining semantic similarity of texts is an important component of many areas of natural language processing: machine translation, information retrieval, question and answer systems, artificial intelligence in education. The authors solved the problem of classifying the proximity of student answers to the teacher’s standard answer. The neural network language models BERT and GPT, previously used to determine the semantic similarity of texts, the new neural network model Mamba, as well as stylometric features of the text were chosen for the study. Experiments were carried out with two text corpora: the Text Similarity corpus from open sources and the custom corpus, collected with the help of philologists. The quality of the problem solution was assessed by precision, recall, and F-measure. All neural network language models showed a similar F-measure quality of about 86% for the larger Text Similarity corpus and 50–56% for the custom corpus. A completely new result was the successful application of the Mamba model. However, the most interesting achievement was the use of vectors of stylometric features of the text, which showed 80% F-measure for the custom corpus and the same quality of problem solving as neural network models for another corpus.
206-220 293
Abstract
The text complexity assessment is an applied problem of current interest with potential application in the drafting of legal documents, editing textbooks, and selecting books for extracurricular reading. The methods for generating a feature vector when automatically assessing the text complexity are quite diverse. Early approaches relied on easily calculable quantities, such as the average length of a sentence or the average number of syllables per word. With the development of natural language processing algorithms, the space of used features is expanding. In this work, we examined three groups of features: 1) automatically generated keywords, 2) information about the features of morphemic word parsing, and 3) information about the diversity, branching, and depth of syntactic trees. The RuTermExtract algorithm was utilized to generate keywords, a convolutional neural network model was used to generate morphemic parses, and the Stanza model, trained on the SynTagRus corpus, was used to generate syntax trees. We conducted a comparison using four different machine learning algorithms and four annotated Russian-language text corpora. The corpora used differ both in the domain and markup paradigm, due to which the results obtained more objectively reflect the real relationship between the characteristics and the text complexity. The use of keywords performed worse on average than the use of topic markers obtained using latent Dirichlet allocation. In most situations, morphemic characteristics turned out to be more effective than previously described methods for assessing the lexical complexity of a text: the frequency of words and the occurrence of word-formation patterns. The use of an extensive set of syntactic features allowed, in most cases, to improve the quality of work of neural network models in comparison with the previously described set.

Computing Methodologies and Applications

164-181 256
Abstract
Traffic safety in railway transport requires regular inspection of the rail condition to identify and timely eliminate defects. Eddy current flaw detection is one of the popular methods of non-destructive rail testing. The data (defectograms) obtained from eddy current flaw detectors during rail testing are produced in large volume and therefore need in efficient automatic analysis. The analysis means the process of detecting on defectograms the presence of defective areas and identification of structural elements of the rail track, taking into account noise and possible interferences of various natures. The threshold noise level is found to isolate signals from defects and structural elements. Its value can be distorted by electromagnetic influences superimposed on the signals, which have pronounced low frequency and periodicity. This interferences raises the threshold noise level, complicating to detect useful signals. In this regard, there is a need to suppress the effects of the described type. Therefore, there is a need for described interferences reduction. In this paper spectral subtraction was used as a method for interference reduction on eddy current defectograms. The interference function was defined as the sum of the low-frequency harmonics of the Discrete Fourier Transform of the original signal. Then the interference-free signal can be found by subtracting of low-frequency range harmonics. The right boundary of this range was called the threshold harmonic frequency. It was found minimizing the distance of the signal's autocorrelation function and expected autocorrelation. For it two kinds were proposed: the noise autocorrelation and the reference autocorrelation. Both approaches allow to determine the threshold harmonic frequency so that periodic interferences will be best reduced. The based on the Gaussian noise autocorrelation method is somewhat universal for eddy current defectograms. Whereas the based on reference autocorrelation method depends on specific data and recording equipment. For the eddy current defectogram data under consideration, the most suitable threshold harmonic frequency was found. The described approaches to periodic low-frequency interference reduction, in addition to eddy current testing, can be successfully used for other areas.

Algorithms in Computer Science

142-151 192
Abstract
The individual-based model describes the dynamics of genetic diversity of a population scattered on a spatial continuum in the case of a finite number of individuals. During extinction events in a certain area, a portion of the population dies, after which new individuals with the genotype of the parent are born during recolonization event. In this paper we examine the model, as well as its modification, and derive properties related to population parameters. The study demonstrates that the lifespan of individuals follows an exponential distribution, allele probabilities remain constant over time, and the average heterozygosity, constrained by the number of individuals during extinction and recolonization, equals a similar quantity in the Moran model. The joint distribution of alleles is generalized for populations continuously scattered in space. Joint allele distribution and heterozygosity are computed through simulations.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)