Preview

Modeling and Analysis of Information Systems

Advanced search
Vol 28, No 3 (2021)
View or download the full issue PDF (Russian)

Algorithms

220-233 666
Abstract

For a continuous nonlinear control system on a finite time interval with control constraints, where the right-hand side of the dynamics equations is linear in control and linearizable in the vicinity of the zero equilibrium position, we consider the construction of a feedback according to the Kalman algorithm. For this, the solution of an auxiliary optimal control problem with a quadratic functional is used by analogy with the SDRE approach.

Since this approach is used in the literature to find suboptimal synthesis in optimal control problems with a quadratic functional with formally linear systems, where all coefficient matrices in differential equations and criteria can contain state variables, then on a finite time interval it becomes necessary to solve a complicated matrix differential Riccati equations, with state-dependent coefficient matrices. This circumstance, due to the nonlinearity of the system, in comparison with the Kalman algorithm for linear-quadratic problems, significantly increases the number of calculations for obtaining the coefficients of the gain matrix in the feedback and for obtaining synthesis with a given accuracy. The proposed synthesis construction algorithm is constructed using the extension principle proposed by V. F. Krotov and developed by V. I. Gurman and allows not only to expand the scope of the SDRE approach to nonlinear control problems with control constraints in the form of closed inequalities, but also to propose a more efficient computational algorithm for finding the matrix of feedback gains in control problems on a finite interval. The article establishes the correctness of the application of the extension principle by introducing analogs of the Lagrange multipliers, depending on the state and time, and also derives a formula for the suboptimal value of the quality criterion. The presented theoretical results are illustrated by calculating suboptimal feedbacks in the problems of managing three-sector economic systems.

234-237 627
Abstract

This article describes an algorithm for obtaining a non-negative basic solution of a system of linear algebraic equations. This problem, which undoubtedly has an independent interest, in particular, is the most time-consuming part of the famous simplex method for solving linear programming problems.

Unlike the artificial basis Orden’s method used in the classical simplex method, the proposed algorithm does not attract artificial variables and economically consumes computational resources.

The algorithm consists of two stages, each of which is based on Gaussian exceptions. The first stage coincides with the main part of the Gaussian complete exclusion method, in which the matrix of the system is reduced to the form with an identity submatrix. The second stage is an iterative cycle, at each of the iterations of which, according to some rules, a resolving element is selected, and then a Gaussian elimination step is performed, preserving the matrix structure obtained at the first stage. The cycle ends either when the absence of non-negative solutions is established, or when one of them is found.

Two rules for choosing a resolving element are given. The more primitive of them allows for ambiguity of choice and does not exclude looping (but in very rare cases). Use of the second rule ensures that there is no looping.

Theory of Computing

238-249 610
Abstract

The paper considers methods for estimating stability using Lyapunov functions, which are used for nonlinear polynomial control systems. The apparatus of the Gro¨bner basis method is used to assess the stability of a dynamical system. A description of the Gro¨bner basis method is given. To apply the method, the canonical relations of the nonlinear system are approximated by polynomials of the components of the state and control vectors. To calculate the Gro¨bner basis, the Buchberger algorithm is used, which is implemented in symbolic computation programs for solving systems of nonlinear polynomial equations. The use of the Gro¨bner basis for finding solutions of a nonlinear system of polynomial equations is considered, similar to the application of the Gauss method for solving a system of linear equations. The equilibrium states of a nonlinear polynomial system are determined as solutions of a nonlinear system of polynomial equations. An example of determining the equilibrium states of a nonlinear polynomial system using the Gro¨bner basis method is given. An example of finding the critical points of a nonlinear polynomial system using the Gro¨bner basis method and the Wolfram Mathematica application software is given. The Wolfram Mathematica program uses the function of determining the reduced Gro¨bner basis. The application of the Gro¨bner basis method for estimating the attraction domain of a nonlinear dynamic system with respect to the equilibrium point is considered. To determine the scalar potential, the vector field of the dynamic system is decomposed into gradient and vortex components. For the gradient component, the scalar potential and the Lyapunov function in polynomial form are determined by applying the homotopy operator. The use of Gro¨bner bases in the gradient method for finding the Lyapunov function of a nonlinear dynamical system is considered. The coordination of input-output signals of the system based on the construction of Gro¨bner bases is considered.

Theory of Data

250-259 603
Abstract

The article compares character-level, word-level, and rhythm features for the authorship verification of literary texts of the 19th-21st centuries. Text corpora contains fragments of novels, each fragment has a size of about 50 000 characters. There are 40 fragments for each author. 20 authors who wrote in English, Russian, French, and 8 Spanish-language authors are considered.

The authors of this paper use existing algorithms for calculation of low-level features, popular in the computer linguistics, and rhythm features, common for the literary texts. Low-level features include n-grams of words, frequencies of letters and punctuation marks, average word and sentence lengths, etc. Rhythm features are based on lexico-grammatical figures: anaphora, epiphora, symploce, aposiopesis, epanalepsis, anadiplosis, diacope, epizeuxis, chiasmus, polysyndeton, repetitive exclamatory and interrogative sentences. These features include the frequency of occurrence of particular rhythm figures per 100 sentences, the number of unique words in the aspects of rhythm, the percentage of nouns, adjectives, adverbs and verbs in the aspects of rhythm. Authorship verification is considered as a binary classification problem: whether the text belongs to a particular author or not. AdaBoost and a neural network with an LSTM layer are considered as classification algorithms. The experiments demonstrate the effectiveness of rhythm features in verification of particular authors, and superiority of feature types combinations over single feature types on average. The best value for precision, recall, and F-measure for the AdaBoost classifier exceeds 90% when all three types of features are combined.

260-279 534
Abstract

This article is dedicated to the analysis of various stylometric characteristics combinations of different levels for the quality of verification of authorship of Russian, English and French prose texts. The research was carried out for both low-level stylometric characteristics based on words and symbols and higher-level structural characteristics.

All stylometric characteristics were calculated automatically with the help of the ProseRhythmDetector program. This approach gave a possibility to analyze the works of a large volume and of many writers at the same time. During the work, vectors of stylometric characteristics of the level of symbols, words and structure were compared to each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were applied to the input of various classifiers to perform verification and identify the most appropriate classifier for solving the problem. The best results were obtained with the help of the AdaBoost classifier. The average F-score for all languages turned out to be more than 92 %. Detailed assessments of the quality of verification are given and analyzed for each author. Use of high-level stylometric characteristics, in particular, frequency of using N-grams of POS tags, offers the prospect of a more detailed analysis of the style of one or another author. The results of the experiments show that when the characteristics of the structure level are combined with the characteristics of the level of words and / or symbols, the most accurate results of verification of authorship for literary texts in Russian, English and French are obtained. Additionally, the authors were able to conclude about a different degree of impact of stylometric characteristics for the quality of verification of authorship for different languages.

280-291 728
Abstract

The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.

The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.

292-311 1122
Abstract

It is known that in the tasks of natural language processing, the representation of texts by vectors of fixed length using word-embedding models makes sense in cases where the vectorized texts are short.

The longer the texts being compared, the worse the approach works. This situation is due to the fact that when using word-embedding models, information is lost when converting the vector representations of the words that make up the text into a vector representation of the entire text, which usually has the same dimension as the vector of a single word.

This paper proposes an alternative way for using pre-trained word-embedding models for text vectorization. The essence of the proposed method consists in combining semantically similar elements of the dictionary of the existing text corpus by clustering their (dictionary elements) embeddings, as a result of which a new dictionary is formed with a size smaller than the original one, each element of which corresponds to one cluster. The original corpus of texts is reformulated in terms of this new dictionary, after which vectorization is performed on the reformulated texts using one of the dictionary approaches (TF-IDF was used in the work). The resulting vector representation of the text can be additionally enriched using the vectors of words of the original dictionary obtained by decreasing the dimension of their embeddings for each cluster.

A series of experiments to determine the optimal parameters of the method is described in the paper, the proposed approach is compared with other methods of text vectorization for the text ranking problem – averaging word embeddings with TF-IDF weighting and without weighting, as well as vectorization based on TF-IDF coefficients.

Erratum

312-313 437
Abstract

In the article by V. V. Vasilchikov “Parallel Algorithm for Solving the Graph Isomorphism Problem” ( Modeling and analysis of information systems, vol. 27, no. 1, pp. 86–94, 2020; DOI: https://doi.org/10.18255/1818-1015-2020-1-86-94) there was a misprint in the layout. In the Table 1, in the last column of the row “Degree of graph” the value should be 3000 (instead of 300). The corrected “Table 1” is shown below. The editors apologise for the inconvenience.

314-316 456
Abstract

In the article by Y. V. Kosolapov “On the Detection of Exploitation of Vulnerabilities Leading to the Execution of a Malicious Code” (Modeling and analysis of information systems, vol. 27, no. 2, pp. 138–151, 2020; https://doi.org/10.18255/1818-1015-2020-2-138-151) an inaccurate description of the algorithm CheckTrace is committed. The correct description of the algorithm CheckTrace is given below. The author apologises for the inconvenience.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)