Preview

Modeling and Analysis of Information Systems

Advanced search
Vol 29, No 3 (2022)

Algorithms

154-165 401
Abstract
In map production it is necessary to keep the spatial relationships between map objects. Generalization is the simplification performed on geographical data when decreasing its representation scale. It is a common practice to simplify each type of spatial objects independently (administrative boundaries first, then road network, hydrographic network, etc.). During the process some spatial conflicts, which require manual correction, arise inevitably. The generalization automation still remains an open issue for data producers and users. Many researchers are working to achieve a higher level of automation. In order to detect the spatial conflicts a refined description of spatial relationships is needed.The paper analyzes models of describing topological relationships of spatial objects: the nine intersections model, the topological chain model and the E-WID model. Each considered model allows to take into account some relations between objects, but does not allow to transfer them exactly. As a result, the task of developing a model of relations preserving topology is relevant. We have proposed an improved model of nine intersections, which takes into account the topological conflict that occurs when a point object is located next to a simplified line. Line simplification is one of the most requested actions in map production and generalization. When the mesh covered the map inside the cell there can be points, line segments and polygonal topological objects, which, if the cell is rather small, are polyline objects. Thus, the issue of simplification of topological objects within a cell is reduced to the issue of simplification of linear objects (polylines). The developed algorithm is planned to be used to solve the problem of consistent generalization of spatial data. The ideas outlined in this article will form the basis of a new index of spatial data that preserves their topological relationships.
166-180 374
Abstract
In this article, we consider the NP-hard problem of the two-step colouring of a graph. It is required to colour the graph in a given number of colours in a way, when no pair of vertices has the same colour, if these vertices are at a distance of 1 or 2 between each other. The optimum two-step colouring is one that uses the minimum possible number of colours.The two-step colouring problem is studied in application to grid graphs. We consider four types of grids: triangular, square, hexagonal, and octogonal. We show that the optimum two-step colouring of hexagonal and octogonal grid graphs requires 4 colours in the general case. We formulate the polynomial algorithms for such a colouring. A square grid graph with the maximum vertex degree equal to 3 requires 4 or 5 colours for a two-step colouring. In the paper, we suggest the backtracking algorithm for this case. Also, we present the algorithm, which works in linear time relative to the number of vertices, for the two-step colouring in 7 colours of a triangular grid graph and show that this colouring is always correct. If the maximum vertex degree equals 6, the solution is optimum.

Computer System Organization

182-198 352
Abstract
Line codes are widely used to protect against errors in data transmission and storage systems, to ensure the stability of various cryptographic algorithms and protocols, to protect hidden information from errors in a stegocontainer. One of the classes of codes that find application in a number of the listed areas is the class of linear self-complementary codes over a binary field. Such codes contain a vector of all ones, and their weight enumerator is a symmetric polynomial. In applied problems, self-complementary [n, k]-codes are often required for a given length n and dimension k to have the maximum possible code distance d(k, n). For n < 13, the values of d(k, n) are already known. In this paper, for self-complementary codes of length n=13, 14, 15, the problem is to find lower bounds on d(k, n), as well as to find the values of d(k, n) themselves. The development of an efficient method for obtaining a lower estimate close to d(k, n) is an urgent task, since finding the values of d(k, n) in the general case is a difficult task. The paper proposes four methods for finding lower bounds: based on cyclic codes, based on residual codes, based on the (u-u+v)-construction, and based on the tensor product of codes. On the joint use of these methods for the considered lengths, it was possible to efficiently obtain lower bounds, either coinciding with the found values of d(k, n) or differing by one. The paper proposes a sequence of checks, which in some cases helps to prove the absence of a self-complementary [n, k]-code with code distance d. In the final part of the work, on the basis of self-complementary codes, a design for hiding information is proposed that is resistant to interference in the stegocontainer. The above calculations show the greater efficiency of the new design compared to the known designs.

Theory of Data

266-279 451
Abstract
The research is devoted to classification of news articles about P. G. Demidov Yaroslavl State University (YarSU) into 4 categories: “society”, “education”, “science and technologies”, “not relevant”.The proposed approaches are based on using the BERT neural network and methods of machine learning: SVM, Logistic Regression, K-Neighbors, Random Forest, in combination of different embedding types: Word2Vec, FastText, TF-IDF, GPT-3. Also approaches of text preprocessing are considered to achieve higher quality of the classification. The experiments showed that the SVM classifier with TF-IDF embedding and trained on full article texts with titles achieved the best result. Its micro-F-measure and macro-F-measure are 0.8214 and 0.8308 respectively. The BERT neural network trained on fragments of paragraphs with YarSU mentions, from which the first 128 words and the last 384 words were taken, showed comparable results. The resulting micro-F-measure and macro-F-measure are 0.8304 and 0.8181 respectively. Thus, using paragraphs with the target organisation mentions is enough to classify text by categories efficiently.

Theory of Computing

228-245 386
Abstract
When data-driven algorithms, especially the ones based on deep neural networks (DNNs), replace classical ones, their superior performance often comes with difficulty in their analysis. On the way to compensate for this drawback, formal verification techniques, which can provide reliable guarantees on program behavior, were developed for DNNs. These techniques, however, usually consider DNNs alone, excluding real-world environments in which they operate, and the applicability of techniques that do account for such environments is often limited. In this work, we consider the problem of formally verifying a neural controller for the routing problem in a conveyor network. Unlike in known problem statements, our DNNs are executed in a distributed context, and the performance of the routing algorithm, which we measure as the mean delivery time, depends on multiple executions of these DNNs. Under several assumptions, we reduce the problem to a number of DNN output reachability problems, which can be solved with existing tools. Our experiments indicate that sound-and-complete formal verification in such cases is feasible, although it is notably slower than the gradient-based search of adversarial examples.The paper is structured as follows. Section 1 introduces basic concepts. Then, Section 2 introduces the routing problem and DQN-Routing, the DNN-based algorithm that solves it. Section 3 proposes the contribution of this paper: a novel sound and complete approach to formally check an upper bound on the mean delivery time of DNN-based routing. This approach is experimentally evaluated in Section 4. The paper is concluded with some discussion of the results and outline of possible future work.
246-264 637
Abstract
The paper analyzes the possibilities of transforming C programming language constructs into objects of EO programming language. The key challenge of the method is the transpilation from a system programming language into a language of a higher level of abstraction, which doesn’t allow direct manipulations with computer memory. Almost all application and domain-oriented programming languages disable such direct access to memory. Operations that need to be supported in this case include the use of dereferenced pointers, the imposition of data of different types in the same memory area, and different interpretation of the same data which is located in the same memory address space. A decision was made to create additional EO-objects that directly simulate the interaction with computer memory as in C language. These objects encapsulate unreliable data operations which use pointers. An abstract memory object was proposed for simulating the capabilities of C language to provide interaction with computer memory. The memory object is essentially an array of bytes. It is possible to write into memory and read from memory at a given index. The number of bytes read or written depends on which object is being used. The transformation of various C language constructs into EO code is considered at the level of the compilation unit. To study the variants and analyze the results a transpiler was developed that provides necessary transformations. It is implemented on the basis of Clang, which forms an abstract syntax tree. This tree is processed using LibTooling and LibASTMatchers libraries. As a result of compiling a C program, code in EO language is generated. The considered approach turns out to be appropriate for solving different problems. One of such problems is static code analysis. Such solutions make it possible to isolate low-level code fragments into separate program objects, focusing on their study and possible transformations into more reliable code.

Discrete Mathematics in Relation to Computer Science

200-209 430
Abstract
One of the main methods of computational topology and topological data analysis is persistent homology, which combines geometric and topological information about an object using persistent diagrams and barcodes. The persistent homology method from computational topology provides a balance between reducing the data dimension and characterizing the internal structure of an object. Combining machine learning and persistent homology is hampered by topological representations of data, distance metrics, and representation of data objects. The paper considers mathematical models and functions for representing persistent landscape objects based on the persistent homology method. The persistent landscape functions allow you to map persistent diagrams to Hilbert space. The representations of topological functions in various machine learning models are considered. An example of finding the distance between images based on the construction of persistent landscape functions is given. Based on the algebra of polynomials in the barcode space, which are used as coordinates, the distances in the barcode space are determined by comparing intervals from one barcode to another and calculating penalties. For these purposes, tropical functions are used that take into account the basic structure of the barcode space. Methods for constructing rational tropical functions are considered. An example of finding the distance between images based on the construction of tropical functions is given. To increase the variety of parameters (machine learning features), filtering of object scanning by rows from left to right and scanning by columns from bottom to top are built. This adds spatial information to topological information. The method of constructing persistent landscapes is compatible with the approach of constructing tropical rational functions when obtaining persistent homologies.
210-227 360
Abstract
The process of testing dependencies and inference rules can be used in two ways. First, testing allows verification hypotheses about unknown inference rules. The main goal, in this case, is to search for the relation - a counterexample that illustrates the feasibility of the initial dependencies and contradicts the consequence. The found counterexample refutes the hypothesis, the absence of a counterexample allows searching for a generalization of the rule and conditions for its feasibility (logically imply). Testing cannot be used as a proof of the feasibility of inference rules, since the process of generalization requires the search for universal inference conditions for each rule, which cannot be programmed, since even the form of these conditions is unknown. Secondly, when designing a particular database, it may be necessary to test the feasibility of a rule for which there is no theoretical justification. Such a situation can take place in the presence of anomalies in the superkey. The solution to this problem is based on using join dependency inference rules. For these dependencies, a complete system of rules (axioms) has not yet been found. This paper discusses: 1) a technique for testing inference rules using the example of join dependencies, 2) a scheme of a testing algorithm is proposed, 3) some hypotheses are considered for which there are no counterexamples and inference rules, 4) an example of using testing when searching for a correct decomposition of a superkey is proposed.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)