On the Characteristics of Symbolic Execution in the Problem of Assessing the Quality of Obfuscating Transformations

Obfuscation is used to protect programs from analysis and reverse engineering. There are theoretically effective and resistant obfuscation methods, however, most of them are not implemented in practice yet. The main issues are the large overhead for the execution of obfuscated code and the limitation of application only to a specific class of programs. On the other hand, a large number of obfuscation methods have been developed that are applied in practice. The existing approaches to the assessment of such obfuscation methods are based mainly on the static characteristics of programs. Therefore, the comprehensive (taking into account the dynamic characteristics of programs) justification of their effectiveness and resistance is a relevant task. It seems that such a justification can be made using machine learning methods based on feature vectors that describe both static and dynamic characteristics of programs. In this paper, it is proposed to build such a vector on the basis of characteristics of two compared programs: the original and obfuscated, original and deobfuscated, obfuscated and deobfuscated. In order to obtain the dynamic characteristics of the program, a scheme based on a symbolic execution is constructed and presented in this paper. The choice of the symbolic execution is justified by the fact that such characteristics can describe the difficulty of comprehension of the program in dynamic analysis. This paper proposes two implementations of the scheme: extended and simplified. The extended scheme is closer to the process of analyzing a program by an analyst, since it includes the steps of disassembly and translation into intermediate code, while in the simplified scheme these steps are excluded. In order to identify the characteristics of symbolic execution that are suitable for assessing the effectiveness and resistance of obfuscation based on machine learning methods, experiments with the developed schemes were carried out. Based on the obtained results, a set of suitable characteristics is determined.


INTRODUCTION
Obfuscation is the modification of program code while preserving its functionality and making it difficult to analyze, understand, and modify program algorithms. Obfuscation is widely used to protect programs from analysis and reverse engineering [1]. Although there are now theoretically robust obfuscation methods [2] these methods cannot be applied in practice. This is mainly due to either the cost of resources to execute the obfuscated code or because of the limitation to apply it only to a certain class of programs [3]. There are many obfuscation methods, which at intuitive level make it difficult to understand the algorithms protected by obfuscation; however, there is no theoretical justification of their efficiency. Nevertheless, a number of practical methods and metrics have been proposed to evaluate the effectiveness of obfuscation transformations, i.e., the resilience to analysis and understanding the programs [4][5][6][7][8]. Let us note that program source code understanding is a widely researched area in software engineering [9]. It is noted in [10] that program understanding and code obfuscation are two sides of the same coin, and, therefore, metrics for evaluating understanding are constructed in [10] using knowledge from obfuscation. One may suggest that the evaluation methods for program comprehension, in turn, may also be used to evaluate the effectiveness of obfuscating transformations.
In [11] we proposed a scheme for evaluating the robustness of obfuscating transformations based on comparison of the similarity signatures computed by characteristics of the programs. The main block of this scheme is the block of robustness estimation, which makes a conclusion about the similarity of programs. This block can be implemented using machine learning methods. For this purpose, it is necessary to have characteristics of the program that describe it from different sides of the analysis: in the static analysis (the structure of the control flow graph, completeness of disassembling, comprehensibility of the program code and others) and at dynamic analysis (behavior of the program at runtime). A set of characteristics proposed in [4][5][6][7][8], as well as characteristics from the field of programs [9], can be constructed by static analysis of the program for the scheme from [11]. At the same time gathering program characteristics by means of dynamic analysis is more difficult, as it is required to run the program and analyze its behavior, which may depend on the execution environment and/or input parameters. In this paper, the dynamic analysis model chosen is symbolic execution [12], which can characterize the complexity of understanding a program in dynamic analysis and has already found application in [13] in the analysis of obfuscated code.
The goal of this paper was, on the one hand, to obtain and evaluate the characteristics of the symbolic execution of the obfuscated/de-obfuscated/source program and, on the other hand, to evaluate the scheme for obtaining the characteristic of symbolic execution of the programs constructed in [11]. To evaluate the scheme we consider its simplified version, which lacks the steps of compilation into binary representation and translation back to LLVM bitcode [14]. Hereinafter, for convenience, the simplified version will be referred to as the simplified version and its full version will be referred to as the extended scheme.
Apart from the introduction and the conclusion the paper contains three sections. The first section is devoted to a reviews of known methods of evaluation of obfuscating transformations. In the second section the extended and simplified schemes for obtaining characteristics of the symbolic program execution characteristics are proposed. The third section is devoted to the analysis of the results of conducted experiments with the proposed schemes.

KNOWN APPROACHES TO EVALUATING OBFUSCATING TRANSFORMATIONS
One of the first methods of complex evaluation of obfuscating transformations was proposed by K. Kohlberg in [4]. For this purpose, four indicators are suggested: efficiency (potency), resilience (resilience), the degree of increase in resources consumed by the obfuscated program (cost), quality of obfuscation (quality). The efficiency of obfuscation is defined with the help of program quality metrics from software engineering, such as the program length, the cyclomatic program length, cyclomatic complexity, complexity of data flows and structures, and other metrics. Persistence is defined as a function of the analyst's time to develop the deobfuscator and the operating time of the deobfuscator itself. The obfuscation quality is defined as a combination of the previous three indicators: efficiency, persistence, and cost. However, we note that [4] does not propose a method to estimate the time needed for the analyst to develop the deobfuscator.
In [5] a method for searching and detecting encrypted data in a program (such data can be algorithms of the program) is proposed. The method is based on using the n-gram model [15]. Using this method, the artificiality index is calculated, which is used to identify the parts of the program that contain high entropy data. It appears that this indicator can be used to evaluate the efficiency of obfuscating transformations, since such transformations can influence the entropy of the program code (both in the lower and in the upper side).
In [6] a different approach is proposed: the quality of obfuscation of the source code is estimated by Kolmogorov complexity. It has been experimentally established that the lower the similarity of the source code and decompiled code is, the higher the Kolmogorov complexity is for the obfuscated program. Consequently, the higher the Kolmogorov complexity (estimated using compression algorithms) is, the better the obfuscation is. This approach in [7] is used to evaluate the obfuscation of programs written in Java. The Kolmogorov complexity calculation is performed on the basis of the files with the programs source code.
An experimental approach for evaluating the robustness of obfuscating transformations was described in [8]. In this approach, obfuscation is considered in terms of program code understanding by the analyst. A series of controlled experiments involving groups of analysts were performed to evaluate persistence. It was shown that static and dynamic analysis of obfuscated programs by the analyst takes considerably more time than the analysis of the original program. However, this method is not suitable for automatic analysis of the persistence of obfuscation.
The problem of automatically evaluating the quality of obfuscating transformations is relevant both in terms of static as well as dynamic analysis. In the above-mentioned works, with the exception of the experimental approach with the participation of analysts, efficiency is calculated on the basis of the static analysis. For the complex estimation of efficiency, it is necessary to take into account the dynamic character-istics of programs. In order to obtain such characteristics, in this paper we propose to use symbolic performance. In order to determine which characteristics of symbolic performance can be used to evaluate obfuscation efficiency, a scheme for obtaining them is made and experiments are performed, the results are analyzed. Let us point out that the symbolic performance is used in research and analysis of the obfuscated code. In particular, it was noted in [13] that obfuscating transformations have a significant influence on the efficiency of symbolic analysis and a generalized approach is proposed for increasing the efficiency of such analysis. Nevertheless, in [13] symbolic execution is used as a method of analysis without considering its applicability in the problems of evaluation of the efficiency and robustness of obfuscation.

SCHEMES FOR OBTAINING CHARACTERISTICS
In [11] a scheme for characterization of symbolic performance was proposed, which is supposed to be used to evaluate the efficiency and persistence of characterization. This section briefly describes this scheme noting some features related to the steps of translation of the machine code to LLVM bitcode, and describes a simplified scheme.

An Extended Scheme
According to the model in [11] the program P undergoes the following steps: (1) compilation with the Hikari obfuscating compiler [16] with different options of obfuscating transformations (ten different transformations are chosen), as well as compiling without transformations; (2) building the execution flow graph of the compiled program by means of the mcsema-disass tool [17]; (3) translation of the representation obtained at the previous step into LLVM bitcode with the mcsema-lift tool [17]; (4) optimization of the obtained bitcode with the optimizer opt from the LLVM; (5) symbolic execution of the obtained bitcode versions with the KLEE symbolic interpreter [18]; and (6) processing of the characteristics obtained by symbolic execution. The sequence of steps is shown in Fig. 1.
After the first step 11 different executable modules are created for each program P: 10 obfuscated and 1 original (without obfuscation) module. As a result of the second and the third step we obtained 11 different bitcode files corresponding to the executable modules. On the fourth step for each of the ten bitcode files of the obfuscated programs optimization is performed and the result is saved in a separate bitcode file. We note that the optimizer used at this step acts as a deobfuscator from the model [11], since optimizers usually perform conversions inverse to obfuscation [19]. Therefore, the optimized bitcode files will be referred to as deobfuscated files. At the beginning of step 5 there are 21 different bitcode files: the original, 10 obfuscated, and 10 deobfuscated. The set of resulting bitcode files corresponding to the program P we denote by B(P). In the fifth step the symbolic execution of each bitcode file from B(P) is performed. Step 1. Step

Problems in the Analysis of Reconstructed Bitcode in the Extended Model
The McSema translation utilities used in the implementation of the extended scheme divide the binary program code translation process into two steps. In the first stage, a high-level representation of the program is built, that is, the execution flow graph, which contains functions, instructions of the basic blocks and other necessary information. This work is done with third-party utilities such as IDA Pro [20] and DynInst [21]. At the second stage the obtained representation is translated to LLVM bitcode by internal utility McSema. Thus, program processing in the machine code representation and preparation of representation convenient for translation to a greater extent is accomplished by the binary analysis utilities used at the first stage. At the second stage the translation is done almost directly: each instruction of the machine code is mapped into an intermediate representation instruction. Let us have a look at the features of the first and second stages of translation.
In translating programs from a high-level programming language to lower-level ones (the intermediate representation of the compiler or machine instructions, see step 1 of Fig. 1) some information about the program is lost (e.g., information about the types of variables, function names, and interface names), and class interfaces). Therefore, when disassembling and decompiling, in particular, the problem of distinguishing the executable code from the data arises [22]. On the one hand, all executable files of a program usually have a certain format [23,24] in which, at least, it is specified which parts of it are the executable code, which parts are the data, and where the entry point into the program is located. Therefore, the format of the executable files partially helps to solve the problem of executable code definition. On the other hand, there are still problems with identifying the transition addresses in indirect addressing and with specifying the boundaries of the functions. We note that the information lost during compilation is important for reverse engineering tasks, in particular, when it comes to code conversion from a low-level representation (machine code) to a high-level representation (intermediate compiler/pseudo code/high level language, see steps 2, 3 Fig. 1). This information allows one to analyze the program code more accurately and quickly [25]. However, binary files are often distributed without it, thus, decompilation remains a difficult task because of the lack of complete information.
The translation used in the second step, in practice, is implemented by interpretation. Interpretation means that machine instructions are not translated directly into instructions of the target processor, but are translated into the bytecode of the so-called virtual machine. The translation process creates a global structure describing the target processor (all its registers, flags, and other information specific to the target architecture). The machine instructions are replaced with the same instructions as in the intermediate representation level where the program is translated; however, the interaction is no longer carried out with the structures of the real processor, but is rather implemented on the structure of the virtual machine describing the processor [26]. For this reason optimization (deobfuscation) of the bitcode of such a program may not bring substantial difference to the source code, as the optimizer with high probability will not find the corresponding code of such a program structure (in other words, the optimizer analyzes code of the virtual machine, but not the code of the analyzed program).
In order to determine the effect of intermediate translation steps on the evaluation of persistence of obfuscation a simplified scheme is constructed.

A Simplified Scheme
The simplified diagram excludes steps 2 and 3 of the extended diagram shown in Fig. 1. Thus, the program is translated from the source code directly into LLVM bitcode. The obfuscating transformations are performed correctly, because they operate at the intermediate representation level. The program is then parsed by the KLEE interpreter. A simplified scheme of the analysis is illustrated in Fig. 2.
Due to this organization the simplified scheme retains most of the original information about the program, which is lost in the intermediate representations and may be useful during its analysis. Avoiding the steps of translation of the binary representation of the program into the intermediate LLVM representation (building the control flow graph and translating the resulting representation to LLVM bitcode) also eliminates the influence of complicated structure of the translated program on the work of the deobfuscator.

EXPERIMENTAL CHARACTERIZATION OF SYMBOLIC EXECUTION
Experiments were conducted for the extended and simplified schemes to find the characteristics of symbolic execution and calculation of similarity indices based on such characteristics. The experiments were performed on a computer with the following configuration: an AMD Ryzen 2700U processor (4/8 cores/threads), 16GB RAM, SSD M.2 PCI-E NVMe 256GB SSD. Since the amount of memory consumed increases considerably during the symbolic run executions the swapfile size was increased to 16GB in addition to the installed memory. Thus, the total memory available for the symbolic interpreter reached 32GB.
The following subsections describe the data we used, the character-execution parameters, the similarity indexes, and the limits of the experiments; the restrictions on the experiments are also described.

Data Description
For the experimental study a set of 3 programs was written in C language. The sample was based on the set of programs used in [27] to study the effect of obfuscating transformations on symbolic execution.
All the 3 programs take one command line parameter as the input and process it. The functionality of the programs includes calculating simple checksums, sorting the characters of the input parameter, searching for a character, checking the properties of a given number, and converting the input parameter to a string, to a number of different numeral systems, as well as the performance of a simple operation depending on the input parameter. If the program runs successfully it prints out the result of the parameter and returns 0, otherwise it returns an error code.
Most of the programs have been modified to process only the first character of the input parameter; we added some programs that implement a simple algorithm. Initially, the sample 3 consisted of 50 programs.
To limit the processing time of the symbolic bitcode files the maximum symbolic execution time was set to 30 min. As a result of applying the schemes shown in Figs. 1 and 2 to the programs from 3, it was found that some programs could not be examined by the symbolic interpreter within the maximum specified time (30 min). In this case the symbolic execution time of the obfuscated and original programs is the same and is equal to the maximum and the other characteristics of the symbolic execution do not seem objective because the symbolic execution is not finished.
We note that there is also a limit to the amount of RAM installed on the computing device that is available to the symbolic interpreter. Despite the fact that this memory has been increased by increasing the size of the swap file there were still programs for which symbolic interpretation was terminated prematurely by the operating system because the symbolic interpreter process was taking up all the available RAM. Such programs were also excluded from the sample 3. Thus, as a result, 35 programs were selected for evaluation (|3| = 35).

Source programs
Step 2.

Obfuscation Parameters
The Hikari obfuscator-LLVM compiler is a further development of the Obfuscator-LLVM compiler that was described in detail in [28], Some transformations provide additional parameterization, such as specifying the probability of application to each basic block. Default parameters were used for all such transformations.

The Characteristics of Symbolic Execution
A necessary condition for choosing a characteristic is its sensitivity to changes of a program. It is obvious that if a characteristic does not change when the program is changed (e.g., when using obfuscating transformations), then it is hard to evaluate the impact of such changes on program understanding by this characteristic. The following set is chosen as a set of analyzed characteristics of symbolic execution where F iexe is the number of executed instructions during the analysis, F time is the symbolic execution time, F icov is the percentage coverage of instructions in bitcode, F bcov is the percentage coverage of the transition instructions, F ilen is the total number of instructions in the bitcode file, F tsmt is the total time spent by the solver module SMT (satisfiability modulo theories), F qsmt is the number of the solver module calls per execution on average, F blen is the number of transition operators in the program code, F qall is the number of queries to the solver in total during symbolic analysis of the program, and F mem is the average amount of memory consumed during symbolic execution.
To eliminate the influence of the operating system processes on the results of the experiments for each program from 3 the experiment is executed t times (the characteristics of the symbolic execution can be affected by such processes as system components update, background execution of services and scheduled execution of tasks). In the present paper the parameter t is equal to 5. For each characteristic F(∈^) we denote by the symbol F i the value of this characteristic after ith iteration of the algorithm, and by we denote the averaged value of the characteristic over all t iterations: .

Program Similarity Indicators
, bcov  ilen  tsmt  qsmt  blen  qall mem   ,  ,  ,  ,  ,  ,  , , , ( , ( ( ))) , max ( ), ( ( )) max ( ), ( ( ( ))) ) ( ( ( ))) ( ( ), ( ( ))) , max ( ( )), ( ( ( ))) For each characteristic F of ^ the set of values , , (where O(∈2)) allows us to identify the obfuscating transformation that maximally changes the original program within the considered similarity metrics. For a fixed O it is difficult to make an assumption about the persistence of the obfuscating transformation, since the obtained normalized similarity indices for each F can differ strongly from each other. What is required is the introduction of weights for characteristics from ^ to find the integral similarity index. On the other hand, this problem can be approached using machine learning methods. For this purpose, it is necessary to identify the most variable characteristics. For this purpose, the values of , , averaged over the set 2 are denoted as (1) The index characterizes the average influence of the obfuscating transformation on the value of the characteristic F (efficiency). The index characterizes the ability of the obfuscator to approximate the obfuscated program closer to its initial version (robustness). The index characterizes the deobfuscator's ability to level out the obfuscated transformations (control value). On the basis of these values it is possible to make a choice of characterteristics that can be used in a machine-learning based persistence estimation model.
The values averaged over the set ^( 2) allow us to identify the obfuscating transformations that have the greatest impact on the characteristics of symbolic program execution.

EXPERIMENTAL RESULTS
For the considered sets ^, 2, and 3 the results of calculation of averaged values (1) and (2) for extended and simplified schemes are presented in Tables 1, 2 and 3, 4, respectively. The first column in Tables 1 and 3 shows how much obfuscation on average changes the program from its original state within a particular similarity score, i.e., the first column shows the effectiveness of obfuscation. The second column shows how much the deobfuscated program differs from the original program within the same indicator, i.e., this column characterizes the persistence of obfuscating transformations within the framework of this similarity index. It is experimentally confirmed that the third column can be determined by the values of the first two columns: if the values of the first and second columns are close the value of the third column is expected to be close to zero.
In [30] it is suggested that similarity values that are less than 0.05 should be considered insignificant. In this paper this estimate is refined based on the standard deviation of the values from the mean. Let us denote by M^ and σ^ respectively the mean value and the standard deviation of the 2-averaged similarity indices, and M 2 and σ 2 , the mean value and the standard deviation of the similarity indices averaged over accordingly. The calculated values are proposed to be considered as boundaries by which the significance of the indicators can be determined. If for the chosen index its value is less than the corresponding boundary the index magnitude is considered to be insignificant. For Table 1 and 3 this means that the values of symbolic characteristic practically do not change (on the average in 2). Similarly for Table 2 and 4 this means that the transformation O has practically no effect on all symbolic characteristics of execution (on average over ^).
( ) ( ) ( , ( )) ( , ( ( ))) ( , ( )) , ( , ( ( ))) ,       Analysis of Tables 1 and 3 shows that the characteristics that are most variable in both obfuscation and deobfuscation are the characteristics: F time , F iexe , F tsmt , F qsmt , F qall , F mem . Let us note that these characteristics are the most variable, both within the extended scheme and within the simplified scheme. Since the total symbolic execution time includes the time spent by the SMT module and also the total number of queries includes queries to the SMT module, only four characteristics are left out of the above: F time , F iexe , F qall , F mem .
Comparison of the tables for the extended scheme with those for the simplified scheme shows that the values of the indicators for the extended scheme have grown by at least twice compared to the simplified one. It seems that this is due to the problems described earlier (see Subsection 2.2), in particular, with additional stages of disassembling and translation, which are excluded from the simplified scheme. Nevertheless, both schemes can be used for feature selection since the corresponding sets of the most volatile indicators are almost identical. The exception is the key F bcov indicator, which is variable within the simplified scheme, as opposed to the extended scheme; however, its value is close to the boundary. However, since the extended scheme corresponds more to the process of dynamic analysis of the program by the analyst, and the discrepancy with the results of the simplified scheme occurs only in one indicator, the indicators selected are those identified as volatile within the expanded scheme.
Analysis of Tables 2 and 4 shows how obfuscating transformations affect the change in the program similarity metrics. Applying all obfuscations simultaneously in the expected way has the maximum effect. The tables show that almost all of the obfuscating transformations have varying degree effect on the characteristics of the symbolic execution within the framework of the similarity indices under consideration. In spite of the fact that according to the values in Tables 2 and 4 we can distinguish the obfuscating transformations that have the greatest effect, the values from these tables characterize the quality of each obfuscating transformation exclusively within symbolic execution, without taking into account the characteristics of these transformations obtained in the static analysis.

CONCLUSIONS
The similarity metrics obtained from the static analysis of a program can be used to evaluate the efficiency of obfuscating transformations and also to determine their robustness to static deobfuscation techniques. Such indicators can be built on the basis of various complexity metrics calculated, for example, from a program's executable file. In the present paper we propose to use symbolic realization for the dynamic analysis modeling. Two schemes for obtaining characteristics of symbolic execution (extended and simplified) have been developed. Similarity indices for the obtained characteristics were suggested. Experiments were carried out with the schemes in order to determine the changeable characteristics of the symbolic performance on the one hand, and on the other hand, to compare the results of the constructed schemes to each other. Comparison of the experimental results of both schemes shows a slight difference in the choice of the set of changeable characteristics of the symbolic performance (in the simplified scheme the F bcov characteristic is defined as variable in contrast to the extended scheme). As well, com-  3 parison shows that values of indicators of the simplified scheme on average are lower than the values of the same indicators of the extended scheme because the symbolic performance of the simplified scheme requires the symbolic performance of the extended scheme. The simplified scheme requires fewer resources (time and memory) to analyze the program. Nevertheless, the characteristics (F time , F iexe , F qall , F mem ) are the most sensitive to changes in the program within both schemes. These four characteristics are proposed for use in machine learning.
We note that the extended scheme for finding the characteristics of symbolic execution is more universal with respect to obfuscating transformations, since the obfuscation is allowed to be applied at the machine instruction level; also such a scheme is a better match to the process of dynamic program analysis by the analyst. This is further done by combining similarity metrics from static analysis with similarity metrics based on selected characteristics of symbolic performance in a feature vector for making experiments on evaluating efficiency and persistence of obfuscating transformations with machine learning methods.