2015 - Multi-Platform Binary Program Testing Using Concolic Execution…
2015 - Multi-Platform Binary Program Testing Using Concolic Execution
Produces a new path of execution
Number of inputs executed targeting this branch
Targets the uncovered edge of a known branch
Length of the path
Initialize algorithm with seed input
Length of the content should vary since the system only varies the content of the file, but not the length of it.
New input obtained from mutating previous input based on constraint information from the model
Begin recording of program execution with input
Concolic execution of PANDA replay
Augment LLVM code with callback function
Implemented by LLVM pass
If the memory location is not tainted, the byte from the concrete memory is picked and turned into a bit-vector.
If an instruction reads a value from memory as an operator, and that memory is marked as tainted, we construct a new bit-vector expression with the size of the read.
Need to capture potential symbolic values read from (or written to) shadow memory, while still mixing in concrete values from the underlying storage if no symbolic values are present.
Translates LLVM instruction in the form of a symbolic SMT-LIB expression
When the address read from or written to is tainted i.e. is a symbolic value. This involves that the exact result of the operation depends on the input
SMT solver solves selected constraint
Z3 SMT solver
Generate new input
Test Case Generation
Once the constraints of the current test case have been captured, new inputs are generated using the solutions to the constraints produced by Microsoft's Z3 SMT solver
Individual constraints are iteratively negated
Dynamic Taint Analysis
Taint analysis requires bookkeeping of the taint state of the systems memory, including I/O memory and CPU registers, in order to properly propagate taint.
Extending taint analysis in order to learn about how exactly the inputs to the program affect its execution
Restricted variant of symbolic execution
Instead of covering all possible execution paths in one execution, concolic execution outputs the PC that is recorded during a single execution.
Generate new input files by selectively negating individual branch constraints and solving the resulting, new, path constraint.
Program is executed while all its inputs are treated as symbolic variables
Transformations on inputs during the execution are tracked in the form of logical formulas depending on these symbolic variables
System performing execution kept symbolic values in a symbolic state that mirrors the memory locations of the system
List of logical frst-order expressions
depending on the input variables.
Not scaling well to larger applications
Record & Replay
Satisfiability Modulo Theories
Standardized format of representing SMT formulas is SMT-LIB
Determining if an interpretation of boolean variables for a given boolean formula exists
Executing a program with (semi-)random inputs in an automated fashion with the goal of causing errors in the program.
Validate the correctness of the symbolic execution implementation and its performance against complex branch conditions in isolation
Complex Branch Conditions
Nested if statements
Arithmetic operations to verify common binary operations
Larger equation based on two integers read from the input file is compared to a constant value
Larger equation based on two integers read from
the input file is compared to a constant value
Complex Control Flow Logic
Samples from real-world applications
Source code with buffer overflow vulnerabilities
Not all of the samples could be run
American Fuzzy Lop fuzzing tool
Platform to dynamically generate test les for programs that cause the program to behave dierently.
Analysis of binary programs with zero knowledge about the program internals, without any source code, and on multiple architectures