2007 - BitScope: Automatically Dissecting Malicious Binaries (Information,…
2007 - BitScope: Automatically Dissecting Malicious Binaries
Automatic dissection of malicious binaries
Execute binaries with symbolic inputs
Symbolic System Environment
Manipulates inputs to the malicious binary to control the execution of the malware, and it records the outputs from the malicious binary including its impact on the system such as sending packets and writing to files. Logged information will be used by the Extractor to provide analysis results
Adding code to QEMU that executes right before the emulated environment would jump to a hooked function.
When the emulated CPU reaches the entry point of a hooked function, the QEMU will execute the hook associated with that function
A platform to log information on API call with hooking, where it’s called from as well as the arguments it’s called with
Intercepting Windows API calls made by the malicious software, instead of allowing the actual concrete value to be returned, the Symbolic System Environment will create a new symbolic variable that represents the return value
This symbolic variable represents all the values that could have been returned to the malicious binary
Performing mixed concrete and symbolic execution on malicious software
Symbolic execution is necessary to handle the symbolic variables that are introduced by the Symbolic System Environment.
Mixed Concrete and Symbolic Execution
Concrete execution is used as an optimization for all operations that do not depend on those symbolic variables
When we reach a branch in the malicious code that depends on a symbolic value, we can express the condition of that branch in terms of path predicate
Construct a path predicate for each branch direction. Each path predicate describes the constraints the symbolic inputs need to satisfy for the program execution to go down that path.
The new path predicate is the conjunction of the constraints of the current path before the current branch and the constraint imposed by the current branch
Once these path predicates are constructed, Rudder will use the Solver to determine if each direction of the branch is satisfiable. All possible directions are given to the Path Selector to enqueue as future paths to be explored
Determine Whether to Execute an Instruction Symbolically or Concretely
Maintain a table denoting whether each register contains symbolic or concrete data.
Page-table type data structure that shadows each valid memory location, marking it as symbolic or concrete
The Path Selector
Keeps a pool of feasible paths to be further explored
Analyzing the information that other system components obtain and providing that analysis to the user.
Control Flow Graph Module
Control-flow graph of discovered code
Input Analysis Module
Inputs required by the binary to drive the different
execution paths discovered
Impact Analysis Module
Impact that the binary has on the system
Single-path Dependency Analysis Module
Dependency between the inputs and outputs
of the malicious binary
Multi-path Dependency Analysis Module
Analyzes several runs of a program, using several different inputs, to infer additional dependencies
Takes as input a malicious binary, and outputs information about execution paths
Employs whole system emulation in order to intercept any input to the program
Inputs are replaced with symbolic variables
Symbolically executes all instructions which are derived from the input
Allows us to reason about code paths without constraining the analysis to a particular input value
What behavior the malware exhibits
What input/output dependencies exist
What inputs cause interesting behavior
Flow of the program
What actions may the malware perform, and what is the control flow between potential actions?
How do we run the malware to uncover its behavior?
How do inputs and outputs relate?