Please enable JavaScript.
Coggle requires JavaScript to display documents.
2011 - S2E: A Platform for In-Vivo Multi-Path Analysis of Software Systems
2011 - S2E: A Platform for In-Vivo Multi-Path Analysis of Software Systems
Information
S2E
Platform for analyzing the properties and behavior of software systems
Automated path explorer with modular path analyzers
Explorer drives target system down all execution path of interest while analyzers check properties of each such path or simply collect information
Desired path can be specified
Analysis tool build on top of S2E glues together path selectors and path analyzers.
Selectors guide S2E path explorer by specifying the paths of interest: all paths that touch a specific memory object, paths influenced by a specific parameter, paths inside a target code module,etc.
Analyzers can be pieced together from S 2E-provided analyzers, or can be written from scratch using the S 2E API
Feature
Properties
Scalability
Going from single-path analysis to multi-path analysis turns a linear problem into an exponential one, because the number of paths through a program increases exponentially in the number of branches (path explosion problem)
Able to directly analyze binaries, being able to analyze even proprietary software
Dynamic Binary Instrumentation
Directly interpret x86 machine code
Perform analyses in-vivo within real software stack instead of using abstract models of these layers
Virtualization
Removes the need for the stubs or abstract models required by most state-of-the-art symbolic execution engines and model checkers
e.g.
drivers
kernel
libraries
user programs
Simultaneously analyze entire families of execution paths
Predictive analyses often must reason about entire families of paths through the target system, not just one path
Execution consistency models
Models
Defining a model, we think in terms of which paths it includes vs. excludes
Globally feasible
Execution is additionally consistent with control flow restrictions imposed by data-related constraints in the environment
Subset of the locally feasible paths
Locally feasible
Execution is consistent with both the system’s CFG and with the restrictions on control flow imposed by the data-related constraints within the unit
Subset of the statically feasible paths
Statically feasible
There exists a path in the system’s inter-procedural control flow graph (CFG) corresponding to the execution in question
Strict Consistency
Strictly Consistent System-level Execution
Implementation of SC-SE is limited by the path explosion problem: the number of globally feasible paths is roughly exponential in the size of the whole system
Provided that the engine can solve all constraints it encounters, every path through the unit that is possible under a concrete execution of the system will be eventually found by SC-SE exploration
Exploration engine gathers and uses information about all parts of the system, to explore new paths through the unit.
Strictly Consistent Unit-level Execution
To implement this model, the SSE converts all symbolic data to concrete values when the unit calls the environment
The conversion is consistent with the current set of path constraints in the unit. No other conversion is performed. The environment is treated as a black box, and no symbolic data can flow into it
Strictly Consistent Concrete Execution
New paths can only be explored by blindly guessing new inputs
The only explored paths are the paths that the system follows when executed with the sample input provided by the analysis
The entire system is treated as a black box: no internal information is used to explore new paths
Relaxed Consistency
CFG Consistency
Exploration is fast, because branch feasibility need not be checked with a constraint solver
Implement RC-CC by pursuing all outcomes of every branch, regardless of path constraints, thus following all edges in the unit’s inter-procedural CFG
Overapproximate Consistency
SSE converts concrete values at unit/environment interface boundaries into unconstrained symbolic values that disregard interface contracts
Local Consistency
Brings completeness at the expense of substantial
overapproximation
SSE converts, at the unit/environment boundary, the concrete values generated by the environment into symbolic values that satisfy the constraints of the environment’s API
Consistent model
Every path through the unit admissible by the model, there exists a corresponding globally feasible path through the system (i.e., the system can run concretely in that way)
Complete model
Every path through the unit that corresponds to some globally feasible path through the system will eventually be discovered by exploration done under that model.
Distinction
Which execution paths each model admits
The traditional assumption about system execution is that the state at any point in time is consistent, i.e., there exists a feasible path from the start state to the current state
Making principled peformance/accuracy trade-offs in complex analyses
Selective Symbolic Execution
Transisition
Symbolic → Concrete Transition
S2E employs lazy concretization
This is an important optimization when doing in-vivo symbolic execution,because a lot of data can be carried through the layers of thesoftware stack without conversion
Converts the value of x from symbolic to concrete on-demand, only when concretely running code actually reads the value of x
Concrete → Symbolic Transition
The concrete domain is unaware of libFn being executed in multi-path mode. All paths execute independently, and it is up to the S2Eanalyzer plugins to decide whether, besides observing the concrete path, they also wish to look at the symbolic paths
Simplest conversion is to use an S2E selector to change the concrete arguments into symbolic ones, e.g., instead of libFn(10) call libFn(λ)
Once this transition occurs, S2E executes libFn symbolically and simultaneously executes libFn with the original concrete argument(s) as well
Once exploration of libFn completes,S2E returns to appFn the concrete return value resulting from the concrete execution, but libFn will have been explored symbolically as well.
The execution of app is consistent, while at the same time S2E exposes to the analyzer plugins those paths inlib that are rooted at libFn’s entry point
S2E is based on the key observation that often only some families of paths are of interest
This superposition of paths into a symbolic execution tree, in which each possible execution corresponds to a path from the root of the tree to a leaf corresponding to a terminal state
Program is treated as a superposition of possible execution paths. if (x>0) then else can be viewed as a superposition of two possible paths: one for x>0 and another one for x≤0
Automatically minimize amount of code that has to be executed symbolically given a target analysis
Ability to scale large real system, e.g. full Windows stack
System Analysis with S2E
Selection interface
Used to guide the exploration of execution paths
First step in using S2E is deciding on a policy for which part of a program to execute in multi-path (symbolic) mode vs. single-path (concrete) mode
Data-based selection
Provides a way to expand an execution path into a multi-path execution by introducing symbolic values into the system
Any time S2E encounters a branch predicate involving a symbolic value, it will fork the execution accordingly.
Code-based selection
Enables/disables multi-path execution depending on whether the program counter is or not within a target code area
Priority-based selection
Used to define the order in which paths are explored within the family of paths defined with data-based and code-based selectors
Analysis interface
Used to collect events or check properties of execution paths
Once the selectors define a family of paths,S2E executes these paths and exposes each one of them to the analyzer plugins
Analyzer class
Bug finder
DataRaceDetector
MemoryChecker
Look for the corresponding bug conditions and output an execution path leading to7the encountered bug
ExecutionTracer
Selectively records the instructions executed along a path ,along with the memory accesses, register values, and hardware I/O
PerformanceProfile
Counts cache misses,TLB misses, and page faults incurred along each path
Used to obtain the performance envelope of an application,and we describe it in more detail in the evaluation section
Analysis would like to do what-if analysis of system characteristic
e.g. determining whether aligning a certain data structure on a page boundary will avoid allcache misses and thus increase performance
Work on multi-path analysis
Model checking
Symbolic execution
What is?
Proprietary software
Computer programs that are exclusive property of their developers or publishers, and cannot be copied or distributed without complying with their licensing agreements
in-vivo analysis
Experimenting using a whole live system.
Captures all interactions of the analyzed code with its surrounding system, not just with a simplified abstraction of that system
Authors
Chipounov V
Kuznetsov V
Candea G
Goals
Introduce S2E
Contribution
Introduces selective symbolic execution
Introduces execution consistency models
General platform for performing diverse in-vivo multi-path
analyses in a way that scales to large real systems