Please enable JavaScript.

Coggle requires JavaScript to display documents.

2011 - S2E: A Platform for In-Vivo Multi-Path Analysis of Software Systems

- - - - Properties
        
        Scalability
        
        Going from single-path analysis to multi-path analysis turns a linear problem into an exponential one, because the number of paths through a program increases exponentially in the number of branches (path explosion problem)
        
        Able to directly analyze binaries, being able to analyze even proprietary software
        
        Dynamic Binary Instrumentation
        
        Directly interpret x86 machine code
        
        Perform analyses in-vivo within real software stack instead of using abstract models of these layers
        
        Virtualization
        
        Removes the need for the stubs or abstract models required by most state-of-the-art symbolic execution engines and model checkers
        
        e.g.
        
        drivers
        
        kernel
        
        libraries
        
        user programs
        
        Simultaneously analyze entire families of execution paths
        
        Predictive analyses often must reason about entire families of paths through the target system, not just one path
      - Execution consistency models
        
        Models
        
        Defining a model, we think in terms of which paths it includes vs. excludes
        
        Globally feasible
        
        Execution is additionally consistent with control flow restrictions imposed by data-related constraints in the environment
        
        Subset of the locally feasible paths
        
        Locally feasible
        
        Execution is consistent with both the system’s CFG and with the restrictions on control flow imposed by the data-related constraints within the unit
        
        Subset of the statically feasible paths
        
        Statically feasible
        
        There exists a path in the system’s inter-procedural control flow graph (CFG) corresponding to the execution in question
        
        Strict Consistency
        
        Strictly Consistent System-level Execution
        
        Implementation of SC-SE is limited by the path explosion problem: the number of globally feasible paths is roughly exponential in the size of the whole system
        
        Provided that the engine can solve all constraints it encounters, every path through the unit that is possible under a concrete execution of the system will be eventually found by SC-SE exploration
        
        Exploration engine gathers and uses information about all parts of the system, to explore new paths through the unit.
        
        Strictly Consistent Unit-level Execution
        
        To implement this model, the SSE converts all symbolic data to concrete values when the unit calls the environment
        
        The conversion is consistent with the current set of path constraints in the unit. No other conversion is performed. The environment is treated as a black box, and no symbolic data can flow into it
        
        Strictly Consistent Concrete Execution
        
        New paths can only be explored by blindly guessing new inputs
        
        The only explored paths are the paths that the system follows when executed with the sample input provided by the analysis
        
        The entire system is treated as a black box: no internal information is used to explore new paths
        
        Relaxed Consistency
        
        CFG Consistency
        
        Exploration is fast, because branch feasibility need not be checked with a constraint solver
        
        Implement RC-CC by pursuing all outcomes of every branch, regardless of path constraints, thus following all edges in the unit’s inter-procedural CFG
        
        Overapproximate Consistency
        
        SSE converts concrete values at unit/environment interface boundaries into unconstrained symbolic values that disregard interface contracts
        
        Local Consistency
        
        Brings completeness at the expense of substantial
        overapproximation
        
        SSE converts, at the unit/environment boundary, the concrete values generated by the environment into symbolic values that satisfy the constraints of the environment’s API
        
        Consistent model
        
        Every path through the unit admissible by the model, there exists a corresponding globally feasible path through the system (i.e., the system can run concretely in that way)
        
        Complete model
        
        Every path through the unit that corresponds to some globally feasible path through the system will eventually be discovered by exploration done under that model.
        
        Distinction
        
        Which execution paths each model admits
        
        The traditional assumption about system execution is that the state at any point in time is consistent, i.e., there exists a feasible path from the start state to the current state
        
        Making principled peformance/accuracy trade-offs in complex analyses
      - Selective Symbolic Execution
        
        Transisition
        
        Symbolic → Concrete Transition
        
        S2E employs lazy concretization
        
        This is an important optimization when doing in-vivo symbolic execution,because a lot of data can be carried through the layers of thesoftware stack without conversion
        
        Converts the value of x from symbolic to concrete on-demand, only when concretely running code actually reads the value of x
        
        Concrete → Symbolic Transition
        
        The concrete domain is unaware of libFn being executed in multi-path mode. All paths execute independently, and it is up to the S2Eanalyzer plugins to decide whether, besides observing the concrete path, they also wish to look at the symbolic paths
        
        Simplest conversion is to use an S2E selector to change the concrete arguments into symbolic ones, e.g., instead of libFn(10) call libFn(λ)
        
        Once this transition occurs, S2E executes libFn symbolically and simultaneously executes libFn with the original concrete argument(s) as well
        
        Once exploration of libFn completes,S2E returns to appFn the concrete return value resulting from the concrete execution, but libFn will have been explored symbolically as well.
        
        The execution of app is consistent, while at the same time S2E exposes to the analyzer plugins those paths inlib that are rooted at libFn’s entry point
        
        S2E is based on the key observation that often only some families of paths are of interest
        
        This superposition of paths into a symbolic execution tree, in which each possible execution corresponds to a path from the root of the tree to a leaf corresponding to a terminal state
        
        Program is treated as a superposition of possible execution paths. if (x>0) then else can be viewed as a superposition of two possible paths: one for x>0 and another one for x≤0
        
        Automatically minimize amount of code that has to be executed symbolically given a target analysis
      - Ability to scale large real system, e.g. full Windows stack
      - System Analysis with S2E
        
        Selection interface
        
        Used to guide the exploration of execution paths
        
        First step in using S2E is deciding on a policy for which part of a program to execute in multi-path (symbolic) mode vs. single-path (concrete) mode
        
        Data-based selection
        
        Provides a way to expand an execution path into a multi-path execution by introducing symbolic values into the system
        
        Any time S2E encounters a branch predicate involving a symbolic value, it will fork the execution accordingly.
        
        Code-based selection
        
        Enables/disables multi-path execution depending on whether the program counter is or not within a target code area
        
        Priority-based selection
        
        Used to define the order in which paths are explored within the family of paths defined with data-based and code-based selectors
        
        Analysis interface
        
        Used to collect events or check properties of execution paths
        
        Once the selectors define a family of paths,S2E executes these paths and exposes each one of them to the analyzer plugins
        
        Analyzer class
        
        Bug finder
        
        DataRaceDetector
        
        MemoryChecker
        
        Look for the corresponding bug conditions and output an execution path leading to7the encountered bug
        
        ExecutionTracer
        
        Selectively records the instructions executed along a path ,along with the memory accesses, register values, and hardware I/O
        
        PerformanceProfile
        
        Counts cache misses,TLB misses, and page faults incurred along each path
        
        Used to obtain the performance envelope of an application,and we describe it in more detail in the evaluation section
  - - - Computer programs that are exclusive property of their developers or publishers, and cannot be copied or distributed without complying with their licensing agreements
    - - Experimenting using a whole live system.
      - Captures all interactions of the analyzed code with its surrounding system, not just with a simplified abstraction of that system