Please enable JavaScript.

Coggle requires JavaScript to display documents.

2015 - Behavioral Analysis of Obfuscated Code (Goal (Scope (Malware would…

- - - - Examples
        
        Encoding
        
        Constant unfolding
        
        Identities
      - Obfuscation of data structures used by the program
    - - Reordering statements ,methods, loops
      - Hiding actual control flow behind irrelevant conditional statements
      - Example
        
        Ordering transformations
        
        Compiled code follows the principle of spatial locality of logically related basic blocks. Also, blocks that are usually executed near in time are placed adjacent in the code.
        
        Transformations that involve reordering and unconditional branches break these properties.
        
        Opaque predicates
        
        Opaque predicate is a special conditional expression whose value is known to the obfuscator, but is difficult for an adversary to deduce statically
        
        Used in combination with a conditional jump: the correct branch will lead to semantically relevant code, the other one to junk code
        
        Functions In/Out-lining
        
        Functions inlining is the process of including a subroutine into the code of its caller
        
        Function outlining means separating a function into smaller independent parts.
        
        Control indirection
        
        Using control flow constructs in an uncommon way is an effective way for making a control graph not very meaningful to an analyst.
      - Altering the flow of control within the code
    - - Control-flow flattening
        
        Applied at function-level
        
        Function is modifed such that, basically, every branching
        construct is replaced with a big switch statement (Different implementations use if-else constructs, calling of sub-functions, etc. but the underlying principle remains unaltered)
      - White-Box Cryptography
      - Virtual machines
        
        Implementation of a custom virtual machine
        
        An ad-hoc instruction set is defined and selected parts of the program are converted to opcodes for the VM. At runtime the newly created bytecode will be interpreted by the virtual machine, achieving a semantically equivalent program.
        
        An adversary needs to first reverse engineer the virtual machine implementation and understand the behavior of each opcode. Only after these operations it will be possible to decompile
        the bytecode to actual machine code.
  - - - Data-flow analysis
        
        Focus on
        
        Visualization of memory accesses
        
        Memory trace is displayed in an interactive chart, where the x-axis represents the instruction count while the y-axis the address space
        
        Allows the analyst to visually identify memory segments (data, heap,libraries and stack) and explore the trace for finding interesting patterns or accesses that leak confidential information
        
        Discovery of repeating or distinctive patterns in the
        data-flow
        
        Analysis of statistical properties of the data
        
        The use of statistical properties of the data-flow also to identify parts of the binary that deal with data with distinctive characteristics
        
        Entropy
        
        Expresses the average amount of information that is contained in a specific data stream
        
        Encrypted or compressed data has very high entropy
        
        Computing
        
        Groups the memory accesses in chunks of selectable length
        
        For each chunk the probability distribution of each possible byte value is computed
        
        Entropy level = -(Sum of P(xi) log (xi))
        
        P(xi) = Frequency of each byte in the observed data
        
        Distinguish encrypted from random data
        
        1 more item...
        
        Visualization of entropy and randomness of the data flow can reveal patterns that enable the identification of distinctive operations performed by the code
        
        Sequences of memory accesses are tightly coupled with the semantics of the program
        
        Most obfuscation methods are concerned of concealing the program logic by substituting instructions with equivalent (but more complex) one or altering the control flow
        
        Techniques
        
        Dynamic Taint Analysis
        
        Some tests the implementation offered by PANDA is requiring too much memory and thus the analysis can be unfeasible
        
        diff
        
        Computation of the difference between memory traces, recorded with different inputs
        
        Alternate approach if DTA by PANDA proves unfeasible
        
        Identifying which parts of the execution depend on our input can be helpful in order to isolate smaller parts of the code that will later be analyzed in detail.
        
        Auto-correlation of memory accesses
        
        It is used to identify repeating patterns in time series of power consumption
        
        Memory trace needs to be transformed in a time series, we consider locations that were accessed over time
        
        Distinctive patterns in the memory accesses remain unvaried and part of the data that flows to and from the memory is also unchanged
      - Control-flow analysis
        
        Visualizing an execution graph, loops or repetitions of basic blocks and by using graph analysis to counter obfuscations of the control-flow.
        
        After obfuscation transformations, the control flow is often heavily modified in order to make static analysis more difficult
        
        By recording concrete traces of the execution we intrinsically filter out all the dead/junk code and only focus on the parts of the program that were actually executed
        
        Analyzing cryptographic implementations one or very few traces are often enough as the control-ow of a cryptographic function should not depend on the input data
        
        Even though the original control-ow of the program is transformed, there are still some patterns in the execution trace that remain.
        
        Example
        
        Multiple executions of parts of the code
        
        Distinctive sequences of blocks that are run one after each other
        
        Technique
        
        Visualization
        
        To visualize the collected data for the analyst, a graph is built from the execution trace
        
        A subset of CFG, considering only executed parts of it
        
        Nodes are the basic blocks in the execution trace while edges represent a transition from one basic block to another
        
        Though this comes at the expense of not reaching complete coverage of the possible execution paths
        
        CFG
        
        Analysis of the execution graph for countering control-flow flattening
      - Recorded every memory access and every execution of
        basic blocks produced by target binary during one concrete execution.
    - - Recording
        
        Ad-hoc plugins
        
        To isolate the target process from others running on the same system we use the ASID (Address-Space Identifier) register, it determines the location of the page table associated to a process and it is univocally corresponding to it.
        
        One for execution traces
        
        One for recording memory traces
        
        Since we are interested in all memory accesses and instructions that are executed by one single process
        
        PANDA records at records at a system level, without having the notion of processes
        
        PANDA only record non deterministic output/input to optimize traces for size
      - Malware often have anti-debugging techniques that detect if it is running in a sandbox or debugging environment
      - PANDA offers instrumentation at emulator level able to evade such technique
      - Dynamic Taint Analysis
        
        Application
        
        Data-flow tainting