Please enable JavaScript.

Coggle requires JavaScript to display documents.

2010 - Identifying Dormant Functionality in Malware Programs (Information,…

- - - - All automated model / signature generation searches for bytes, instructions, subgraph that appears in malware frequently
    - - Model are required to be functionally aware, i.e. equipped with semantic information that indicates a malicious functionalities
      - e.g., the fact that a malware sends spam, monitors keystrokes, or starts a web server to provide backdoor access to a compromised host
    - - Generating models for malware behaviors
        
        Dynamic Behavior Identification
        
        Malware binary is executed in dynamic analysis environment
        
        Anubis records invocation of security-relevant system calls and Windows API function
        
        Taint analysis
        
        Used to track data flow dependencies between system and function call arguments
        
        Based on the recording. a set of specification is used to identify different types of phenotypes, i.e. interesting security relevant behavior that a malware exhibit during dynamic analysis
        
        Use rules that describe a malware phenotype in terms of the required system or API calls, their arguments, and the data flows between these arguments
        
        Behavioral specifications for different phenotypes is written manually
        
        Extracting Genotype Models
        
        Filtering
        
        Techniques
        
        Finding exclusive instructions
        
        White-listing
        
        The goal of this filtering step is to identify instructions that are not directly responsible fora malicious behavior
        
        It is likely that a program slice contains code that is not directly related to the malicious behavior that was observed
        
        Slicing
        
        Identify all instructions that contribute to the input parameters of these system calls, as well as instructions that operate on their output parameters
        
        Once this code is located, we can extract its CFG and generate the corresponding fingerprints. These fingerprints then serve asthe genotype model for detecting dormant behaviors in other binaries
        
        Genotype Models
        
        In other words, a genotype model is not the colored CFG itself, but a set of fingerprints that represent it. To search a binary for the presence of a particular genotype, Only the fingerprints are used.
        
        An algorithm generates a subset of all possible k-node subgraphsof G and normalizes them. Each normalized k-nodesubgraph then serves as a succinct fingerprint of the coderegion that is modeled
        
        Given a genotype, modeled as a colored CFG G, the problem of finding this genotype in a malware binary is reduced to finding an isomorphic subgraph of size k that is present both in G and in the binary under analysis
        
        Genotype are considered similiar when their respective CFG share at least one isomorphic subgraph that is sufficiently large
        
        Colored control flow
        graph
        
        Nodes of the CFG we use are colored based on the classes of instructions that are present in the corresponding basic blocks, e.g. arithmetic, logic, data transfer
        
        Edge is a possible control flow ( e.g. jump or branch)
        
        Node is basic block
        
        Need to be able to characterize binary code
        
        Since the result of slicing step is neither precise or complete, the result is filtered for parts not related to the behavior and germination step that extends the slice to include parts of the code that is missed by slicing
        
        Starts by identifying all instructions that contribute to the input parameters of the system calls previously discovered using program slicing step
        
        Once genotype is located, a model for it can be builld
        
        Goal is to locate genotype, i.e. part of the binary directly responsible for certain phenotype previously discovered
        
        Germination
        
        A slice might be incomplete. In particular, a slice might fail to include instructions that are part of a behavior,simply because these instructions do not directly operate on tainted data or because they are not part of define-use chains
        
        Consider an instruction as part of the code that implements a behavior when this instruction cannot be executed without executing at least one instruction that is part of the program slice
      - Finding Dormant Functionalities
        
        Statistically disassemble an unpacked sample and check binaries for dormant functionality using previously created models
        
        When code region is found that matches one of the model, we report this sample contains a dormant functionality that implements behavior associated with matching genotype
        
        Packed / obfuscated code, the system use need to be unpacked
    - - Anubis
        
        Sandbox dynamic analysis tool build on top of QEMU