Please enable JavaScript.

Coggle requires JavaScript to display documents.

Features engineering (What is (Feature ? (Features extraction ->…

- - - - raw features are obtained directly from the dataset with no extra data manipulation or engineering.
      - Derived features are usually obtained from feature engineering, where we extract features from existing data attributes.
    - - Images → colours, textures, contours, ...
      - Signals → frequency, phase, samples, spectrum, ...
      - Time series → ticks, trends, self-similarities, ...
      - Biomed → dna sequence, genes, ...
      - Text → words, POS tags, grammatical dependencies
    - - entropy / joint, conditional entropy
    - - Understand the properties of the task - how they might interact with the strength and limitations of the model
      - 2 Experimental work - test expectations and find out what actually
        works
- - - - Remove unnecessary features
      - Remove redundant features
      - Create new features
        
        Combine existing features
        
        Transform features
        
        Use features from the context
        
        Integrate external sources
      - Modify feature types
        
        e.g. from binary to numeric
        
        Modify feature values
  - - - Rounding : when dealing with continuous numeric attributes like proportions or percentages, we may not need the raw values having a high amount of precision. Hence it often makes sense to round off these high precision percentages into numeric integers. These integers can then be directly used as raw values or even as categorical (discrete-class based) features
      - raw measures
      - Binarization
      - counts
      - Interaction
        
        Supervised machine learning models usually try to model the output responses (discrete classes or continuous values) as a function of the input feature variables.
      - Binning
        
        The problem of working with raw, continuous numeric features is that often the distribution of values in these features will be skewed
        
        problem of the varying range of values in any of these features
        
        Directly using these features can cause a lot of issues and adversely affect the model
        
        strategies to deal with this, which include binning and transformations.
        
        Types
        
        Fixed-Width Binning
        
        Adaptive Binning
        
        a safer strategy in these scenarios where we let the data speak for itself
        
        Quantile based binning
        
        the resultant outcome of binning leads to discrete valued categorical features and you might need an additional step of feature engineering on the categorical data before using it in any model.
    - - Their main significance is that they help in stabilizing variance, adhering closely to the normal distribution and making the data independent of the mean based on its distribution :question:
        
        Log Transform
        
        Box Cox transformation
        
        The Box-Cox transform is another popular function belonging to the power transform family of functions.
        pre-requisite : the numeric values to be transformed must be positive (similar to what log transform expects).
        In case they are negative, shifting using a constant value helps. Mathematically, the Box-Cox transform function can be denoted as follows.
  - - - Typically, any data attribute which is categorical in nature represents discrete values which belong to a specific finite set of categories or classes. These are also often known as classes or labels in the context of attributes or variables which are to be predicted by a model (popularly known as response variables). These discrete values can be text or numeric in nature
    - - ordinal
        
        Ordinal categorical attributes have some sense or notion of order amongst its values
      - nominal
        
        no concept of ordering amongst the values of that attribute
    - - In general, there is no generic module or function to map and transform these features into numeric representations based on order automatically. Hence we can use a custom encoding\mapping scheme.