Please enable JavaScript.
Coggle requires JavaScript to display documents.
Features engineering (What is (Feature ? (Features extraction ->…
Features engineering
What is
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.
It increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process
-
Feature ?
A feature is typically a specific representation on top of raw data, which is an individual, measurable attribute, typically depicted by a column in a dataset. Considering a generic two-dimensional dataset, each observation is depicted by a row and each feature by a column, which will have a specific value for an observation.
types
-
Derived features are usually obtained from feature engineering, where we extract features from existing data attributes.
-
Information Theory
entropy / joint, conditional entropy
-
-
How to do
-
continuous numeric data,
values
Rounding : when dealing with continuous numeric attributes like proportions or percentages, we may not need the raw values having a high amount of precision. Hence it often makes sense to round off these high precision percentages into numeric integers. These integers can then be directly used as raw values or even as categorical (discrete-class based) features
-
-
-
Interaction
Supervised machine learning models usually try to model the output responses (discrete classes or continuous values) as a function of the input feature variables.
Binning
The problem of working with raw, continuous numeric features is that often the distribution of values in these features will be skewed
-
-
strategies to deal with this, which include binning and transformations.
Types
-
Adaptive Binning
-
Quantile based binning
the resultant outcome of binning leads to discrete valued categorical features and you might need an additional step of feature engineering on the categorical data before using it in any model.
-
categorical data
def
Typically, any data attribute which is categorical in nature represents discrete values which belong to a specific finite set of categories or classes. These are also often known as classes or labels in the context of attributes or variables which are to be predicted by a model (popularly known as response variables). These discrete values can be text or numeric in nature
-
strat: transformation of these categorical values into numeric labels and then applying some encoding scheme on these values
ordinal data
In general, there is no generic module or function to map and transform these features into numeric representations based on order automatically. Hence we can use a custom encoding\mapping scheme.
-