Please enable JavaScript.
Coggle requires JavaScript to display documents.
NORMALIZATION, TRANSFORMATION & PANDA PROFILING - Coggle Diagram
Normalization and Scaling
In this method, we convert variables with different scales of measurements into a single scale.
StandardScaler
- normalizes the data using the formula:
(x-mean)/standard deviation
- x is n stdv away of its average.
Positive is Above | Negative is Below
Essentially returns the z-scores of every attribute:
- from sklearn.preprocessing import StandardScaler
- std_scale = StandardScaler()
- dF['Colname_Stdscale'] = std_scale.fit_transform(df[['Colname']])
returns z-scores of the values of the attribute
MinMaxScaler normalizes the data using the formula:
(x - min)/(max - min)
- Maximun is 1 and minimun 0
returns value 0 to 1 of every attribute:
- from sklearn.preprocessing import MinMaxScaler
- minmax_scale = MinMaxScaler()
- df['Colname_MinMaxScale'] = minmax_scale.fit_transform(df[['Colname']])
-
Transformation
transform an attribute using a mathematical transformation.**
We might want to do that to change the distribution shape of the data
Exponential Transformation
tears open variability, when x are too clamp together
- from sklearn.preprocessing import FunctionTransformer
- exp_transformer = FunctionTransformer(np.exp)
- df['Colname_exptransform'] = exp_transformer.fit_transform(df[['Colname']])
- returns the exponential transform of the data
Log Transformation
Deal with skewness, reduce variability, vg. when x is growing multiplying or in % terms
- from sklearn.preprocessing import FunctionTransformer
- log_transformer = FunctionTransformer(np.log1p)
- df['Colname_logtransform'] = log_transformer.fit_transform(df[['Colname']])
- transform Logarithmically the attribute
Pandas Profiling
Generates profile reports from a pandas DataFrame.
For each column the following statistics - if relevant for the column type - are presented
-