Please enable JavaScript.
Coggle requires JavaScript to display documents.
Preprocessing - Coggle Diagram
Preprocessing
Feature engineering
-
Agregating number
May be mean of multiple value when we want jus the a single value, Aggregating by time like week, month
Binning or discretization, binarization
Text features
tf_idf , bag of wrods count
Remove redundant: After tranformation remvove the tranformed columns, Id and nan columnbs
-
-
Using PCA to reduce correlated feature, check the component like which components contribute the most ( its a black box linear transformations) therefore do it at the end of feature engineering
-
-
-
Standardization
Transform continuous variable into a normal distribution, It helps linear models or distance measurement models. Will reduce high variance can solve for slow convergence problem
Log normalization - works well for hig variability and makes the distribution approximately normal , making al values positive and make relative change intact
-
-
-
Why feature selection: reduce noise (remove unwanted variables), remove correlated variable that can affect our assumption, reduce overall variance with dimensionality reduction