Please enable JavaScript.
Coggle requires JavaScript to display documents.
Life Cycle of Data Science Project - Coggle Diagram
Life Cycle of
Data Science Project
Step 1
Feature
Engineering
:star:Exploratory data
analysis
EDA libraries for python
D tale
Pandas-profiling
Pandas-visual-analysis
Autoviz
Sweetviz
Shapash monitor
High level steps:
Check data frame
split for dependent and independent variable
replace nan/missing values etc.
split into train and test - Cross validations
check for Imbalanced dataset
:star:Handling missing
values
:star:Handling
Outliers
:star:Categorical
Encoding
:star:Normalization
and Standardization
Normalization
mean = 0 and
SD =1
Standardization
any value we choose
min max
this is deep learning tech as
pixcles are from 0 to 256
Euclidean distance
Gradient descent
standardization or
normalization
Decision Tree or
ensemble tech. Nothing
Input data have large diff.
:check:
Types of Transformation
:warning:Normalization and Standaridation
:warning:Scaling to minimum and max values
:warning:scaling to Median and Quantiles--
Robust scaler
...Good for Outliers
:warning:Guassian Transformation
Logarithmic transformation
Reciprocal Transformation
Square root Transformation
Exponential Transformation
Box Cox Transformation
Step 2
Feature
Selection
Correlation
Forward Elimination
Backward elimination
Univariant Seletion
Random forest Importance
Feature selection
with Decision trees
Step 3
Model Creation and
Hyper parameter tuning
GridSearch CV
Randomized
SearchCV
Keras Tuner
Bayesian Optimization
hyperopt
Genetics
Algothrims
Optuna
Cross validations for Train and test split
K fold cross validations
( average accuracy)
Stratified cross validations(for imbalanced data set)
Time series cross validations
Leave one out Cross validations
Step 4
Model Deployment
Step 5
Model Monitoring &
retraining
Data Gathering
Set of rules
to agree on
for the data gathering