Please enable JavaScript.
Coggle requires JavaScript to display documents.
Processing Big Data for Analytics - Coggle Diagram
Processing Big Data for Analytics
Feature Engineering
Feature Extraction
PCA
(3 p46-57)
LDA
t-SNE
Feature Selection
(3 p15-16)
Supervised
(3 p18)
(3 p41)
Filter
Univariate Feature Selection
(3 p20-28)
Looks at each feature separately and determines whether there is a significant relationship between that feature and the target
Does not take into account the interaction between features
Statistical Measures
(3 p25)
F-Test
Mutual Information
Wrapper
(3 p30-31)
Backward elimination
(3 p33)
Exhaustive feature selection
(3 p34)
Recursive feature elimination
(3 p35)
Recursive feature elimination with cross validation
(3 p36-37)
Forward selection
(3 p32)
Embedded
(3 p40)
Unsupervised
Variance
Threshold
(3 p17)
Dimension Reduction
(3 p7-12)
Curse of Dimensionality
Improved Model Performance
Interpretability and Visualization
Computational Efficiency
Noise and Redundancy Reduction
Overfitting Prevention
Synthetic data generation
(3 p59-63)
Missing data imputation
(3 p68-72)
Categorical data encoding
(3 p74-78)
Feature Transformation
(3 p79-82)
Data / AI Projects
Reason for Failure
Entanglement
(4 p9)
Undeclared Consumers
(4 p10)
Unstable Data Dependencies
(4 p11)
Glue Code
(4 p12)
Pipeline Jungle
(4 p13)
Culture
(4 p15)
Knowledge Gap
(4 p7)
Frameworks
CRISP-ML (Q)
(4 p24-38)
Microsoft TDSP
(4 p41-49)
Tools for project management
Value Proposition Canvas
(4 p51-54)
Business Model Canvas
(4 p55-57)
AI Canvas
(4 p58-62)
Machine Learning Canvas
(4 p63-64)
Workflow design
(4 p71-82)
Agile
DevOps
Kanban
Data Pipelines
(4 p84-92)
Machine Learning Lifecycle
Model deployment
(5 p12)
Types
(5 p23)
Model Embedded in Application
(5 p19)
Served via a dedicated service
(5 p20-21)
Batch prediction (offline process)
(5 p22)
Cloud based deployment
Python web Frameworks
(5 p24-30)
Full-stack
Micro
Big Data driven Model Deployment
(5 p32-36)
Challenges
(5 p13-18)
Model Monitoring
(5 p38)
Reasons for Model Decay
(5 p39)
Data drift
(5 p40)
Concept drift
(5 p41-42)
Sudden drift
Gradual drift
Incremental drift
Recurring concept
Detecting drifts
(5 p43-51)
MLOps
(6 p5-15)