Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATA SCIENCE (Machine Learning (DEEP LEARNING (Neural Networks (Loss…
DATA SCIENCE
Machine Learning
Supervised
Classification
-
-
Metrics
Precision, Recall, F1 and Fbeta
-
Regression
Metrics
- Mean Absolute Error
- Mean Squared Error
- R2 Score
- SSE (Sum of squared error)
Algos in SSE
- Ordinary Least Square(inbuilt in sklearn LinearRegression())
- GRADIENT DECENT
-
UnSupervised
-
-
Dimensionality Reduction
PCA.(Principle Component Analysis)
- Takes full dataset and reduce it to the parts that only hold most info
- Reduces Features while keeping the output same
- In general PCA is used to reduce the dimensionality of your data
-
Independent Component Analysis (ICA) - ICA assumes that features are mixtures of independent sources and therefore isolates these independent sources and therefore isolates these independent sources completely
-
-
Statistics
Descriptive Stats
Descriptive statistics is about describing our collected data.
- Type of Data (Quantitative(Conti/Discrete) and Categorical(Ordinal/Nominal)
ASPECTS OF QUANTITATIVE VARIABLE
- Centre
- Shape
- Spread
- Outliers
- Central Tendency(Mean,Median and Mode
)
- Spread
a . Find the 5 Number Spread policy (Range, IQR, Min, Max)
b . If you don't want to show the spread with 5 Number policy just find the STANDARD DEVIATION which will show the spread
- SHAPE
There can be 3 Types of Distribution.
A. Left Skiwed
B. Right Skewed
C. Symmetric (Normal Distribution)
- OUTLIERS
Below are my guidelines for working with any column (random variable) in your dataset.
1. Plot your data to identify if you have outliers.2. Handle outliers accordingly via the methods above.3. If no outliers and your data follow a normal distribution - use the mean and standard deviation to describe your dataset, and report that the data are normally distributed.
4. If you have skewed data or outliers, use the five number summary to summarize your data and report the outliers.
Inferential Statistics
Inferential Statistics is about using our collected data to draw conclusions to a larger population.We looked at specific examples that allowed us to identify the
- Population - our entire group of interest.
- Parameter - numeric summary about a population
- Sample - subset of the population
- Statistic numeric summary about a sample
-
-