Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistics Course - Coggle Diagram
Statistics Course
Week 1
Course Introduction
Summarizing Data
Frequency Table
Histograms
Types of Variables
Pie Charts
Scatter Plots
Week 2
Measures of Central Tendencies
Central tendancy
Mean :check:
Median :!?:
Measure of Dispersion
Range :red_cross:
Variance
Standard deviation
Percentiles and Z-Score
Z-score = (Variable of interest - Mean)/ Standard deviation
Emprical Rule
68%-95%-99.7%
NORM.DIST
NORM.S.DIST
NORM.S.INV
Random Variables
Discrete
Continuous
Probability Distribution
Expected value (mean)
Normal Distribution
Standard Normal Curve
Less Than
Greater Than
Between two values
Finding Z for a given Probability
NORM.INV
NORM.S.INV
Standard Normal Table
Week 6
Testing for No Difference in Means
Test for a Specific Value in Means
Comparing two suppliers
Pair Tests
Product testing (before & After)
Test for Proportion
Comparing two proportions
Hypothesis Test for Two Means
Comparing two brands
Confidence Interval for the mean differences
One Sample
Two Samples
We use in excel t-test: Two Sample Assuming Unequal Variances
One Tail test vs Two Tail Test
Week 5
Inferential & Predictive Statistics
Steps in scientific method
Formuiating the Hypothesis
Null Hypothesis
Alternate Hypothesis
One-tail vs two-tail tests
Types of Errors
Type I
Reject the Null Hypothesis incorrectly
Type II
Retain the Null Hypothesis incorrectly
Test for Mean
SE= s/SQRT(n)
t-statistics = x - µ / SE
p-value = t.dist(x,degree of freedom,1)
For two-tail test, we multiply p-value by 2
For right tail test, p-value = 1- t.dist(x,degree of freedom,1)
Test for Proportion
SE= SQRT(Po*(1-Po)/n)
P-value = NORM.S.DIST(Z,1)
Z= (P-Po)/SE
Week 3
Producing Data
Response variable vs Independent variable
Experimental Study vs Observational Study
Cross-sectional Data vs Time-series Data
Sampling methods
Non-Probability Methods
Volunteer Sampling
Convenience Sampling
Probability Methods
Simple Random Sampling
Stratified Sampling
Cluster Sampling
Sample Size
Central Limit Theorm (CLT) and Means
Sampling Distribution and Emprical Rule
Distribiution of the Sample Proportion
Week 4
Confidence Intreval Basics
Confidence interval for Mean
Sample Mean +/- Margin of error
Sample Mean +/- Margin of Error
Equation for Confidence interval
Sigma is know
x+/-Z*(sigma/Sqrt(n))
Sigma is unknown
x+/- t * (s/Sqrt(n))
x=average or mean
s=standard deviation
Margin of Error (ME)
Standard Error (SE) = S/SQRT(n)
Margin of Error = SE * Z-score
Comparing z-score & t-score
=NORM.S.INV(probability)
=T.INV(probability,degree of freedom)
Z-score
95% z-socre = 1.96
90% z-score = 1.645
99% z-score = 2.576
We use z-score with proportions
98% z-score = 2.33
Confidencce intreval for population
Sample proportion +/- Margin of error
ME = Z * SE
Confidence intreval for a population proportion = P-hat * ME
SE = SQRT (p-hat*(1-p-hat)/n)
p-hat: Sample proportion
Sample size
Sample size - Mean
n = ( z* sigma/E)^2
E: Acceptable margin of error
Sample size - proportion
n = p
( 1-p)
(z / E) ^2
E: desired margin of error
If p is not provided, we use 0.5
Factors determine ME
Sample size
Level of confidence
Standard deviation of sample
Week 7
Regression Anlaysis
Simple Linear Regression
Dependent/Response Variable
Independent/Explanatory/Predictor Variable
Scatter Plot
No Relation
Strong Positive/Negative Correlation
Strong Positive/Negative Correlation
Least Square Method
Regression Equation
Y=B0+B1X
Line of Best Fit/Least Square Line
Least Square Point Estimate
Assessing the Regression Model
Sources of Variations
Explained Variations (Regression)
Unexplained Variations (Error)
Calculating R^2
R^2=SSR/SST
Simple Correlation Coefficient
Multiple R --> Strength of relationship between observed & predicted values
Muiple R = 0.85 Strong Positive Correlation
Multiple R = 0.30 Weak Correlation
R Square --> How well the independent variable(s) explain the variability
Close to 1 --> The model explains very little of variability
Close to 0 --> The model explains very little of the variability
P-value: Whether the independent variables have a statistically significant relationship with the independent variables
Small P-value (<=0.05) --> The independent variable has a significant effect on the dependent variable.
Large P-value (>0.05) --> The independent variable is
not
significant in predicting the dependent variable.
Model Assumptions and limitations
Week 8
Multiple Regression Model
Regression variables
Making a prediction
Most Impacful Variables
Qualitative Data
Dummy Variables
Modeling the Regression
Avoiding Collinearity
Many Categories