Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch. 9: Introduction to the t statistic, Ch. 14: Correlation &…
Ch. 9: Introduction to the t statistic
Estimating Standard Error
Definition: The standard deviation of the sampling distribution of a statistic
Formula for SE
Applications of Standard Error
Confidence intervals
Hypothesis testing
z-scores
t-scores
Interpreting Standard Error
Small SE: Sample mean is a good estimate of population mean
Large SE: Sample mean is a less reliable estimate
Degrees of Freedom
Importance of Degrees of Freedom
Used in various statistical tests
Affects the shape of the sampling distribution
Essential for accurate estimates in hypothesis testing
Calculating Degrees of Freedom
For a single sample: df = n - 1
n: Sample size
For a t-test comparing two means: df = n1 + n2 - 2
n1: Sample size of group 1
n2: Sample size of group 2
N: Total number of observations
k: Number of groups
Applications of Degrees of Freedom
t-tests
ANOVA
Chi-square tests
Higher df: More precise estimates
Lower df: Less precise estimates
Confidence Interval
A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter.
Point Estimate
Margin of Error
Formula: CI = Point Estimate ± Margin of Error
For population mean: CI = X̄ ± (Z* × (σ/√n))
X̄: Sample mean
Z: Z-value (from standard normal distribution)
σ: Population standard deviation
n: Sample size
For sample mean when population standard deviation is unknown: CI = X̄ ± (t* × (s/√n))
t*: t-value (from t-distribution)
s: Sample standard deviation
Confidence Level (e.g., 95%, 99%)
If the confidence level is 95%, it means we are 95% confident that the true population parameter lies within the interval.
t-distribution
A type of probability distribution that is symmetric and bell-shaped, but has heavier tails than the normal distribution. It is used when the sample size is small and/or when the population standard deviation is unknown.
Characteristics
Symmetric and bell-shaped
Heavier tails than the normal distribution
Depends on degrees of freedom (df)
As df increases, t-distribution approaches the normal distribution
t-tests
One-sample t-test
Independent two-sample t-test
Paired sample t-test
Critical values depend on df and the desired level of significance (α)
Used to determine the probability of observing a t-statistic at least as extreme as the one calculated
Ch. 14: Correlation & Regression
Correlation
is the statistical technique that is used to measure and describe the relationship between two variables
Types of Correlation
Positive Correlation: Both variables increase or decrease together.
Negative Correlation: One variable increases while the other decreases.
Zero Correlation: No linear relationship between the variables.
Correlation Coefficient
A numerical value that quantifies the degree of correlation between two variables.
Represented by 'r'.
Pearson Correlation
Definition:
measures the degree and the direction of the linear relationship between two variables
Range
Values range from -1 to 1
-1: Perfect negative linear relationship
0: No linear relationship
1: Perfect positive linear relationship
Positive value: Direct relationship
Negative value: Inverse relationship
How to Interpret:
Value close to 0: Weak linear relationship
How to Use:
Determining the strength and direction of a relationship between two continuous variables
Used in fields like psychology, finance, and other sciences
Examples:
Correlation between hours studied & exam scores
Correlation between height & weight
Assumptions
Linear relationship between variables
Continuous data
No significant outliers
Homoscedasticity (equal level of variance across the range of values)
Coefficient of Determination
the value r2
it measures the proportion of variability in one variable that can be determined from the relationship with the other variable
A correlation r = 0.80 (or (-0.80), means that r2= 0.64 (or 64%) of the variability in the Y scores can be predicted from the relationship with x
Regression Line
Regression
the statistical technique for finding the best-fitting straight line for a set of data
The straight line is the regression line - line of best fit
Regression equation of Y
Formula:
Y= a + bX
Y: Dependent variable
a: Intercept
b: Slope
X: Independent variable
When to Use:
Predicting outcomes
Predicting sales based on advertising spending
Analyzing relationships
Analyzing the effect of study time on test scores
Forecasting trends
Standard Error Estimate
Measures the accuracy of a sample mean as an estimate of the population mean
Indicates the precision of the sample mean
Used in confidence Intervals
Used hypothesis testing
Factors Affecting SE
Greater variability -> larger SE
Variability in the data (s or p)
Larger sample size -> smaller SE
Sample size (n)
For sample mean: SE = s / √n
s: Sample standard deviation
n: Sample size
For population proportion: SE = √[p(1 - p) / n]
n: Sample size
p: Population proportion
Definition:
Formula
For ANOVA: df = N - k