Ch. 9: Introduction to the t statistic

Estimating Standard Error

Definition: The standard deviation of the sampling distribution of a statistic

For sample mean: SE = s / √n

s: Sample standard deviation

n: Sample size

For population proportion: SE = √[p(1 - p) / n]

n: Sample size

p: Population proportion

Formula for SE

Factors Affecting SE

Greater variability -> larger SE

Variability in the data (s or p)

Larger sample size -> smaller SE

Sample size (n)

Applications of Standard Error

Confidence intervals

Hypothesis testing

z-scores

t-scores

Interpreting Standard Error

Small SE: Sample mean is a good estimate of population mean

Large SE: Sample mean is a less reliable estimate

Degrees of Freedom

Importance of Degrees of Freedom

Used in various statistical tests

Affects the shape of the sampling distribution

Essential for accurate estimates in hypothesis testing

Calculating Degrees of Freedom

For a single sample: df = n - 1

n: Sample size

For a t-test comparing two means: df = n1 + n2 - 2

n1: Sample size of group 1

n2: Sample size of group 2

N: Total number of observations
k: Number of groups

For ANOVA: df = N - k

Applications of Degrees of Freedom

t-tests

ANOVA

Chi-square tests

Higher df: More precise estimates
Lower df: Less precise estimates

Confidence Interval

A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter.

Point Estimate
Margin of Error

Formula: CI = Point Estimate ± Margin of Error

For population mean: CI = X̄ ± (Z* × (σ/√n))

X̄: Sample mean

Z: Z-value (from standard normal distribution)

σ: Population standard deviation

n: Sample size

For sample mean when population standard deviation is unknown: CI = X̄ ± (t* × (s/√n))

t*: t-value (from t-distribution)
s: Sample standard deviation

Confidence Level (e.g., 95%, 99%)
If the confidence level is 95%, it means we are 95% confident that the true population parameter lies within the interval.

t-distribution

A type of probability distribution that is symmetric and bell-shaped, but has heavier tails than the normal distribution. It is used when the sample size is small and/or when the population standard deviation is unknown.

Characteristics

Symmetric and bell-shaped

Heavier tails than the normal distribution

Depends on degrees of freedom (df)

As df increases, t-distribution approaches the normal distribution

t-tests

One-sample t-test

Independent two-sample t-test

Paired sample t-test

  • Critical values depend on df and the desired level of significance (α)
  • Used to determine the probability of observing a t-statistic at least as extreme as the one calculated

Ch. 14: Correlation & Regression

Correlation

Pearson Correlation

Coefficient of Determination

Regression Line

Regression equation of Y

Standard Error Estimate

is the statistical technique that is used to measure and describe the relationship between two variables

Definition:

the value r2

it measures the proportion of variability in one variable that can be determined from the relationship with the other variable

A correlation r = 0.80 (or (-0.80), means that r2= 0.64 (or 64%) of the variability in the Y scores can be predicted from the relationship with x

Regression

the statistical technique for finding the best-fitting straight line for a set of data

The straight line is the regression line - line of best fit

Measures the accuracy of a sample mean as an estimate of the population mean

Definition:

Indicates the precision of the sample mean

Used in confidence Intervals

Used hypothesis testing

Formula

click to edit

image

Formula:

Y= a + bX

Y: Dependent variable

a: Intercept

b: Slope

X: Independent variable

When to Use:

Predicting outcomes

Analyzing relationships

Forecasting trends

Predicting sales based on advertising spending

Analyzing the effect of study time on test scores

Range

Values range from -1 to 1

-1: Perfect negative linear relationship

0: No linear relationship

1: Perfect positive linear relationship

Positive value: Direct relationship

Negative value: Inverse relationship

How to Interpret:

Value close to 0: Weak linear relationship

measures the degree and the direction of the linear relationship between two variables

How to Use:

  • Determining the strength and direction of a relationship between two continuous variables
  • Used in fields like psychology, finance, and other sciences

Examples:

  • Correlation between hours studied & exam scores
  • Correlation between height & weight

Assumptions

Linear relationship between variables

Continuous data

No significant outliers

Homoscedasticity (equal level of variance across the range of values)

Types of Correlation

Positive Correlation: Both variables increase or decrease together.

Negative Correlation: One variable increases while the other decreases.

Zero Correlation: No linear relationship between the variables.

Correlation Coefficient

  • A numerical value that quantifies the degree of correlation between two variables.
  • Represented by 'r'.