Inference for
Categorical & Quantitative
Data:
Chi-Square & Slopes
Chi-square Test
Goodness of Fit(1 way table)
Observed Count - The actual observed counts from a sample/study
Expected Count - The counts expected if the null hypothesis is true
One-way table - displays counts for categories of a single categorical variable
Chi-square Statistics:
df:
As df increases the chi-sqaure dist. becomes less right skewed
Conditions for contructing GOF test
Independence: Independent treatments/N>=n*10
Large Counts: All expected counts are >= 5
Randomness: Random sampling/assignemnt
Ho: The distribution of the variable is the SAME as the claim Ha: The distribution of the variable is DIFFERENT than the claim
p-value = X²cdf
Chi-square test for Homogeniety (1 categorical varibale)
Chi-square test for Independence (2 categorical vairbales)
Same Conditions: Randomness Independence, Large Counts
Same Conditions: Randomness Independence, Large Counts
Ho: The distribution of var. A is the same as var. B Ha: The distribution of var. A is the not the same as var. B
Ho: There is no association between var A & var. B Ha: There is an association between var A & var. B (They are dependent)
Expected Counts = (row total * colum total) / (table total)
2- way table
Expected Counts = (row total * colum total) / (table total)
2-way table
Confidence Interval for Slopes
Conditions
Normality - no strong skew or outliers on the resdual plot
Equal Variance - No fanning on the residual plot / same distribution around residual=0
Independene - N>=10n or treatments are indpt from each other
Randomness - random assignment/random selection
Linear - The relationship between x and y is linear (NO pattern in the residual plot)
Standard Deivation:
Margin of Error: (t*)(SEb) =
Confidence Interval: b±(t*)(SEb) =
Interpretations
If all conditions met perform a linear regression t-intreval for slope
Constant(α): when (Y) is 0, the predicted (X) is ___
Slope(β): for every 1 increase in (Y), their predicted (X) will increase by ___
SD(σ): On average, the predicted (X) typically vary by ___ from the actual (X)
Standard error of slope(SEb): the slope of the sample regression line typically vary by ___ from the true regression line for predicting (X) from (Y)
Conclsion for 4-step plan: We are (confidence level) confident that the interval to captures the slope of the true regression line for predicting (X) from (Y)
Mean:
df = n-2
Test for Slopes
Test Statistics: Hypothesized slope = 0 when checking for linear association
Ho
Ha
β = 0 (if sample suggests linear relationship)
β = hypothesized slope (if given)
β ≠ 0 (There is an linear relationship)
β > 0 (if there is a positve linear relationship)
β < 0 (if there is a negative linear relationship)
df = n-2
if testing for β ≠ 0 remember to times tcdf by 2 to get the p-value
Conditions
LINER
if all conditions met, perform a linear regression t-test for slope
if p-value>α, we fail to reject the Ho. Therefore we do not have convincing evidence that (Y) increases (X) increases
if p-value<α, we reject the Ho. Therefore we do have convincing evidence that (Y) increases (X) increases