Chi-Squared Tests
Chi-Squared Goodness of Fit Test – used to determine if data fits a given distribution. One set of categorical variables.
bar chart
Assumptions – Random, independent, categorical data (based on counts), and all expected counts are at least
H0: provided expected counts are correct HA: provided expected counts are incorrect
Test statistic: the larger values indicate observed not close to expected
DF= Categories -1
Decision Rule – If Calculated Test Statistic is past the critical value, you reject Ho.
x2 calc ≥ x2 crit or p < a
reject Ho
Model is not a good fit.
x2 calc < x2 crit or p > a . FTR Ho model is a good fit
The x2 distribution is skewed to the right. But, as df increases, it becomes less skewed. You would look at the greater than region to determine the p-value from a test statistic, or obtain the critical value from degrees of freedom and significance level.
Standardized Residuals – (observed-expected)/ Expected
• Positive means more in the category than expected.
• Negative means less in the category than expected.
Chi-Squared Test for Independence – used to determine if two categorical variables are independent.
Chi Squared Test for Homogeneity – tests a single categorical variable over two populations to determine if
they have the same distribution or not. The procedure is the same as the test for independence.
H0: The row variable is independent of the column variable (no association / no relationship) HA: The variables are NOT independent (there is an association / relationship)
H0: The two populations have the same distribution.
HA: The two populations do not have the same distribution.
Graph – Mosaic plot
Assumptions – Random, independent, categorical data (based on counts), and all expected counts are at least 5.
test statistic= sigma ((observed - expected) ^2) /Expected
where expected = (row total * column total)/grand total
.
x2 calc < x2 crit or p > a
FTR Ho
There is sufficient evidence to believe variables are associate/dependent
x2 calc ≥ x2 crit or p < a
reject Ho
There is sufficient evidence to believe variables are associate/dependent
df = (rows – 1) (columns – 1)