Chi-Squared Tests

Chi-Squared Goodness of Fit Test – used to determine if data fits a given distribution. One set of categorical variables.

bar chart

Assumptions – Random, independent, categorical data (based on counts), and all expected counts are at least

H0: provided expected counts are correct HA: provided expected counts are incorrect

Test statistic: the larger values indicate observed not close to expected

DF= Categories -1

Decision Rule – If Calculated Test Statistic is past the critical value, you reject Ho.

x2 calc ≥ x2 crit or p < a
reject Ho
Model is not a good fit.

x2 calc < x2 crit or p > a . FTR Ho model is a good fit

The x2 distribution is skewed to the right. But, as df increases, it becomes less skewed. You would look at the greater than region to determine the p-value from a test statistic, or obtain the critical value from degrees of freedom and significance level.

Standardized Residuals – (observed-expected)/ Expected
• Positive means more in the category than expected.
• Negative means less in the category than expected.

Chi-Squared Test for Independence – used to determine if two categorical variables are independent.

Chi Squared Test for Homogeneity – tests a single categorical variable over two populations to determine if
they have the same distribution or not. The procedure is the same as the test for independence.

H0: The row variable is independent of the column variable (no association / no relationship) HA: The variables are NOT independent (there is an association / relationship)

H0: The two populations have the same distribution.
HA: The two populations do not have the same distribution.

Graph – Mosaic plot
Assumptions – Random, independent, categorical data (based on counts), and all expected counts are at least 5.

test statistic= sigma ((observed - expected) ^2) /Expected
where expected = (row total * column total)/grand total

x2 calc < x2 crit or p > a
FTR Ho
There is sufficient evidence to believe variables are associate/dependent

x2 calc ≥ x2 crit or p < a
reject Ho
There is sufficient evidence to believe variables are associate/dependent

df = (rows – 1) (columns – 1)