Please enable JavaScript.
Coggle requires JavaScript to display documents.
Test 5 - Coggle Diagram
Test 5
Testing for an association between two categorical variables
Two-way tables and multiple comparisons
if 0 is within CI --> Fail to reject H0
if 0 is not within CI --> reject H0
The result of multiple comparison tests for one table is not valid. We need one test for the whole table.
Chi-Square test for association
Association
H0: (Explanatory variable) is NOT associated with (Response variable)
Ha: (Explanatory variable) IS associated with (Response variable)
Expected counts
Chi-Square test statistic
Where observed and expected counts are farther apart, we see larger contributions
The larger x^2, the stronger is the evidence against H0
Chi-Square distributions
Degrees of freedom
Blue: df=5
Green: df=10
Red: df=15
Mean of distribution = df (=center)
About half of distribution is on the left and the other half is on the right
The higher df, the more symmetrical the distribution
Larger x^2-statistic means stronger evidence against H0
x^2 is always right tail
Ha is always two-sided
Conditions for using a Chi-square distribution
Each expected count (number that we calculated) must be at least 5 to use chi-square distribution
P-value estimate
1) Find x^2 on x-axis
2) Shade area to the right
3) Estimate p-value based on % on curve
4) Write conclusion (see underneath)
Two-part conclusions (always implied in ANY multi-comparison test)
1) Is there statistically significant evidence of a difference of proportions) (See p-value)
Ex.: We have weak evidence in support of an association between degree held and views about astrology.
2) If so, where is the difference observed? (Largest contribution to chi-square - compare expected and observed counts)
Ex.: The largest contributions to x^2 are from Junior College, scientific, in which we saw more than expected, and from Graduate, scientific, in which we saw fewer than expected
Statkey
1) X^2 test for association
2) Edit data but leave (blank)
3) Comma in between the data
4) X^2 is shown above table
5) Show details is needed for conclusion
6) See if conditions are met (all expected counts are >5)
7) In a new tab go to theoretical distribution x^2
8) Fill out df
9) Pick right-tail and enter x^2
10) Write down p-value
11) Write conclusion --> if there is no evidence of an association, we leave out part 2
Handedness and profession
H0: No association between explanatory and response variable
Ha: Some association
A closer look at testing
Type I and II errors
Type I error
We reject H0 even if it was true
What would an error of this type mean in context?
Find statistically significant evidence that (Ha is true), even though (H0 is true).
Type II error
We fail to reject H0 even if it was false
What would an error of this type mean in context?
Do not get statistically significant evidence of (Ha), even though (Ha is true).
Choosing the significance level
H0
Innocent (not guilty)
Do not reject H0
Type II error
Avoid Type II error by having a large α (significance level)
Ha
Guilty
Reject H0
Type I error
There is a α% chance of (type I error in context)
Avoid Type I error by having a small α (significance level)
P-Value
Smaller n
Larger standard error
more spread out distribution
Larger n
distribution is more concentrated at center
Smaller SE
Calculating p-value
1) Count the amount of all values that are more extreme than sample statistic
2) Divide through n
3) Double value if it's a two-sided test
If Ha is true, it is much easier to detect it with a large sample
Practical vs. statistical significance
Statistical significance
P-value shows us that we have very strong evidence of an association
We have very strong evidence that there is some effect but not how big the effect is
Practical significance
Very strong evidence of an association ≠ evidence of a very strong association (association could still be weak)
Even if the association is causal, the chance of developing schizophrenia is still quite low for those with and without cats.
An increase in the likelihood of developing schizophrenia may still outweigh the benefits of having a cat.
Analysis of Variance: ANOVA to compare means
Comparing several means
The idea of ANOVA
If values of means of all distributions can be found within the others, we only have weak to no evidence of difference in means
When values of means cannot be found in other distributions --> strong evidence of difference of means
For the same sample means, having less variability (spread) in each group gives more evidence for difference of means
How far apart are sample mens from one another
As this value gets lager we have more evidence that means are different
"average" variability in each sample
As the variability gets smaller, the f-statistic gets bigger
The ANOVA F Statistic
Degrees of freedom
Degrees of freedom for groups (numerator)
k - 1
Degrees of freedom for error (denominator)
n - k
k = number of groups
n = total pooled sample size for all groups
By adding together both degrees of freedom we should get n-1
Both DF needed to establish F-curve
F (DF groups, DF error)
==> Find F-statistic on F-curve
==> Establish p-value based on that
F - equation
Variability Between groups
Some kind of difference between all means
Variability WITHIN groups
Some kind of average of all standard deviations
Conclusion
Part 1
Is there statistically significant evidence of a difference in means? (P-value)
Ex.: We have overwhelming evidence that Ha is true (for small p-value)
Part 2
If so, where is the difference observed? (Sample statistic of comparative graph)
Ex.: The good weather message brings in higher tips on average. The bad weather message may decrease tips.
Example
1) State hypothesis
H0: μ1=μ2=μ3=μ4
Ha: μi ≠ μj for some pair i, j
2) Check conditions for using an F-distribution
Statkey: Descriptive Statistics --> One quantitative and one categorical variable --> upload file
Normal distribution
Extreme outliers are cause for concern, even with n=30 in each group
Equal variance
Is the largest standard deviation more than double the smallest? If so, equal variance condition is not met
3) Use Statkey to find F-statistic
Statkey: ANOVA for difference in means --> upload file --> Read F-statisitik
4) Calculate degrees of freedom
number of groups - 1
total pooled sample size of all groups - number of groups
5) Generate f curve
Statkey: Theoretical distributions --> F --> type in bot DF values
6) Estimate p-value
Find F-statistic on curve
Fill out space on right
Does it look like more than 10% / 5% / 2%...
7) State parts 1 and 2 of your conclusion
F distributions and degrees of freedom
Normality condition
If each sample has n>30, assume use of F-distribution is ok. If not, watch out for signs of clear skewness or extreme outliers
Equal variance
Check that the largest standard deviation is no more than twice as large as the smallest standard deviation
If either of this conditions is not met it might still be possible to do an ANOVA using a randomization distribution but not using an F-distribution
Confidence and prediction Intervals
Confidence Interval (CI)
Generalizing to a population
Confidence about the mean
Prediction Interval (PI)
Larger Margin of Error (Wider CI)
Confidence about an individual
Single Point
Teal: CI for mean
Red: PI for individual
Black: predicted value
Interpretation
CI
With 95% confidence, the mean number of Fecebook friends for all adults in the UK who are on FB with a grey matter density z-score of 1.0 is between 368 and 530 friends.
PI
With 95% confidence, an individual on FB in the UK with a grey matter density of 1.0
would be expected
to have between 103 and 794 friends.