Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 8. Hypothesis testing - Coggle Diagram
Chapter 8. Hypothesis testing
Hypothesis testing. A statistical method that uses sample data to evaluate a hypothesis about a population
Step 1. State the hypothesis about a population. Ex. Students who take SAT prep courses score 30 points higher than th
Step 2. We expect the mean scores of students taking SAT prep courses to be 30 points higher than the mean scores of students not taking the prep courses.
Step 3. We obtain a random sample from the population of students, say 200. Calculate the z-score of the sample mean.
Step 4. We compare the random sample data with our hypothesis. If the z-score falls outside of the alpha level of .05 or .01, then the test prep has an effect. So, if the sample mean is consistent with the expected population mean, then the hypothesis is reasonable. If the sample mean is not consistent, then the hypothesis likely wrong.
Hypothesis testing assumes random sampling. The sampling must be representative of the sample from which it is drawn.
Hypothesis testing assumes independent observations. It is important to use a random samples of unrelated individuals for a study. Sampling without replacement is one example where this principle is violated.
Hypothesis testing assumes the standard deviation of the population is unchanged by a treatment.
Hypothesis testing when using z-scores assumes a normal distribution.
The null hypothesis. A treatment or intervention has no effect.
A statistically significant result is one where the null hypothesis is rejected. Ex. A sample study with Z = 2.4 would likely be considered statistically significant, whereas a study with Z = .65 would not.
The scientific or alternative hypothesis. A treatment has an effect on the dependent variable. The alternative and null hypothesis must be mutually exclusive.
What impacts Z-scores?
Variability. Higher variability reduces the chances of finding a treatment.
Number of scores in a sample. The higher the number of scores, the lower the standard error, which increases the Z-score.
Alpha level. Also called the level of significance. Probability used to define "very unlikely" in a hypothesis test.
Critical region. Composed of extreme sample values that are very unlikely to be seen if the null hypothesis is true. If the sample data falls within the critical region, then the null hypothesis is rejected.
An alpha level of ⍺ = .01, representing 1%, means that in a distribution, the critical region appears in the .5% lower tail and .5% upper tail.
A typical alpha value of ⍺ = .05, representing 5%, which corresponds to a z-score of +/- 1.96.
Type 1 error. The sample data is misleading. It shows the treatment has an effect (null hypothesis is false) when in fact it does not (null hypothesis is t.
The alpha level represents the probability of a Type 1 error. So, when ⍺ = .05, there is a 5% probability of a Type 1 error.
By selecting a small alpha level, the likelihood of Type 1 error goes down.
By convention, researchers typically use .05 alpha level or 95% certainty, because as alpha decreases more resources are needed to conduct experiments.
Type 2 error. The sample data is also misleading. The treatment actually has an effect, but the sample data does not show an effect. This often happens when the effect of a treatment is small.
The probability of Type 2 error is represented by β.
A one-tailed test. Statistical hypothesis about an increase or decrease to a mean. Ex. SAT prep courses increase scores by 30 points.
In a one tailed test, where ⍺ = .05, the probability of .05 exists in one tail, so the critical region for a z-score is +/- 1.65.
A one-tailed test should be used when there is a strong expected directional effect. Ex SAT prep courses improving test scores.
Effect size. Provides a measurement of the magnitude of a treatment effect.
Cohen's d measures the effect in terms of standard deviation. Ex. A d of .2 is considered a small effect, whereas a d of .8 or greater is having a
Cohen's d is the mean difference / standard deviation of the population
Power of a statistical test is the probability that the test will reject the null hypothesis. This means researchers can conduct the power of a test before running an experiment.
A greater sample size increases the power of a test.
As the effect increases (ex. SAT scores increase 60 points vs. 30 points), the power increases
Reducing the alpha level, reduces the power of a test
Changing from a two-tailed to one-tailed test increases the power of a test