Please enable JavaScript.
Coggle requires JavaScript to display documents.
Hypothesis Testing, Chapter Summary, Example Analogy - Coggle Diagram
Hypothesis Testing
-
Steps
The first and most important of the two hypotheses is the null hypothesis. The null hypothesis states that the treatment has no effect. In general, the null hypothesis states that there is no change, no effect, no difference—nothing happened. H_0
The second hypothesis is the scientific, or alternative hypothesis. The alternative hypothesis states that there is a change, a difference, or a relationship for the general population. In the context of an experiment, H_1 predicts that the independent variable (treatment) does have an effect on the dependent variable.
The null hypothesis and the alternative hypothesis are mutually exclusive and exhaustive. They cannot both be true. The data will determine whether to reject or fail to reject the null hypothesis.
- Set the criteria for a decision
The data will either be consistent with the null hypothesis or tend to refute the null hypothesis. In particular, if there is a big discrepancy between the data and the hypothesis, we will conclude that the hypothesis is wrong.
To formalize the decision process, we use the null hypothesis to predict the kind of sample mean that ought to be obtained. Specifically, we determine exactly which sample means are consistent with the null hypothesis and which sample means are at odds with the null hypothesis.
Alpha Level
The alpha level, or the level of significance, is a probability value that is used to define the concept of “very unlikely” in a hypothesis test.
Critical Region
The critical region is composed of the extreme sample values that are very unlikely (as defined by the alpha level) to be obtained if the null hypothesis is true. The boundaries for the critical region are determined by the alpha level. If sample data fall in the critical region, the null hypothesis is rejected.
-
- Collect data and compute sample statistics
The data are collected after the researcher has stated the hypotheses and established the criteria for a decision. This sequence of events helps ensure that a researcher makes an honest, objective evaluation of the data and does not tamper with the decision criteria after the experimental outcome is known.
Next, the raw data from the sample are summarized with the appropriate statistics. Now it is possible for the researcher to compare the sample mean (from the data) with the null hypothesis. This is the heart of the hypothesis test: comparing the data with the hypothesis.
The comparison is accomplished by computing a z-score that describes exactly where the sample mean is located relative to the hypothesized population mean from H_0.
the top of the z-score formula measures how much difference there is between the data and the hypothesis. the bottom of the formula measures the amount of error one should expect between the sample mean and the population mean.
Two outcomes
The sample data are located in the critical region. By definition, a sample value in the critical region is very unlikely to occur if the null hypothesis is true. Therefore, we conclude that the sample is not consistent with H-0 and our decision is to reject the null hypothesis. Remember, the null hypothesis states that there is no treatment effect. By rejecting H-0 we are concluding there is evidence that the treatment had an effect.
The sample data are not in the critical region. In this case, the sample mean is reasonably close to the population mean specified in the null hypothesis (in the center of the distribution). Because the data do not provide strong evidence that the null hypothesis is wrong, our conclusion is to fail to reject the null hypothesis. This conclusion means that there is no evidence for a treatment effect.
Z-Score Statistic
The z-score statistic that is used in the hypothesis test is the first specific example of what is called a test statistic. The term test statistic simply indicates that the sample data are converted into a single, specific statistic that is used to test hypotheses
Formula as recipe
If you follow instructions and use all the right ingredients, the formula produces a z-score. In the hypothesis-testing situation, however, you do not have all the necessary ingredients. Specifically, you do not know the value for the population mean μ, which is one component or ingredient in the formula.
This situation is similar to trying to follow a cake recipe where one of the ingredients is not clearly listed. For example, the recipe may call for flour, but there is a grease stain on the page that makes it impossible to read how much flour. Faced with this situation, you might try the following steps:
- Make a hypothesis about the amount of flour. For example, hypothesize that the correct amount is 2 cups.
- To test your hypothesis, add the rest of the ingredients along with the hypothesized flour amount and bake the cake.
- If the cake turns out to be good, you can reasonably conclude that your hypothesis was correct. But if the cake is terrible, you conclude that your hypothesis was wrong.
In a hypothesis test with z-scores, we do essentially the same thing. We have a formula (recipe) for z-scores, but one ingredient is missing. Specifically, we do not know the value for the population mean, μ. Therefore, we try the following steps:
- Make a hypothesis about the value of μ. This is the null hypothesis.
- Plug the hypothesized value into the formula along with the other values (ingredients).
- If the formula produces a z-score near zero (which is where z-scores are supposed to be), we conclude that the hypothesis was correct. On the other hand, if the formula produces an extreme value (a very unlikely result), we conclude that the hypothesis was wrong.
Formula as ratio
In the context of a hypothesis test, the z-score formula has the following structure:
-
Measuring Effect Size
one concern with hypothesis testing is that a hypothesis test does not really evaluate the absolute size of a treatment effect. To correct this problem, it is recommended that whenever researchers report a statistically significant effect, they also provide a report of the effect size
A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used.
One of the simplest and most direct methods for measuring effect size is Cohen’s d. Cohen (1988) recommended that effect size can be standardized by measuring the mean difference in terms of the standard deviation.
For the z-score hypothesis test, the mean difference is determined by the difference between the population mean before treatment and the population mean after treatment. However, the population mean after treatment is unknown. Therefore, we must use the mean for the treated sample in its place.
Errors
Type I
occurs when a researcher rejects a null hypothesis that is actually true. In a typical research situation, a Type I error means the researcher concludes that there is evidence for a treatment effect when in fact the treatment has no effect.
The alpha level for a hypothesis test is the probability that the test will lead to a Type I error. That is, the alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true.
the probability of a Type I error is equal to the alpha level. By selecting a small alpha level, the researcher can minimize the probability of a Type I error.
Type II
occurs when a researcher fails to reject a null hypothesis that is in fact false. In a typical research situation, a Type II error means that the hypothesis test has failed to detect a real treatment effect.
The consequences of a Type II error are usually not as serious as those of a Type I error. In general terms, a Type II error means that the research data do not show the results that the researcher had hoped to obtain. The researcher can accept this outcome and conclude that there is no evidence of a treatment effect, or the researcher can repeat the experiment (usually with some improvement, such as a larger sample) and try to demonstrate that the treatment really does work.
the probability of a Type II error is represented by the symbol β, the Greek letter beta.
-
Statistical Power
The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. That is, power is the probability that the test will identify a treatment effect if one really exists.
Whenever a treatment has an effect, there are only two possible outcomes for a hypothesis test:
- The first outcome is failing to reject H_0 when there is a real effect, which was defined earlier as a Type II error.
-
Calculating Power
Researchers typically calculate power as a means of determining whether an experiment is likely to be sensitive to detect a treatment effect when one exists.
-
-
-
Elements
Unknown Population
A population parameter is known or assumed before the study. The purpose of the study is to determine whether the treatment has an effect on the population mean.
-
To simplify the hypothesis-testing situation, one basic assumption is made about the effect of the treatment: If the treatment has any effect, it is simply to add a constant amount to (or subtract a constant amount from) each individual’s score.
Sample
The research study involves selecting a sample from the original population, administering the treatment to the sample, and then recording scores for the individuals in the treated sample.
-
-
-
Chapter Summary
- Hypothesis testing is structured as a four-step process
State the hypotheses, and select an alpha level. The null hypothesis states that there is no effect or no change. In this case, H_0 states that the mean for the population after treatment is the same as the mean before treatment. The alpha level, usually α = .05 or α = 0.1, provides a definition of the term very unlikely and determines the risk of a Type I error. Also state an alternative hypothesis (H_1) which is the exact opposite of the null hypothesis.
Locate the critical region. The critical region is defined as sample outcomes that would be very unlikely to occur if the null hypothesis is true. The alpha level defines “very unlikely.”
Collect the data and compute the test statistic. The sample mean is transformed into a z-score by the formula
The value of μ is obtained from the null hypothesis. The z-score test statistic identifies the location of the sample mean in the distribution of sample means. Expressed in words, the z-score formula is
-
Make a decision. If the obtained z-score is in the critical region, reject H_0 because it is very unlikely that these data would be obtained if were true. In this case, conclude that the treatment has changed the population mean. If the z-score is not in the critical region, fail to reject H_0 because the data are not significantly different from the null hypothesis. In this case, the data do not provide sufficient evidence to indicate that the treatment has had an effect.
- As the size of the treatment effect increases, statistical power increases. Also, power is influenced by several factors that can be controlled by the experimenter:
-
-
-
- Whatever decision is reached in a hypothesis test, there is always a risk of making the incorrect decision. There are two types of errors that can be committed:
A Type I error is defined as rejecting a true H_0. This is a serious error because it results in falsely reporting a treatment effect. The risk of a Type I error is determined by the alpha level and therefore is under the experimenter’s control.
A Type II error is defined as the failure to reject a false H_0. In this case, the experiment fails to detect an effect that actually occurred. The probability of a Type II error cannot be specified as a single value and depends in part on the size of the treatment effect. It is identified by the symbol β (beta).
- In addition to using a hypothesis test to evaluate the significance of a treatment effect, it is recommended that you also measure and report the effect size. One measure of effect size is Cohen’s d, which is a standardized measure of the mean difference. Cohen’s d is computed as
-
- Hypothesis testing is an inferential procedure that uses the data from a sample to draw a general conclusion about a population. The procedure begins with a hypothesis about an unknown population. Then a sample is selected, and the sample data provide evidence that either supports or refutes the hypothesis.
- In this chapter, we introduced hypothesis testing using the simple situation in which a sample mean is used to test a hypothesis about an unknown population mean. The goal for the test is to determine whether a treatment has an effect on the population mean.
- When a researcher expects that a treatment will change scores in a particular direction (increase or decrease), it is possible to do a directional, or one-tailed, test. The first step in this procedure is to incorporate the directional prediction into the hypotheses. To locate the critical region, you must determine what kind of data would refute the null hypothesis by demonstrating that the treatment worked as predicted. These outcomes will be located entirely in one tail of the distribution.
- The size of the sample influences the outcome of the hypothesis test, but has little or no effect on measures of effect size. As sample size increases, the likelihood of rejecting the null hypothesis also increases. The variability of the scores influences both the outcome of the hypothesis test and measures of effect size. Increased variability reduces the likelihood of rejecting the null hypothesis and reduces measures of effect size.
- The power of a hypothesis test is defined as the probability that the test will correctly reject the null hypothesis.
- To determine the power for a hypothesis test, you must first identify the boundaries for the critical region. Then, you must specify the magnitude of the treatment effect, the size of the sample, and the alpha level. With these assumptions, the power of the hypothesis test is the probability of obtaining a sample mean in the critical region.
Example Analogy
The test begins with a null hypothesis stating that there is no treatment effect. The trial begins with a null hypothesis that the accused did not commit the crime (that is, innocent until proven guilty).
The research study gathers evidence to test whether the treatment actually does have an effect, and the police gather evidence to test whether the accused really committed a crime.
If there is enough evidence, the researcher rejects the null hypothesis and concludes that there is evidence for a treatment effect. If there is enough evidence, the jury rejects the hypothesis and concludes that the defendant is guilty of a crime.
If there is not enough evidence, the researcher fails to reject the null hypothesis. Note that the researcher does not conclude that there is no treatment effect, simply that there is not enough evidence to conclude that there is an effect. Similarly, if there is not enough evidence, the jury fails to find the defendant guilty. Note that the jury does not conclude that the defendant is innocent, simply that there is not enough evidence for a guilty verdict