Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chp. 12 - Intro to Analysis of Variance - Coggle Diagram
Chp. 12 - Intro to Analysis of Variance
An Overview of Analysis of Variance
Terminology
In analysis of variance, the variable (independent or quasi-independent) that designates the groups being compared is called a
factor
.
When a researcher uses a nonmanipulated variable to designate groups, the variable is called a quasi-independent variable
The individual groups or treatment conditions that are used to make up a factor are called the
levels
of the factor.
A study that combines two factors is called a
two-factor design or a factorial design
.
Single-factor designs:
studies that have only one independent variable (or only one quasi-independent variable)
Type I Errors & Multiple-Hypothesis Tests
Each time you do a hypothesis test, you select an alpha level that determines the risk of a Type I error
Often, a single experiment requires several hypothesis tests to evaluate all the mean differences. However, each test has a risk of a Type I error, and the more tests you do, the more risk there is.
The
testwise alpha level
is the risk of a Type I error, or alpha level, for an individual hypothesis test.
When an experiment involves several different hypothesis tests, the
experimentwise alpha level
is the total probability of a Type I error that is accumulated from all of the individual tests in the experiment. Typically, the experimentwise alpha level is substantially greater than the value of alpha used for any one of the individual tests.
The advantage of ANOVA is that it performs all three comparisons simultaneously in one hypothesis test. Thus, no matter how many different means are being compared, ANOVA uses one test with one alpha level to evaluate the mean differences and thereby avoids the problem of an inflated experimentwise alpha level.
The Test Statistic for ANOVA
For ANOVA, however, we want to compare differences among two or more sample means. With more than two samples, the concept of “difference between sample means” becomes difficult to define or measure.
The solution to this problem is to use variance to define and measure the size of the differences among the sample means.
Analysis of variance (ANOVA)
- a hypothesis-testing procedure that is used to evaluate mean differences between two or more treatments (or populations).
The Logic of Analysis of Variance
Between-Treatments Variance
Two possible explanations for these between-treatment differences:
The differences between treatments are not caused by any treatment effect but are simply the naturally occurring, random and unsystematic differences that exist between one sample and another. That is, the differences are the result of sampling error.
The differences between treatments have been caused by the
treatment effects
. For example, if treatments really do affect performance, then scores in one treatment should be systematically different from scores in another condition.
To demonstrate that there really is a treatment effect, we must establish that the differences between treatments are bigger than would be expected by sampling error alone.
To accomplish this goal, we determine how big the differences are when there is no systematic treatment effect; that is, we measure how much difference (or variance) can be explained by random and unsystematic factors. To measure these differences, we compute the variance within treatments.
Within-Treatments Variance
Inside each treatment condition, we have a set of individuals who all receive exactly the same treatment; that is, the researcher does not do anything that would cause these individuals to have different scores.
Why are the scores different? The answer is that there is no specific cause for the differences. Instead, the differences that exist within a treatment represent random and unsystematic differences that occur when there are no treatment effects causing the scores to be different.
The F-Ratio: The Test Statistic for ANOVA
Once we have analyzed the total variability into two basic components (between treatments and within treatments), we simply compare them. The comparison is made by computing an
F-ratio
.
When there are no systematic treatment effects, the differences between treatments (numerator) are entirely caused by random, unsystematic factors.
When the treatment does have an effect, causing systematic differences between samples, then the combination of systematic and random differences in the numerator should be larger than the random differences alone in the denominator.
It also is possible that there might be more variability both between and within treatment groups because the participants are unintentionally treated in a different way.
Another possible example of random and unsystematic variability is
error of measurement.
Because the denominator of the F-ratio measures only random and unsystematic variability, it is called the
error term
.
The error term provides a measure of the variance caused by
random and unsystematic differences.
Remember that calculating variance is simply a method for measuring how big the differences are for a set of numbers.
ANOVA Notation & Formulas
Analysis of Sum of Squares (SS)
The ANOVA requires that we first compute a total sum of squares and then partition this value into two components: between treatments and within treatments.
Total Sum of Squares, SS
total
. As the name implies, SS
total
is the sum of squares for the entire set of N scores.
Within-Treatments Sum of Squares, SS
within treatments
. Now we are looking at the variability inside each of the treatment conditions.
Between-treatments Sum of Squares, SS
between treatments
The Analysis of Degrees of Freedom (df)
The analysis of degrees of freedom (df) follows the same pattern as the analysis of SS. First, we find df for the total set of N scores, and then we partition this value into two components: degrees of freedom between treatments and degrees of freedom within treatments.
Each df value is associated with a specific SS value.
Normally, the value of df is obtained by counting the number of items that were used to calculate SS and then subtracting 1. For example, if you compute SS for a set of n scores, then df = n - 1 .
Calculation of Variances (MS) and the F-Ratio
Next step in the ANOVA procedure is to compute the variance between treatments and the variance within treatments, which are used to calculate the F-ratio
In ANOVA, it is customary to use the term
mean square
, or simply MS, in place of the term variance.
The letter k is used to identify the number of treatment conditions—that is, the number of levels of the factor.
The number of scores in each treatment is identified by a lowercase letter n.
The total number of scores in the entire study is specified by a capital letter N.
The sum of the scores for each treatment condition is identified by the capital letter T (for treatment total).
The sum of all the scores in the research study (the grand total) is identified by G.
Although there is no new notation involved, we also have computed SS and M for each sample, and we have calculated for the entire set of scores in the study.
Please note that there is no universally accepted notation for ANOVA. Although we are using Gs and Ts, for example, you may find that other sources use other symbols.
Examples of Hypothesis Testing & Effect Size with ANOVA
The Distribution of F-Ratios
If the null hypothesis is false, the F-ratio should be much greater than 1.00. The problem now is to define precisely which values are “around 1.00” and which are “much greater than 1.00.”
To answer this question, we need to look at all the possible F values that can be obtained when the null hypothesis is true—that is, the
distribution of F-ratios
.
F values always are positive numbers. Remember that variance is always positive.
The distribution of F-ratios should pile up around 1.00.
With very large df values, nearly all the F-ratios are clustered very near to 1.00. With the smaller df values, the F distribution is more spread out.
The F Distribution Table
An F-ratio that is much larger than 1.00 is an indication that is not true.
In the F distribution, we need to separate those values that are reasonably near 1.00 from the values that are significantly greater than 1.00.
To use the F distribution table, you must know the df values for the F-ratio (numerator and denominator), and you must know the alpha level for the hypothesis test.
Measuring Effect Size for ANOVA
A
significant mean
difference simply indicates that the difference observed in the sample data is very unlikely to have occurred just by chance.
Thus, the term significant does not necessarily mean large, it simply means larger than expected by chance.
To provide an indication of how large the effect actually is, it is recommended that researchers report a measure of effect size in addition to the measure of significance.
Unequal Sample Sizes
ANOVA procedure is most accurate when used to examine experimental data with equal sample sizes.
In situations where there is unequal number of participants, ANOVA still provides a valid test, especially when the samples are relatively large and when the discrepancy between sample sizes is not extreme.
Assumptions for the Independent-Measures ANOVA
The observations within each sample must be independent.
The populations from which the samples are selected must be normal.
The populations from which the samples are selected must have equal variances (homogeneity of variance).
Post Hoc Tests
The primary advantage of ANOVA (compared to t tests) is it allows researchers to test for significant mean differences when there are more than two treatment conditions.
When you obtain a significant F-ratio (reject , it simply indicates that somewhere among the entire set of mean differences there is at least one that is statistically significant. In other words, the overall F-ratio only tells you that a significant difference exists; it does not tell exactly which means are significantly different and which are not.
Post hoc tests (or posttests)
are additional hypothesis tests that are done after an ANOVA to determine exactly which mean differences are significant and which are not.
However, with three or more treatments k greater than equal to 3 , the problem is to determine exactly which means are significantly different.
Posttests & Type 1 Errors
A post hoc test enables you to go back through the data and compare the individual treatments two at a time. In statistical terms, this is called making
pairwise comparisons.
As you do more and more separate tests, the risk of a Type I error accumulates and is called the
experimentwise alpha level.
Tukey's Honestly Significant Difference (HSD) Test
Tukey’s HSD test
allows you to compute a single value that determines the minimum difference between treatment means that is necessary for significance.
This value, called the honestly significant
honestly significant difference
, or HSD, is then used to compare any two treatment conditions.
If the mean difference exceeds Tukey’s HSD, you conclude that there is a significant difference between the treatments. Otherwise, you cannot conclude that the treatments are significantly different.
The Scheffe Test
The safest of all possible post hoc tests (smallest risk of a Type I error)
Although you are comparing only two treatments, the Scheffé test uses the value of k from the original experiment to compute df between treatments. Thus, df for the numerator of the F-ratio is k - 1 .
The critical value for the Scheffé F-ratio is the same as was used to evaluate the F-ratio from the overall ANOVA. Thus, Scheffé requires that every posttest satisfy the same criterion that was used for the complete ANOVA.
More about ANOVA
The Relationship Between ANOVA and t Tests
The independent measure t test and the ANOVA always result in the same statistical decision. There is no difference which one you choose.