Ch. 3
Validity of Assessment Results

General Nature of Validity

Four Principles for Validation

  1. Use
  1. Values
  1. Interpretations
  1. Consequences

Validity of Teacher-Made Classroom Assessment Results

Validity of Large-Scale Assessments

Validity Issues When Accommodating Students With Disabilities

Conclussion

Appropriate Interpretations

Appropriate Uses

Appropriate Values

Appropriate Consequences

Content Representativeness and Relevance

  1. Does my assessment procedure emphasize what I have taught?
  1. Do my assessment task and scoring schemes accurately represent the outcomes specified in my school's and state's curriculum framework?
  1. Are my assessment tasks in line with the current thinking about what should be taught and how it should be assessed?
  1. Is the Content in my Assessment important and worth learning?

Thinking Processes and Skills Represented

  1. Does my assessment instrument represent the kinds of thinking skills that my school's curriculum framework and state's standards view as important?
  1. During the assessment, do students actually use the types of thinking I expect them to use?
  1. Do the tasks on my assessment instrument require students to use important thinking skills and processes?
  1. Do I allow enough time for students to demonstrate the type of thinking I am trying to assess?

Consistency with other Classrom Assessments

  1. Is the pattern of results in the class consistent with what I expected based on my other assessments of them?
  1. Do I make the assessment tasks to difficult or too easy for my students?

Reliability and Objectivity

  1. Do I use a scoring guide for obtaining quality ratings or scores from students' performance on the assessment?
  1. Is my assessment instrument long enough to be a representative sample of the types of learning outcomes I am assessing?

Fairness to Different Types of Students

  1. Do I word the problems or tasks on my assessment so that students with different ethnic and socioeconomic backgrounds will interpret them in appropriate ways?
  1. Do I modify the wording of the administrative conditions of the assessment tasks to accommodate students with disabilities or special learning problems?
  1. Do the Pictures, stories, verbal statements, or other aspects of my assessment procedure perpetuate racial, ethnic, or gender stereotypes?

Economy, Efficiency, Practicality, Instructional Features

  1. 7s the assessment relatively easy for me to construct and not to cumbersome to use to evaluate students?
  1. Would the time needed to use this assessment be better spent directly teaching my students instead?
  1. Does my assessment represent the best use of my time?

Multiple Assessment Usage

  1. Do I use one assessment result in conjunction with other assessment results?

Positive Consequences for learning

  1. Do my assessments result in both the students' and my getting information that helps students learn?
  1. Do my assessments avoid inappropriate negative consequences?

Construct an Argument for Validity, Supported by Evidence

  1. Aninterpretive argument
  1. The validiry argument
  1. You can appropriately assess students' success in the calculus course (i.e., a suitable criterion assessment procedure is available).
  1. You can identify the algebra concepts and thinking skills that students will use frequently in the calculus course.
  1. The algebra content and thinking skills assessed by the placement test match those frequently used in the calculus course.
  1. The remedial course to which low-scoring students will be assigned will succeed in teaching students the algebra concepts and skills needed in the calculus course.
  1. Scores on the placement test are reliable (i.e., students' scores are consistent across different samples of test items, different testing occasions, and different persons scoring the test).
  1. It is not helpful for students with high ability in algebra to take the remedial algebra course (i.e., students who score high on the placement tests will not significantly improve their chances of success in calculus by first taking this particular remedial algebra course).
  1. The placement test scores are not affected by systematic errors that would lower the validity of your interpretation that the placement test measures algebra knowledge and thinking skills.

Content Representativeness and Relevance: Content Evidence

  1. Content
  1. Depth
  1. Emphasis
  1. Performances
  1. Implied applicability

Thinking Skills and Processes: Substantive Evidence

Relationships Among Parts of Assessment: Internal Structure Evidence

Relationships of Results to Other Variables: External Structure Evidence

Correlation Coefficient

Students' Scores on Different Texts

Comparing Students' Rank Orders

Scatter Diagrams

Pearson Product-Moment Correlation Coefficients

Degrees of Relationships

Correlation and Causation

Correlation Coefficients and Sample Sizes

Factors that Raise or Lower Correlation Coefficients

Validity Coefficients

Expectancy Tables

The Criterion

Judging the Worth of Criteria

Low Criterion Reliability Limits Validity

Systematic Errors

Practical Considerations

Reliability Over Time, Assessors, and Content Domains: Reliability Evidence

Generalization of Interpretation Over People, Conditions, or Special Instructions and Interventions: Generalization Evidence

Intended and Unintended Consequences: Consequential Evidence

Cost, Efficiency, Practicality, and Instructional Features: Practicality Evidence

Validity of Scores From Test Accommodations

How should Accommodated Norm-Referenced Scores Be Reported?

How Should Accommodated Criterion-Referenced Scores Be Reported?

Measurement Perspective on Accommodations and Modifications

  1. Will changes in format or testing conditions change the skill being measured?
  1. Will the scores of examinees tested under standard conditions have a different meaning than scores for examinees tested with the requested accommodation?
  1. Would examinees who do not need accommodations benefit if they were nevertheless allowed the same accommodations?
  1. Do examinees requesting or granted accommodations have any capacity for adjusting to standard test administration conditions?
  1. Is the disability evidence or testing accommodation policy based on procedures with doubtful validity and reliability? (adapted from Phillips, 1994, p. 104).

The validity of classroom and large-scale assessment results depends on intended purposes and uses. This chapter has outlined the various types of evidence that should be considered in arguments that particular assessment results are valid for a particular purpose or use. We introduced the concept of reliability as a necessary but not sufficient condition for validity. Chapter 4 discusses reliability in more detail.