Twenty common testing mistakes for EFL teachers to avoid

General examination characteristics

Redundancy of test type

Lack of confidence measures

An insufficient number of items

Negative washback through non-occurrent forms

Tests which are too difficult or too easy

Boundary effects (Accumulation of scores at the lower or higher ends of the scoring range

Reduced capacity of the test to discriminate among students in their ability.

Unreliable and unsuitable test for evaluation

Care should be taken to prepare tests and items that have about a fifty percent average rate of student success.

Test reliability is directly related to the number of items occurring on the test.

Increasing the number of subtests adds no significant variance-explanatory information to the test battery beyond that which may be obtained from the three or four best, reliable subtests.

Nothing is added beyond the existing components of the test in terms of ability

The manual provides us with information about the reliability and validity of the tests both what they are and how they were ascertained.

Need to ensure that the persons on whom the test was tried out in its evaluation stage are from the same general population as those with whom the test is ultimately used.

Through use of inappropriate structures of the language it is possible to teach errors to the students.

It is necessary that options include incorrect forms as distractors, it is best if these forms have some possible appropriate environment in the
language.

Item characteristics

Divergence cues

Convergence cues

Redundant wording

Option number

Trick questions

Must be avoided

Such items impair the motivation of the students, the credibility of the teacher, and the quality of the test.

Needless repetition

Reduce the amount of information available from a given period of time available for testing.

Not to provide cues regarding the choice of the correct option.

Test-wise students can identify the correct option because of content overlap.

Irregularity in the numbers of options

It is best to be consistent in the numbers of options used for items within a test

Test validity concerns

Common knowledge

Syllabus mismatch

Wrong medium

Mixed content

Sometimes tests have been claimed to measure something different from what many of their items are actually measuring.


Care must be taken that the response medium be representative
of the skill being tested.

Items that require common-knowledge responses should also be avoided.

Failure of a test to measure adequately either instructional objectives or course content.

Content matching

Mere matching of a word or phrase in a test item with the exact counterpart in a comprehension passage does not necessarily entail comprehension.

Tests involving such content-matching tasks are usually invalid as measures of comprehension.

Administrative and scoring issues

Administrative inequities

Lack of piloting

Inadequate instructions

Subjectivity of scoring

Lack of cheating controls

The teacher should take care to separate students, and where possible use alternate forms of the test.

When students obtain higher scores through cheating, tests are neither reliable nor valid.

If the students fail to understand the task, their responses may be invalid, in the sense that the students would have been able to supply the correct answers if they had understood the procedure.

Procedures should be carefully standardized even if this requires special training sessions for test administrators.

Other factors as well may impair the reliability of the test.

It is important to try out the test on a restricted sample
from the target population before it is put into general use.

Instructors give subjective, opinionated judgments of student performance.

If subjective judgment must be relied on, several mitigating procedures should be employed.

More than one judge should be consulted on marks assigned by other judges. The total of all judges’ ratings should determine the student’s mark.

Judges should make use of some precise rating schedule. Judges will be giving equal weight to the same kinds of performance.

Sufficient samples of language should be elicited from the students.

Ana Cristina Reyes Vera