Please enable JavaScript.
Coggle requires JavaScript to display documents.
Module 6: estimating reliability (sources of measurement error (any…
Module 6: estimating reliability
introduction
the more the observed scores reflect true score variance rather than error variance, the more reliable the measure
the less that observed scores reflect true score variance, the more that observed scores reflect error variance, and the less reliable the measure
sources of measurement error
any variability in test administration procedures, including changes in the way instructions are provided, the environment in which test administration occurs, the administrator, and even the other test takers, can introduce
random sources of measurement error
the more subjective the scoring procedures, the more likely errors in scoring will occur
when using the same test taker with
parallel measures
we would hope the test score would be the same, but any difference would be error
type of reliability
test-retest reliability
: the focus is on consistency of test scores over time.
measures
memory effects
,
true score fluctuation
, and
guessing
parallel forms reliability
: the reliability coefficient provides an estimate of equivalence of two versions of the same test
must be measuring the same construct and be composed of the same types of items and also have the same number of items
the most difficult part of this type is developing two tests. it is hard enough to develop one psychometrically sound test
by dividing a test into two halves, we could derive an estimate of
internal consistency reliability
.
primary benefits of
internal consistency reliability
are that there is no need to create two separate tests, the measure is administered just once to examinees
split-half reliability
estimate of internal consistency, there are two concerns.
dividing a test in half actually reduces the reliability of the test as a whole because it reduces the total number of items that compose the test by half
spearman-brown formula
can be used to correct this
a concern is that there are many ways to divide the items composing a measure into two separate halves. in addition to comparing odd-numbered items to even-numbered items, it is also possible to compare scores on the first 10 items with scores on the second items. this helps with
fatigue effects
more often then not coefficient alpha or cronbachs alpha will be used in this case
a
.70
on the alpha scale is appropriate as well
1 more item...
what do we do with reliability estimates now that we have them?
we will need to report our reliability estimate in any manuscripts (technical manuals, conferences papers, and articles) that we write
if we have followed sound basic test construction principles, someone who scores high on our test is likely to be higher on the underlying trait than someone who scores low on our test
concluding comments
there will always be some form of error in psychological measurement so we have to follow these rules
decide what form of error we are most interested in measuring. once we do this we can choose the best reliability measure estimate our reliability.
we can use the reliability estimate to build CIs around our observed scores to estimate the underlying true scores. in doing so, we will have much more confidence in the interpretation of our measurement instruments