Please enable JavaScript.
Coggle requires JavaScript to display documents.
Scale Reliability and Validity - Coggle Diagram
Scale Reliability and Validity
We also must test these scales to ensure that: these scales indeed measure the unobservable
construct that we wanted to measure, and they measure the intended construct consistently and precisely.
Reliability and validity, jointly called the "psychometric properties" of measurement scales, are the yardsticks against which the adequacy and accuracy of our measurement procedures are evaluated in scientific research.
Reliability is the degree to which the measure of a construct is consistent or dependable.
An example of an unreliable measurement is people guessing your weight.
A more reliable measurement may be to use a weight scale, where you are likely to get the same value every time you step on the scale, unless your weight has actually changed between measurements.
Reliability implies consistency but not accuracy.
Sometimes, reliability may be improved by using quantitative measures, for instance, by counting the number of grievances filed over one month as a measure of morale, but it is less subject to human subjectivity, and therefore more reliable.
Inter-rater reliability, also called inter-observer reliability, is a measure of consistency between two or more independent raters (observers) of the same construct.
Test-retest reliability is a measure of consistency between two measurements (tests) of the same construct administered to the same sample at two different points in time.
If the observations have not changed substantially between the two tests, then the measure is reliable.
Split-half reliability is a measure of consistency between two
halves of a construct measure.
Internal consistency reliability is a measure of
consistency between different items of the same construct.
If a multiple-item construct measure is administered to respondents, the extent to which respondents rate those items in a similar manner is a reflection of internal consistency.
Validity, often called construct validity, refers to the extent to which a measure
adequately represents the underlying construct that it is supposed to measure.
This type of validity is called translational validity (or representational validity), and consists of two subtypes: face and content validity. Translational validity is typically assessed using a panel of expert judges, who rate each item (indicator) on how well they fit the conceptual definition of that construct, and a qualitative technique called Q-sort.
This type of validity is called criterion-related validity, which includes four sub-types: convergent, discriminant, concurrent, and predictive validity.
Face validity refers to whether an indicator seems to be a reasonable measure of its underlying construct "on its face".
Content validity is an assessment of how well a set of scale items
matches with the relevant content domain of the construct that it is trying to measure.
Convergent validity refers to the closeness with which a measure relates to (or converges on) the construct that it is purported to measure, and discriminant validity refers to the degree to which a measure does not measure (or discriminates from) other constructs that it is not supposed to measure.
Predictive validity is the degree to which a measure successfully predicts a future outcome that it is theoretically expected to predict.
Concurrent validity examines how well one measure relates to other concrete criterion that is presumed to occur simultaneously.
Now that we know the different kinds of reliability and validity, let us try to synthesize our understanding of reliability and validity in a mathematical manner using classical test theory, also called true score theory.
Measurement errors can be of two types: random error and systematic error. Random error is the error that can be attributed to a set of unknown and uncontrollable external factors that randomly influence some observations but not others.
Systematic error is an error that is introduced by factors that systematically affect all
observations of a construct across an entire sample in a systematic manner.
The integrated approach starts in the theoretical realm.
The first step is conceptualizing
the constructs of interest.
Next, we select (or create) items or indicators for each construct based on our conceptualization of these construct, as described in the scaling procedure.
A panel of expert judges (academics experienced in research methods and/or a representative set of target respondents) can be employed to examine each indicator and conduct a Q-sort analysis.