TASK 1
Gregory - Concepts of Reliability
Classical Test Theory (theory of true scores)
2 factors
Consistency factors - stable traits of individual, true scores
Inconsistency factors - characteristics of individual that have nothing to do with the attribute being measured, but still affect test scores - error
unsystemtic measurement errors
systematic measurement errors
item selection (not equaly fair to all persons)
test administration (changing environemnt etc)
test scoring (u score test differently for people - for example issue for the projective test when you change the scoring)
when test measures something else, that is supposed to (problm of vlaidity and test development)
ASSUMPTIONS
Mean ME is = 0
True scores and error scores are uncorrlated (if they were, then ME would be more systematic than unsystematic)
measurement errors (ME) are random
MEs are not correlated with errors in other tests (becasue they would be systematic - would relate to some other construct/factor)
Reliability coefficient r(XX) = ration of true score variance to total variance of test scores r(XX) = σ² (T) / [σ²(T) + σ²(e)]
Reliability
Temporal
Internal Consistency
Test-retest reliability
Alternate forms reliability
Split-half reliability
o SPEARMAN-BROWN FORMULA r(SB) = 2r(hh) / [1 + r(hh)] whereby r(SB) is the estimated reliability of the full test, and r(hh) the half-test reliability
Coefficient alpha (Cronbach's alpha) which is just all the possible split-half combinations
r(α) = coefficient alpha, N = number of items, σ²(j) = variance of 1 item, σ² = total variance of scores
Kuder-Richardson Formula 20 (KR20) = internal ocnsistency for dichotomous tests
KR20 = (N/(N-1)) * (1-(SUM pq)/total variance)
Interscorer reliability = sample of tests is independently scored by 2+ examiners and scores for pairs of examiners are then correlated (nice supplement to other reliabilities)
WHICH RELIABILITY TO CHOOSE
- Tests designed to be administered in same individual more than once = test-retest
- Tests purport to possess factorial purity coefficient alpha
- Factorially complex tests (e.g. general intelligence measures) = coefficient alpha not appropriate (internal consistency measure not enough)
- Tests where items are carefully ordered according to difficulty level = split-half
- Tests with subjectivity scoring = interscorer reliability
Item Response Theory
Item reponse fucntion/Item characteristic curve = equaiton describing between amount of latnet trait (ability) that individual possesses and probablility the individual will answr given question correctly
- Assumptions: each respondent has a certain amount of the latent trait measured; latent trait influences directly the responses to the items on the trait in question
- Difficulty level: how much of the trait is needed to answer the item correctly (in contrast to CTT: difficulty of an item = proportion of examinees who pass the item
- Item discrimination parameter: how well item differentiates among individuals at a specific level of the trait in question (high - good when separating wheat from chaff)
different models
- Rasch model / 1-parameter model: p(Φ) = 1 / (1 + e^(-( Φ –b)), whereby, p (Φ) = probability of a respondent with trait level Φ correctly responding to an item with difficulty b, Φ = amount of trait one possesses, b= item difficult
- 2-parameter model: adds item discrimination index to equation
- 3-parameter model: adds a guessing parameters
Information functions: information reduces uncertainty in tests it represents capacity to differentiate among people
Specia circumstances -
Unstable characteristics - emotional reativity fluctuates quickly in reaction to environment : Test and Retest must be nearly instantaneous in order to provide accurate index of reliabliity for such characteristics
o SEM = SD √ (1 – r), whereby SD = standard deviation, r = reliability coefficient (both derived from normative sample/ representative group)
Confidence interval: needed to get an estimate of whether obtained score is likely to be close to true score
click to edit
Standard error of difference: comparison between abilities in subject SEdifference can help determine if difference between scores is significant
- SEdiff = √ ((SEM1)² + (SEM2)²)
- Assumed that the 2 scores are on the same scale
- Example: person scores IQverbal = 112 and IQperformance = 105 is 7 points difference significant? compute SEdiff = 4.74 not significant
CORTINA
Generalisability theory: Aspects of tests/scales are sampled from a predefined domain, test variance can be broken into variance attributions of each of the aspects and the interaction between them reliability estimate depends on the sources of variance that one considers relevant
- particular estimate of reliability depends on the particular error producing factors the one seeks to identify
- Errors associated with time relevant test-retest or multiple administrations of parallel tests
- Errors associated with use of different items internal consistency estimates (e.g. Cronbach’s alpha) or single administration of parallel tests
CRONBAHCS COEFFICIENT ALPHA IS
- α is a measure of first-factor saturation
o alpha is a measure of the extent to which there is a general factor present in a set of items and therefore, the extent to which the items are interrelated - TRUE FOR STANDARDIZED ALPHA
- α is equal to reliability in conditions of essential tau-equivalence
o “As the items in tests approach essential tau-equivalence, as they do when tests are composed of equal portions of general and group factor variance, alpha approaches reliability. When items are exactly tau-equivalent, alpha equals reliability”
- α is the lower bound of reliability of a test
o 2nd and 4th statement combined: measurements are tau-equivalent if they are linearly related and differ only by a constant alpha is lower bound because this tau-equivalence is seldom achieved
- α is the mean of all split-half reliabilities - not subject to randomness
is a more general version of Kuder-Richardson coefficient of equivalence
o Kuder-Richardson applies only to dichotomous items, whereas alpha applies to any set of items regardless of response scale
o Definition of equivalence: there is very little variance specific to individual items
Precision of alpha: interrelatedness and multidimensionality are issues of precision not level of alpha
he higher the precision the less overlap between different dimensions.
RESULTS OF STUDY
number of iems affects Alpha
Number of dimensions affects alpha
intercorrelation between items also affects alpha
o Although the table shows that alpha can be high in spite of low item intercorrelations and multidimensionality, alpha does increase as function of correlation and decreases as function of multidimensionality
Conclusion: alpha says only how correlated halves are, nothing about if the items measure what they are intended to measure
click to edit
Useful for estimating reliability when item-specific variance is a unidimensional test of interest
o Test has large alpha large portion of variance in test is attributable to general and group factors
o Very little item specific variance (uniqueness gets assessed by alpha)
click to edit
click to edit