TASK 1

Gregory - Concepts of Reliability

Classical Test Theory (theory of true scores)

2 factors

Consistency factors - stable traits of individual, true scores

Inconsistency factors - characteristics of individual that have nothing to do with the attribute being measured, but still affect test scores - error

unsystemtic measurement errors

systematic measurement errors

item selection (not equaly fair to all persons)

test administration (changing environemnt etc)

test scoring (u score test differently for people - for example issue for the projective test when you change the scoring)

when test measures something else, that is supposed to (problm of vlaidity and test development)

ASSUMPTIONS

Mean ME is = 0

True scores and error scores are uncorrlated (if they were, then ME would be more systematic than unsystematic)

measurement errors (ME) are random

MEs are not correlated with errors in other tests (becasue they would be systematic - would relate to some other construct/factor)

Reliability coefficient r(XX) = ration of true score variance to total variance of test scores r(XX) = σ² (T) / [σ²(T) + σ²(e)]

Reliability

Temporal

Internal Consistency

Test-retest reliability

Alternate forms reliability

Split-half reliability

o SPEARMAN-BROWN FORMULA r(SB) = 2r(hh) / [1 + r(hh)] whereby r(SB) is the estimated reliability of the full test, and r(hh) the half-test reliability

Coefficient alpha (Cronbach's alpha) which is just all the possible split-half combinations

image

r(α) = coefficient alpha, N = number of items, σ²(j) = variance of 1 item, σ² = total variance of scores

Kuder-Richardson Formula 20 (KR20) = internal ocnsistency for dichotomous tests

KR20 = (N/(N-1)) * (1-(SUM pq)/total variance)



image

Interscorer reliability = sample of tests is independently scored by 2+ examiners and scores for pairs of examiners are then correlated (nice supplement to other reliabilities)

WHICH RELIABILITY TO CHOOSE

  • Tests designed to be administered in same individual more than once = test-retest
  • Tests purport to possess factorial purity  coefficient alpha
  • Factorially complex tests (e.g. general intelligence measures) = coefficient alpha not appropriate (internal consistency measure not enough)
  • Tests where items are carefully ordered according to difficulty level = split-half
  • Tests with subjectivity scoring = interscorer reliability

Item Response Theory

Item reponse fucntion/Item characteristic curve = equaiton describing between amount of latnet trait (ability) that individual possesses and probablility the individual will answr given question correctly

  • Assumptions: each respondent has a certain amount of the latent trait measured; latent trait influences directly the responses to the items on the trait in question
  • Difficulty level: how much of the trait is needed to answer the item correctly (in contrast to CTT: difficulty of an item = proportion of examinees who pass the item

- Item discrimination parameter: how well item differentiates among individuals at a specific level of the trait in question (high - good when separating wheat from chaff)

different models

  • Rasch model / 1-parameter model: p(Φ) = 1 / (1 + e^(-( Φ –b)), whereby, p (Φ) = probability of a respondent with trait level Φ correctly responding to an item with difficulty b, Φ = amount of trait one possesses, b= item difficult
  • 2-parameter model: adds item discrimination index to equation
  • 3-parameter model: adds a guessing parameters

Information functions: information reduces uncertainty  in tests it represents capacity to differentiate among people

Specia circumstances -

Unstable characteristics - emotional reativity fluctuates quickly in reaction to environment : Test and Retest must be nearly instantaneous in order to provide accurate index of reliabliity for such characteristics

o SEM = SD √ (1 – r), whereby SD = standard deviation, r = reliability coefficient (both derived from normative sample/ representative group)
Confidence interval: needed to get an estimate of whether obtained score is likely to be close to true score

click to edit

Standard error of difference: comparison between abilities in subject  SEdifference can help determine if difference between scores is significant

  • SEdiff = √ ((SEM1)² + (SEM2)²)
  • Assumed that the 2 scores are on the same scale
  • Example: person scores IQverbal = 112 and IQperformance = 105  is 7 points difference significant?  compute SEdiff = 4.74  not significant

CORTINA

Generalisability theory: Aspects of tests/scales are sampled from a predefined domain, test variance can be broken into variance attributions of each of the aspects and the interaction between them  reliability estimate depends on the sources of variance that one considers relevant

  • particular estimate of reliability depends on the particular error producing factors the one seeks to identify
  • Errors associated with time relevant  test-retest or multiple administrations of parallel tests
  • Errors associated with use of different items  internal consistency estimates (e.g. Cronbach’s alpha) or single administration of parallel tests

CRONBAHCS COEFFICIENT ALPHA IS

  1. α is a measure of first-factor saturation
    o alpha is a measure of the extent to which there is a general factor present in a set of items and therefore, the extent to which the items are interrelated - TRUE FOR STANDARDIZED ALPHA
  1. α is equal to reliability in conditions of essential tau-equivalence
    o “As the items in tests approach essential tau-equivalence, as they do when tests are composed of equal portions of general and group factor variance, alpha approaches reliability. When items are exactly tau-equivalent, alpha equals reliability”
  1. α is the lower bound of reliability of a test
    o 2nd and 4th statement combined: measurements are tau-equivalent if they are linearly related and differ only by a constant  alpha is lower bound because this tau-equivalence is seldom achieved
  1. α is the mean of all split-half reliabilities - not subject to randomness

is a more general version of Kuder-Richardson coefficient of equivalence

o Kuder-Richardson applies only to dichotomous items, whereas alpha applies to any set of items regardless of response scale

o Definition of equivalence: there is very little variance specific to individual items

Precision of alpha: interrelatedness and multidimensionality are issues of precision not level of alpha

he higher the precision the less overlap between different dimensions.

RESULTS OF STUDY

number of iems affects Alpha

Number of dimensions affects alpha

intercorrelation between items also affects alpha

o Although the table shows that alpha can be high in spite of low item intercorrelations and multidimensionality, alpha does increase as function of correlation and decreases as function of multidimensionality

Conclusion: alpha says only how correlated halves are, nothing about if the items measure what they are intended to measure

click to edit


Useful for estimating reliability when item-specific variance is a unidimensional test of interest

o Test has large alpha  large portion of variance in test is attributable to general and group factors

o Very little item specific variance (uniqueness gets assessed by alpha)

click to edit

click to edit