Validity and Reliability

Validity - confidence in measurement. Does it measure what its supposed to measure.

Reliability - consistency, Could be completely invalid but can be consistent.

Operationalize

Define variable

Measure

Error in measurement will occcur

Systematic

Random error

Face

Content

Criterion

Construct

Face value. Does the content of the tool seem suitable in achieving aims. Quite subjective.

Is the test fully representative of what to measure. Bring in specialists. More indepth.

How close am I getting to the underlying true value. Use of gold standard.

Internal Validity

External Validity

Reproducibility

Consistency between different observers.

Different tools giving same results. Different indicators.

Different dimensiosn of Reliability

Stability - related to time. time 1 vs time 2 should be the same. Consistency across time.

Reproducibility - different observers = same result.

Homogeneity - consistent result across different measures of the same concept

Accuracy - lack of mistakes in measurement

Errors in measurement

Random error

Syetematic error

Systematic error refers to consistent, repeatable errors associated with faulty equipment or flawed experiment design. These errors lead to measurements that are consistently off in the same direction from the true value, either higher or lower. Systematic errors are predictable and typically caused by issues in the measurement system, such as calibration errors, biased sampling methods, or environmental factors affecting the measurement process.

Random error refers to the unpredictable and unavoidable fluctuations that occur in measurement processes. These errors are caused by random variations in the measurement environment, the measurement instrument, or the observer. Unlike systematic errors, random errors do not follow a consistent pattern and can cause measurements to vary in both directions around the true value.

Variability: The error varies in magnitude and direction with each measurement.

Lack of Bias: Random errors do not consistently skew measurements in one direction; they can lead to both overestimates and underestimates.

Reduction Through Averaging: Random errors can be reduced by increasing the number of measurements and averaging the results, as their effects tend to cancel out over many observations.

Consistency: The error is consistent across repeated measurements.

Bias: Systematic error introduces a bias into the results, as it skews measurements in a particular direction.

Detectability and Correction: Once identified, systematic errors can often be corrected or minimized by recalibrating instruments, adjusting procedures, or using different methodologies.

Relationship between validity and reliability

For a measure to produce small amount of error, it
must be both valid and reliable

Content validity assesses whether a test or measurement covers the entire range of the concept it intends to measure. It ensures that the test items represent all aspects of the construct being measured.

Example: A math test intended to assess algebra skills should include questions covering all relevant algebra topics. If it only includes questions on a subset of topics, it lacks content validity.

Construct validity examines whether a test measures the theoretical construct it is intended to measure. This type of validity is concerned with how well the test relates to underlying theories and constructs.

Criterion-related validity assesses how well one measure predicts an outcome based on another measure (criterion). It evaluates the relationship between the test and an external criterion.

Face validity is the extent to which a test appears to measure what it is supposed to measure, based on subjective judgment. While it is the least scientific type of validity, it is important for ensuring that the test is taken seriously by participants.

External validity concerns the extent to which the results of a study can be generalized to other settings, populations, times, and measures.

Internal validity pertains to the extent to which a study can demonstrate a cause-and-effect relationship between variables. It assesses whether the study design, data, and analysis accurately establish a causal link.

Established before external validity. If false then cannot make assumptions about external.

2 types of criterion validity

Concurrent Validity: Determines how well a test correlates with a criterion measure taken at the same time.

Predictive Validity: Evaluates how well a test predicts future outcomes or behaviors.

Tools for reliability

Test-Retest Reliability: The consistency of results when the same test is administered to the same group at different points in time.

Inter-Rater Reliability: The level of agreement between different observers or raters.

Internal Consistency: The consistency of results across items within a test or measure. Different measures testing the same construct.

Changes in values may not be due to unreliability but actual changes in construct.

Two data collectors - Cohen’s K. Three + data collectors – intra-class correlation (ICC)

Measures of Internal consistency

Corrected item-total correlation - correlation of each
item with the sum of all items in the scale

Split-half reliability - Correlation between scores on two subsets of questions when an original set of items on a given topic is split in half

Cronbach’s alpha (a)

KR-20: Kuder Richardson formula 20