Please enable JavaScript.
Coggle requires JavaScript to display documents.
Assessment - Concept Mapping - Group 3 - Coggle Diagram
Assessment - Concept Mapping - Group 3
Reliability - Consistency of results produced by measurement devices
Test length-longer tests preferred as they reduce random measurement error
Test environment - external factors (e.g. noise, distractions)
Item quality - e.g. poorly written questions or ambiguous questions reduce reliability
Test-retest reliability : measures consistency over time
Inter-rater reliability: How consistent different graders score the same test (Improve by use of rubrics with clear grading criteria)
Internal consistency reliability: measure how well different parts of a test measure the same concept.
Alternate forms/versions - consistency in results between two or more different versions of an assessment (Make sure are equal in terms of difficulty & perform statistical equating for minor differences in both versions)
Improving Reliability : Pilot Testing, standardized administration, refining test items, increasing test length, training for raters).
Validity - degree to which an assessment accurately assesses an intended content
Test content - How well a test covers the topics it's supposed to measure. It ensures that the questions reflect what students are actually expected to learn.
Response Processes - Whether the thought process used by test-takers matches what the test is trying to measure. In other words, does the test truly assess the skills or knowledge it aims to evaluate without distracting them?
Internal Structure - Whether the different parts of the test work together as expected to measure what they claim to measure. This is to make sure that the test is structured correctly and fairly.
Relations to Other Variables - Whether the test results align with other related or unrelated measures as expected. For example, if a math test is valid, scores should relate to past math performance but not necessarily to unrelated skills like art or music. Sometimes prior knowledge of sports (like baseball) is expected in physics tests - so fairness needs to be checked
Comparative data - reliability and validity in one country is not the same as for another country
Comparative analysis (even for the same content) cannot be performed in following cases
different countries
different levels within a country (i.e., middle school vs college)
different types of assessments (i.e., SAT subject matter with AP exam scores, or ACT vs SAT)
different forms of assessment (i.e., project-based or task-performance based (like labs) vs. paper-pencil tests)
Discrimination Index - measurement of how well a test distinguishes between students who do well and those who do poorly
helps to determine which questions are satisfactory
determine which questions need to be eliminated
helps determine if the assessment is measuring accurately
-1.0 to 1.0 with an acceptable score of .3
Calculate by subtracting the # of correct answers for the Lowest 27% from the correct answers from the highest 27% and divide by the total number of participants