Please enable JavaScript.
Coggle requires JavaScript to display documents.
Module 15: developing measures of typical performance (additional test…
Module 15: developing measures of typical performance
measures of typical performance assess an individuals typical preferences, or how he or she normally behaves or performs
examples of typical performance tests include: measures of
personality inventories
,
attitude surveys
, and
self-reports of behavior
Necessary forewarning
minor changes in question wording, item format, response options, or the ordering of questions can result in major change in the responses obtained to measures of typical performance
respondents try to make sense of the question by using the rule of "
cooperative conversational conduct
"
test specifications
this is the first step in making a typical performance test. the primiary focus of this is to define the construct of interest and to delineate it from related (but distinct) constructs. by going into great detail about our construct, the process of writing our items becomes far simpler. this also makes it easier to develop a case for strong construct validity. another thing it does is makes our test takers take the test more seriously
free-response versus constructed-response items
free-response items may be too time consuming and could result in it being difficult for them to answer the question adequately
the benefit of free-response items over constructed response items is that free-response items make the test taker feel free to answer however they want, or not be influenced by the test administers answers and possibly direction they want the test taker to go towards. the test taker could go in a direction that is completely abstract and not what the test administer had in mind at all
another issue with the free-response items is that the test taker may go in a direction that the test administer did not want them to go in. also, they make have difficulty articulating themselves if there is a language barrier or the test taker may have cognitive deficiencies
an issue with constructed response items is that they may provide answers that the respondent would have never come up with on their own. they also need to be pilot tested to see if they will be answered in the way that the test developer intended them to be answered
additional test specification issues
does the ordering of items influence responses?
sometimes. when they do occur they influence a large number of responses supplied by the test takers. ordering effects can not always cause the respondents to all choose the wrong answer, but rather they all choose a vast variety of wrong answers.
order effects are most likely to occur when there are multiple items on the test that are all asking the same question with different answers
should items include a "dont know" response option?
sometimes the validity of the test is stronger, sometimes it is lower. often times the option is given on questions that the researcher puts into the test that the respondents possibly have never thought about. if they take this option they can get a larger variance in their data. i would side on the fact that making them choose an answer, even if they have not thought about it before is the best option. this also reduces the error variance at the end of the statistical analysis
will respondents make up a response if they know nothing about the question?
if there is no "dont know" option the respondents will most likely try to figure out a reasonable answer rather than taking the easy route of choosing dont know
does acquiescence influence responses in attitude measurement?
acquiescence refers to the tendency of a respondent to agree with an attitude statement. this is most common when tests administer a likert scale. the concern is that rather than providing a likert scale individuals could simply answer the question in a yes no format.
item writing
the more items that can be thought of during the brainstorming phase the better, because often a large portion of them will be discarded during SMEs testing
keep items as simple as possible:
respondents are likely to differ in education level, as well as in vocabulary and language abilities
avoid or define ambiguous terms:
respondents are often unfamiliar with terms that may be considered commonplace to the test developers. this concern speaks to the importance of pilot testing both items and instructions
assess choices respondents would make today, not what they plan to do in the future:
for example, inquire whom an individual would vote for if the election were held today, not whom they plan on voting for in an upcoming election. while individuals are notoriously poor at predicting their own future behavior, they can report what they would do right now.
carefully consider the advantages and disadvantages of using reverse coded items:
in an effort to guard against acquiescence and random responding, test development experts once routinely recommend that one-third to one-half of items be reverse coded. reverse-coded items are worded such that a favorable attitude requires respondents to disagree with them. this practice is no longer universally recommended, however. summarizing a large number of studies on the use of reverse-coded items, Hughes urges caution in the use of such items. among the concerns for reverse-coded items can have unexpected impacts on factor structure, such as the formation of independent factors composed of reverse-coded items
ensure that response options (if provided) are logically ordered and mutually exclusive
keep in mind that respondents often view the scale midpoint as a neutral point or typical amount:
this is especially important to keep in mind when assessing frequency of behavior. balancing negative and positive response options can be helpful
include an "undecided" or "no opinion" response option along with the response scale
what to avoid when item writing
awkward vocabulary or phrases:
acronyms in particular should be avoided, as the understanding of the acronym may not be universal in the sample. (the use of acronyms could be a major snafu for your data collection effect and may make your results fubar.) likewise, pay close attention to idiomatic phrases (its raining cats and dogs)as the figurative meaning of the phrase may be lost on some respondents.
double barreled items:
these are items that assess more than one thing. for example, "my favorite classes in high school were math and science."
double negatives:
respondents required to respond on an agreement scale often experience difficulty interpreting items that include the word"not"
false premises:
these are items that make a statement and then ask respondents to indicate their level of agreement with a second statement. for example, "although dogs make terrific pets, some dogs just dont belong on urban areas." if a respondent does not agree with the initial statement, how should he or she respond? notice that this item has the further complication of including a double negative
leading or emotionally loaded items:
these items implicitly communicate what the "right" answer should be. for example "do you support or oppose restrictions on the sale of cancer-causing tobacco products to our state's precious youth?" the use of these items is sometimes appropriate, however, when respondents might otherwise be uncomfortable in reporting a certain attitude or behavior that might be considered socially deviant (self-reports of sexual practices)
asking questions about which the respondent is likely to have very little interest:
researchers all too often administer surveys to participants with little or no interest in the topic. one author of this textbook recently participated in a phone survey sponsored by a local municipality about the use of converting wastewater into drinking water. while the author had never previously considered this topic, he was able to respond fairly confidently to the first few questions. twenty minutes later, however, when the phone interviewer continued to inquire about various attitudes on the topic, the quality of the provided responses might surely be considered questionable
rational or empirical test development
rational development is concerned with the internal integrity of the test
basically they are wondering if we are measuring the construct we are intending to measure, much like internal validity
empirically tested items are more focused on how the test measured up against external criterion of interest.
the items that pass the internal criterion testing(seen above)and the external criterion are then used to develop future tests because they have proven to be good measures
Pilot testing
at minimum the test should go through a think aloud study to see if it is appropriate for use