Coding and observation

Observational methods

Limitations
Questionnaires of limited use (only for humans)
Aparatus limited to animals habituated to its use (e.g skinner box)
Context dependent phenomena not in lab (e.g. riots & crowds)

Observation

Steps (Martin & Bateson, 2007)
Observational stream
Experimental stream

Observational stream
Ask question
Observe informally (ad libitum sampling)
choose measures
choose recording method
collect and analyse data

Experimental stream
Hypothesise
predict
design
experiment
analyse
Interpret

As Question
Usually more you know in a published area, the more sophisticade the question
BUT simple questions are necessary

Observe informally
Get to know population
understand what's typical
immerse

Choose measures
don't code behaviour not related to question
what you want vs. what you can
decide what to measure
recording methods - how to sample behaviour

Operational definitions
specify physical requirements for coding
(e.g. button press, finger through cage, etc.)


Ostensive definitions
diagrams, descriptions of behaviour
(solitary vs. coordinated play)


Classify measures as events or states
events - occurnces as points in time
States - long duration events, eg. sleep or play

Ethogram
operationalised list of behaviours and how to code
enable replication

Types of measures
Latency - how long subject takes to respond to simulus
Frequeny - number of occurences during observation interval (always use per unit of time, except when equal across conditions)
rate - frequency per unit time
Duration - total time for single occurance during observation interval
Proportion - 30 minutes out of 45. Note .67 is a proportion not a percentage (also mean proportion)

Scales of measurement
non-parametric
nominal - categorical - is/isn't, presence/absence
ordinal - ranking - relational mathematical concepts (greater than, 1st 2nd 3rd)


parametric
Interval (0 is arbitrary) - 0 degrees is not zero heat. 60 degrees is not twice 30, but it is 30 more.
ratio (continuous) - 0 is real. Can apply multiply and divide 60 is twice 30

Sampling

ad libitum (at leasure) - preferred method for initial study, miss rare events, short durations, underestimate effect of smaller events
Focal sampling - specific indivudual, dyad etc. chosen. Large bias if individual does or doesn't do something others do, or does in private
scan sampling - group is sampled, same biases for ad libitum
behaviour sampling - all occurrences sampling. every time soething happens. possible to over estimate small events.

Recording methods
Time sampling (periodically sample)
instantaneous sampling
one-zero sampling - is it happening now yes/no


continuous recording e.g. skinner box
high fidelity records means less coding can be done

Coding

Reliability

coding scheme = measuring instrument
Coding schemes provide data (always numerical)
coding scheme > data > analysis

More or less accurate, more or less precise
not efficient to use more or less accurate scheme than is required


not efficient to use more or less precise
e.g. taller or shorter? dont' need 4 decimal places perhaps.

Crying yes/no.


misses intensity of crying (could code quiet 0, whimper 1, scream 2)
misses duration
misses relative intensity to contextual events (could subdivide into stranger/caregiver)

Coding scheme design


mutually exclusive
Exhaustive - all behaviours have categories (use other)
can use parallel coding schemes


Although using tested coding schemes good, limits questions to the same as the designer.

Precision of coding scheme, not observer (although it will show up as lower reliability)
Demonstrating good reliability in one context with one set of observers, does not necessarily mean this can be used elsewhere.

Intra-ovserver reliability


Same observer coded the same behavioural record at different times. only possible with recordings of some sort

inter-observer reliability


different observers code the same behaviour
No standard, but typically 15% independently code

Can be assessed at the beginning
Once reliability has been established several observers can then independently code


can be assessd at the end
risky, as coding may be faulty and then wasted time


Can be assessed during

Consensus estimate based on assumption that two (or more coders) can come to exact agreement (percent agreement, Kappa)


Consistencey based on assumption that it is not necessary that coders agree, but that they remain consisten within their own understanding (Pearson's r, Spearman's Rho, Cronbach's alpha - two or more observers)
Perfect correlation between observations, even if different values
e.g. obs 1 says 4 6 8, obs 2 says 5 7 9, obs 3 says 6 8 10. Magnitude is same.

Consensus
Percent agreement (doesn't account for agreement by chance)
Cohen's kappa (does take random into account)


Consistency
correlation coefficient (doesn't take into account variance between coders)
Cronbach's alpha (corrects for variance, can asses more than 2 coders)

Kappa
PA = observed agreements
PC = expected agreements by chance (products of yes percentages multiplied, added to product of no percentages)


Kappa = (Pa - Pc) / (1-Pc)

No absolute ranges for what's good. Landis & Koch quote a lot

Bias: “tendency of a measurement process to over- or under-estimate the value of a population parameter”

Marginal homogeneity
more conservative as the marginal homogeneity increases
where marginals are homogenous (Yes/no totals at side) are nearly the same.
percentage agreement vs lower kappa

Trait prevalance
more conservative as the prevalence of a trait becomes higher or lower
Single square (yes/no agreement) is notably more than others