Please enable JavaScript.
Coggle requires JavaScript to display documents.
OBJECT RECOGNITION IN CONTEXT :PENCIL2: B+B2 LECTURE 4 - Coggle Diagram
OBJECT RECOGNITION IN CONTEXT
:PENCIL2: B+B2 LECTURE 4
Search in the real world
When humans need to perceive multiple items simultaneously, performance deteriorates rapidly as a function of the number of items in the array.
Search in the real world is more like conjunction search.
Object detection in scenes is effortless
Large amount of distractor objects that are similar to potential targets on various dimensions, in real-world scenes.
However, in real-world scenes we seem to effortlessly detect and recognise objects, much unlike the effects for simple stimuli search.
Humans are very accurate in detecting and recognising objects within scenes, even when scenes are presented very briefly.
(Potter, 1976)
This suggests that scene backgrounds are not particularly detrimental to performance.
To test how effortful object recognition in natural scenes really is, one can manipulate the amount of attention available for doing the task
(Li et al., 2012)
Here, participants are required to do an animal detection task in the periphery or a letter detection task at fixation - or both at the same time.
If you do both at the same time, then you should have less attention available for each of the tasks, and task performance should therefore decrease (if the task depends a lot on attention).
Interestingly, binding attention at fixation during the dual task does not deteriorate performance in the object detection task to a great extent.
This suggests that not much attention is needed to perform object detection in the wild.
Compared to simple task
If you give participants a peripheral task that requires discriminating a simple visual feature conjunction, things look quite different.
Performance on this task deteriorates sharply once participants are simultaneously engaged in the central task.
Face detection of objects in the visual system
Thorpe et al. (1997)
Show an image for 20ms
Participants have to indicate whether there was an animal present in the scene or not.
EEG waveforms
Over the first 100-150ms, these lines look quite similar.
In these tasks, the first signatures of object detection in scenes can be found as early as 150ms post-stimulus.
This suggests that object information is extracted rapidly, despite variations in category examples and the presence of a scene background.
But after that, the brain is quick to respond to whether there was an animal present or not.
We are very efficient at these tasks, and good at perceiving objects embedded in complex scenes.
Processing Multiple Objects
Naturally objects appear together with many others in a scene.
Different objects need to compete for processing resources
Competing for cortical resources
When multiple objects are present, neural processing is reduced, in turn hampering perceptual efficiency.
The more objects, the more severely processing should be impaired.
Single-cell recording in monkey brains - Similar results found in humans (fMRI)
When multiple objects are presented simultaneously, rather than sequentially, responses are reduced.
Kastner & Ungerleider (2001)
The researchers either presented four little texture patches, one after the other, each one for a quarter of a second, or they presented them all together for 250ms.
Fixation point where the participant would fixate in the centre, then in the periphery have either the full stimulus of the four individual elements.
Sequential condition = assumed that over the period of one second, the stimuli would also stack up to the stimulation that would look exactly like the combined stimulus in the bottom left hand picture.
However, at no time are there two pictures on the screen simultaneously in the sequential condition.
Stimuli should compete in combined condition.
Higher responses in the sequential condition than in the simultaneous condition.
Processing Meaningful Object Groups
Researchers have used objects that were interacting with each other or just shown side by side to see the effects on competition.
Conditions: Novel interaction or familiar interaction OR novel side-by-side and familiar side-by-side.
Objects that interact meaningfully yield stronger responses, suggesting that neural competition is relieved for objects that are interacting.
As many objects in real-life scenes go together in meaningful ways, we could expect that competition is greatly reduced.
Response normalisation for multiple objects
When multiple objects are present, information about them is processed independently in the visual system.
A sign of this concurrent, independent processing is the normalisation of response patterns across objects.
The response you observe is a mixture of the responses that you would get from one object when it is alone and the other object when it is alone.
Take the two responses of the individual objects and average them to predict the response of showing the objects simultaneously.
The response pattern for multiple objects can be described by the
average of the individual patterns
.
Multi-object patterns are biased by attention.
In real-life situations, directing our attention to particular objects may favour the attended over the unattended object.
Experimentally, one can test this by asking participants to attend to one object or to the other (either the face or the house).
When attention is directed to one of two objects in a pair, the resulting response pattern contains more of the response to the attended object.
Attention thereby biases perception in favour of the behaviourally relevant information, alleviating competition between representations.
Comes at the expense of the unattended items
Data collected from the object selective cortex, from the lateral occipital cortex of these participants.
Kaiser & Peelen (2018)
The configuration of multiple objects has an influence on cortical processing.
This can be tested by using objects that are configured in typical or atypical pairs and then measuring how similar responses to these pairs are matched to the average responses to the two constituent objects.
E.g. TV located behind the sofa (unnatural)
Multi-object grouping in the visual cortex
Consistent combinations
= maybe familiar combinations are not processed individually anymore, but processed as one object.
Maybe there is a whole other set of neurons responsible for this.
In object-selective cortex (OSC), but not early visual cortex (EVC), the typically positioned pairs are more dissimilar to the average response to the individual object.
This suggests that meaningful groups of objects are more than the sum of their parts.
In OSC, they are processed together as a meaningful ensemble.
As a consequence, fewer objects need to be processed at a time.
Less strain on processing systems.
Natural scenes and competition for processing
Multi-object processing is a challenge in natural scenes, because a large number of objects compete for a limited amount of processing resources.
Competition can be Competition can be
reduced
by (
1
)
directing
attention in smart ways, and (
2
)
grouping
objects that go together in meaningful ways.
Processing Objects within scenes
Objects are embedded in meaningful context.
Context matters for object processing.
Scene context influences object processing.
When objects are presented in
intact scenes
they are
identified more correctly
than when they are presented in jumbled scenes.
This shows that scene context (although essentially irrelevant) impacts object perception.
Semantic Relationships matter
Congruence of the object and the scene is important.
To test this, researchers have manipulated the semantic relationship between scenes and objects.
Objects that are placed within consistent scenes are identified more correctly.
Scenes are identified more correctly when they contain a semantically consistent object.
Semantic congruency facilitates interactions between object and scene processing
- two systems linked.
On a neural level...
Researchers have used EEG while manipulating the congruence of objects and scenes, along semantic (i.e. content) and syntactic (i.e. placement) dimensions.
Congruence matters
Violations of scene-object congruence lead to characteristic deflections in EEG waveforms.
Most prominently,
semantic violations
lead to
deviations in the N300 and N400 components.
Syntactic violations are coded later
, and also prominently in the
P600
.
Context enhances "noisy" object processing
One could assume that context enhances object processing on a neural level.
This should be particularly pronounced in cases where there is some ambiguity or noise in the visual point.
First fMRI evidence for this came from interactions between face and body processing.
The researchers presented blurred faces in isolation or together with a typical versus displaced body “context”
FFA activations to noisy faces are strongest when they are presented together with a body
- when the face is positioned correctly.
Similar findings for objects in scenes.
Here, participants saw two types of degraded, “noisy” objects (animate vs inanimate)
These objects were presented with or without scene context.
The critical test is whether the type of object can be decoded from fMRI response patterns.
What they found was that you can classify between the animate and the inanimate objects, although they are degraded, you can tell them apart.
You can also classify between the scenes that contained the animate objects and the scenes that contained the inanimate objects.
You can also let the classifier discriminate between the animate and the inanimate objects when they are embedded in the scene.
Strong increase in classification accuracy.
So when objects are embedded in the scene, the classifier is much more successful in telling whether it is an animate or an inanimate object.
Critically decoding animate vs. inanimate objects works much better when they’re embedded within the scene.
Scene context thus directly facilitates the neural representation of objects.
Bridging the object and scene processing pathways
The scene processing pathway may extract a rapid representation of scene ‘gist’ (i.e. the current context), perhaps from low spatial frequencies (LF)
This representation may refine object representations in the inferotemporal cortex, specifically when they are noisy.
Another pathway has been postulated, where (also based on LF) candidate objects are computed in the frontal cortex - however, this idea is quite disputed.