Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 4: Recognizing Visual Objects (object recognition (general…
Chapter 4: Recognizing Visual Objects
object representation
general information
representation refers to pattern of neural activity in brain that contains information about stimulus and gives rise to subjective perceptual experience of stimulus
begins as soon as photoreceptors respond to light
as progresses through visual pathway, representations are constructed that contain information about increasingly complex aspects of retinal image
aka perceptual organization; performs operations involved in identifying those portions of the retinal image that belong to one object or another or the background
principles of perceptual organization are adaptations to physical world in which we evolved; adaptive advantages of more successful navigation of environment has led principles to be built into our visual system
however, many 'general truths' that we have evolved to perceive more easily aren't always true
heuristics
:
in perceptual organization, rules of thumb based on evolved principles and on knowledge of physical regularities
perceptual inference
:
in vision, the interpretation of a retinal image using heuristics
visual system tends to interpret given scene as the one that is most probable given current retinal image
steps of perceptual organization
1) represent edges
edge extraction
:
the process by which the visual system determines the location, orientation, and curvature of edges in the retinal image
based on patterns of responses from neurons in areas V1, V2, and V4
2) represent uniform regions bounded by edges
occlusion, shading, and shadows all contribute to the image clutter that the visual system must deal with when organizing scenes into regions
edges partition scene into regions that exhibit uniform connectedness
uniform connectedness
:
a characteristic of regions of the retinal image that have approximately uniform properties
3) divide regions into figure and ground, assign border ownership
figure
:
a region of an image that is perceived as being part of an object
ground
:
a region of an image that is perceived as part of the background
border ownership
:
the perception that an edge, or border, is “owned” by a particular region of the retinal image
when one object occludes another, border between them belongs to occluding object
different principles that visual system uses to assign border ownership and organize visual scene into figure and ground
depth- when one region perceived to be in front of another, region in front is perceived as owning border between regions and is perceived as figure, while other perceived as ground
surroundedness- if region completely surrounded by another, then surrounded region tends to be perceived as owning border and being the figure
without additional information about relative depth of two regions, figure-on-background perception usually more compelling than hole-in-a-surface perception
symmetry- region with symmetrical borders more likely to be seen as figure than ground
convexity- regions that convex (outward-bulging) borders are more likely to be perceived as figures than are regions with concave (inward-going) borders
thought to reflect that most objects have smooth, convex shapes
meaningfullness- visual system recognizes object shapes prior to assignment of border ownership and determination of figure-ground organization
simplicity- refers to the number and placement of shapes composing the image
tend to see images as those containing the least number of shapes and those in which moving objects about in relation to each other would not change shape
area V2 very important in process of assigning border ownership
response of single cell in V2 with preferred orientation is greater when border shown in receptive field is part of figure on left than when it’s part of figure to the right
cell knows where figure is even though almost entire figure lies outside cell's receptive field
conclude that early visual areas (V1 and V2) include specialized networks that allow important information about border ownership and figure-ground organization to be computed and transmitted very rapidly among cells whose combined receptive fields cover large contiguous areas of visual scene
4) group together regions that have similar properties
perceptual grouping
:
the process by which the visual system combines separate regions of the retinal image that “go together” based on similar properties
attributes of regions that lead to them being grouped together
proximity- elements that are close together group more easily than elements that are far apart
similarity- similar elements tend to group together
common motion- (aka common fate) elements that move in unison are likely to be perceptually grouped
symmetry and parallelism- elements that are symmetrical or parallel tend to group together
good continuation- two edges that would meet if extended are perceived as single edge that has been partially occluded; also applies to curved edges
proposed neural basis for perceptual grouping invokes phenomenon of synchronized neural oscillations- neural spikes often happen in temporal pattern, or oscillation, where spokes come in clumps
if two neurons are representing regions that belong together then those neurons could indicate that the regions belong together by synchronizing their oscillations- producing clumps of spikes at the same time
if two receptive fields with same orientation preference experience bar moving through each at the same time in the same direction, neural oscillations strongly synchronized
suggests that three principles of perceptual grouping (similarity, good continuation, common motion) may be represented by synchronized neural oscillations
5) fill in missing edges and surfaces to obtain complete representation of candidate objects
perceptual interpolation
:
the process by which the visual system fills in hidden edges and surfaces in order to represent the entirety of a partially visible object
uses visible parts of object with knowledge about object shape and how edges tend to relate to one another in real scenes
consists of two different operations with somewhat different perceptual consequences
edge completion
:
the perception of a partially hidden edge as complete; one of the operations involved in perceptual interpolation
can contribute to the perception of optical illusions
illusory contours
:
nonexistent but perceptually real edges perceived as a result of edge completion
example is Kanizsa triangle- three black circles with slices removed, giving the perception of there being a triangle between them, covering part of each
surface completion
:
the perception of a partially hidden surface as complete; one of the operations involved in perceptual interpolation
with two black circles with seemingly complete white bar except for black line completing edge of black circles, edges appear to be occluded by surface of the page, so don’t experience white space between the disks as brighter than white of the page
in response to optical illusion that is result of perceptual interpolation, actual neural stimulation in manner that would be expected if optical illusion were not an illusion
demonstrates that experience of illusory contour is not abstract expectation or inference, but result of explicit perceptual representation quite early in visual stream
shows that neurons tend to base responses on information of scene outside their receptive field
object recognition
general information
three aspects of object recognition make process complicated
object variety
:
refers to the fact that the world contains an enormous variety of objects
variable views
:
the different retinal images that can be projected by the same object or category of objects
image clutter
:
a characteristic of visual scenes in which many objects are scattered in 3-D space, with partial occlusion of various parts of objects by other objects
recognition refers to the process of matching the representation of a stimulus to a representation stored in long-term memory, based on previous encounters with that stimulus or with similar stimulus
shape is usually most important cue
uses higher-level processes to represent objects fully enough to recognize them, by matching representations to representations stored in memory
representation in memory has to take into account fact that object can b viewed from infinite number of angles
takes place mostly in ventral visual pathway (what pathway) where neurons respond selectively to objects with complex curved contours in specific configurations
visual system that can recognize object as being same despite changes in retinal image exhibits invariance
looking at same object in two different ways produces different patterns of action potentials in optic nerves and different neural activity in V1 and V4, but somewhere patterns must be the same or at least related to recognize different stimuli as same object
two proposed approaches to understanding invariance
recognition by components
:
a model of object recognition that proposes that recognizing an object depends on first identifying the object’s basic 3-D shapes and how they fit together
based on idea that single representation active whenever object is seen, regardless of viewpoint, and that single representation involves specifying parts of object and spatial relationships
representations from this model tend to be too abstract; provides no way to differentiate two dogs that are same shape
leads to too much invariance; an invariant representation of all objects with same shape
other approach based on idea that objects are represented in view-specific manner
for any given object, multiple representations would be stored, each corresponding to different possible view; representation of current view would be compared to all stored representations
participants look longer to recognize objects presented from novel viewpoint, suggesting representations created while learning to recognize objects weren’t viewpoint invariant, but matched specific views studied
single neurons in inferotemporal cortex respond differently to different views, suggesting existence of distinct, view-specific representations
shape representation in V4
each V1 neuron tuned to respond maximally to an edge within narrow range of orientations and locations; tuning results from way V1 neurons are connected to LGN
more complex contours could be represented by neurons in V4 that combine responses of multiple V1 neurons, perhaps with input from V2 neurons
V4 neuron could combine responses from V1 neurons that indicate presence of short, straight edge with specific orientation/position and conclude that there is a longer, curved edge
however, individual neurons in area V4 respond most strongly to edges that can be more complex than those in V1 in at least 3 ways
edges to which V4 neurons respond strongly can be straight or curved sharply to broadly
V4 neurons have preferred orientations like neurons in V1, but a contour with preferred orientation will elicit strong response from V4 neuron only if contour is at particular angular position relative to entire shape that contour belongs to
V4 neurons have preferred location in retinal image, but preferred location covers larger region of retinal image than V1 neurons
means individual V4 neurons show some degree of invariance with response to location because response to edge with preferred curvature/orientation will be same across range of locations of edge within retinal image
led to conclusion that shapes are represented in V4 by combined activity of all neurons responding to contour fragments making up shape
V4 neurons have larger receptive fields and respond selectively to more complex characteristics of contour fragments, so representation of shape in V4 richer than in V1
single neurons in inferotemporal (IT) cortex have much larger receptive fields than even V4 neurons, covering almost entire retinal image, and each neuron selective for much more complex shapes than V4 neurons
IT neurons respond most strongly to specific combinations of contour fragments, located almost anywhere in visual field
shape representation based on structural description (description that specifies set of parts/contour fragments and their spatial relations)
suggests that representation of shape in brain is both parts based and view specific; compromise of two approaches
nature of ventral pathway with increasing complexity in higher processes introduces question of whether there are grandmother cells
2 more items...
recognition of different categories of objects
lateral occipital cortex theorized to be part of higher-level processing since responds selectively to pictures of objects but not simple features or textures
also exhibits invariance- activity not dependent on size, position, or other features of pictured object
regions of IT cortex have been shown to respond selectively to specific categories of objects
two main ideas about how objects are processed
modular coding
:
representation of an object by a module, a region of the brain that is specialized for representing a particular category of objects
module is part of brain that represents specific object
modules located in IT and occipital cortex along ventral pathway
supporting evidence from fMRI showing that viewing certain categories of objects can strongly activate specific brain regions
identified fusiform face area (FFA) in this way; located on fusiform gyrus along lower surface of temporal lobe
other areas are parahippocampal place area (PPA), responds to buildings/outdoor scenes and extrastriate body area, responds to human/animal body parts
also supporting evidence provided by patients with brain damage that results in visual agnosia
visual agnosia
:
an impairment in object recognition
can describe object but not name/describe function and when given name, can describe function/shape
prosopagnosia
:
a type of visual agnosia in which the person is unable to recognize faces, with little or no loss of ability to recognize other types of objects
inherited or produced by damage to face-selective regions
topographic agnosia
:
a type of visual agnosia in which the person is unable to recognize spatial layouts such as buildings, streets, landscapes, and so on
most likely due to damage in PPA
possible opposing evidence showing that response to houses compared to scrambled objects looks similar to responses to faces compared to scrambled objects and responses to chairs
each module also has high response to other categories of objects, suggesting representation of viewed objects is both modular and distributed
distributed coding
:
representation of objects by patterns of activity across many regions of the brain
regions along cortex of ventral pathway
supporting evidence provided by experiment that tested idea that neurons outside category-specific modules carry information about whether viewed object belongs to category
could predict with 95% accuracy what object category had been viewed based on brain activity, even when primary response region had been hidden
supported by observation that number of objects humans can recognize makes it unlikely that each has its own module
believe that brain uses combination of modular and distributed coding
modules for specific categories may not be clustered together, but may be distributed over relatively large areas of brain
seems likely that brain represents certain types of objects via modular coding but that distributed coding also plays significant role even with those objects and all others
top-down information
bottom-up is flow from retina to V1, V4, and beyond
incomplete without considerations from top-down information- flow from higher regions to lower regions
involves perceiver's goals, attention, knowledge, expectations
neither perceiving the gist of a scene nor identifying the objects comes before the other- perception of the gist of a scene improves recognition of the objects in a scene and at the same time recognition of objects improves perception of a gist
both identifying objet and background were more accurate when figure/ground were consistent
possible that visual system first creates representation of general, overall layout of scene and tries to match with representations of general layout of specific categories of scenes stored in memory when perceiving gist
general layout that can rapidly be matched to layout stored in memory provides gist that visual system can use to guide identification of specific objects in scene
then narrows down range of probable objects to be recognized
top-down information combines with bottom-up in central pathway to speed up process of fully recognizing objects in scene
unconscious inference and bayesian approach
hypothesize that brain is always predicting immediate perceptual future, using both past and current scene as evidence
Helmholtz spoke of vision as process of 'unconscious inference'- mind always trying to infer what scene is that produced current retinal image based on information in image and information in memory
Bayesian approach: in object recognition, the use of mathematical probabilities to describe the process of perceptual inference
states that unconscious takes into account two probabilities in order to infer what type of scene produced the currently experienced retinal image
the prior probability of all possible scenes
the probability that each possible scene produced the current retinal image