Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 6: Perceiving Depth (integrating depth cues (drawn conclusions…
Chapter 6: Perceiving Depth
Learning to See in 3-D
both patients with strabismus- eyes didn’t point in same direction; produced double image and visual system automatically suppressed the vision from one eye or the other at all times
means she was stereoblind- unable to experience depth from brain combining two slightly different perceptions of retinas
only able to see depth with visual cues or relative motions
when given prism glasses in one eye, had to practice overcoming automatic suppression of one eye and then exercise in fusing (combining) images
only visual information we have that enables us to perceive where things are in 3-D space around us is information in retinal images in two eyes
vertical and horizontal dimensions of 3-D space are explicitly represented in each retinal image
fundamental goal of depth perception is to let us accurately perceive 3-D world on the basis of two 2-D retinal images- one in each eye
any given 2-D retinal image could be produced by infinite variety of 3-D scenes
representation of 3-D space in 2-D retinal image is many-to-one; many different 3-D scenes can produce one and the same retinal image
depth perception accomplished by using various properties of the retinal image as reliable (but not infallible) cues to depth
feedback from muscles in and around eyes also provides information about depth
depth cues
oculomotor cues
accommodation
convergence
cues based on retinal image
monocular cues
static cues
position-based cues
partial occlusion
relative height
size-based cues
familiar size
relative size
texture gradients
linear perspective
lighting-based cues
atmospheric perspective
shading
cast shadows
dynamic cues
motion parallax
deletion and accretion
optic flow
binocular cue
binocular disparity
oculomotor depth cues
oculomotor depth cues:
cues that are based on feedback from the oculomotor muscles controlling the shape of the lens and the position of the eyes
the adjustments your eyes make to focus on anything right in front of eyes to about 2m away
work because of two different sets of oculomotor muscles- those that control shape of lens and those that control position of eyes
accommodation
in accommodation, shape of lens adjusts to focus an image sharply on retina
something more than a few feet away, need flat lens so ciliary muscles relax; something closer, need thick lens so ciliary muscles contract
shape determined by autonomic nervous system
visual system uses information about focussing as cue to distance of object
accommodation provides depth information only for objects up to about 2m away, and even then information very imprecise
convergence
if looking at something very far away or more than a few meters away, lines of gaze about parallel
eyes must converge (turn inward, toward each other) to focus as object moves closer
angle between lines of gaze of two eyes decreases as distance increases
convergence change very little when object beyond 2m away, which is why convergence doesn’t provide much depth information beyond this point
monocular depth cues
depth cues based on information in retinal image much more important than oculomotor cues because operate across much greater range of distance
monocular depth cues:
cues that are based on the retinal image and that provide information about depth even with only one eye open
static monocular cues (aka pictorial) are seen in motionless 2-D depictions of 3-D scenes
static cues: position, size, and lighting in the retinal image
position in the retinal image
two important cues based on position of objects in retinal image
partial occlusion (or interposition):
a position-based depth cue—in scenes where one object partially hides (occludes) another object, the occlusion indicates that the former is closer than the latter
works because almost all objects are opaque
requires assumptions of which we are mostly unconscious- assumptions about the nature of objects in a scene and assumptions about how those objects are arranged in 3-D space
T-junctions are intersections between the edges of the two objects
make most likely interpretation that object continues behind the one it is occluded by because other interpretations would require highly unlikely accidental alignment of shapes with line of sight
example of figure-ground organization
key assumption when perceiving depth on the basis of partial occlusion is that the objects in a scene and their arrangement with respect to each other and the observer are as simple and natural as possible
relative height:
a position-based depth cue—the relative height of the objects in the retinal image with respect to the horizon (or with respect to eye level of there is no visible horizon) provides information about the relative distances of the objects from the observer
lets us infer depth from the position of objects in relation to the horizon or eye level
below eye level, objects situated lower in the image are closer to the observer
power indicated by fact that relative height in retinal image affects depth perception even in scenes where there is no visible floor or ceiling
size in the retinal image
size-distance relation:
the farther away an object is from the observer, the smaller is its retinal image
retinal image size of an object can be measured in terms of its visual angle
visual angle:
the angle subtended (occupied) by an object in the field of view
the size-distance relation is exact—the size of the retinal image decreases in the same proportion as the distance to the object increases
scenes in which the size-distance relation is apparent are said to contain size perspective information
size perspective:
depth information in scenes in which the size-distance relation is apparent
refers to this regular decrease in the retinal image size of objects as their distance from the observer increases
familiar size:
a size-based depth cue—knowing the retinal image size of a familiar object at a familiar distance lets us use its retinal image size to gauge its distance
if retinal image of person is about half as big as the familiar-size retinal image of them standing 4m away, then size-distance relation tells us they are about 8m away
relative size:
a size-based depth cue—under the assumption that two or more objects are about the same size, the relative size of their retinal images can be used to judge their relative distances
difference between depth cues of familiar size and relative size- the cue of familiar size is based on our familiarity with the renal image size of objects with known sizes at known distances, whereas the cue of relative size requires only that we assume objects are of approximately equal size, but doesn't require that they be familiar
texture gradient:
a size-based depth cue—if surface variations or repeated elements of a surface are fairly regular in size and spacing, the retinal image size of these equal-size features decreases as their distance increases
many surfaces have visible texture, either because of variations in structure of surface or because surface is composed of repeated elements
surface variations or repeated elements usually regular in size and spacing
special case of relative size cue
linear perspective:
a size-based depth cue—parallel lines appear to converge as they recede in depth
reason is that fixed distance between two lines projects a smaller and smaller retinal image as it recedes from the observer
lighting in the retinal image
atmospheric perspective:
a lighting-based depth cue--the farther away an object is, the more air the light must pass through to reach us and the more that light can be scattered, with the result that distant objects appear less distinct than nearby objects
happens because of different particles in light that can scatter light
distant objects can also appear more bluish because atmospheric haze tends to scatter short-wavelength blue light more than others
light falls on curved surfaces in ways that give rise to shading differences, because some parts of the surface are illuminated more directly than others; gives us information about relative depth and orientation of different parts of surface
unconscious assumptions play role in how interpret this depth cue; assume that just one primary light source and that it is above the scene
depth can also be signaled by the shadows cast by objects
static monocular depth cues:
cues that provide information about depth on the basis of the position of objects in the retinal image, the size of the objects in the retinal image, and the effects of lighting in the retinal image
dynamic monocular cues involve motion; provide information about depth when walking or watching moving scene
as move through scene, see it from constantly changing viewpoints; result in chances in positions of objects in retinal image relative to each other, provide information about layout of objects
three types of dynamic (motion-based) cues
motion parallax
motion parallax:
a dynamic depth cue—the difference in the speed and direction with which objects appear to move in the retinal image as an observer moves within a scene
involves observer movement from side to side
retinal image of closer objects moves further on retina as observer moves than does retinal image of further objects; means retinal image of closer objects moves at faster speed than retinal image of further objects
if focus on object at intermediate distance while moving in one direction, object closer appears to move opposite to your direction of motion while object further appears to move in same direction as your direction of motion
optic flow
type of motion parallax
optic flow:
a dynamic depth cue—the relative motions of objects and surfaces in the retinal image as the observer moves forward or backward through a scene
close objects approach you and disappear out edges of field of view rapidly
more distant landscape flows outward more slowly but also eventually disappears out of side of field of view
objects far away/in front of you seem fixed in view
objects and surfaces near point toward which you are heading (focus of expansion) move outward slowly in retinal image; objects closer to you move away from focus of expansion much more rapidly
deletion and accretion
deletion:
a dynamic depth cue—the gradual hiding (occlusion) of an object as it passed behind another one
accretion:
a dynamic depth cue—the gradual revealing (de-occlusion) of an object as it emerges from behind another one
as object moves behind and out from behind another object, cue that occluding object is closer to viewer than hidden object
binocular depth cue: disparity in the retinal images
to see monocular depth cues, person doesn’t have to view them with both eyes
stereopsis (or stereoscopic depth perception):
the vivid sense of depth arising from the visual system’s processing of the different retinal images in the two eyes
binocular disparity
corresponding and noncorresponding points, and the horopter
corresponding points:
a point on the left retina and point on the right retina that would coincide if the two retinas were superimposed—for example, the foveas of the two eyes
two points that are each 4mm to the left of the fovea in each eye are also corresponding points
noncorresponding points:
a point on the left retina and a point on the right retina that wouldn’t coincide if the two retinas were superimposed—for example, the fovea of on eyes and a point 4 mm to the right of the fovea in the other eye
horopter:
an imaginary surface defined by the locations in a scene from which objects would project retinal images at corresponding points
whenever an observer fixates an object (points both eyes at it), a horopter is established
objects that are either further or closer than the horopter will project retinal images that fall on noncorresponding points and will be perceived as being either nearer or farther than objects on the horopter
crossed disparity, uncrossed disparity, and zero disparity
three types binocular disparity—crossed disparity, uncrossed disparity, zero disparity
crossed disparity a type of binocular disparity produced by an object that is closer than the horopter—you would have to “cross” your eyes to look at it
in order to switch focus to an object that is closer than horopter, would have to increase angle of convergence between eyes
uncrossed disparity:
a type of binocular disparity produced by an object that is farther away than the horopter—you would have to ‘uncross’ your eyes to look at it
in order to switch focus to an object that is further than horopter, would have to decrease angle of convergence between eyes
zero disparity:
a type of binocular disparity in which the retinal image of an object falls at corresponding points in the two eyes
seen in any object that is being focussed on or any other object that lies on horopter
the distance between the images of an object in the views from two eyes (magnitude of binocular disparity) increases as distance of the object from the horopter increases
stereopsis typically provides information about relative depth out to distance of about 200 m
binocular disparity:
a depth cue based on differences in the relative positions of the retinal images of objects in the two eyes
correspondence problem
stereograms and anaglyphs
stereogram:
two depictions of a scene that differ in the same way as an observer’s two retinal images of that scene would differ; an observer who simultaneously views one image with one eye and the other image with the other eye (as in a stereoscope) will see a combined image in depth
happens because brain automatically interprets the retinal images of the photographs in terms of binocular disparity
anaglyph:
a stereogram in which the two photographs taken from adjacent camera positions are printed in contrasting colors and then superimposed; an observer who views an anaglyph with special glasses in which one lens filters out one of the colors and the other lens filters out the other color will see a single image in depth
one eye sees red image and one eye sees blue image; visual system interprets the difference between the images in terms of binocular disparity; see single combined image giving the vivid impression of depth
random dot stereograms
crucial experiment in solving correspondence problem
random dot stereogram (RDS):
a stereogram in which both images consist of a grid of randomly arranged dots, identical except for the displacement of a portion in one image relative to the other; an observer who views a random dot stereogram in a stereoscope or as an anaglyph will see a single image with the displaced portion in depth
difference isn’t apparent when looking with both eyes, but when looking at each image with one eye, background has zero disparity whereas central portion exhibit binocular disparity since each dot on left image has pair in right image that is slightly displaced
how RDS addresses question of whether correspondence matching precedes or follows object recognition
correspondence matching is necessary for perception of binocular disparity
if object recognition necessarily precedes correspondence matching, an RDS wouldn’t produce sense of depth, because doesn’t contain any objects
RDS does produce sense of depth, so correspondence matching must precede object recognition
Marr/Poggio suggests visual system solves correspondence problem by making two simple assumptions about world when matching features in left/right retinal images
each feature in one retina will match one and only one feature in other retinal image
visual scenes tend to consist of smooth and continuous surfaces with relatively few abrupt changes in depth; almost every point in field of view is surrounded by by points that are about the same depth
if retinal image of object exhibits crossed disparity, visual system knows that object is closer than fixated object and knows that the greater the disparity, the greater the difference in depth from the fixated object
correspondence problem:
the problem of determining which features in the retinal image in one eye correspond to which features in the retinal image in the other eye
when produces retinal image, how does brain know that image on left retina was produced by same object as image on right retina
two very different ways in which visual system may solve correspondency problem
visual system surveys left and right retinal images and separately performs 2-D object recognition on them; ‘labels’ each feature of each retinal image as belonging to ab object in the scene
object recognition precedes correspondence matching
the visual system matches parts of the retinal images based on very simple properties such as color or edge orientation before proceeding to object recognition (without assigning object labels)
matching precedes object recognition
neural basis of stereopsis
how the brain measures binocular disparity in order to extract depth information from binocular view
binocular cells:
neurons that respond best to the stimulation of their receptive fields in both eyes simultaneously
receptive field of neuron is region of the retina that, when stimulated, causes the neuron to change its firing rate
object must fall at certain point in left retina and certain point in right retina in order to stimulate cells
receptive fields in each retina are at noncorresponding points
if binocular cell receives stimulus from only one eye, cell doesn’t respond
binocular cell responds when receptive fields are stimulated by object that exhibits particular magnitude of binocular disparity
different binocular cells tuned to different disparities—crossed, uncrossed, zero—and cell tuned to particular disparity will be tuned to particular magnitude of disparity
binocular cells found in both ventral and dorsal pathways
serve an important role in allowing visual system to segment scene into distinct visual objects
integrating depth cues
in most real-world situations, use many cues simultaneously to obtain information about depth, usually cues consistent with one another
have redundant information from many different sources to ensure accurate perception of depth since evolutionarily important
different depth cues supply the most useful depth information under different conditions
drawn conclusions from manipulating number of depth cues and degree to which they conflict in order to discover how we combine different depth cues to yield single coherent interpretation of scene
no single depth cue dominates in all situations and no single depth cue is necessary in all situations
partial occlusion may be closest to being dominant
the more depth cues present in scene, greater likelihood that we’ll perceive depth and greater in accuracy and consistency of depth perception
cues differ in kinds of information they provide, use differences to construct more accurate view of layout of scene
depth perception based on multiple cues is rapid, automatic process that occurs without conscious thought; employs ‘unconscious inference’ to make best guess about layout of scene based on current retinal images
similar to Bayesian approach of combining knowledge about probability of possible scene with information from current scene to deal with ambiguous visual information
combines depth estimate provided by each cue in weighted average taking into account reliability of each cue in given context
also assumed to take into account prior knowledge of the possible depth in scene when estimating depth
depth and perceptual constancy
size constancy and size-distance invariance
eyes normally compensate for distance when perceiving size of object, but doesn’t always take place
depends on presence of depth cues that provide information about distance
experiment showed people able to judge true sizes of disks if they could get information about relative distances of disks from depth cues such as binocular disparity, motion parallax, linear perspective
as cues removed, judgement of disk sizes increasingly based on size of retinal image
the size-distance relation tells us that suede of retinal image of any rigid object depends on two factors- object’s actual size and object’s distance from observer
retinal image of object becomes smaller as object recedes in depth
size constancy:
a type of perceptual constancy—the tendency to perceive an object’s size as constant despite changes in the size of the object’s retinal image due to the object’s changing distance from the observer
as object backs away, don’t see it as shrinking in size
size-distance invariance:
the relation between perceived size and perceived distance: the perceived size of an object depends on its perceived distance, and vice versa
Emmert’s law:
size-distance invariance of retinal afterimages—the perceived size of an afterimage is proportional to the distance of the surface on which it is ‘projected’
shape constancy and shape-slant invariance
retinal image shape also depends on two factors- object’s actual shape and object’s slant (its orientation relative to the observer’s line of sight
only projects actual shape when object is perpendicular to line of sight of observer
shape constancy:
a type of perceptual constancy—the tendency to perceive an object’s shape as constant despite changes in the shape of the object’s retinal image due to the object’s changing orientation
shape-slant invariance:
the relation between perceived shape and perceived slant: the perceived shape of an object depends on its perceived slant, and vice versa
if looked at object and had no way of knowing what slant was, would have no way of knowing what the actual shape of the object is
in real life, can estimate slant by deriving information from cues such as shading, binocular disparity, texture gradients
once slant of object is judged correctly, perception of its true shape follows
experience perceptual constancy when perceive some properties of an object as constant despite changes in sensory information used to perceive that property
includes color constancy and lightness constancy- perceived color and lightness of object tend to remain constant despite changes in wavelengths and intensity of light reflected into eyes due to changes in lighting conditions
illusions of depth, size, and shape
forced perspective
many visual illusions work by exploiting visual system’s use of size-distance invariance
forced perspective:
an illusion in which a near and a far object seem to be at about the same depth because of the way they’re aligned and the way they appear to be interacting, leading the observer to disregard other depth cues in the scene
ponzo illusion
illustrates the principle of size-distance invariance and the powerful influence of linear perspective on size perception
two lines diverging in distance like rail road tracks and tend to see object placed further away on tracks as bigger than object placed closer on tracks
visual system looks at photographs same way it looks at natural world, by seeking out the most likely natural interpretation, even if it involves depth and there is no true depth in photographs
can also use partial occlusion as well as linear perspective and other depth cues in illusion
depth perception believed to require processing in brain areas beyond V1
find larger area of activity in V1 when fixate on larger figure than when fixating on smaller fixture, suggesting that information about depth was transmitted back to V1, influencing activity in early visual area
ames room
Ames room:
a room specifically designed to create an illusory perception of depth; when viewed with one eye through a peephole, objects along the far wall look like they are all the same distance away, leading to a misperception of their relative size
works by leading observer to perceive the people in each corner as being at the same depth
look through peephole with just one eye to eliminate binocular disparity
can be seen as a failure of shape constancy; visual system evolved to seek most likely natural interpretation of any given scene and arrangement of surfaces and shapes in Ames room is so unlikely that visual system is willing to accept that person could grow/shrink as they walk across room rather than accept complex combination of factors that are necessary to produce an Ames room has actually occurred
moon illusion
moon looks much bigger at horizon compared to when it’s high in the sky
physical size of retinal image of moon doesn’t change with position in the sky; perceived size difference is perceptual phenomenon
still don’t have definitive answer, but most widely accepted explanation is that it results from misperception of distance, like most other size illusions
argue that perceived distance is greater when moon at horizon perhaps because we perceive moon to be at the height of the clouds and depth cue of relative height tells us that clouds near horizon are farther away than clouds directly overhead
if moon is perceived to be at height of clouds, then fixed size of moon’s retinal image will lead us to perceive moon as bigger near the horizon, based on principle of size-distance invariance
however, find that illusion still stands when there are no clouds and that moon is actually perceived as closer when at horizon
tabletop illusion
two tabletops are identical in shape and size, but look different because of perceived slants
tabletop on left perceived as slanting back in depth lengthwise, which means perceive length as being foreshortened; assumes actual length of table os greater than its length on page
front edge of tabletop on right is perceived as being at constant depth, so visual system assumes table is exactly as long as it appears to be
opposite perceptual exaggerations of length and width change perceived shapes in way that cannot be overcome even if know retinal images are exact same shape/size