Chapter 6: Perceiving Depth

Learning to See in 3-D

both patients with strabismus- eyes didn’t point in same direction; produced double image and visual system automatically suppressed the vision from one eye or the other at all times

means she was stereoblind- unable to experience depth from brain combining two slightly different perceptions of retinas

only able to see depth with visual cues or relative motions

when given prism glasses in one eye, had to practice overcoming automatic suppression of one eye and then exercise in fusing (combining) images

only visual information we have that enables us to perceive where things are in 3-D space around us is information in retinal images in two eyes

vertical and horizontal dimensions of 3-D space are explicitly represented in each retinal image

fundamental goal of depth perception is to let us accurately perceive 3-D world on the basis of two 2-D retinal images- one in each eye

any given 2-D retinal image could be produced by infinite variety of 3-D scenes

representation of 3-D space in 2-D retinal image is many-to-one; many different 3-D scenes can produce one and the same retinal image

depth perception accomplished by using various properties of the retinal image as reliable (but not infallible) cues to depth

feedback from muscles in and around eyes also provides information about depth

depth cues

oculomotor cues

cues based on retinal image

accommodation

convergence

monocular cues

binocular cue

static cues

dynamic cues

position-based cues

partial occlusion

relative height

size-based cues

familiar size

relative size

texture gradients

linear perspective

lighting-based cues

atmospheric perspective

shading

cast shadows

motion parallax

deletion and accretion

optic flow

binocular disparity

oculomotor depth cues

oculomotor depth cues:
cues that are based on feedback from the oculomotor muscles controlling the shape of the lens and the position of the eyes

the adjustments your eyes make to focus on anything right in front of eyes to about 2m away

work because of two different sets of oculomotor muscles- those that control shape of lens and those that control position of eyes

accommodation

convergence

in accommodation, shape of lens adjusts to focus an image sharply on retina

something more than a few feet away, need flat lens so ciliary muscles relax; something closer, need thick lens so ciliary muscles contract

shape determined by autonomic nervous system

visual system uses information about focussing as cue to distance of object

accommodation provides depth information only for objects up to about 2m away, and even then information very imprecise

if looking at something very far away or more than a few meters away, lines of gaze about parallel

eyes must converge (turn inward, toward each other) to focus as object moves closer

angle between lines of gaze of two eyes decreases as distance increases

convergence change very little when object beyond 2m away, which is why convergence doesn’t provide much depth information beyond this point

monocular depth cues

depth cues based on information in retinal image much more important than oculomotor cues because operate across much greater range of distance

monocular depth cues:
cues that are based on the retinal image and that provide information about depth even with only one eye open

static monocular cues (aka pictorial) are seen in motionless 2-D depictions of 3-D scenes

dynamic monocular cues involve motion; provide information about depth when walking or watching moving scene

static cues: position, size, and lighting in the retinal image

position in the retinal image

size in the retinal image

lighting in the retinal image

static monocular depth cues:
cues that provide information about depth on the basis of the position of objects in the retinal image, the size of the objects in the retinal image, and the effects of lighting in the retinal image

two important cues based on position of objects in retinal image

partial occlusion (or interposition):
a position-based depth cue—in scenes where one object partially hides (occludes) another object, the occlusion indicates that the former is closer than the latter

works because almost all objects are opaque

requires assumptions of which we are mostly unconscious- assumptions about the nature of objects in a scene and assumptions about how those objects are arranged in 3-D space

T-junctions are intersections between the edges of the two objects

make most likely interpretation that object continues behind the one it is occluded by because other interpretations would require highly unlikely accidental alignment of shapes with line of sight

example of figure-ground organization

key assumption when perceiving depth on the basis of partial occlusion is that the objects in a scene and their arrangement with respect to each other and the observer are as simple and natural as possible

relative height:
a position-based depth cue—the relative height of the objects in the retinal image with respect to the horizon (or with respect to eye level of there is no visible horizon) provides information about the relative distances of the objects from the observer

lets us infer depth from the position of objects in relation to the horizon or eye level

below eye level, objects situated lower in the image are closer to the observer

power indicated by fact that relative height in retinal image affects depth perception even in scenes where there is no visible floor or ceiling

size-distance relation:
the farther away an object is from the observer, the smaller is its retinal image

retinal image size of an object can be measured in terms of its visual angle

visual angle:
the angle subtended (occupied) by an object in the field of view

the size-distance relation is exact—the size of the retinal image decreases in the same proportion as the distance to the object increases

scenes in which the size-distance relation is apparent are said to contain size perspective information

size perspective:
depth information in scenes in which the size-distance relation is apparent

refers to this regular decrease in the retinal image size of objects as their distance from the observer increases

familiar size:
a size-based depth cue—knowing the retinal image size of a familiar object at a familiar distance lets us use its retinal image size to gauge its distance

if retinal image of person is about half as big as the familiar-size retinal image of them standing 4m away, then size-distance relation tells us they are about 8m away

relative size:
a size-based depth cue—under the assumption that two or more objects are about the same size, the relative size of their retinal images can be used to judge their relative distances

difference between depth cues of familiar size and relative size- the cue of familiar size is based on our familiarity with the renal image size of objects with known sizes at known distances, whereas the cue of relative size requires only that we assume objects are of approximately equal size, but doesn't require that they be familiar

texture gradient:
a size-based depth cue—if surface variations or repeated elements of a surface are fairly regular in size and spacing, the retinal image size of these equal-size features decreases as their distance increases

many surfaces have visible texture, either because of variations in structure of surface or because surface is composed of repeated elements

surface variations or repeated elements usually regular in size and spacing

special case of relative size cue

linear perspective:
a size-based depth cue—parallel lines appear to converge as they recede in depth

reason is that fixed distance between two lines projects a smaller and smaller retinal image as it recedes from the observer

atmospheric perspective:
a lighting-based depth cue--the farther away an object is, the more air the light must pass through to reach us and the more that light can be scattered, with the result that distant objects appear less distinct than nearby objects

happens because of different particles in light that can scatter light

distant objects can also appear more bluish because atmospheric haze tends to scatter short-wavelength blue light more than others

light falls on curved surfaces in ways that give rise to shading differences, because some parts of the surface are illuminated more directly than others; gives us information about relative depth and orientation of different parts of surface

unconscious assumptions play role in how interpret this depth cue; assume that just one primary light source and that it is above the scene

depth can also be signaled by the shadows cast by objects

as move through scene, see it from constantly changing viewpoints; result in chances in positions of objects in retinal image relative to each other, provide information about layout of objects

three types of dynamic (motion-based) cues

motion parallax

optic flow

deletion and accretion

motion parallax:
a dynamic depth cue—the difference in the speed and direction with which objects appear to move in the retinal image as an observer moves within a scene

involves observer movement from side to side

retinal image of closer objects moves further on retina as observer moves than does retinal image of further objects; means retinal image of closer objects moves at faster speed than retinal image of further objects

if focus on object at intermediate distance while moving in one direction, object closer appears to move opposite to your direction of motion while object further appears to move in same direction as your direction of motion

type of motion parallax

optic flow:
a dynamic depth cue—the relative motions of objects and surfaces in the retinal image as the observer moves forward or backward through a scene

close objects approach you and disappear out edges of field of view rapidly

more distant landscape flows outward more slowly but also eventually disappears out of side of field of view

objects far away/in front of you seem fixed in view

objects and surfaces near point toward which you are heading (focus of expansion) move outward slowly in retinal image; objects closer to you move away from focus of expansion much more rapidly

deletion:
a dynamic depth cue—the gradual hiding (occlusion) of an object as it passed behind another one

accretion:
a dynamic depth cue—the gradual revealing (de-occlusion) of an object as it emerges from behind another one

as object moves behind and out from behind another object, cue that occluding object is closer to viewer than hidden object

binocular depth cue: disparity in the retinal images

to see monocular depth cues, person doesn’t have to view them with both eyes

stereopsis (or stereoscopic depth perception):
the vivid sense of depth arising from the visual system’s processing of the different retinal images in the two eyes

binocular disparity

corresponding and noncorresponding points, and the horopter

crossed disparity, uncrossed disparity, and zero disparity

correspondence problem

stereograms and anaglyphs

random dot stereograms

neural basis of stereopsis

binocular disparity:
a depth cue based on differences in the relative positions of the retinal images of objects in the two eyes

corresponding points:
a point on the left retina and point on the right retina that would coincide if the two retinas were superimposed—for example, the foveas of the two eyes

two points that are each 4mm to the left of the fovea in each eye are also corresponding points

noncorresponding points: a point on the left retina and a point on the right retina that wouldn’t coincide if the two retinas were superimposed—for example, the fovea of on eyes and a point 4 mm to the right of the fovea in the other eye

horopter:
an imaginary surface defined by the locations in a scene from which objects would project retinal images at corresponding points

whenever an observer fixates an object (points both eyes at it), a horopter is established

objects that are either further or closer than the horopter will project retinal images that fall on noncorresponding points and will be perceived as being either nearer or farther than objects on the horopter

three types binocular disparity—crossed disparity, uncrossed disparity, zero disparity

crossed disparity a type of binocular disparity produced by an object that is closer than the horopter—you would have to “cross” your eyes to look at it

in order to switch focus to an object that is closer than horopter, would have to increase angle of convergence between eyes

uncrossed disparity:
a type of binocular disparity produced by an object that is farther away than the horopter—you would have to ‘uncross’ your eyes to look at it

in order to switch focus to an object that is further than horopter, would have to decrease angle of convergence between eyes

zero disparity:
a type of binocular disparity in which the retinal image of an object falls at corresponding points in the two eyes

seen in any object that is being focussed on or any other object that lies on horopter

the distance between the images of an object in the views from two eyes (magnitude of binocular disparity) increases as distance of the object from the horopter increases

stereopsis typically provides information about relative depth out to distance of about 200 m

if retinal image of object exhibits crossed disparity, visual system knows that object is closer than fixated object and knows that the greater the disparity, the greater the difference in depth from the fixated object

correspondence problem:
the problem of determining which features in the retinal image in one eye correspond to which features in the retinal image in the other eye

when produces retinal image, how does brain know that image on left retina was produced by same object as image on right retina

two very different ways in which visual system may solve correspondency problem

visual system surveys left and right retinal images and separately performs 2-D object recognition on them; ‘labels’ each feature of each retinal image as belonging to ab object in the scene

object recognition precedes correspondence matching

the visual system matches parts of the retinal images based on very simple properties such as color or edge orientation before proceeding to object recognition (without assigning object labels)

matching precedes object recognition

stereogram:
two depictions of a scene that differ in the same way as an observer’s two retinal images of that scene would differ; an observer who simultaneously views one image with one eye and the other image with the other eye (as in a stereoscope) will see a combined image in depth

happens because brain automatically interprets the retinal images of the photographs in terms of binocular disparity

anaglyph:
a stereogram in which the two photographs taken from adjacent camera positions are printed in contrasting colors and then superimposed; an observer who views an anaglyph with special glasses in which one lens filters out one of the colors and the other lens filters out the other color will see a single image in depth

one eye sees red image and one eye sees blue image; visual system interprets the difference between the images in terms of binocular disparity; see single combined image giving the vivid impression of depth

crucial experiment in solving correspondence problem

random dot stereogram (RDS):
a stereogram in which both images consist of a grid of randomly arranged dots, identical except for the displacement of a portion in one image relative to the other; an observer who views a random dot stereogram in a stereoscope or as an anaglyph will see a single image with the displaced portion in depth

difference isn’t apparent when looking with both eyes, but when looking at each image with one eye, background has zero disparity whereas central portion exhibit binocular disparity since each dot on left image has pair in right image that is slightly displaced

how RDS addresses question of whether correspondence matching precedes or follows object recognition

correspondence matching is necessary for perception of binocular disparity

if object recognition necessarily precedes correspondence matching, an RDS wouldn’t produce sense of depth, because doesn’t contain any objects

RDS does produce sense of depth, so correspondence matching must precede object recognition

Marr/Poggio suggests visual system solves correspondence problem by making two simple assumptions about world when matching features in left/right retinal images

each feature in one retina will match one and only one feature in other retinal image

visual scenes tend to consist of smooth and continuous surfaces with relatively few abrupt changes in depth; almost every point in field of view is surrounded by by points that are about the same depth

how the brain measures binocular disparity in order to extract depth information from binocular view

binocular cells:
neurons that respond best to the stimulation of their receptive fields in both eyes simultaneously

receptive field of neuron is region of the retina that, when stimulated, causes the neuron to change its firing rate

object must fall at certain point in left retina and certain point in right retina in order to stimulate cells

receptive fields in each retina are at noncorresponding points

if binocular cell receives stimulus from only one eye, cell doesn’t respond

binocular cell responds when receptive fields are stimulated by object that exhibits particular magnitude of binocular disparity

different binocular cells tuned to different disparities—crossed, uncrossed, zero—and cell tuned to particular disparity will be tuned to particular magnitude of disparity

binocular cells found in both ventral and dorsal pathways

serve an important role in allowing visual system to segment scene into distinct visual objects

integrating depth cues

in most real-world situations, use many cues simultaneously to obtain information about depth, usually cues consistent with one another

have redundant information from many different sources to ensure accurate perception of depth since evolutionarily important

different depth cues supply the most useful depth information under different conditions

drawn conclusions from manipulating number of depth cues and degree to which they conflict in order to discover how we combine different depth cues to yield single coherent interpretation of scene

no single depth cue dominates in all situations and no single depth cue is necessary in all situations

partial occlusion may be closest to being dominant

the more depth cues present in scene, greater likelihood that we’ll perceive depth and greater in accuracy and consistency of depth perception

cues differ in kinds of information they provide, use differences to construct more accurate view of layout of scene

depth perception based on multiple cues is rapid, automatic process that occurs without conscious thought; employs ‘unconscious inference’ to make best guess about layout of scene based on current retinal images

similar to Bayesian approach of combining knowledge about probability of possible scene with information from current scene to deal with ambiguous visual information

combines depth estimate provided by each cue in weighted average taking into account reliability of each cue in given context

also assumed to take into account prior knowledge of the possible depth in scene when estimating depth

depth and perceptual constancy

size constancy and size-distance invariance

shape constancy and shape-slant invariance

experience perceptual constancy when perceive some properties of an object as constant despite changes in sensory information used to perceive that property

includes color constancy and lightness constancy- perceived color and lightness of object tend to remain constant despite changes in wavelengths and intensity of light reflected into eyes due to changes in lighting conditions

eyes normally compensate for distance when perceiving size of object, but doesn’t always take place

depends on presence of depth cues that provide information about distance

experiment showed people able to judge true sizes of disks if they could get information about relative distances of disks from depth cues such as binocular disparity, motion parallax, linear perspective

as cues removed, judgement of disk sizes increasingly based on size of retinal image

the size-distance relation tells us that suede of retinal image of any rigid object depends on two factors- object’s actual size and object’s distance from observer

retinal image of object becomes smaller as object recedes in depth

size constancy:
a type of perceptual constancy—the tendency to perceive an object’s size as constant despite changes in the size of the object’s retinal image due to the object’s changing distance from the observer

as object backs away, don’t see it as shrinking in size

size-distance invariance:
the relation between perceived size and perceived distance: the perceived size of an object depends on its perceived distance, and vice versa

Emmert’s law:
size-distance invariance of retinal afterimages—the perceived size of an afterimage is proportional to the distance of the surface on which it is ‘projected’

retinal image shape also depends on two factors- object’s actual shape and object’s slant (its orientation relative to the observer’s line of sight

only projects actual shape when object is perpendicular to line of sight of observer

shape constancy:
a type of perceptual constancy—the tendency to perceive an object’s shape as constant despite changes in the shape of the object’s retinal image due to the object’s changing orientation

shape-slant invariance:
the relation between perceived shape and perceived slant: the perceived shape of an object depends on its perceived slant, and vice versa

if looked at object and had no way of knowing what slant was, would have no way of knowing what the actual shape of the object is

in real life, can estimate slant by deriving information from cues such as shading, binocular disparity, texture gradients

once slant of object is judged correctly, perception of its true shape follows

illusions of depth, size, and shape

forced perspective

ponzo illusion

ames room

moon illusion

tabletop illusion

many visual illusions work by exploiting visual system’s use of size-distance invariance

forced perspective:
an illusion in which a near and a far object seem to be at about the same depth because of the way they’re aligned and the way they appear to be interacting, leading the observer to disregard other depth cues in the scene

illustrates the principle of size-distance invariance and the powerful influence of linear perspective on size perception

two lines diverging in distance like rail road tracks and tend to see object placed further away on tracks as bigger than object placed closer on tracks

visual system looks at photographs same way it looks at natural world, by seeking out the most likely natural interpretation, even if it involves depth and there is no true depth in photographs

can also use partial occlusion as well as linear perspective and other depth cues in illusion

depth perception believed to require processing in brain areas beyond V1

find larger area of activity in V1 when fixate on larger figure than when fixating on smaller fixture, suggesting that information about depth was transmitted back to V1, influencing activity in early visual area

Ames room:
a room specifically designed to create an illusory perception of depth; when viewed with one eye through a peephole, objects along the far wall look like they are all the same distance away, leading to a misperception of their relative size

works by leading observer to perceive the people in each corner as being at the same depth

look through peephole with just one eye to eliminate binocular disparity

can be seen as a failure of shape constancy; visual system evolved to seek most likely natural interpretation of any given scene and arrangement of surfaces and shapes in Ames room is so unlikely that visual system is willing to accept that person could grow/shrink as they walk across room rather than accept complex combination of factors that are necessary to produce an Ames room has actually occurred

moon looks much bigger at horizon compared to when it’s high in the sky

physical size of retinal image of moon doesn’t change with position in the sky; perceived size difference is perceptual phenomenon

still don’t have definitive answer, but most widely accepted explanation is that it results from misperception of distance, like most other size illusions

argue that perceived distance is greater when moon at horizon perhaps because we perceive moon to be at the height of the clouds and depth cue of relative height tells us that clouds near horizon are farther away than clouds directly overhead

if moon is perceived to be at height of clouds, then fixed size of moon’s retinal image will lead us to perceive moon as bigger near the horizon, based on principle of size-distance invariance

however, find that illusion still stands when there are no clouds and that moon is actually perceived as closer when at horizon

two tabletops are identical in shape and size, but look different because of perceived slants

tabletop on left perceived as slanting back in depth lengthwise, which means perceive length as being foreshortened; assumes actual length of table os greater than its length on page

front edge of tabletop on right is perceived as being at constant depth, so visual system assumes table is exactly as long as it appears to be

opposite perceptual exaggerations of length and width change perceived shapes in way that cannot be overcome even if know retinal images are exact same shape/size