Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 4: Recognizing Visual Objects (object recognition (general…

- - - - begins as soon as photoreceptors respond to light
      - as progresses through visual pathway, representations are constructed that contain information about increasingly complex aspects of retinal image
    - - principles of perceptual organization are adaptations to physical world in which we evolved; adaptive advantages of more successful navigation of environment has led principles to be built into our visual system
        
        however, many 'general truths' that we have evolved to perceive more easily aren't always true
        
        heuristics:
        in perceptual organization, rules of thumb based on evolved principles and on knowledge of physical regularities
        
        perceptual inference:
        in vision, the interpretation of a retinal image using heuristics
        
        visual system tends to interpret given scene as the one that is most probable given current retinal image
  - - - edge extraction:
        the process by which the visual system determines the location, orientation, and curvature of edges in the retinal image
        
        based on patterns of responses from neurons in areas V1, V2, and V4
    - - occlusion, shading, and shadows all contribute to the image clutter that the visual system must deal with when organizing scenes into regions
      - edges partition scene into regions that exhibit uniform connectedness
        
        uniform connectedness:
        a characteristic of regions of the retinal image that have approximately uniform properties
    - - figure:
        a region of an image that is perceived as being part of an object
      - ground:
        a region of an image that is perceived as part of the background
      - border ownership:
        the perception that an edge, or border, is “owned” by a particular region of the retinal image
        
        when one object occludes another, border between them belongs to occluding object
        
        different principles that visual system uses to assign border ownership and organize visual scene into figure and ground
        
        depth- when one region perceived to be in front of another, region in front is perceived as owning border between regions and is perceived as figure, while other perceived as ground
        
        surroundedness- if region completely surrounded by another, then surrounded region tends to be perceived as owning border and being the figure
        
        without additional information about relative depth of two regions, figure-on-background perception usually more compelling than hole-in-a-surface perception
        
        symmetry- region with symmetrical borders more likely to be seen as figure than ground
        
        convexity- regions that convex (outward-bulging) borders are more likely to be perceived as figures than are regions with concave (inward-going) borders
        
        thought to reflect that most objects have smooth, convex shapes
        
        meaningfullness- visual system recognizes object shapes prior to assignment of border ownership and determination of figure-ground organization
        
        simplicity- refers to the number and placement of shapes composing the image
        
        tend to see images as those containing the least number of shapes and those in which moving objects about in relation to each other would not change shape
        
        area V2 very important in process of assigning border ownership
        
        response of single cell in V2 with preferred orientation is greater when border shown in receptive field is part of figure on left than when it’s part of figure to the right
        
        cell knows where figure is even though almost entire figure lies outside cell's receptive field
        
        conclude that early visual areas (V1 and V2) include specialized networks that allow important information about border ownership and figure-ground organization to be computed and transmitted very rapidly among cells whose combined receptive fields cover large contiguous areas of visual scene
    - - perceptual grouping:
        the process by which the visual system combines separate regions of the retinal image that “go together” based on similar properties
        
        attributes of regions that lead to them being grouped together
        
        proximity- elements that are close together group more easily than elements that are far apart
        
        similarity- similar elements tend to group together
        
        common motion- (aka common fate) elements that move in unison are likely to be perceptually grouped
        
        symmetry and parallelism- elements that are symmetrical or parallel tend to group together
        
        good continuation- two edges that would meet if extended are perceived as single edge that has been partially occluded; also applies to curved edges
        
        proposed neural basis for perceptual grouping invokes phenomenon of synchronized neural oscillations- neural spikes often happen in temporal pattern, or oscillation, where spokes come in clumps
        
        if two neurons are representing regions that belong together then those neurons could indicate that the regions belong together by synchronizing their oscillations- producing clumps of spikes at the same time
        
        if two receptive fields with same orientation preference experience bar moving through each at the same time in the same direction, neural oscillations strongly synchronized
        
        suggests that three principles of perceptual grouping (similarity, good continuation, common motion) may be represented by synchronized neural oscillations
    - - perceptual interpolation:
        the process by which the visual system fills in hidden edges and surfaces in order to represent the entirety of a partially visible object
        
        uses visible parts of object with knowledge about object shape and how edges tend to relate to one another in real scenes
        
        consists of two different operations with somewhat different perceptual consequences
        
        edge completion:
        the perception of a partially hidden edge as complete; one of the operations involved in perceptual interpolation
        
        can contribute to the perception of optical illusions
        
        illusory contours:
        nonexistent but perceptually real edges perceived as a result of edge completion
        example is Kanizsa triangle- three black circles with slices removed, giving the perception of there being a triangle between them, covering part of each
        
        surface completion:
        the perception of a partially hidden surface as complete; one of the operations involved in perceptual interpolation
        
        with two black circles with seemingly complete white bar except for black line completing edge of black circles, edges appear to be occluded by surface of the page, so don’t experience white space between the disks as brighter than white of the page
        
        in response to optical illusion that is result of perceptual interpolation, actual neural stimulation in manner that would be expected if optical illusion were not an illusion
        
        demonstrates that experience of illusory contour is not abstract expectation or inference, but result of explicit perceptual representation quite early in visual stream
        
        shows that neurons tend to base responses on information of scene outside their receptive field
- - - - object variety:
        refers to the fact that the world contains an enormous variety of objects
      - variable views:
        the different retinal images that can be projected by the same object or category of objects
      - image clutter:
        a characteristic of visual scenes in which many objects are scattered in 3-D space, with partial occlusion of various parts of objects by other objects
    - - shape is usually most important cue
      - uses higher-level processes to represent objects fully enough to recognize them, by matching representations to representations stored in memory
        
        representation in memory has to take into account fact that object can b viewed from infinite number of angles
      - takes place mostly in ventral visual pathway (what pathway) where neurons respond selectively to objects with complex curved contours in specific configurations
  - - - recognition by components:
        a model of object recognition that proposes that recognizing an object depends on first identifying the object’s basic 3-D shapes and how they fit together
        
        based on idea that single representation active whenever object is seen, regardless of viewpoint, and that single representation involves specifying parts of object and spatial relationships
        
        representations from this model tend to be too abstract; provides no way to differentiate two dogs that are same shape
        
        leads to too much invariance; an invariant representation of all objects with same shape
      - other approach based on idea that objects are represented in view-specific manner
        for any given object, multiple representations would be stored, each corresponding to different possible view; representation of current view would be compared to all stored representations
        
        participants look longer to recognize objects presented from novel viewpoint, suggesting representations created while learning to recognize objects weren’t viewpoint invariant, but matched specific views studied
        
        single neurons in inferotemporal cortex respond differently to different views, suggesting existence of distinct, view-specific representations
  - - - more complex contours could be represented by neurons in V4 that combine responses of multiple V1 neurons, perhaps with input from V2 neurons
        
        V4 neuron could combine responses from V1 neurons that indicate presence of short, straight edge with specific orientation/position and conclude that there is a longer, curved edge
        
        however, individual neurons in area V4 respond most strongly to edges that can be more complex than those in V1 in at least 3 ways
        
        edges to which V4 neurons respond strongly can be straight or curved sharply to broadly
        
        V4 neurons have preferred orientations like neurons in V1, but a contour with preferred orientation will elicit strong response from V4 neuron only if contour is at particular angular position relative to entire shape that contour belongs to
        
        V4 neurons have preferred location in retinal image, but preferred location covers larger region of retinal image than V1 neurons
        
        means individual V4 neurons show some degree of invariance with response to location because response to edge with preferred curvature/orientation will be same across range of locations of edge within retinal image
        
        led to conclusion that shapes are represented in V4 by combined activity of all neurons responding to contour fragments making up shape
        
        V4 neurons have larger receptive fields and respond selectively to more complex characteristics of contour fragments, so representation of shape in V4 richer than in V1
        
        single neurons in inferotemporal (IT) cortex have much larger receptive fields than even V4 neurons, covering almost entire retinal image, and each neuron selective for much more complex shapes than V4 neurons
        
        IT neurons respond most strongly to specific combinations of contour fragments, located almost anywhere in visual field
        
        shape representation based on structural description (description that specifies set of parts/contour fragments and their spatial relations)
        
        suggests that representation of shape in brain is both parts based and view specific; compromise of two approaches
        
        nature of ventral pathway with increasing complexity in higher processes introduces question of whether there are grandmother cells
        
        2 more items...
    - - lateral occipital cortex theorized to be part of higher-level processing since responds selectively to pictures of objects but not simple features or textures
        
        also exhibits invariance- activity not dependent on size, position, or other features of pictured object
      - regions of IT cortex have been shown to respond selectively to specific categories of objects
      - two main ideas about how objects are processed
        
        modular coding:
        representation of an object by a module, a region of the brain that is specialized for representing a particular category of objects
        
        module is part of brain that represents specific object
        modules located in IT and occipital cortex along ventral pathway
        
        supporting evidence from fMRI showing that viewing certain categories of objects can strongly activate specific brain regions
        
        identified fusiform face area (FFA) in this way; located on fusiform gyrus along lower surface of temporal lobe
        
        other areas are parahippocampal place area (PPA), responds to buildings/outdoor scenes and extrastriate body area, responds to human/animal body parts
        
        also supporting evidence provided by patients with brain damage that results in visual agnosia
        
        visual agnosia:
        an impairment in object recognition
        
        can describe object but not name/describe function and when given name, can describe function/shape
        
        prosopagnosia:
        a type of visual agnosia in which the person is unable to recognize faces, with little or no loss of ability to recognize other types of objects
        
        inherited or produced by damage to face-selective regions
        
        topographic agnosia:
        a type of visual agnosia in which the person is unable to recognize spatial layouts such as buildings, streets, landscapes, and so on
        
        most likely due to damage in PPA
        
        possible opposing evidence showing that response to houses compared to scrambled objects looks similar to responses to faces compared to scrambled objects and responses to chairs
        
        each module also has high response to other categories of objects, suggesting representation of viewed objects is both modular and distributed
        
        distributed coding:
        representation of objects by patterns of activity across many regions of the brain
        
        regions along cortex of ventral pathway
        
        supporting evidence provided by experiment that tested idea that neurons outside category-specific modules carry information about whether viewed object belongs to category
        
        could predict with 95% accuracy what object category had been viewed based on brain activity, even when primary response region had been hidden
        
        supported by observation that number of objects humans can recognize makes it unlikely that each has its own module
        
        believe that brain uses combination of modular and distributed coding
        
        modules for specific categories may not be clustered together, but may be distributed over relatively large areas of brain
        
        seems likely that brain represents certain types of objects via modular coding but that distributed coding also plays significant role even with those objects and all others
- - - - involves perceiver's goals, attention, knowledge, expectations
  - - - general layout that can rapidly be matched to layout stored in memory provides gist that visual system can use to guide identification of specific objects in scene
      - then narrows down range of probable objects to be recognized
      - top-down information combines with bottom-up in central pathway to speed up process of fully recognizing objects in scene
  - - - Helmholtz spoke of vision as process of 'unconscious inference'- mind always trying to infer what scene is that produced current retinal image based on information in image and information in memory
    - - states that unconscious takes into account two probabilities in order to infer what type of scene produced the currently experienced retinal image
        
        the prior probability of all possible scenes
        
        the probability that each possible scene produced the current retinal image