Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 11: The Auditory Brain and Perceiving Auditory Scenes (Localizing…
Chapter 11: The Auditory Brain and Perceiving Auditory Scenes
The Auditory Brain
Ascending Pathways: From the Ear to the Brain
structures in auditory pathways come in pairs, one for the right hemisphere and one for the left
Type I auditory nerve fibers carry signals from inner hair cells in cochlea to the ipsilateral cochlear nucleus in the brain stem
cochlear nucleus:
a structure in the brain stem (one on each side of the brain); it receives signals via Type I auditory nerve fibers from inner hair cells in the ipsilateral ear
from there, main pathways carry signals to the contralateral side of the brain, but secondary pathways remain mostly on the ipsilateral side
signals on the main pathways from the ipsilateral cochlear nucleus travel directly to the contralateral inferior colliculus (via a nerve tract called the lateral lemniscus), then to the contralateral medial geniculate body (MGB), and then to the contralateral auditory cortex
inferior colliculus:
a structure in the midbrain (one on each side of the brain); a stop on the ascending auditory pathway
medial geniculate body (MGB):
a structure in the thalamus (one on each side of the brain); the next stop on the ascending auditory pathway after the inferior colliculus
also travel directly and indirectly (via a synapse in the contralateral trapezoid body) to the contralateral superior olivary complex, then to the contralateral inferior colliculus, and then (like the direct signals from the cochlear nucleus to the inferior colliculus) to the contralateral MGB and auditory cortex
superior olivary complex:
a structure in the brain stem (one on each side of the brain); a stop on the ascending auditory pathway receiving signals from both cochlear nuclei
signals on secondary pathways from ipsilateral cochlear nucleus travel to the ipsilateral superior olivary complex and then to the ipsilateral inferior colliculus, from which some signals cross over to the contralateral MGB which other signals travel to the ipsilateral MGB and then to the ipsilateral auditory cortex
additionally, some secondary-pathway signals from the contralateral inferior colliculus to the ipsilateral MGB
neurons within the subcortical structures on the auditory pathways differ in their responses to signals evoked by sounds of different frequencies and durations
in response to incoming sounds with their preferred frequency, inner hair cells typically produce a large burst of action potentials in the Type I auditory nerve fibers to which they’re connected, followed by a sustained slower firing rate still above baseline
within cochlear nucleus, which receives these signals via Type I fibers, many neurons show a similar response while others show the same initial strong response but then quickly return to baseline rate and others respond just by gradually increasing their firing rate to a moderate level, without any initial burst of activity
neurons in later subcortical structures (trapezoid body, superior olivary complex, inferior colliculus, and MGB) show similarly varied patterns of response
don’t know exact function of different subcortical structures and different response patterns, but clear that they play critical role in encoding the rapidly changing stimuli that typically make up auditory environment
Descending Pathways: From the Brain to the Ear
numerous descending pathways carry signals between the auditory cortex, subcortical auditory structures, and the ears
inhibitory neural signals from the superior olivary couples to the outer hair cells cause a reduction in the motile response of the outer hair cells, possibly functioning to reduce the damaging effects of very loud sounds
descending signals also thought to help protect the ear from damage by activating the acoustic reflex
acoustic reflex:
a contraction of tiny muscles attached to the ossicles that limits their movement in the presence of loud sounds and hence prevents overstimulation of the cochlea
descending signals also involved in attention- block task-irrelevant ascending signals while passing task-relevant ones
experiments on bats suggest feedback from auditory cortex to MGB reflects a top-down mechanism for increasing the bat’s ability to discriminate similar frequencies from one another
fMRI studies of humans indicate that feedback-related MGB activity correlates with the ability to discriminate different syllables
provides strong evidence that top down signals from the cortex can affect subcortical structures according to the demands of the perceptual task at hand
Auditory Cortex
auditory cortex:
part of the cerebral cortex, tucked into the lateral sulcus on top of the temporal lobe; constitutes of the auditory core region, belt, and parabelt
auditory core region:
part of the auditory cortex, located within the transverse temporal gyrus in each hemisphere; consists of the primary auditory cortex, rostral core, and rostrotemporal core
primary auditory cortex (A1):
part of the auditory core region
signals from core flow to two regions wrapped around it
belt:
along with the parabelt, a region of cortex wrapped around and receiving signals from the auditory core region
parabelt:
along with the belt, a region of cortex wrapped around and receiving signals from the auditory core region
each contains distinct subareas
tonotopic map:
an arrangement of neurons within auditory brain regions such that the characteristic frequencies of the neurons gradually shift from lower at one end of the region to higher at the other end
neurons within each of the auditory core regions arranged into tonotopic map
echoes arrangement of frequencies along basilar membrane
in area A1 and in rostrotemporal core, neurons with high characteristic frequencies located at posterior (back)
in rostral core, arrangement is opposite
each of the subcortical structures in the ascending auditory pathway also has a tonotopic organization
frequency tuning of neurons in auditory cortex and subcortical auditory regions can be broad or narrow
for narrowly tuned neurons, only narrow band of frequencies on either side of neuron’s characteristic frequency produced response, no matter how high amplitude of tone
for broadly tuned neurons, when amplitude of stimulating tone fairly low, band of frequencies surrounding characteristic frequency was fairly narrow, but as amplitude of stimulating tones increased, band became much broader
neurons with broad tuning widths might be involved in integrating component frequencies of complex sounds
discrimination and recognition process appears to be carried forward in the belt and parabelt, thought to be analogous to areas beyond V1 in visual pathways
neurons in belt and parabelt don’t respond strongly to pure tones, but appear to be tuned to more complex stimuli containing multiple frequencies
"What" and "Where Pathways and Other Specialized Regions of the Auditory Brain
‘what’ pathway, specialized for representing the identity of sound sources that extends from the cure regions into the belt and parabelt and then into anterior parts of the temporal cortex
‘where’ pathway specialized for representing the location of sound sources, extends from the core regions into posterior parts of the auditory cortex and eventually into the posterior parietal cortex
research showing that some people can localize sound location but not identify sounds while others can do the opposite supports idea that there are two separate pathways for this process
two distinct regions in auditory cortex selectively active when listeners performing either identity task or location task
more anterior region active when judging identity, more posterior region active when judging location
large areas of primate cerebral cortex are multimodal (respond both to auditory and visual stimuli)
responses to different types of auditory stimuli indicated that information about the identity and meaning of auditory input flows from the anterior part of the auditory core region into the anterior temporal cortex and then to the prefrontal cortex (ventral ‘what’ pathway)
information about the spatial location of the sound sources flows from the posterior part of the auditory core region into the posterior part of the temporal cortex and then to the parietal cortex (dorsal ‘where’ pathway)
neural signals arising from cochlea carry information about sound to brain via auditory nerve
processed and combined with signals from other senses in brain as they travel via ascending pathways from structures in the brain stem to auditory areas in the cerebral cortex
feedback signals from brain travel back to cochlea via descending pathways
beyond differences in the range of audible frequencies, structure and functioning of auditory system are similar across groups of mammals
Localizing Sounds
Perceiving Azimuth
Interaural Level Differences
sound emitted by source located on median plane (directly in front) is equally intense in two ears because equally distant from them
sounds to one side emit sounds that are more intense in closer ear for two reasons
intensity of sounds decreases with distance from source according to the inverse square law
insignificant difference except when sound source relatively close to head
head produces an acoustic shadow that has much greater effect on high-frequency sounds than low-frequency sounds
acoustic shadow:
an area on the other side of the head from a sound source in which the loudness of the sound is reduced because the sound waves are partially blocked by the head; has a much greater effect on high-frequency sounds than on low-frequency sounds
interaural level difference (ILD):
the difference in the sound level of the same sound at the two ears
differs at each azimuth depending on frequency
regardless of frequency, sound at 0° (in front) azimuth exhibits zero ILD
ILD increases steadily for sound sources from 0° to 90° azimuth and then decreases steadily back to zero at 180° (behind)
because of acoustic shadow, ILD typically increases with frequency at any given azimuth since much easier for auditory system to detect large ILDs than small ones
ILD good cue for perceiving azimuth of pure tones at high frequencies but not as good as low frequencies
Interaural Time Differences
sounds arrive at each ear at different times because of 10cm difference in distance between ears
interaural time difference (ITD):
the difference in arrival time of the same sound at the two ears
experiment where heard two closely spaced bursts of narrowband white noise, one in each ear
if time between sounds brief enough, perceives single sound
depending on listener’s ITD threshold, single perceptual sound perceived as coming from left or right, according to which ear received first sound
most people have ITD threshold of 100 µs or less, which means that the 292 µs ITD of a sound at an azimuth of 45° more than sufficient
ITDs for sounds at different azimuths vary from 0
sounds directly in front/behind listener (0° or 180° azimuth) arrive at both ears simultaneously
about 600µs for sounds directly to the side (90° azimuth)
together, ITD and ILD provide complementary sources of information about the location of a sound source in horizontal plane
Head Motion and the "Cone of Confusion
identical sounds from sources at 135° (behind and to left) and 45° (in front and to left) would have nearly identical ILD and ITD, wouldn’t be able to use those factors to disambiguate azimuth
cone of confusion:
a hypothetical cone-shaped surface in auditory space; when two equally distant sound sources are located on a cone of confusion, their locations are confusable because they have highly similar ILD and ITD
use head movement to deal with ambiguity
as soon as turn head to side or tilt, ILD and ITD of sound change in way that instantly disambiguates azimuth of source
ability to perceive azimuth accurately (to localize sound in the horizontal plane) can be quantified psychophysically, using method of constant stimuli, by situating listener at center of circle of closely spaced speakers
pure/reference tone emitted by one speaker, followed by another tone of same frequency; have to say whether it came from right or left of reference tone
minimum audible angle:
the minimum angular separation between a reference sound source and a different sound source emitting a tone of the same frequency that yields 75% correct judgments about the relative horizontal positions of the two sources
the smaller the minimum audible angle, the more accurately the listener can perceive the azimuth of sound source
for wide variety of sounds, minimum audible angle is under 10°, in some cases as little as 1°
Perceiving Elevation
some animals, not humans, can move pinnae independently to focus on sources of sound without having to turn head
for humans, pinnae still provide information used to judge elevation
as incoming sound waves funneled by pinna into auditory canal, reflect off bumps and ridges and reverberate slightly, which amplifies some frequencies and attenuates others, changing shape of frequency spectrum
exact nature of modification depends a but on azimuth of sound source, but depends even more on its elevation
sound directly in front of you at 30° elevation (above horizontal plane) will be distorted in certain way by pinnae amplifying some frequencies and attenuating others
same sound at -3° elevation will have different pattern of distortion
spectral shape cue:
a pinna-induced modification in a sound’s frequency spectrum; provides information about the elevation of the sound source
each person’s pinnae are unique, so spectral shape cues produced by own particular pinnae have to be learned
with artificial pinna, person’s accuracy at judging elevation (but not azimuth) at first greatly impaired but then improves within a few days as auditory system adapts to new spectral shape cues
since spectral shape cues depend on hearing how pinna modifies shape of sound wave across entire spectrum, cue works best for broadband sounds (contain wide range of frequencies) as opposed to pure tones
people quite poor at judging elevation of pure tones
Perceiving Distance
ILD, ITD, and spectral shape cue provide little information about distance
if know sound level of source, then perceived loudness can be used to judge at least whether source is relatively near/far
use inverse square law to judge distance of many familiar sounds
reduction in level greater for high frequencies than for low frequencies, which results in progressive ‘blurring’ of sound as source grows more distant
even if level of sound at source is unknown, can use ‘blurring’ cue to judge distance of sound source
important distance cue provided by echoes in situations where there are many hard surfaces to reflect sound waves
where you can distinguish the sound arriving directly from the source and the sound echoing off surfaces (because the spectral shape of the sound is modified by reflection), you can perceive the relative proportion of each type of sound energy, direct versus reflected
if proportion more direct than reflected, sound source is near
if proportion is more reflected than direct, sound source is farther away
other distance cues involving loudness and frequency and provided by the movement of sound sources toward/away from listener
loudness cue also result of inverse square law
frequency cue is result of Doppler effect
Doppler effect:
the frequency of a sound emitted by a moving sound source is higher in front of the sound source than behind it; the frequency rapidly decreases as the sound source passes the listener
together, don’t tell you exactly how far away source is at any given moment, but change in loudness can tell you whether source is approaching/receding and both together tell you when approaching source reaches closest point, because that’s when loudness stops increasing and starts decreasing and when rate of change in frequency is maximal
Echolocation by Bats and Humans
echolocation:
sound localization based on emitting sounds and then processing the echoes to determine the nature and location of the object that produced the echoes
bats use this by emitting sequence of high-frequency sounds (20,000-100,000Hz) and processing echoes to determine whether the sound was reflected off potential prey and to continuously track prey’s location as they close in on it
provides information about azimuth, elevation, and distance, as well as information about size/shape of target and physical characteristics of target’s surface (hardness/texture)
humans can use echolocation quite accurately to judge distance from walls or other objects
blind and sighted participants blindfolded and pointed in direction of wall, asked to report when they could first detect there was a wall in front of them and then stop as close as possible to wall without touching it based on echoes of footsteps
blind participants could sense wall 2-5m away, stopped within 15cm
blindfolded did not report sensing wall until less than a meter away and walked into it more often than half the time, but rapidly improved with practice
when wore earplugs, all unable to detect wall and collided with it every time
other experiment found that both blind and sighted individuals could use echoes to detect presence of small disk placed at different distances in front of them, although blind participants’ performance better since they had more practice with emitting clicks with tongue/mouth
Echoes and the Precedence Effect
receive ILD, ITD, and spectral shape cues indicating that sound is coming from places where it is really just echoing from
ability to correctly localize sound as coming from direction of source relies on fact that sound coming directly from source follows shorter path than any reflected sound and direct sound arrives first
auditory system tuned to localize sounds as originating from source in direction from which sound first arrives
precedence effect:
the localization of a sound as originating from a source in the direction from which the sound first arrives; minimizes the effect of echoes on sound localization
exact mechanisms still unclear; some propose that echoes are suppressed in cochlea, but people with cochlear implants also demonstrate precedence effect, suggesting a more brain-centered mechanism
Looking While Listening: Vision and Sound Localization
efforts to localize sound sources often accompanied by looking; see apparent source of sound, which would seemingly make further localization efforts by auditory system unnecessary
when watching video of conversation between two people while listening to conversation in single earbud in right ear, sound originates from same location for both voices
if closed eyes, auditory system would localize sound source as earbud
as soon as opened eyes, would experience each person’s voice as coming from his/her mouth
when visual system and auditory system give you conflicting information about location of sound source, perception tends to be dominated by visual information and hear sound coming from visually determined location
ventriloquism effect:
the tendency to localize sound on the basis of visual cues when visual and auditory cues provide conflicting information
aka visual capture
especially powerful when visual information matches perceived experience
degree to which vision can bias perceived location of sound depends on three factors
visual and auditory events must be reasonable close together in time (if anything, visual event should precede auditory event)
the more out of sync, more you’d tend to localize voices as coming from earbud
two events must be plausibly linked; sound must be something that the visual event could be the source of
two events must be plausibly close together in space
if computer hooked up to speaker behind you in another room, you’d be highly likely to localize sound at that speaker and not the moving mouths
Neural Basis of Sound Localization
in humans, brain structure called medial superior olive (MSO), part of superior olivary complex in brain stem, thought to contain neurons that function as mechanism for detecting specific ITDs and representing azimuth of sound sources
neurons (coincidence detectors) in both left and right MSO receive signals from both left and right cochlear nuclei
fire only if signals from the two cochlear nuclei arrive at the same time
would contain many more circuits, enabling highly accurate auditory localization based on population coding
known to exist in barn owls that use hearing to locate prey
structure in brain stem of owls called nucleus laminaris contains neurons that function as coincidence detectors for encoding specific ITDs
neurons in auditory cortex are tuned for different ILDs, and responses of a population of differently tuned neurons provide a neural code for sound localization using ILDs
once determine location of sound source, can use other senses to help identify source of sound, which will help us decide how to respond to it, so usually first need to know where sound is coming from
auditory brain evolved because needed the ability to localize sound sources rapidly and automatically in order to survive
less critical but still important to localize sound sources in noisy environment so you can direct attention and use vision to get information about sound source
in audition, no corresponding explicit representation of location like there is in retinal image with vision; cochlea organized tonotopically, not spatially
evolved sensitive method based on comparing aspects of the sound arriving at the two ears
polar coordinate system based on two mutually perpendicular planes centered on the head used to specify locations of sound sources in 3D space
azimuth:
in sound localization, the location of a sound source in the side-to-side dimension in the horizontal plane—that is, the angle left or right of the median plane
elevation:
in sound localization, the location of a sound source in the up-down dimension in the median plane—that is, the angle above or below the horizontal plane
distance:
in sound localization, how far a sound source is from the center of the head in any direction
Auditory Scene Analysis
Simultaneous Grouping
Grouping by Harmonic Coherence
when auditory system presented with sound wave containing fundamental frequency and series of harmonics that are all integer multiples of fundamental, can entertain two possibilities
all harmonics are coming from single source
many independent sources are emitting frequencies that, completely by chance, are related to each other in precisely this was
former much more likely and is the interpretation that the auditory system strongly tends to prefer
when presented with set of harmonics that go together, perceive them as single auditory stream
when presented with set of harmonics that go together but one that has been slightly altered, perceive the one as a distinct auditory stream from the ones that go together and are perceived as a single auditory stream
can pick out that one tone that is different while it is difficult to pick out individual tones when they all go together
when two sound sources produce tones that are all harmonics of same fundamental frequency, will perceive mixture as single auditory stream coming from single sound source (as long as they’re reasonable close in space)
if one sound source produces harmonics of a different fundamental frequency, will perceive mixtures of tones from two sound sources as two distinct auditory streams
Grouping by Synchrony or Asynchrony
two unrelated auditory events rarely begin/end/change at exactly the same time in same way
all frequency components of single auditory stream do start/stop/change in synchrony
uses synchrony/asynchrony as powerful grouping principles
when three harmonics of same frequency begin/end at same time, mixture perceived as single auditory stream
when second harmonic begins before/ends after other two (has asynchronous onset and offset) it's perceived as a distinct auditory stream ‘passing through’ mixture of other two
could also be grouped separately because of an asynchronous change in frequencies rather than asynchronous onset/offset
mixture of frequencies arriving at ear within given interval of time can be grouped according to two different principles
harmonic coherence- frequencies that are harmonics of same fundamental frequency tend to be grouped together as part of same auditory stream
synchrony- sounds that begin, end, change at same time also tend to be grouped together
Sequential Grouping
Grouping by Frequency Similarity
auditory stream uses frequency similarity of sequential tones to group those tones into single auditory stream or segregate them into multiple distinct auditory streams
when listen to sequence of two pure tones that vary between two close alternating frequencies, perceive sequence as single auditory stream, warbling up and down
if two alternating tones are far apart in frequency, then perceive two separate auditory streams, one a low-frequency tone repeating and the other a high-frequency tone repeating
when presented with two tones alone that differ in frequency and are played one after the other, can tell whether higher frequency tone comes before or after lower frequency one easily
when a third flanker tone played before and after pair of target tones, judging order of times A and B becomes much more difficult
can’t listen for isolated upward/downward transition, but must listen for/remember the middle of three transitions
when there is a fourth captor tone that is same frequency as flanker tone played between flanker tones and target tones, judging order of tones A and B is easy again
flanker and captor tones are perceives as single auditory stream apart from A and B, making A and B easier to pick out
Grouping by Temporal Proximity
if there are two sequences of tones (one high- and one low-frequency) where relatively short time separates successive tones in each sequence, listener perceives two sequences as two distinct auditory streams
if successive tones in two sequences are further apart in time, listener tends to group together the two sequences into single auditory stream, a single sequence bounding up and down in frequency
auditory system uses both similarity in the frequencies of sequential tones and their temporal proximity (how close together in time they occur) to determine which tones should be grouped and perceived as part of single auditory stream
two tones may differ enough in frequency to be grouped into separate streams, but two tones that are sufficiently different in frequency to be segregated into distinct auditory streams may or may not be perceptually segregated depending on timing between successive tones
Perceptual Completion of Occluded Sounds
we often experience auditory scenes in which some sounds are occluded (hidden) by other sounds
however, if interruption is brief enough, may not even notice it and seem actually to perceive occluded words
when listener presented with sequence of pure-tone glides (pure tones that glide up/down in frequency), separated by brief gaps of silence, perceives both glides and gaps
when gaps are filled with broadband white noise, listener perceives glides as continuing ‘behind' the noise
similar to visual system, auditory system has evolved to make ‘best guesses’ about what’s really going on behind occluding sounds and shapes
auditory scene:
all the sound entering the ears during the current interval of time
we can extract from mixture the frequencies associated with each of the sound sources in the scene
auditory scene analysis:
the process of extracting and grouping together the frequencies emitted by specific sound sources from among the complex mixture of frequencies emitted by multiple sound sources within the auditory scene
basis for our use of hearing to make sense of the auditory world
first and most important step in auditory scene analysis is to organize the auditory scene perceptually into set of distinct auditory streams
auditory stream:
an assortment of frequencies occurring over time that were all emitted by the same sound source or related sound sources
auditory stream segregation:
the process of perceptual organization of the auditory scene into a set of distinct auditory streams
auditory system must group the sounds that go together as belonging to single auditory stream, while separating out the sounds (with overlapping frequencies) the belong to other, simultaneous auditory streams, despite the occurrence of all these sounds in same interval of time
perceptual grouping of sounds almost as good even when all sounds in scene are coming from single location, so localization doesn’t provide most important basis for grouping sounds into streams
when several frequency components begin all at the same time, likely that they were all produced by single sound source and marked the beginning of a single auditory stream
auditory system uses simultaneous onset of all frequency components as basis for concussing that they are part of single auditory stream
other cues
features of single auditory stream (frequency, amplitude, timbre) tend to change slowly/gradually over time; abrupt change often signals new auditory stream
all frequency components of single auditory stream tend to change in same way at same time
auditory grouping principles aren’t perfect, but often correct, especially when several principles lead to same answer
principles also reflect certain physical regularities in the world that we have become expert at detecting as result of natural selection/evolution