Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 12: Perceiving Speech and Music (Music (Neural Basis of Music…
Chapter 12: Perceiving Speech and Music
Speech
The Sounds of Speech: Phonemes
phonemes:
the smallest unit of sound that, if changed, would change the meaning of a word
‘cat’ and ‘bat’ are different, ‘c’ and ‘b’ are phonemes
not always one-to-one correspondence to letters
same letter in written words can often correspond to different sounds
International Phonetic Alphabet (IPA):
an alphabet in which each symbol stands for a different speech sound; provides a distinctive way to write each phoneme in all the human languages currently in use
Pronouncing the Sounds of Speech
Producing Vowels
vibrating vocal folds produce sounds containing harmonic frequencies in addition to the fundamental frequency
amplitudes of frequencies tend to decrease as frequency increases
in order to produce vowel sound, have to modify basic sound produced by vocal folds by modifying shape of oral cavity in order to attenuate certain harmonics more than others, with different pattern of modification for each vowel
oral cavity (chamber through which sound wave travels from vocal folds to mouth) has different shapes, which have different resonances
particular resonance determines which frequencies are attenuated and by how much
modify shape of oral cavity by opening jaw to different degrees, adjusting shape/position of tongue, shaping lips
each particular shape of oval cavity serves as filtering function, so that sound wave emerging from mouth has distinctive harmonic spectrum associated with the vowel being produced
formants:
frequency bands with relatively high amplitude in the harmonic spectrum of a vowel sound
individual peaks in the harmonic spectrum
most vowel sounds contain 2-3 prominent formants
frequency spectrum of vowel sound about constant over time
in natural speech, frequencies corresponding to each vowel sound in the utterance change rapidly over time
sound spectrogram:
a graph that includes the dimensions of frequency, amplitude, and time, showing how the frequencies corresponding to each sound in an utterance change over time
Producing Consonants
produced by restricting the flow of air coming from the vocal folds by narrowing/completely closing the vocal tract at one/more points along the path of airflow
production of every consonant can be defined in terms of three characteristics
place of articulation:
in the production of consonants, the point in the vocal tract at which airflow is restricted, described in terms of the anatomical structures involved in creating the restriction
lips, tongue, alveolar ridge, velum used to create the restriction in English
manner of articulation:
in the production of consonants, the nature of the destruction of airflow in the vocal tract
voicing:
in the production of consonants, specifies whether the vocal folds are vibrating or not (i.e., whether the consonant is voiced or voiceless)
the human vocal apparatus is the part of the body used in producing sounds of speech
most speech sounds begin with exhalation of air from lungs which then flows through trachea (windpipe) and into the larynx
larynx (or voice box):
the part of the vocal tract that contains the vocal folds
within larynx, air passes through pair of membranes called the vocal folds
vocal folds (or vocal cords):
a pair of membranes within the larynx
then flows from larynx into pharynx and from the pharynx up into the oral and nasal cavities, from which it exists the body through the mouth and nose
pharynx:
the uppermost part of the throat
uvula can hang down, leaving an open pathway from the pharynx into the nasal cavity, or bend upward against the back wall of the pharynx, closing off the nasal cavity and directing all exhaled air into the oral cavity and out the mouth
uvula:
a flap of tissue that hangs off the posterior edge of the soft palate; it can close off the nasal cavity
most english phonemes produces with nasal cavity closed off
vocal folds can be relaxed and open, allowing air to pass silently, or tensed, causing them to vibrate when air passes
fundamental frequency of vocal fold vibration depends on size/thickness of vocal folds and the size/shape of larynx as well as current degree of contraction/relaxation of muscles in the throat
most vocalization involves changing the fundamental frequency of vocal fold vibrations, done by contracting/relaxing muscles in the throat, which changes the tension of the vocal folds and their rate of vibration
the greater the tension, the faster the vibration and higher the pitch
speech sounds can be divided into vowels and consonants
vowels:
speech sounds produced with a relatively unrestricted flow of air through the pharynx and oral cavity
different vowels produced by varying size/shape of oral cavity
consonants:
speech sounds produced by restricting the flow of air at one place or another along the path of the airflow from the vocal folds
Perceiving the Sounds of Speech
Coarticulation and Perceptual Constancy
coarticulation:
the influence of one phoneme on the acoustic properties of another, due to the articulatory movements required to produce them in sequence
even as start to say syllable, configuration required for final vowel is already influencing the configuration used to form the initial syllable
coarticulation affects not only the flow across the transition from one phoneme to the next, but also the flow ‘backward’, from an upcoming phoneme to the phoneme currently being produced
because of coarticulation, given sound regularly associated with different acoustic features but listeners perceive the same sound
type of perceptual constancy
auditory constancy that leads us to hear two different sounds as same consonant is important mechanism underlying correct perception of speech
Categorical Perception of Phonemes
categorical perception:
the perception of different sensory stimuli as identical, up to a point at which further variation in the stimulus leads to a sharp change in the perception
different stimuli (different video images with different frame rates above 24 fps) are perceived as identical (motion looks smooth in all cases)
below rate of 24 fps, motion begins to look jumpy/uneven
opposed to continuous perception in which there are no sharp changes in perception as stimulus varies
two tones of different frequency are perceived as having different pitches with no sharp discontinuities in perceived pitch as frequency changes
research suggests that perception of certain speech sounds is categorical rather than continuous where categories are different phonemes
voice onset time (VOT):
in the production of tom consonants, the interval between the initial burst of frequencies and the onset of voicing
obvious difference between voiceless /p/ and voiced /b/
VOT for /ba/ much shorter than for /pa/ because voicing occurs during the /b/ sound of /ba/, because /b/ itself is voiced
pairs of voiceless and voiced stop consonants are always differentiated by this pattern of relatively long VOT for voiceless stop and relatively short VOT for voiced stop
when VOT less than 25 msec, virtually almost every listener labeled the syllable as /ba/ and when VOT more than 35 msec, almost every listener labeled syllable as /pa/
as VOT began increasing from 25 to 35 msec, listeners gave decreasing percentage of /ba/ responses and increasing percentage of /pa/ responses; about 50/50 at 30 msec
phonemic boundary:
the voice onset time at which a stop consonant transitions from being mainly perceived as voiced to being mainly perceived as voiceless
transition near phonemic boundary very abrupt
abrupt transition characterizes categorical, as opposed to continuous, perception of speech sounds
theory behind categorical perception is that there are ‘detectors’ in the auditory system tuned to respond to certain ranges of VOTs
would explain why VOTs in 25-35 msec range lead to uncertainty- both types of receptors are responding
overall, suggests that perception of voiceless vs voiced stop consonants is categorical and based on VOT
believe other dimensions of speech are also categorical
categorical perception is also a form of perceptual constancy that facilitates accurate speech perception
stimuli that differ over a wide range of VOTs on one side of the phonemic boundary are perceived to be the same phoneme
categorical perception also occurs in nonspeech sounds, as well as for visual stimuli, so may reflect more general principle of perception
Vision and Perception: The McGurk Effect
good evidence that we use information from vision when available to help reinforce auditory perception, including speech perception
in some cases, seeing a person talking can strongly affect what phonemes we perceive
participants listened to and then watched clip of person pronouncing syllable but mouthing a different one
reported almost perfectly what the sound was when not looking at screen but error rate over 90% when also watching the clip
errors quite systematic and related to the mismatches between the articulatory movements of woman’s mouth in the clop and the place of articulation of the consonant in the syllables on the sound track
McGurk effect:
in the perception of speech sounds, when auditory and visual stimuli conflict, the auditory system tends to compromise on a perception that shares features with both the seen and the heard stimuli
if no good compromise perception is available, either the conflict is resolved in favor of the visual stimulus or there is a conflicting perceptual experience
when auditory information degraded/masked, the available visual information might help perceive what’s being said
Knowledge and Speech Perception
Syntax and Semantics
semantic information influences speech perception when we can expect what word will go in a blank because the speech has been interrupted in some way
to investigate whether syntactic rules (grammar) help us perceive speech correctly, use three types of sentences- grammatical sentences, anomalous sentences (follow rules but have no clear meaning), ungrammatical sentences (violate syntax rules and have no clear meaning)
shadowing performance best for grammatical sentences, but found that performance on the anomalous sentences better than on ungrammatical sentences, showing that knowledge of the syntactic rules of language can aid in speech perception, even when speech lacks meaning
Word Segmentation
stream of speech sounds uttered at normal conversational speed often contains no brief silent intervals to mark where one word ends/next begins
question of how we can segment words for our own language but not for others and how infants learn to segment words when only hear continuous stream
possible that listeners guess that a new word has started when they hear a sound that is unlikely to be part of the same word as the preceding sound
in any language, only certain phoneme sequences can occur in word-initial, midword, or word-final position; some possible sequences much more unlikely than others
phoneme transition probabilities:
for any particular sequence of phonemes, the chances that the sequences occurs at the start of a word, in the middle of a word, at the end of a word, or across the boundary between two words
thought to aid in efficient speech perception
8-month-olds listened to 2-minute stream of synthesized consonant-vowel syllables with no pauses after any of the syllables
within words, the chance that one syllable was follow by a particular other one was 100% while across words, the chance that the end of one word was followed by a particular end of another was 33%
tested to see if they had picked out the words; listened significantly longer to new words, showing that they had registered the syllable transition probabilities and were able to distinguish familiar words from the new words
recent study revealed locations in the auditory cortex in which phoneme transition probability may be encoded in the brain, where neural responses correlated with transition probabilities
played some unlikely phoneme combinations and some likely ones
found a place where neural responses were correlated with the forward probability of a transition between two phonemes (the probability that the first phoneme will be followed by the second one)
another place where neural responses were correlated with the backward probability of a transition between two phonemes (the probability that the second phoneme will be preceded by the first one)
Perceptual Completion: Phonemic Restoration
phonemic restoration:
a kind of perceptual completion in which listeners seem to perceive obscured or missing speech sounds
in many cases, don’t even become aware of disrupting noise
experiment where interrupted part of sentence with cough and asked where in sentence cough occurred and whether it disrupted the sound it coincided with
couldn’t correctly identify location and reported that no sound was disrupted
same results when interrupted by pure tone
when replaced with silence, correctly reported location and sound that had been disrupted
auditory system filled in missing phoneme automatically, without any accompanying conscious awareness based on auditory system’s inherent tendency toward perceptual completion of auditory stimuli (bottom-up) and the listeners’ knowledge about language and context (top-down)
phonemic restoration can also be affected by visual context
experiment where determined the noise duration threshold that would produce phonemic restoration 50% of the time when mouth movement was congruent, incongruent, or static
in congruent condition, had significantly higher threshold than in other two; could tolerate longer burst of noise interrupting word and still restore missing phoneme
no significant difference between static and incongruent condition
reinforced idea that knowledge of the mouth movements associated with specific words and their phonemes aided phonemic restoration
although top-down feedback plays role in phonemic restoration, clear that process is to some degree dependent on the particular sounds involved
either fricative or vowel was replaced by either white noise or pure tune
restoration of fricatives was better when the masking sound was white noise while restoration for vowels better when masking noise pure tone
frequency spectrum of fricatives resembles white noise, with energy distributed fairly randomly across spectrum
format structure of vowels gives frequency spectrum with horizontal bands of energy at just a few frequencies
phonemic restoration works better when masking sound and masked sound have similar frequency spectra
listeners’ knowledge provides a basis for perceiving phonemes and words
this knowledge takes three forms
knowledge of the grammatical rules of the language and the context in which an utterance is produced
knowledge about the probability of various sequences of phonemes within words or across words in the language they’re hearing
knowledge of specific words that are expected in a particular situation
listener must analyze the stream into a sequence of phonemes, which must then be grouped into subsequences that match up with words
there is no one-to-one correspondence between the sounds produced and the phonemes that those sounds represent
different talkers produce sounds with different fundamental frequencies
another perceptual challenge is specific phonemes sounding different in different dialects
even same phonemes produced at different times by same speaker can differ significantly
another perceptual challenge relates to the indistinct boundaries between words and the sound stream of normal speech
when speak sentence like would in normal conversation (without gaps between words), some of the phonemes represented in the spoken-out version are missing/modified
can’t identify phonemes my simply mapping specific frequencies to specific phonemes
auditory system must be using relative positions of frequencies (such as formants in vowel sounds) in the context of the entire speech stream as well as other patterns of acoustic features to identify the phonemes
listener’s knowledge of language/understanding of context also provide important information
Brain Pathways for Speech Perception and Production
damage to left inferior frontal cortex (Broca’s area) and left superior temporal cortex (Wernicke’s area) showed these regions critical for production/comprehension of speech
aphasia:
an impairment in speech production or comprehension (or both) caused by damage to speech centers in the brain
speech sounds and nonspeech sounds are transduced by the cochlea in the same way and resulting neural signals are sent along same pathways to auditory cortex
in primary auditory cortex, activity no different for speech and nonspeech sounds
beyond that, brain regions primarily in left hemisphere are thought to form a specialized network for processing speech
speech-related neural signals channeled through regions via two distinct pathways- a ventral pathway and a dorsal pathway
ventral pathway includes regions involved in representing the meanings of words and of combinations of words
dorsal pathway includes regions dedicated to the production of speech by the motor system
region in left superior temporal sulcus (the phonological network) responded more when test sound was speech sound than nonspeech
important for processing phonemes
region in location of the auditory cortex in both hemispheres responded about equally to both speech and nonspeech sounds, confirming isn’t specialized for speech
studies suggest that parts of the brain specialized for processing speech are more strongly represented in the left hemisphere than in the right hemisphere
dorsal pathway also thought to coordinate perceived speech with the production of speech
evidenced by experiment in which listened to spoken syllables then produced same syllables
motor regions of prefrontal that were activated during production of syllables also activated when listened to syllables but not when sounds were nonspeech
simply hearing speech activates the motor centers of the brain responsible for producing speech
suppose that a conceptual network creates representations of thoughts and ideas that flow into both pathways and provide a basis for spontaneous speech production
Music
Dimensions of Music: Pitch, Loudness, Timing, and Timbre
Pitch
the most fundamental dimension of music, distinguishes one musical composition from another
range of pitches produced by a piano spans range of pitches typically heard in Western music
octave:
a sequence of notes in which the fundamental frequency of the last note is double the fundamental frequency of the first note
in western music, each octave consists of 13 notes separated by 12 proportionally equivalent intervals
semitones:
the 12 proportionally equivalent intervals between the notes in an octave
organization into octaves with notes separated by proportionally equivalent intervals has twofold perceptual basis
notes separated by octave are perceptually more similar than notes separated by some other interval
harmonics of the notes played by a musical instrumental are integer multiples of the fundamental frequency
semitone intervals are perpetually equivalent to one another; the difference in pitch between any two successive notes is perceived as constant even though actual difference in fundamental frequency between successive notes increases as notes increase in frequency
pitch helix illustrates similarity among pitches geometrically
tone chroma refers to the differences in pitch within an octave
tone height refers to the octave in which a tone appears, which increases from the bottom to the top of the helix
notes that are one octave apart are vertically aligned and have same chroma, representing perceptual similarity
Loudness and Timing
dynamics:
the manner in which loudness varies as a piece of music progresses
control over dynamics achieved by specifying the loudness of different sections by indicating sequences of notes over which the loudness increases/decreases abruptly/gradually
often combined with changes in timing, in order to achieve particular artistic/emotional affects
rhythm:
the temporal patterning of events in a musical composition, encompassing tempo, beat, and meter
tempo of piece refers to its overall pace
beat refers to equally spaced pulses that can express fast/slow tempo
meter refers to temporal patterning of strong/weak pulses in beat over time
Timbre
the difference between complex sounds that have the same pitch (fundamental frequency) and same loudness but don’t sound the same
difference attributed largely to differences in relative amplitudes of various overtones along with differences in attack/decay
musical composition consists of sequences and combinations of notes played with different durations and with different relative emphasis, unfolding over time in patterned ways
compete understanding of music requires an appreciation of the almost infinite variety of pitch, loudness, timing, and timbre combinations
Melody
melody (or tune):
a sequence of musical notes arranged in a particular rhythmic pattern, which listeners perceive as a single, recognizable unit
complex melodies consist of series of recognizable sequences, forming a combination that is perceived as a larger recognizable unit
most salient aspect of melody is relative positions of pitches in sequence, not absolute pitches
transpositions:
two versions of the same melody, containing the same intervals but starting at different notes
even young infants appear to respond to transpositions as being perceptually equivalent
looked longer toward speaker playing unfamiliar melody than they did toward speaker playing transposed version of familiar melody
when played either untransposed or transposed same melody looked equally at both
suggests infants perceive/remember melody in terms of relative, not absolute pitches
Scales and Keys: Consonance and Dissonance
scale:
a particular subset of the notes in an octave
named after note with which it starts/ends
key:
the scale that functions as the basis of a musical composition
composition in key of C major contains notes mostly from C major scale
major/minor scales are differentiated by pattern of intervals (number of semitones) between successive notes
when two/more notes played together (chord), combination may exhibit consonance or dissonance
consonance:
the quality exhibited by a combination of two or more notes from a scale that sounds pleasant, as if the notes “go together”
dissonance:
the quality exhibited by a combination of two or more notes from a scale that sound unpleasant or “off”
notes have peak in their acoustic spectrum at fundamental (lowest) frequency and peaks at integer multiples of fundamental (harmonics)
harmonicity:
the extent to which the harmonics of notes played in combination (simultaneously or sequentially) coincide with the harmonics of a note with a lower fundamental frequency
important factor in perception of consonance/dissonance
different combinations can exhibit greater/lesser degrees of consonance/dissonance, depending on degree of harmonicity (how closely harmonics of notes in combination coincide)
Neural Basis of Music Perception
beyond primary auditory cortex, specialized areas in two hemispheres play important roles in processing different types of sound
listened to sequences of pure tones that either had same pitch or increased/decreased in pitch
both left and right auditory cortex more active (and about equally active) when listening to fixed-pitch sequence versus silence
only right responded more strongly when listening to changing-pitch sequence
left auditory cortex responds about the same to both fixed-pitch and changing-pitch sequences
right auditory cortex especially active when listener processing changes in pitch
auditory areas in left hemisphere appear to be specialized for representing fine differences in timing of sounds whereas areas in right hemisphere specialized for representing fine differences in pitch
recent study revealed anterior regions of the auditory cortex that respond selectively to sound of music
nonmusical listened to music they had learned to play on piano, brain activity in both music perception and finger movement regions
there is close tie between perception and production of music
less than 1% of population have ability of absolute pitch- can listen to isolated notes and name them
others (4%) have profound impairments in perceiving/remembering melodies and in distinguishing one melody from another
amusia:
a profound impairment in perceiving and remembering melodies and in distinguishing one melody from another
can hear other sounds fine
can come from stroke, but more often congenial (present at birth)
difficulty in telling apart different notes and in judging whether sequence of notes increases/decreases in pitch
when congenital, exhibit thickening of right inferior frontal cortex and right auditory cortex, thought to be associated with reduced neural connections between auditory cortex and frontal cortex
differences in musical ability and in brain structures that support it arise through musical training/experience
when non musicians and musicians listened to pure tones, same patterns of activity, but greater magnitude of activity for musicians, indication that significantly larger population of neurons activate by musical sounds in musicians
magnitude of activity depended on when person began musical training- largest in those who had started younger
Knowledge of Music Perception
knowledge that leads to musical expectations can be very specific
other, less specific expectations are based on the general knowledge that even non-musicians gain through repeatedly listening to compositions in the familiar scales/keys
from first few notes, can often unconsciously tell scale
tonic or base note of scale (note that scale begins with( tends to occur more often and played for longer duration
experience on how type of knowledge is reflected in listeners’ perception of music, had to rate how well various notes ‘fit’ with chords from various scales
notes of the chord itself were rated as fitting best with chord, but those that belonged to same scale in general were on average rates higher than those that didn’t
suggests music provides context in which hearing a chord from a particular scale makes all the notes in that scale become salient
perception of music also depends on our accumulated knowledge about how music works, together with what to expect in given musical composition
composers often take advantage of listeners’ musical knowledge by creating effects based on violating expectations