Dopamine, Reward and Reinforcement
involved in learning, cognition, behaviour
operant and classical conditioning
can we find neurons that respond to reward-associated stimuli?
Even better – neurons that encode the unpredictability of the reward? This way,they might be implicated in learning.
Dopamine neurons
Dopamine neurons secrete dopamine to cells that
have dopamine receptors
Learning
reward prediction errors (RPE)
Dopamine neurons and RPE - classical view
The midbrain, particularly the ventral tegmental area (VTA), contains a large proportion of dopamine neurons
These neurons project to many other regions in the brain. The striatum and anterior cingulate cortex are two such areas.
Dopamine neurons respond phasically to the presentation of rewards (burst of firing following event).
Their activity is best explained in terms of learning about events that predict rewards.
Three types of reward-related neuronal activity
The presentation of an unexpected/more than expected reward (activation)
Stimuli that predict rewards (activation)
The failure of an expected reward (inhibition)
monkeys learned a task using applejuice for rewards
before task, dopamine neurons responded to unpredicted rewards
during learning reward became increasingly predictable and neuronal activity gradually decreased to baseline levels
There was no phasic activity when fully predictable rewards were delivered.
When expected rewards were omitted, there was a phasic decrease in activity at the time that the reward was expected
During learning the dopamine response switched from the reward (US) to the reward predictor (CS) Schultz et al 1997
Human VTA
Against classical view
Striatum
ACC
Drugs as rewards
Prediction errors in social domain
Same regions that are important for prosocial behaviour (making decisions to benefit others) receive projections from dopamine neurons that encode RPE
Reinforcement learning helps us learn what actions will benefit ourselves.
More empathetic people learn more quickly about outcomes for other people than those low in empathy (Lockwood et al 2016)
Can we use the sample principle to understand how we learn what actions will benefit others?
Subgenual ACC only
encodes prosocial prediction errors.
Ventral striatum encodes all prediction errors
regardless of who’s benefitting.
Rats hooked up intravenously or intracranially to to equipment that allows for dopamine to be released when they make a lever press (an operant response)
Do drugs modify responses to natural rewards or constitute rewards in their own right – engaging existing mechanisms?
Over time rats start to repeatedly self-administer dopamine to themselves. They find dopamine, in itself, rewarding
Cocaine works directly on dopamine neurons by blocking dopamine re-uptake and hence increasing the concentration of dopamine at the synapse
Drugs of abuse that mimic or boost the phasic dopamine reward prediction error might generate a powerful teaching signal and might even produce behavioural changes
Rats and temporal coding - cocaine quicker, marijuana slower
Another major target of dopamine neurons is the anterior and mid portions of the Cingulate Cortex
Engaged by many different processes, but also contains neurons which respond to rewarding stimuli.
Recorded from the ACC when monkeys were making choices between reward probabilities (Kennerley et al 2011)
They found neurons that responded to positive PEs, separate neurons that responded to negative PEs and a third set of neurons that responded to both.
fMRI in humans performing task where there is a subgoal leading to an overall rewarding outcome (Ribas-Fernandes 2011)
found activity in ACC signalled PE for indiv actions leading to a goal
thus PE in ACC may relate to the performance of indiv actions that lead to a goal
A major target of dopamine neurons in the brain
• The striatum is a part of the basal banglia
• Can be divided into dorsal and ventral striatum.
• Dorsal: two adjacent but anatomically separated groups of neurons:
• Ventral:
the caudate
the putamen
nucleus accumbens
olfactory turberde
Pessiglione et al., (2006)
Errors in the prediction of rewards signal the inappropriate nature of the actions performed to obtain them
can be + in response to unexpected rewards
can be - in response to the absence of a predicted reward
in both cases the outcomes are unexpected and drive learning
D'Ardenne et al 2008
Task: Guess whether the number on the right of the screen would be greater or less than the number on the left (max 10).
They won $1 if they guessed correctly and lost $1 if they guessed incorrectly
Ss played game while undergoing fMRI scan
They found activity in the VTA that increased when they unexpectedly got the $1 and was greater as the probability of reward decreased
No response for negative prediction errors.
Participants administered with either L-Dopa or Haloperidol
The signal in the Ventral Striatum was consistent with the RPEs
Choice between two abstract stimuli associated with different probabilities of winning £1
L-Dopa enhanced the size of the signal and its behavioural effects
Haloperidol reduced the magnitiude of the RPE signal and its behavioural effects
New data suggests dopamine isn’t just crucial for reward learning and doesn’t just signal RPEs
And further studies have shown a role for dopamine in effort-based decision-making (see Phillips, Walton et al., 2007)
Other studies have shown that dopamine neurons signal during cognitive tasks, which doesn’t fit with the classical view (Matsumoto & Takada, 2013)
Some studies have extended the hypothesis to include other influences being integrated with reward prediction, such as goaldirected movement (Syed et al., 2016)
Matsumoto 2013
Dopamine neurons signal the reward-task effort
Essay questions
Reading
Behrens et al 2008
click to edit
Social learning is widely held to be distinct from other forms of learning in its mechanism and neural implementation; it is often assumed to compete with simpler mechanisms, such as reward-based associative learning, to drive behaviour
Recently, neural signals have been observed during social exchange reminiscent of signals seen in studies of associative learning
We find that key computational variables for learning in the social and reward domains are processed in a similar fashion, but in parallel neural processing streams
Two neighbouring divisions of the anterior cingulate cortex were central to learning about social and reward-based information, and for determining the extent to which each source of information guides behaviour
When making a decision, however, the information learnt using these parallel streams was combined within ventromedial prefrontal cortex
These findings suggest that human social valuation can be realized by means of the same associative processes previously established for learning other, simpler, features of the environment.
D'Ardenne et al. 2008
Current theories hypothesize that dopamine neuronal firing encodes reward prediction errors.
Although studies in nonhuman species provide direct support for this theory, fMRI studies in humans have focused on brain areas targeted by dopamine neurons (ventral striatum) rather than on brainstem dopaminergic nuclei (VTA) and substantia nigra.
When primary rewards were used in an experiment, the VTA blood oxygen level-dependent (BOLD) response reflected a positive reward prediction error, whereas the VStr encoded positive and negative reward prediction errors
When monetary gains and losses were used, VTA BOLD responses reflected positive reward prediction errors modulated by the probability of winning
We detected no significant VTA BOLD response to nonrewarding events.
conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.
Furthermore, the findings might provide insight into models of clinical disorders in which dopamine is implicated, and for which L-DOPA and haloperidol are used as therapeutic agents, such as Parkinson's disease and schizophrenia
For example, it offers a potential mechanism for the development of compulsive behaviours (such as overeating, hypersexuality and pathological gambling) induced by dopamine replacement therapy in patients with Parkinson's disease