Dopamine, Reward and Reinforcement

involved in learning, cognition, behaviour

operant and classical conditioning

can we find neurons that respond to reward-associated stimuli?

Even better – neurons that encode the unpredictability of the reward? This way,they might be implicated in learning.

Dopamine neurons

Dopamine neurons secrete dopamine to cells that
have dopamine receptors

Learning

reward prediction errors (RPE)

Dopamine neurons and RPE - classical view

The midbrain, particularly the ventral tegmental area (VTA), contains a large proportion of dopamine neurons

These neurons project to many other regions in the brain. The striatum and anterior cingulate cortex are two such areas.

Dopamine neurons respond phasically to the presentation of rewards (burst of firing following event).

Their activity is best explained in terms of learning about events that predict rewards.

Three types of reward-related neuronal activity

The presentation of an unexpected/more than expected reward (activation)

Stimuli that predict rewards (activation)

The failure of an expected reward (inhibition)

monkeys learned a task using applejuice for rewards

before task, dopamine neurons responded to unpredicted rewards

during learning reward became increasingly predictable and neuronal activity gradually decreased to baseline levels

There was no phasic activity when fully predictable rewards were delivered.

When expected rewards were omitted, there was a phasic decrease in activity at the time that the reward was expected

During learning the dopamine response switched from the reward (US) to the reward predictor (CS) Schultz et al 1997

dopamine

Human VTA

Against classical view

Striatum

ACC

Drugs as rewards

Prediction errors in social domain

Same regions that are important for prosocial behaviour (making decisions to benefit others) receive projections from dopamine neurons that encode RPE

Reinforcement learning helps us learn what actions will benefit ourselves.

More empathetic people learn more quickly about outcomes for other people than those low in empathy (Lockwood et al 2016)

Can we use the sample principle to understand how we learn what actions will benefit others?

Subgenual ACC only
encodes prosocial prediction errors.
Ventral striatum encodes all prediction errors
regardless of who’s benefitting.

Rats hooked up intravenously or intracranially to to equipment that allows for dopamine to be released when they make a lever press (an operant response)

Do drugs modify responses to natural rewards or constitute rewards in their own right – engaging existing mechanisms?

Over time rats start to repeatedly self-administer dopamine to themselves. They find dopamine, in itself, rewarding

Cocaine works directly on dopamine neurons by blocking dopamine re-uptake and hence increasing the concentration of dopamine at the synapse

Drugs of abuse that mimic or boost the phasic dopamine reward prediction error might generate a powerful teaching signal and might even produce behavioural changes

Rats and temporal coding - cocaine quicker, marijuana slower

Another major target of dopamine neurons is the anterior and mid portions of the Cingulate Cortex

Engaged by many different processes, but also contains neurons which respond to rewarding stimuli.

Recorded from the ACC when monkeys were making choices between reward probabilities (Kennerley et al 2011)

They found neurons that responded to positive PEs, separate neurons that responded to negative PEs and a third set of neurons that responded to both.

fMRI in humans performing task where there is a subgoal leading to an overall rewarding outcome (Ribas-Fernandes 2011)

found activity in ACC signalled PE for indiv actions leading to a goal

thus PE in ACC may relate to the performance of indiv actions that lead to a goal

A major target of dopamine neurons in the brain

• The striatum is a part of the basal banglia

• Can be divided into dorsal and ventral striatum.

• Dorsal: two adjacent but anatomically separated groups of neurons:

• Ventral:

the caudate

the putamen

nucleus accumbens

olfactory turberde

Pessiglione et al., (2006)

Errors in the prediction of rewards signal the inappropriate nature of the actions performed to obtain them

can be + in response to unexpected rewards

can be - in response to the absence of a predicted reward

in both cases the outcomes are unexpected and drive learning

D'Ardenne et al 2008

Task: Guess whether the number on the right of the screen would be greater or less than the number on the left (max 10).

They won $1 if they guessed correctly and lost $1 if they guessed incorrectly

Ss played game while undergoing fMRI scan

They found activity in the VTA that increased when they unexpectedly got the $1 and was greater as the probability of reward decreased

No response for negative prediction errors.

Participants administered with either L-Dopa or Haloperidol

The signal in the Ventral Striatum was consistent with the RPEs

Choice between two abstract stimuli associated with different probabilities of winning £1

L-Dopa enhanced the size of the signal and its behavioural effects

Haloperidol reduced the magnitiude of the RPE signal and its behavioural effects

New data suggests dopamine isn’t just crucial for reward learning and doesn’t just signal RPEs

And further studies have shown a role for dopamine in effort-based decision-making (see Phillips, Walton et al., 2007)

Other studies have shown that dopamine neurons signal during cognitive tasks, which doesn’t fit with the classical view (Matsumoto & Takada, 2013)

Some studies have extended the hypothesis to include other influences being integrated with reward prediction, such as goaldirected movement (Syed et al., 2016)

Matsumoto 2013

Dopamine neurons signal the reward-task effort

Essay questions

Reading

Behrens et al 2008

click to edit

Social learning is widely held to be distinct from other forms of learning in its mechanism and neural implementation; it is often assumed to compete with simpler mechanisms, such as reward-based associative learning, to drive behaviour

Recently, neural signals have been observed during social exchange reminiscent of signals seen in studies of associative learning

We find that key computational variables for learning in the social and reward domains are processed in a similar fashion, but in parallel neural processing streams

Two neighbouring divisions of the anterior cingulate cortex were central to learning about social and reward-based information, and for determining the extent to which each source of information guides behaviour

When making a decision, however, the information learnt using these parallel streams was combined within ventromedial prefrontal cortex

These findings suggest that human social valuation can be realized by means of the same associative processes previously established for learning other, simpler, features of the environment.

D'Ardenne et al. 2008

Current theories hypothesize that dopamine neuronal firing encodes reward prediction errors.

Although studies in nonhuman species provide direct support for this theory, fMRI studies in humans have focused on brain areas targeted by dopamine neurons (ventral striatum) rather than on brainstem dopaminergic nuclei (VTA) and substantia nigra.

When primary rewards were used in an experiment, the VTA blood oxygen level-dependent (BOLD) response reflected a positive reward prediction error, whereas the VStr encoded positive and negative reward prediction errors

When monetary gains and losses were used, VTA BOLD responses reflected positive reward prediction errors modulated by the probability of winning

We detected no significant VTA BOLD response to nonrewarding events.

conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.

Furthermore, the findings might provide insight into models of clinical disorders in which dopamine is implicated, and for which L-DOPA and haloperidol are used as therapeutic agents, such as Parkinson's disease and schizophrenia

For example, it offers a potential mechanism for the development of compulsive behaviours (such as overeating, hypersexuality and pathological gambling) induced by dopamine replacement therapy in patients with Parkinson's disease