Gatsby Computational Neuroscience Unit, UCL, London, UK

Wellcome Trust Centre for Neuroimaging, UCL, London, UK

Translational Neuromodeling Unit, University of Zurich and ETH Zurich, Switzerland

Department of Psychiatry, Psychotherapy and Psychosomatics, University Hospital of Psychiatry, Zurich, Switzerland

Department of Psychiatry, Harvard Medical School, MA, USA

Department of Psychology and Neuroscience, Duke University, NC, USA

Abstract

Background

Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge.

Methods

Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D_{2} agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate.

Results

MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D_{2} agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate.

Conclusion

Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior.

Background

Anhedonia is one of the cardinal symptoms for a clinical diagnosis of major depressive disorder (MDD;

Here, we attempt to distinguish two critical factors. The first factor is a reduction in the

The distinction between these two factors is sharp in the mathematical formulation of reward learning based on prediction errors that underpins the account of DA activity _{
t
}=1 if the participant did receive the reward on trial _{
t
}=0 if it did not. We write _{
t
} and expected

The critical factors that might be associated with anhedonia are the two parameters _{
t
}; thus, alterations in the amount of DA released per spike, or in the sensitivity of postsynaptic receptors should behave like a change in the

These quantities are formal parameters of a reinforcement learning rule. The question thus arises whether they can actually be distinguished in experimental practice. In this paper, we focus on objective measures of learning behavior. Crudely, since _{
t
}, whereas _{
t
} on

Task and typical behaviour.

**Task and typical behaviour.****A**: Task. Each trial had the following structure: 1) 500 ms presentation of a central fixation cross; 2) 500 ms presentation of face without a mouth; 3) 100 ms presentation of long (13 mm) or short (11.5 mm) mouth inside the face; 4) participants reported whether the mouth was long or short by key-press (‘Z’ or ‘/’ on US keyboard, counterbalanced); 5) Face without mouth remained on screen until participant response. Short and long stimuli were each presented 50 times per block in pseudorandom sequence avoiding more than three repetitions in a row. Adapted from **B**: Reward schedule. One response (counterbalanced across participants) had a higher reward expectation. Correct identification of that “rich” stimulus was more likely to be rewarded (75% probability) than correct identification of the other, “lean”, stimulus (30% probability). There was no punishment. If in doubt, choosing the more rewarded stimulus was beneficial. **C**: Surrogate simulated data showing prototypical response evolution. The dark bars show a hypothetical control group, developing a strong response bias towards the more rewarded response over the three blocks of 100 trials. The light bars show a prototypical treatment group with a reduced response bias. **D-E**: Surrogate simulated data generated from a simple reinforcement learning (‘Stimulus-action’) model. Both a reduction in reward sensitivity (**D**) and a reduction in learning rate (**E**) can roughly reproduce the pattern in the data (**C**). **F**: Percent correct responses for each of the 392 experimental sessions. Each black point represents one experimental session. Vertical bars demarcate datasets. Red horizontal line represents chance performance for each session. Four participants performed below chance (red). Sixty-three out of 392 experimental sessions were not fitted better than chance by model ‘Belief’ (binomial test; blue). Of these, 58 out of 63 were in the Stress dataset, in which performance was generally worst.

The fact that varying either parameter can lead to similar qualitative patterns shows that the two parameters play partially replaceable roles and may not be fully separable

To maximise the chance of identifying specific contributions of learning rate and reward sensitivity, we jointly analysed a series of datasets that are likely to differentially affect the two parameters (Table

**Name**

**Manipulation**

**Reference**

**Participants #**

Full details of all the patient and control groups are provided in the original publications.

Healthy

High

57

Healthy volunteers. Payment: US$5 and course credit.

MDD

MDD

48

23 participants during an episode of MDD and 25 controls matched for age, sex, education, ethnicity and marital status. MDD participants met DSM-IV criteria for MDD, had Hamilton Rating Scale for depression scores ≥ 17, and no other axis I comorbidities except for anxiety. Inclusion required a minimal drug-free period of 2 weeks. Payment: US$10/hour plus 5US$ average task earnings.

Hx

History of MDD

85

Currently healthy participants with and without a history of major depressive disorder (MDD). Participants received US$15/hr in compensation for their time, as well as their task “earnings” (on average, US$5).

BPD

BPD

19

Euthymic outpatients (matched to the same 25 controls as in dataset ‘MDD’). The outpatients had a long-standing diagnosis of Bipolar Disorder, currently satisfying criteria on the Affective Disorder Evaluation, which contains modified SCID mood and psychosis modules. Patients were classified as euthymic if they currently Young Mania Rating Scale

PPX

Pramipexole

24

Healthy volunteers received either placebo or a single dose of the D2/D3 agonist pramipexole hydrochloride (PPX) 0.5 mg 2 hours prior to the task. At this low dose, PPX is thought to reduce phasic DA release through autoreceptor stimulation. Payment: US$ 40 for the pharmacological session and US$24.60 for the task session.

Stress

Threat-of-shock acute

79(x2) +1

Healthy volunteers took part in the task twice (one missing session), once in a no-stress condition and once in a stress condition. Participants were told that poor performance on the task might lead to a shock being delivered through electrodes attached to the back of their neck. In the stress condition, they were told that this was quite likely, whereas they were told that no shock would be delived in the no-stress condition. No shocks were actually delivered. Notably, the version of the task used in this study was more difficult, with the difference in size between long and short mouth being smaller. This resulted in fewer correct discriminations (see Figure

Our main result is that measures of anhedonia and depression preferentially affected the reward sensitivity

Methods

Task and data

In this paper, we re-analyse 392 sessions of behavioural data derived from a probabilistic reward task (

where _{
r
} and _{
l
} indicate presentation of the rich and lean stimulus, respectively, _{1} and _{2} are the two possible key presses, and _{2} agonist pramipexole. The low dose (0.5 mg) of pramipexole was assumed to reduce phasic DA bursts to unexpected rewards due to presynaptic (autoreceptor) activation

**Measure**

**
N
**

**mean**

**median**

**1st quartile**

**3rd quartile**

BDI is the total Beck Depression Inventory II

BDI

366

8.3

6

1

11

BDA

281

1.9

0

1

3

BDI\A

281

6.9

1

5

10

Generalized distress depression (GDD)

276

23.2

16

20

30

Generalized distress anxiety (GDA)

276

18.8

15

18

24

Anxious anxiety (AA)

276

53.7

45

53.5

65

Anhedonic Depression (AD)

276

21.1

17

20.5

25

Full (BDI + BDA + MASQ subscores)

255

Reinforcement learning models

Reinforcement learning models account for every choice on every trial for every participant individually. Here, we describe the model for one particular participant. ‘Weights’ for emitting a particular choice are updated after every trial to predict the next choice. We consider a set of factors that might affect the weights, and use complexity-sensitive model comparison methods to try to identify the importance of each. Briefly, write _{
t
} for the participant’s choice on trial _{
t
} (long or short mouth) was presented, the model assigns to _{
t
} a probability _{
t
}|_{
t
}). This probability depends on the ‘weights’ _{
t
}. The mapping from weight to probability is made via a ‘softmax’ function so that a choice _{
t
} will be expected to be emitted more frequently the bigger the difference between its weight and the weight of the alternative choice, or more specifically:

The choice weights themselves change over time (hence the subscript on

The first of these terms, _{
t
} is the instructed choice for stimulus _{
t
} (for instance pressing ‘z’ for the long mouth) and is zero otherwise. The parameter

The second and the third term depend on the expected reward _{
t
} up to that point in time, which indicates whether a reward was delivered or not, an initial

After every choice, this

That is, after every trial, the expected reward _{
t
}=1) but the expectation _{
t
}=0). The larger

In the task, the mouth is only shown for a very short period of time. Thus participants cannot be sure which stimulus was actually presented, and, as experimenters, we cannot know what the participants perceived. This uncertainty has two consequences. First, it implies that the factor

Equations 3 and 4 comprise the full model ‘Belief’. We also considered two simpler variants, both of which had one fewer free parameter, and one more complicated variant, with an extra parameter. First, at

**Supplementary methods and results [**
**,**
**-**
**].**

Click here for file

Model fitting & comparison

Bayesian model comparison at the group level and model fitting procedures are described in detail in ^{−}. All parameters were represented as non-linearly transformed variables with support on the real line and normally distributed group priors.

More complex models will often fit the data better because they have more freedom. However, model complexity is better assessed by methods other than counting parameters

The same principles of model comparison also apply to the categorical question whether two groups differ in terms of their parameters. That is, when asking whether group A and B differ in terms of their reward sensitivity

Regression analyses

After model validation, we first assessed inter-correlations between specific questionnaire measures (AD, BDA, GDD, BDI\A, AA, GDA) and reward sensitivity or learning rate in the entire sample using one multiple linear regression analysis for

Results

Model validation

We built a set of models that embody key hypotheses about the course of learning in the different groups and fitted them to the data. The models parameterize the

Since we are interested in understanding the characteristics of groups of individuals, we need to ascertain at the group, rather than at the individual level, which model does best

The results are shown in Figure

Model performance.

**Model performance.****A:** Model comparison. Group-level log Bayes factors **B**: The parameter

Both the standard Rescorla-Wagner model ‘Stimulus-action’ with separate stimulus-choice values

Figure ^{−20}), which in turn frees the other parameters to capture trial-to-trial variation in the behaviour contingent on the reward outcomes.

Finally, Additional file

Regression analyses

Given that model ‘Belief’ captured the data satisfactorily, we proceeded to analyse the relationship between model parameters and self-report questionnaire measures. Our aims were primarily to identify correlations between measures of anhedonia and learning rates or reward sensitivities. A standard, unweighted, multiple linear regression analysis revealed a significant negative correlation between

As expected, questionnaire scores were substantially correlated (Figure

Correlates of anhedonia.

**Correlates of anhedonia.****A:** Correlation coefficients for all pairwise correlations between questionnaire measures. All are highly significant (**B**: Hierarchical weighted regression analysis across all datasets, involving all 255 participants with a full set of BDI, BDA and MASQ scores. The plots shows the linear coefficients between anhedonic depression (AD) score and the reward sensitivity and learning rate parameters **C**: Scatter plot of anhedonic depression against reward sensitivity. Size of dots scale with weight (inference precision). **D**: Scatter plot of reward sensitivity vs. learning rate. **E**: Significance of correlations across parameter estimates from 70 surrogate datasets. There is a consistent and stably significant correlation between AD and reward sensitivity

Next, there was a negative linear correlation between

At least part of the correlation between

Finally, all correlation analyses using the reward sensitivity and learning parameters inferred from the second-best model ‘Action’ yielded the same results, showing that the results are not dependent on a particular model formulation.

Categorical comparisons

We next examined how learning rate and reward sensitivity were affected by the factors explored in each of the individual datasets. For each dataset, we compared two models: one which assumes that the two experimental groups differed in terms of

Figure

Comparing models incorporating hypotheses about group differences in terms of reward sensitivity

**Comparing models incorporating hypotheses about group differences in terms of reward sensitivity**** or learning rate****.** Each bar of the large panel shows the Bayes factor comparing _{ρ} to _{ϵ}. Green bars indicate very strong evidence for model _{ρ} (Bayes factor ≥20), yellow bars weak evidence (Bayes factor 3−10) and cyan bars insufficient evidence (Bayes factor <3). MDD and high scores of anhedonic depression (AD) result in a reduction in

However, the Bayes factors comparing models

Discussion

Our results suggest that anhedonia (as measured by AD) and MDD affect appetitive learning more by reducing the primary sensitivity to rewards

Anhedonia

Two self-report measures of anhedonia were used in this paper: the anhedonic depression subscore of the MASQ questionnaire, and the anhedonic subscore of the BDI. The former was clearly related to the reward sensitivity

By contrast, in our paradigm,

At a neurobiological level, ‘liking’, has been linked to

Dopamine in depression

The temporal difference learning model of phasic dopamine signalling posits that DA reports the prediction error

In the current study, we report tentative findings suggesting that pramipexole reduced the learning rate rather than affecting the reward sensitivity, which might correspond to a direct reduction in the signal reported by phasic DA. Although pramipexole is a non-ergot D _{2}/_{3}
_{2} agonist cabergoline have been found to specifically reduce reward go learning _{2} receptors, which have a higher affinity for DA

As indicated in the introduction, DA has multiple and profound involvements with depression. These range from the fact that DA manipulations affect mood

Overall, our findings are consistent with the fact that DA by itself is not a major target for psychopharmacological treatment of anhedonia

Analysis methods

Our conclusions were derived from a detailed, model-based meta-analysis combining behavioural data from six datasets in 392 sessions. Several features and limitations of this analysis deserve comment.

First, the interpretability of the parameters is maximised by using rigorous model comparisons. The Bayesian approach we used prevents overfitting by

Second, our approach takes individual model fits into account. The regression analyses are true random effects analyses, weighting parameters by how strongly they are constrained by each participant’s own choice data, and by how well the model fits that particular participant. This, in combination with the explicit modelling of stimulus uncertainty (beliefs) and instruction weights, ensures that any non-specific performance variability does not unduly affect our parameters of interest. Furthermore, the weighted regression ensures that each participant influences the conclusions proportionally to how well they are fit by the model.

Third, it is standard practice to constrain the parameters when fitting models, for instance to avoid extreme outlier inference. We use two types of constraints. The parameter transformations generate hard constraints that force parameters to remain inside feasible regions. The empirical Bayesian inference of the group priors additionally yields the most appropriate soft constraints

Fourth, learning rate and reward sensitivity were correlated in all models tested. To alleviate this, we enforced independence at the group level. One standard approach would have been to compare models in which one dimension is constrained by forcing the parameters for all participants along that dimension to be equal. However, this a) is an unrealistic constraint; b) fails to address the fact that parameters may not give the model equal flexibility; and c) renders parameters hard to interpret as variability from one is squeezed into all the other parameters (which further aggravates point b). To circumvent these issues, we contrasted the parsimony of models that explicitly allowed participants to fall into distinct groups. Doing this at the group level addressed the questions at the group level, which is where we sought to draw conclusions. We also performed the regression analysis in a number of ways to assess the selectivity of the relationship with

Fifth, it is important to note that our failure to discover correlations between AD and learning rate, or between the effects of pramipexol and reward sensitivity might be due to limits of power. This is particularly true for the categorical comparisons on which arguments about the differential effect of dopamine and anhedonia or depression rest mainly. It will be critical to replicate these findings in larger samples, potentially requiring paradigms explicitly adapted to separating the two factors.

Sixth, we find no evidence that either learning rate of reward sensitivity clearly separates any of the groups when comparing the basic model to models

Of course, even though our best fitting model did an excellent job predicting the data of a plurality of participants, there could be a model that we did not try that would do even better. This is particularly true of the participants in the Stress dataset, who were fit the worst. There is a conventional ANOVA-like procedure in these circumstances that involves assessing the extent to which responses are potentially predictable

Alternative models

One advantage of the simplicity of the task is that it is likely insensitive to several aspects of reinforcement learning that are under current investigation, such as goal-directed

This elaborate account raises an important question about the factor _{
t
}; but noted that this is exactly confounded in the magnitude of

Equally, in conventional reinforcement learning models, it is common to employ a variant of Equation 2 in which the terms _{
t
} and

Finally, two relevant findings may be important for future model development. First, positive affective responses to positive events in daily life have recently been found to be

Conclusions

This paper presented a model-based meta-analysis of behavioural data spanning several related manipulations and adds to a growing literature of behavioural correlates of depression

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

QJMH and PD performed the computational modelling analysis. DAP and RB devised the experiment and collected the data. All authors wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank the member of the Affective Neuroscience Laboratory for their assistance with collection of the data analysed in this manuscript.

**Funding**

Funding from the Gatsby Charitable Foundation (QH, PD) and Deutsche Forschungsgemeinschaft DFG GZ RA/1047/2-1 (QH). Collection of datasets presented in the current manuscript was supported by grants from the National Institute of Mental Health (R01 MH068376, R21 MH078979, R01 MH095809) and National Center for Complementary & Alternative Medicine (R21AT002974) awarded to DAP. DAP has also received consulting fees from ANT North America Inc. (Advanced Neuro Technology), AstraZeneca, Ono Pharma USA, Shire and Servier, as well as honoraria from AstraZeneca for projects unrelated to this study.