Researchers decode the brain mechanism that links repeated actions and reward to form stable habits

A study published in the journal Nature and coordinated by the Sainsbury Wellcome Centre in London, with the participation of the IDIBAPS-Hospital Clínic, used computational models and animal experiments to demonstrate that the dopamine signal related to movement helps to consolidate actions regardless of their reward. This mechanism plays a key role in habit formation.

Animals and humans rely on two main strategies when repeating actions. First, they repeat actions that previously brought them a reward, which corresponds to value-based learning. Learning here is based on a reward prediction error, which reflects the difference between the expected reward and the actual one received. Second, they repeat actions that they have performed in the past, even if they were not associated with any reward. This strategy involves an action prediction error, which occurs when there is a mismatch between the action performed and the one expected to be carried out. From a computational point of view, remembering past actions is a simpler and more efficient approach for producing automated behaviours. Both of these learning mechanisms are controlled by dopamine, though each is associated with a different type of signal.

The primary goal of this study was to investigate whether dopaminergic activity related to movement, rather than reward, can encode action prediction error and serve as a learning signal. This would reinforce repeated associations between a stimulus and an action, ultimately leading to habit formation. To test this hypothesis, the researchers used an auditory task with mice, where they had to discriminate between sounds and respond to them with a specific action. The team measured and modified dopaminergic activity during the task, while also developing computational models to better understand the underlying mechanisms.

‘This study demonstrates that there are two types of dopaminergic prediction errors that work in a complementary way to enhance learning: the reward prediction error and the action prediction error’, explains Hernando Martínez Vergara, a former researcher at the Sainsbury Wellcome Centre in London, where he started this project as one of the first authors. He is also a Ramon i Cajal researcher at IDIBAPS.

The results indicate that dopaminergic activity in the caudate of the striatum is linked to movement and codes for action prediction error. This type of signal acts as a learning mechanism without the need for a reward, strengthening repeated associations that eventually become established as habits.

Imagen: Hernando Martínez Vergara, researcher of the study.

Paper of reference: Action prediction error: a value-free dopaminergic teaching signal that drives stable learning. Francesca Greenstreet et al. Nature. 2025.