An optimal decision-making strategy emerges from non-stop learning

Unlike machines, the behaviour of animals and humans almost always has an element of unpredictability. Countless experiments have shown that our responses to the exact same challenge are sometimes faster, sometimes slower, sometimes correct and sometimes wrong.

In the field of neuroscience, this variability is often attributed to what is called "noise". An ever-present "neural babble" that influences the way brains process and respond to incoming information.

A new collaborative study in rodents by a team of scientists from the Champalimaud Centre for the Unknown in Portugal, Harvard Medical School in the US, and the University of Geneva in Switzerland, shows that, in fact, this variability could sometimes be wrongly interpreted as noise. Instead, it may actually be the reflection of a behavioural strategy that was overlooked due to prior assumptions about how the subject should behave. Their results - published today (June 2nd) in the scientific journal Nature Communications - call into question what "optimal behaviour" really means.

An unexpected strategy

"It all started with a simple experiment", recalls Maria Inês Vicente, who collected the experimental data as part of her graduate work at the Champalimaud Centre for the Unknown and is currently working at Leiden University . "We took two different odours and created several mixtures of the two. During the experiment, the different mixtures were presented to the rats, one at a time. On each trial, the rats had to report which of two odours was more dominant. If it thought the answer was odour A, it would approach a water spout on the right, and if it opted for odour B, it would go to the left. Some mixtures had much more of one odour compared with the other, making it easier to tell which was more salient. Whereas in other mixtures, the difference was more subtle. If the rat got the correct answer, it received a water reward."

The researchers recorded how quickly the rats responded and whether their answer was right or wrong. To their surprise, when they analyzed the data, they realized that the rats' behaviour didn't follow a common decision-making rule. "In these types of tasks we tend to see a clear dependency between difficulty and decision time: on the harder, more subtle trials, animals (and humans) take longer to decide than on easy trials", says André Mendonça of the Champalimaud Centre for the Unknown. "Instead, our rats would take, on average, the same amount of time to make both hard and easy decisions."

"The explanation for this unexpected observation wasn't easy to come by", adds Jan Drugowitsch, a co-author affiliated with Harvard Medical School. "Finally, we found it by constructing a mathematical model that united separate branches in the field of decision-making. In a sense, our goal was to replicate the rats' behaviour in a 'machine's brain' with the hope of discovering the underlying variables that produced this surprising result."

The model revealed an unexpected strategy. On each trial, the rat was readjusting its behaviour according to the results of the previous trial. If the rat was correct in one trial, it would be biased towards the same odour in the next one. And vice versa, an incorrect response in one trial would lead to switching in the next.

Why did the animals adopt this particular strategy? "This strategy is consistent with a world-view where the environment is continuously changing, which leads the animals to update their decision-making approach on a trial-by-trial basis. From the outside, their behaviour appears highly variable but in fact they were just adapting too quickly. That is why it would have been easy to wrongly interpret this variablity as 'just' noise", Drugowitsch points out.

Optimality is in the eye of the beholder

Why did the rats opt for a different strategy from the expected one? The authors explain that there are several reasons, the first is the nature of the task. "There isn't just one type of sensory discrimination task", says Mendonça. "Various elements in the design of the task may draw out different decision-making strategies. For instance, if we had asked rats to localize the side where a sound comes from instead of discriminating between odours, their strategy would have aligned with our initial expectation. This is because there is a 'built-in' right-left category in the brain for certain sensory modalities that are naturally spatially separated, but that's not the case for olfaction."

Another reason is confidence. "Just like humans, rats appear to evaluate their own decisions and change their behaviour accordingly. When you are very confident and end up making the correct decision, there's really not much to learn. But what happens when you're confident, but then find out that you're actually wrong? In this case, you should change your behaviour drastically. Which is precisely what we saw with our rats", says Zachary Mainen, one of the group leaders who headed the study and who is affiliated with the Champalimaud Centre for the Unknown.

According to the authors, another explanation for the rat's choice of strategy is their "hard-wired" circuitry for learning. "Ironically, if they would not constantly readjust their responses according to the outcome of the last trial, they would actually do better. In fact, what we were originally expecting them to do is to construct an 'odour A - odour B' category and implement it", points out Alex Pouget, who is a group leader at the University of Geneva and co-author of the study. "Still, the rats' strategy makes sense."

As the authors explain, this observation doesn't mean the rat is a maladapted animal, on the contrary, they claim that the scientific community should reconsider what they define as "optimal behaviour". "Rats have evolved over millions of years to search and explore an ever-changing environment. Therefore, when we assess the behaviour of these animals, we should remember that it's not necessarily only about performance per se. Optimality should depend both on the problem at hand and the nature of the problem-solver", Pouget argues.

"We believe that our work is a good starting point for exploring further how different subfields of decision-making may interact. We also hope that other scientists will use and refine our models in follow-up experiments. It would be fascinating and informative to see when, how and why our model starts to fail. Making an error is an opportunity for learning something new, and that is both the result and take-home message of our study", Mendonça concludes.

Credit:

Champalimaud Centre for the Unknown