Decision making under partial uncertainty of the action results: a combined MEG and pupillometry study

How does learning occur under natural conditions, that is, on the basis of trial and error, in the absence of known rules and even without guaranteed reinforcement of the target behavior? In a large sample of healthy adults we have shown that to develop a preference for one stimulus from a pair, it was sufficient to reinforce its choice at least twice as often as the choice of another one. Under the experiment the majority of participants experienced illusion of regularity of the order of rewards and punishments. They persistently tried to recognize the pattern of the reinforcements by switching from one stimulus to another, although an objectively beneficial strategy would be to always choose the most often reinforced item. In the course of the experiment, the probability of such choices steadily increased, but only a small part of the participants (20%) completely stopped choosing an unprofitable alternative. We found that already before presentation of an item, the relative dilatation of the pupil reliably predicted the subsequent unprofitable decision. This reaction was present in the subjects already after several trials, when a statistical preference for one stimulus over another was yet lacking. Thus, we have obtained evidence that a pragmatic evaluation of the situation takes place very early – possibly even before it becomes conscious or manifests itself in behavior. At present, we are analyzing the magneto-encephalographic potentials associated with profitable and unprofitable choices, which would allow us to more fully uncover the mechanisms of spontaneous learning in a probabilistic environment.