Negative Reinforcement Using Operant Conditioning

Operant conditioning is a powerful learning process, which results in the association of a behavior with an outcome. Our actions are followed by consequences; if the consequences are positive, we learn that this action is beneficial for us and we tend to perform it more. On the contrary, if an action is followed by negative consequences, we learn that we should avoid acting this way.

Let’s take a look at simple classical conditioning before reviewing operant conditioning and, specifically, negative reinforcement.

Pavlovian Classical Conditioning vs. Operant Conditioning

The key element in operant conditioning is the action. However, other forms of learning, such as Pavlovian classical conditioning, may create the association of two stimuli. For example, in his famous experiments Pavlov used dogs and associated food preservation with a bell ring. Dogs normally salivate when food is presented. Pavlov always rang a bell before presenting food to the dogs, and eventually the dogs learned that when the bell rings then food is provided. They started to salivate upon the bell ring even without seeing or smelling food.

Brief Review of Operant Conditioning

In operant conditioning, every action has an effect and this effect determines whether the action is encouraged or discouraged. In scientific terms, the encouragement of a behavior is achieved by reinforcement, while the discouragement of a behavior happens through punishment. Both reinforcement and punishment can be further characterized and subdivided as being positive or negative. Keep in mind that positive and negative does not reflect whether we like or not the consequences. Positive and negative refers to the outcome, whether something was added or removed, if we get something new or we lose something that we already have.

Positive protocols shape an action so it ultimately results in the addition of a new stimulus or outcome. If the outcome is pleasurable/favorable, then we have a positive reinforcement protocol that strengthens this behavior. If the outcome is unpleasant/unfavorable, we have a positive punishment protocol that weakens the expression of the behavior.
Negative protocols are characterized by the removal of an existing stimulus upon the completion of the action. If what is being removed is pleasurable for the learner, such as access to food or money, then we have a negative punishment experiment and the expression of the conditioned behavior will decrease. If the action results in the removal of an aversive factor, such as a loud background noise, then the associated behavior will increase and we have a negative reinforcement

Both positive and negative reinforcement will always result in the increase of the behavior, while both positive and negative punishment will ultimately decrease the behavior.

Now, it’s time to take a deep look at negative reinforcement. We will discuss experimental protocols, the neurobiological substrate of negative reinforcement and the translational value of negative reinforcement experiments for humans.

Negative Reinforcement Experiments

All negative reinforcement experiments with rodents are performed in the operant conditioning apparatus, also known as the Skinner box. The operant chamber is essentially a large box that contains one or more levers, an electrifiable floor and optionally a light or sound indicator.

The general principle of negative reinforcement experiments with rodents is that the rodents will learn to press the lever in order to avoid an aversive stimulus, such as a mild electric shock. As originally described by B.F. Skinner, negative reinforcement can be studied in two experimental paradigms, escape learning and active avoidance.

Escape Learning: Negative reinforcement experimental paradigm

In escape learning, a rodent is placed in the operant conditioning chamber and an aversive factor, such as a mild electric shock, is constantly presented. The rodent is not trained on how to terminate the aversive factor, so it just moves around until it accidentally presses the lever. When this happens, the aversive factor is removed, and in our example the electric shock stops. After repeating the same process for a couple of times, once placed in the experimental chamber, the rodent will directly move towards the lever and press it in order to stop the electric shock. The rodent learns to press the lever to escape from an existing aversive signal.

This is negative reinforcement because the rodent is learning an action (level-pressing) in order to remove an aversive stimuli (foot-shock).

Avoidance Learning: Negative reinforcement experimental paradigm

In avoidance learning, the rodent associates the presentation of a neutral stimulus, for example a light or sound signal, with the onset of an aversive stimulus. First, for avoidance learning experiments, the rodent must successfully associate the light signal with the electric shock. To achieve this, the experimenter must train the rodent by providing the light signal prior to the electric shock in a timely manner. For example the electric shock can be delivered 5 seconds after the light signal. Once this association is made, the rodent will learn to press the lever upon seeing the light signal, in order to avoid the following electric shock.

To summarize, in escape learning experiments, a rodent is performing a behavior in order to escape an aversive stimulus. By contrast, in avoidance learning, a rodent learns to perform a behavior in order to avoid an aversive stimulus. Thus, in avoidance learning, the rodent presses the lever to prevent the occurence of the aversive consequence, which is not currently happening. Whereas in escape learning, the aversive stimulus is happening, and the rodent tries to stop this aversive stimulus from continuing to happen.

A Modified Experimental Protocol for Negative Reinforcement

Based on these principles, several modified experimental protocols have been developed to study negative reinforcement.

For example, Inozemtsev and colleagues used a dual compartment operant conditioning apparatus to study active avoidance. In their setup, the rodents were placed in one compartment and a mild electric shock was delivered. They could only avoid the shock by escaping to the other compartment.

Their experiment began with a training session, during which a 10-second light signal acted as a conditioned stimulus. Then, after a 10-second delay, a mild electric shock was administered. During this initial training session, the rodents associated the light signal with the administration of electric shock. Additionally, they learned that they could escape the shock by moving to the other compartment. Thus, they performed a simple version of escape learning.

As the training period continued, the rodents learned to move to the other compartment immediately upon the light signal. By doing so, they could avoid entirely the delivery of the electric shock. Thus, they successfully achieved avoidance learning.

Then, the researchers proceeded with the testing session. They divided the rodents in two experimental groups, one control and one receiving a drug of interest. Then, they assessed the number of avoidance responses (when the rodent changed compartment upon light presentation), as well the escape responses (when the rodent changed compartment upon electric shock presentation).

An index of learning was calculated which expresses the rodents’ avoidance or escape responses as a percentage of total number of presentations. The researchers concluded that as a result of the drug, the rodents had improved learning performance.

Schedule of Reinforcement

Both positive and negative reinforcement protocols rely heavily on the schedule of reinforcement to produce results. The schedule of reinforcement is the rule that determines the relationship of the response with the outcome.

Specifically, the schedule of reinforcement can be either continuous or intermittent (fixed ratio, fixed interval, variable ratio, variable interval). Each schedule presents their own characteristics in terms of how quickly it can be learned and forgotten.

Continuous reinforcement protocols instruct that the outcome occurs every time the response is made. For example, in a negative reinforcement experiment under a continuous schedule, every time the rodent presses the lever, it results in the elimination of the electric shock.
Fixed ratio protocols demand that the outcome occurs after a specific number of repetitions of the response. For example, the rodent must press the lever 5 times in order for the electric shock to stop.
Fixed interval protocols require that the outcome occurs after a predetermined period of time, if even one response is made. For example, having a 10-second interval means that the electric shock will stop after 10 seconds even if the rodent presses the lever beforehand. There is a minimum time period that must pass before the outcome is presented. The shock will stop only if the rodent presses the lever at least once and the minimum time period has passed.
Variable ratio protocols are similar to the fixed ratio. The difference is that the number of responses that are necessary to elicit the outcome is not predetermined, but change following a specific rule. This rule may be an increasing, decreasing or random pattern, depending on the protocol. So, the rodent may need to press the lever initially once, then twice, then 5 times in order to stop the administration of the shock.
Variable interval protocols follow the same logic. The response will elicit the outcome if it occurs at least once during the interval period, which is now different in each trial. For example, in the first trial the electric shock will stop after 10 seconds if the rodent has pressed the lever at least once. Then, in the following trial the electric shock will stop after 20 seconds given that the rodent has pressed the lever.

Learning under a continuous reinforcement protocol is fast. However, extinction of the learnt behavior also occurs quickly when the rule is removed, indicating that the learning process is unstable. By contrast, intermittent protocols, and especially variable ratio and interval, require more time for the learning process to be established. But, the results are more sustainable over time and extinction is more difficult to take its course. It depends on the experimenter and the specific parameters of each scientific question to determine which schedule of reinforcement is most suitable for their needs.

Neural Correlates of Negative Reinforcement Learning

Reinforcement learning relies on the reward system of the brain. This can be easily understood when we think of positive reinforcement. If by performing an action we get a pleasant reward, such as food or sweet, it is obvious that our reward system is activated and participates in our learning experience. The case is similar for negative reinforcement . Here, however, the reward is less straightforward but is nonetheless provided by removing or avoiding a disturbing factor. Yet, it is still enough to activate the reward system

The Reward System

The central player of the reward system is the ventral tegmental area (VTA) and, specifically, its dopaminergic projections to the nucleus accumbens (NAcc). In our article on positive reinforcement we discussed the reward dopaminergic system and particularly the implication of the VTA and NAcc in reinforcement learning.

An interesting study aiming to reveal the underlying neural mechanisms of negative reinforcement used the following experimental approach. They hypothesized that relief of an ongoing pain would act as a negative reinforcer and shape the animal’s behavior. Their experiment was not a typical operant conditioning experiment employing a lever and an electrifiable floor, but used a conditioned place preference (CCP) readout. Yet, their findings on the neuronal circuit of negative reinforcement can be extended and applied to negative reinforcement in operant conditioning. Using rats, they were able to show that pain relief acts as a negative reinforcer and induces CCP, a behavior that is otherwise abolished by the use of drugs that interfere with the dopaminergic circuit. Their results suggest that midbrain dopamine neurons are activated following termination of an aversive state and may underlie negative reinforcement, similar to what has been characterized for positive reinforcement.^[2]

The Prefrontal Cortex

Apart from the dopaminergic neuronal cells of the reward circuit, additional brain regions have been associated with negative reinforcement learning. First is the prefrontal cortex (PFC), a highly interconnected area that oversees executive functions, such as attention and behavioral control. Lesion studies from rodents, mammals and human patients show that intact function of PFC structures, such as the orbitofrontal cortex, is necessary for the execution of negative reinforcement and specifically for avoidance learning using a discriminative stimuli. ^[3]

Hippocampus and Entorhinal Cortex

Next on the list follow the hippocampus and entorhinal cortex. These two areas are the major sites of adult neurogenesis and synaptic plasticity, processes that have been characterized as the molecular mechanisms of learning. Studies in rodents have shown that these areas are necessary for forming associations between the action and the consequence.^[3] However, taking into account that the hippocampus additionally participates in coding environment-related information, it has been proposed that hippocampal lesions facilitate reinforcement learning when the outcome is delayed to the response. Since a delayed outcome may lead to the association of the outcome with an independent environmental factor rather than the response, it has been shown that a dysfunctional hippocampus protects from this wrong association, because the information about the environment is not properly encoded.^[4]

Scientific Questions Addressable by Negative Reinforcement Experiments

Negative reinforcement is a powerful learning technique that applies in a broad spectrum of everyday life activities. We follow the rules because we want to avoid the negative consequences of breaking them. Punishment, such as paying a fine is an effective way to learn what we should not do. But we should be informed about the things we should do.

Reinforcement, both positive and negative, helps us to understand which actions we should perform more often. Even without our conscious understanding, we constantly act under negative reinforcement. In our cars we wear a seatbelt to avoid listening to the annoying warning sound. We stop at red lights,don’t drink and drive, or speed to avoid getting a ticket or provoking an accident. We refuse to participate in “get rich quick” email schemes to avoid losing our money.

Negative Reinforcement Possibly Implicated in Alcohol Dependence

Reinforcement learning is a technique. As such, it does not discriminate whether what we are learning is beneficial or damaging. For example, several lines of evidence support that negative reinforcement is implicated in alcohol dependence. It is proposed that the withdrawal symptoms act as negative reinforcers, and under an escape learning paradigm individuals return to alcohol abuse in order to terminate or avoid the distressing physical and psychological symptoms.

Insights from the Iowa Gambling Test

Using a modified Iowa Gambling Test that separately evaluates positive and negative reinforcement, Thompson and colleagues showed that substance dependent individuals were impaired in learning to avoid negative outcomes when receiving negative feedback. As the magnitude of loss increased for the bad deck, the control individuals learned to pass on this bad deck, while the substance dependent individuals would not learn to pass irrespectively of the magnitude of loss. Interestingly, both the control and the substance dependent groups were sensitive to the increasing frequencies of loss. Extrapolating these findings in the real world of alcohol dependence, the authors propose that repeated episodes of withdrawal, rather than a single heavy episode, may drive relapse to alcohol abuse.^[5]

Autism Spectrum Disorders

Moreover, several lines of evidence converge on the role of negative reinforcement learning in autism spectrum disorders (ASD). The circuitry that supports negative reinforcement is atypically developed in ASD patients.^[6]Furthermore, deficiencies in social negative reinforcement are hypothesized to contribute to the social motivational deficits observed in ASD patients.^[7]

Unanswered Questions

Even though negative reinforcement has been studied for many years, there are still a few open questions that warrant our investigation:

What is the exact neuronal circuit that functions during negative reinforcement? How is it different from the circuit responsible for positive reinforcement? Novel technological advances, such as optogenetic manipulation of specific neurons in vivo could answer this question with unprecedented accuracy.
Which schedule of reinforcement is more suitable to help sensitive populations, such as alcohol dependent individuals or ASD patients? Which specific factors determine the efficacy of a negative reinforcement protocol? Is it related to our brain function? The development and wide use of functional and structural brain imaging techniques, such as fMRI and tractography, could help us delineate the complex underlying mechanisms, for healthy individuals and patients alike.
Are there any pharmacological interventions that could facilitate or block harmful negative reinforcement learning? The use of wild type or disease model rodents is critical to answer this question. By taking advantage of the variety of rodent models for the same disease, which have genetic or environmental etiology, one can extract valid conclusions that could be translated for human pathologies.

Criticism on the Binary Classification of Reinforcement

There is an ongoing discussion on whether positive and negative reinforcement should be studied as different entities. Original criticism began in 1975 by a seminal paper published by J. Michael.^[8] More recently, A. Baron and M. Gazilio revisited this discussion about whether positive and negative reinforcement should still be assessed separately or in their essence are the two sides of the same coin.^[9]

For example, food motivated lever pressing is a major experimental paradigm of positive reinforcement. Yet, one could support that presentation of food ends a period of lack of food, thus the test subject learns to perform the task driven by negative reinforcement. The rodent presses the lever to avoid hunger; once phrased like this, it becomes a typical example of negative reinforcement.

Another example is provided by heat reinforcement experiments. Rodents kept in a cold chamber will learn to press a lever that turns on a heating lamp. This experiment is described as positive reinforcement, because lever pressing results in the presentation of heat. Conversely, assessing the experiment on a different perspective, one could say that lever pressing results at the termination of the cold environmental temperature, and so it is a negative reinforcement experiment. Similarly, electric shock avoidance could be explained as onset of safety, adopting a positive reinforcer identity.

Conclusion

The criticism mentioned above appears rational and the scientific community has not ultimately decided whether there is an actual difference in positive and negative reinforcement. Yet, these terms are still being used in the majority of scientific publications. This is probably due to the long tradition that stands behind this terminology and extends back to the original description of operant conditioning by B.F Skinner in 1939.^[8]

Nonetheless, negative reinforcement remains widely applicable in our everyday life and constitutes a valuable alternative to positive reinforcement learning. Overuse of positive reinforcers may lead to satiety effects, ultimately losing their efficiency..

Therefore, the combinatorial application of different operant conditioning techniques allows the effective learning of multiple rules at once. A strong understanding of negative reinforcement allows us to achieve more prominent results, ultimately benefiting ourselves and society.

References

Inozemtsev, A. N., Berezhnoy, D. S., Fedorova, T. N., & Stvolinsky, S. L. (2014). The effect of the natural dipeptide carnosine on learning of rats under the conditions of negative reinforcement. In Doklady Biological Sciences (Vol. 454, No. 1, p. 16). Springer Science & Business Media.
Navratilova, E., Xie, J. Y., Okun, A., Qu, C., Eyde, N., Ci, S., … & Porreca, F. (2012). Pain relief produces negative reinforcement through activation of mesolimbic reward–valuation circuitry. Proceedings of the National Academy of Sciences, 109(50), 20709-20713.
Guerra, L. G. G. C., & Silva, M. T. A. (2010). Learning processes and the neural analysis of conditioning. Psychology & Neuroscience, 3(2), 195.
Cheung, T. H., & Cardinal, R. N. (2005). Hippocampal lesions facilitate instrumental learning with delayed reinforcement but induce impulsive choice in rats. BMC neuroscience, 6(1), 36.
Thompson, L. L., Claus, E. D., Mikulich-Gilbertson, S. K., Banich, M. T., Crowley, T., Krmpotich, T., … & Tanabe, J. (2012). Negative reinforcement learning is affected in substance dependence. Drug and alcohol dependence, 123(1-3), 84-90.
Schuetze, M., Rohr, C. S., Dewey, D., McCrimmon, A., & Bray, S. (2017). Reinforcement Learning in Autism Spectrum Disorder. Frontiers in psychology, 8, 2035. doi:10.3389/fpsyg.2017.02035
Damiano, C. R., Cockrell, D. C., Dunlap, K., Hanna, E. K., Miller, S., Bizzell, J., … Dichter, G. S. (2015). Neural mechanisms of negative reinforcement in children and adolescents with autism spectrum disorders. Journal of neurodevelopmental disorders, 7(1), 12. doi:10.1186/s11689-015-9107-8
Michael, J. (1975). Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things. Behaviorism, 3(1), 33-44.
Baron, A., & Galizio, M. (2005). Positive and negative reinforcement: Should the distinction be preserved?. The Behavior Analyst, 28(2), 85-98.
Skinner, B. F. (1938). The Behaviour of organisms: An experimental analysis. New York: Appleton-Century.