We all interact with willful creatures, human and otherwise, and we all have certain preferences about how we’d prefer these creatures behave.
With every interaction, we reinforce certain behaviors, whether we intend to (or even realize) or not.
Today, I want to start digging into reinforcement and what many of us get wrong about it, particularly when it comes to rewards and punishments.
The first thing we tend to get wrong is that no reinforcer can make a new behavior spontaneously occur. You cannot reinforce what does not yet exist. Therefore, a behavior must occur before it can be reinforced. (However, you need not wait for a behavior to be fully realized before shaping it. You can start reinforcing behaviors that lead toward the desired outcome, even if that outcome is distant.)
Positive reinforcement is something that pleases us in response to a certain behavior. This can take many forms: Cooking a good meal has little positive reinforcers all along the way—the smells of good food cooking, the little tastes as the flavors come together—and ends in the positive reinforcer of enjoying good food, hopefully with good company. Feeling pleased (or even smug!) when you accomplish something is another form of positive reinforcement, as is a smile or a “nice job!” from someone else.
Please note: these pleasing events are either concurrent with or immediately after the behavior that they are reinforcing. This timing is key. Our brains don’t have a long memory when it comes to reinforcement.
It is not possible to overemphasize this: Timing is crucial.
The reinforcer, positive or negative, needs to occur as close to the desired behavior as possible. This is how we figure out which behavior we should do more of. The coach who yells “Good throw!” as soon as it is executed will have more success training his team than the one who waits until after practice to offer the same praise. When time elapses between the desired action and the supposed reinforcer, we may associate it with the wrong behavior1.
Rewards are some desired or pleasant thing given transactionally: You do X, you get Y. Rewards are not positive reinforcement because they generally do not occur coincident with the behavior we want to see more of. It is nice to get a reward, but they don’t actually shape our behavior.
Consider an annual bonus: It’s nice to get an extra chunk of money, but what exactly did you do to get it? I mean exactly. The actual action or actions taken. Pretty obscure, right? A spot bonus, where a desired action is observed, recorded, and cash awarded, might seem like it would reinforce a behavior—but the cash isn’t the reinforcer! It’s the in-the-moment acknowledgment that reinforces the desired behavior (but heaven help you if an expected reward is not given in due course!). The idea of a bonus can act as an incentive to motivate certain actions, but that motivation is too diffuse to serve as a direct reinforcer.
If you live long enough, I guarantee you will have the bitter experience of doing everything “right” and not getting the expected reward or outcome. If this only happens occasionally, and the stakes are low enough that it doesn’t feel like an existential rip-off, you may well carry on. But if it happens too frequently or with too high stakes, it becomes fertile ground for bitterness, resentment, and general mulishness. Why should one do what is asked or expected when compliance doesn’t pay off anyway?
When you or someone you are interacting with is stuck in this place of recalcitrance, a random, unearned treat, can go a long way towards restoring motivation and goodwill. A great truth of life is that sometimes, random chance works against us, and that is dispiriting. But sometimes, this same chance works in our favor, and that can be heartening enough that we can get unstuck and start moving again.
My hunch on this is that getting an unearned treat (not an earned reward!) nudges us out of a transactional frame of mine (e.g., I did X and therefore deserve Y; or conversely, you didn’t do X, so you don’t get Y) to a more reciprocal (e.g., you cooked, so I’ll do the dishes), or simply more compassionate frame of mind (e.g., I see you’re struggling and frustrated, I hope this helps you feel a little better, and you’re worthy of being cared for even when you aren’t behaving exactly as I would prefer). Constant treats will dilute this effect, but sporadic treats—especially when someone is working in earnest but still struggling or failing to accomplish something—can be a powerful offering of caring and goodwill.
Punishment is often confused with negative reinforcement. It isn’t. (Think about this—have you figured out why?)
Instead, punishment is a way to show that you have power over another being. You are demonstrating that you can cause them pain, shame, or deprivation. Many people use punishment thinking it will change behavior, that the one being punished will “learn their lesson.”
There is a causal problem with this logic: punishment is not a reinforcer. Because it is too far away in time from the behavior, punishment does not predictably influence future behavior.
Punishment (think of getting a parking ticket2, scolding a small child for playing with an electrical outlet, or yelling at the cat for … literally anything) doesn’t mean they won’t repeat the behavior. Often the lesson learned from punishment is not “don’t do X,” it’s “don’t get caught doing X.”
Negative reinforcement is something that you don’t like that will stop as soon as you engage in the desired behavior. An example from my own life is that my little cat, Willie Pete, has used negative reinforcement to train me to play with him before I go to bed. As soon as he hears me wrapping up my evening bath, long after the spousal unit is asleep, Willie Pete sets himself on the bathmat and howls. As soon as I go downstairs and click on the laser pointer, the red dot appears and he stops howling. (He then offers me positive reinforcement for this behavior by being extremely entertaining.)
If you want to dig deeper into these concepts, or get tips on applying these concepts with the willful creatures in your life, check out Karen Pryor’s excellent book Don’t Shoot the Dog.
These behaviors are called “superstitions.” For example, if a pigeon learns to walk twice around its enclosure and flap its wings before tapping a lever to get a food pellet (but the only relevant part of the routine is tapping the lever), the walking and flapping behaviors are superstitions that were reinforced because the bird did them before having a successful outcome, so they got included in the food-getting routine, despite not being relevant to the outcome.
And before you consider the pigeon in question stupid, we all do this, but as with the pigeon, we don’t know which parts are critical and which parts are optional, so we just do them all because we want the outcome.
One famous-to-people-who-watch-tennis example is Raphael Nadal’s pre-serve ritual. Early in his career, he would essentially adjust all of his clothing, touch his face and hair multiple times, bounce the ball several times, and then, finally, serve. While plenty of athletes find great value and mental centering in having consistent rituals before taking action, the consensus view was that Nadal’s ritual took too darn long. He abbreviated the ritual later in his career, and still managed to play brilliantly.
Parking tickets are also unreliable, even as a punishment. I’m betting if you’ve parked a car enough times, at some point you’ve parked where you shouldn’t have and not gotten a ticket. Even if you don’t make a habit of parking illegally, the lesson isn’t “don’t ever park illegally,” it’s “try not to get caught parking illegally.”