Let’s get a bit theoretical tonight. Last week I had a long and late discussion with my friend Eva over some glasses of wine. There has been some discussion on NRM (No Reward Markers) and LRS (Least Rewarding Stimulus) in Swedish blogs lately. The NRM is sometimes referred to as a positive punisher (you add something and behavior gets less likely). Tonight, I stumbled on an old blog post (almost four years old) that I wrote in Swedish and that seemed to fit perfect into the discussion. I will translate it for you:
The last year it’s been very clear to me how reinforcing information is to dogs. It’s been known for long that it’s reinforcing to get a cue, or when a target or lure is presented. What you might not think so much about are the small signs that are so reinforcing to both dog and handler that you easily get caught in a vicious circle.
We use (sometimes, not that much anymore) a step that signals “change behavior” to the dog. If the dog has done (and been rewarded for) five sits and I now want the dog to lie down, I’ll take a step back when the dog starts on the sixth sit. A well trained dog will take that hint and try another behavior instead. Some time ago, we had a discussion about what this signal is. A few people argued that this kind of no reward markers are punishers. My conclusion now, a year later, is that this kind of NRM is reinforcing. I did already then argue that the step was functioning as a discriminative stimulus (i.e. a cue), but I didn’t think about it being reinforcing (but I didn’t believe it was punishing either).
There is no problem with using that kind of switch-cue that I described above. The reward does come when the dog does something we like (trying to sit when he has been rewarded for sitting five times before is a good behavior). The problem arises when you use this kind of information when you get behaviors that you don’t like. If your dog has sat down five times and been rewarded for it and then lies down on the sixth try. Or if the dog gets “stuck” during shaping and starts to bark, you take a step back, the dog stops his barking and offers a behavior that you click. Everybody is happy – you’re happy because the dog stopped barking and the dog is happy to get information and a treat. The question is just what happens with the barking in the future…
Many dogs that “get stuck”, “give up” or “get frustrated” during shaping wouldn’t do that if we didn’t reinforce it so well. I’m not suggesting that shaping is about waiting a lot and I very rarely have to wait for my dogs to offer anything (they’ve learned that giving up earns them nothing, so they don’t). But sometimes when we teach classes, there is a fair amount of waiting and it can get frustrating for both dogs and people. The waiting comes from that we need to extinguish behaviors that have been reinforced many, many times (both in humans and dogs). It’s so tempting to nudge the dog in the right direction (move some, wave the target, tell the dog “come on!” etc.) since the dog often rewards our efforts (at least the first times). If you’re really stuck, I think it’s better to just walk away, take a walk with the dog and make a new plan instead of constantly reinforcing helplessness.
People have a hard time buying this, since it doesn’t feel right to just wait the dog out. I do think that this is sometimes needed for the dog to understand that it’s initiative and to repeat rewarded behavior that is getting him the reward – not barking or giving up. And a dog that understands this is a joy to work with – since you’ll rarely or never will have to wait for the dog again. We sometimes wonder what makes our dogs so very easy to train and how come if feels like they’re reading our minds. I think the answer, at least partly, lies here. Of course, to be able to not help the dog out all the time, you have to plan your sessions well, end in time, set good criteria etc. Or you’re just being unfair to the dog.
What are your thoughts on this subject. It only touches the discussion on NRMs, but I’d love to hear your opinion on that as well. Are NRMs always positive punishers? Should you ever use them? Please leave your thoughts!
For my Swedish readers, this is the link to the original blogpost: Hjälp är mer förstärkande än du tror