How Insurance Works

22. Repeated games: cheating, punishment, and outsourcing


Professor Ben Polak:
So last time we were focusing on repeated interaction
and that’s what we’re going to continue with today.
There’s lots of things we could study under repeated interaction
but the emphasis of this week is can we attain–can we
achieve–cooperation in business or personal relationships
without contracts, by use of the fact that these
relationships go on over time? Our central intuition,
where we started from last time, was perhaps the future of
a relationship can provide incentives for good behavior
today, can provide incentives for
people not to cheat. So specifically let’s just
think of an example. We’ll go back to where we were
last time. Specifically suppose I have a
business relationship, an ongoing business
relationship with Jake. And each period I’m supposed to
supply Jake with some inputs for his business,
let’s say some fruit. And each period he’s supposed
to provide me with some input for my business,
namely vegetables. Clearly there are opportunities
here, in each period, for us to cheat.
We could cheat both on the quality of the fruit that I
provide or the quantity of the fruit that I provide to Jake,
and he can cheat on the quantity or quality of the
vegetables that he provides to me.
Our central intuition is: perhaps what can give us good
incentives is the idea that if Jake cooperates today,
then I might cooperate tomorrow, I might not cheat
tomorrow. Conversely, if he cheats and
provides me with lousy vegetables today I’m going to
provide him with lousy fruit tomorrow.
Similarly for me, if I provide Jake with lousy
fruit today he can provide me with lousy vegetables tomorrow.
So what do we need? We need the difference in the
value of the promise of good behavior tomorrow and the threat
of bad behavior tomorrow to outweigh the temptation to cheat
today. I’m going to gain by providing
him with the bad fruit or fewer fruit today–bad fruit because
those I would otherwise have to throw away.
So that temptation to cheat has to be outweighed by the promise
of getting good vegetables in the future from Jake and vice
versa. So here’s that idea on the
board. What we need is the gain if I
cheat today to be outweighed by the difference between the value
of my relationship with Jake after cooperating and the value
of my relationship with Jake after cheating tomorrow.
Now what we discovered last time–this was an idea I think
we kind of knew, we have kind of known it since
the first week–but we discovered last time,
somewhat surprisingly, that life is not quite so
simple. In particular,
what we discovered was we need these to be credible,
so there’s a problem here of credibility.
So in particular, if we think of the value of the
relationship after cooperating tomorrow as being a promise,
and the value of the relationship after cheating as
being a threat, we need these promises and
threats to be credible. We need to actually believe
that they’re going to happen. And one very simple area where
we saw that ran immediately into problems was if this repeated
relationship, although repeated,
had a known end. Why did known ends cause
problems for us? Because in the last period,
in the last period of the game we know that whatever we promise
to do or whatever we threaten to do,
in the last period, once we reached that last
period, in that sub-game we’re going to play a Nash
equilibrium. What we do has to be consistent
with our incentives in the last period.
So in particular, if there’s only one Nash
equilibrium in that last period, then we know in that last
period that’s what we’re going to do.
So if we look at the second to last period we might hope that
we could promise to cooperate, if you cooperate today,
tomorrow. Or you could promise to punish
tomorrow if you cheat today, but those threats won’t be
credible because we know that tomorrow you’re just going to
play whatever that Nash equilibrium is.
That lack of credibility means there’s no scope to provide
incentives today for us to cooperate and we saw things
unravel backwards. So the way in which we ensure
that we’re really focusing on credible promises and credible
threats here is by focusing on sub-game perfect equilibrium,
the idea that we introduced just before the Thanksgiving
break. We know that sub-game perfect
equilibria have the property that they have Nash behavior in
every sub-game, so in particular in the last
period of the game and so on. So what we want to be able to
do here, is try to find scope for cooperation in relationships
without contracts, without side payments,
by focusing on sub-game perfect equilibria of these repeated
games. Right at the end last time,
we said okay, let’s move away from the
setting where we know our game is going to end,
and let’s look at a game which continues, or at least might
continue. So in particular,
we looked at the problem of the Prisoner’s Dilemma which was
repeated with the probability that we called δ
each period, with the probability δ
of continuing. So every period we’re going to
play Prisoner’s Dilemma. However, with probability 1 –
δ the game might just end every period.
We already noticed last time some things about this.
The first thing we noticed was that we can immediately get away
from this unraveling argument because there’s no known end to
the game. We don’t have to worry about
that thread coming loose and unraveling all the way back.
So at least there’s some hope here to be able to establish
credible promises and credible threats later on in the game
that will induce good behavior earlier on in the game.
So that’s where we were last time, And here is the Prisoner’s
Dilemma, we saw this time, and we actually focused on a
particular strategy. But before I come back to this
strategy that we focused on last time let’s just see some things
that won’t work, just to sort of reinforce the
idea. So here’s a possible strategy
in the Prisoner’s Dilemma. A possible strategy in the
Prisoner’s Dilemma would be cooperate now and go on
cooperating regardless of what anyone does.
So let’s just cooperate forever regardless of the history of the
game. Now if two players,
if Jake and I are involved in this business relationship,
which has the structure of a Prisoner’s Dilemma and both of
us play this strategy of cooperate now and cooperate
forever no matter what, clearly that will induce
cooperation. That’s the good news.
The problem is that isn’t an equilibrium, that’s not even a
Nash equilibrium, let alone a sub-game perfect
equilibrium. Why is it not a sub-game
perfect equilibrium? Because in particular,
if Jake is smart (and he is), Jake will look at this
equilibrium and say: Ben is going to cooperate no
matter what I do, so I may as well cheat,
and in fact, I may as well go on cheating.
So Jake has a very good deviation there which is simply
to cheat forever. So the strategy cooperate now
and go on cooperating no matter what doesn’t contain incentives
to support itself as an equilibrium.
And we need to focus on strategies that contain subtle
behavior that generates promises of rewards and threats of
punishment that induce people to actually stick to that
equilibrium behavior. So is everyone clear that
cooperating no matter what–it sounds good–but it isn’t going
to work. People aren’t going to stick
with that. So instead what we focused on
last time, and actually we had some players who seemed to
actually–they’ve moved now–but they seemed actually to be
playing this strategy. We focused on what we called
the grim trigger strategy. And the grim trigger strategy
is what? It says in the first period
cooperate and then go on playing cooperate as long as nobody has
ever defected, nobody has ever cheated.
But if anybody ever plays D, anybody ever plays the defect
strategy, then we just play D forever.
So this is a strategy, it tells us what to do at every
possible information set. It also, if two players are
playing the strategy, has the property that they will
cooperate forever:, that’s good news.
And what we left ourselves last time was checking that this
actually is an equilibrium, or more generally,
under what conditions is this actually an equilibrium.
So we got halfway through that calculation last time.
So what we need to do is we need to make sure that the
temptation of cheating today is less than the value of the
promise minus the value of the threat tomorrow.
We did parts of this already, let’s just do the easy parts.
So the temptation today is: if I cheat today I get 3,
whereas if I went on cooperating today I get 2.
So the temptation is just 1. What’s the threat?
The threat is playing D forever, so this is actually the
value of (D, D) forever. You’ve got to be careful about
for ever: when I say for ever, I mean until the game ends
because eventually the game is going to end,
but let’s use the code for ever to mean until the game ends.
What’s the promise? The promise is the value of
continuing cooperation, so the value of (C,C) for ever.
That’s what this bracket is, and it’s still tomorrow. So let’s go on working on this.
So the value of cooperating for ever is actually–let’s be a bit
more detailed–this is the value of getting 2 in every period,
so it’s value of 2 for ever; and this is the value of 0
forever. So the value of 0 forever,
that’s pretty easy to work out: I get 0 tomorrow,
I get 0 the day after tomorrow, I get 0 the day after the day
after tomorrow. Or more accurately:
I get 0 tomorrow, I get 0 the day after tomorrow
if we’re still playing, I get 0 the day after the day
after tomorrow if we’re still playing and so on.
But that isn’t a very hard calculation, this thing is going
to equal 0. So this object here is just 0.
This object here is 3 – 2, I can do that one in my head,
that’s 1. So I’m left with the value of
getting 2 for ever, and that requires a little bit
more thought. But let’s do that one bit of
algebra because it’s going to be useful throughout today.
So this thing here, the value of 2 for ever is
what? Well I get 2,
that’s tomorrow, and then, assuming I’m still
playing the day after tomorrow–so I need to discount
it–with probability of δ I’m still playing the day after
tomorrow–and I get 2 again. And the day after the day after
tomorrow I’m still playing with the probability that the game
didn’t end tomorrow and didn’t end the next day so that’s with
probability δ² and again I get 2.
And then the day after, what is it?
This is tomorrow, the day after tomorrow,
the day after the day after tomorrow: this is the day after
the day after the day after tomorrow which is δ³
2 and so on. Everyone happy with that?
So starting from tomorrow, if we play (C,
C) for ever, I’ll get 2 tomorrow,
2 the day after tomorrow, 2 the day after the day after
tomorrow, and so on. And I just need to take an
account of the fact that the game may end between tomorrow
and the next day, the game may end between the
day after tomorrow and the day after the day after tomorrow and
so on. Everyone happy with that?
So what is the value, what is thing?
Let’s call this X for a second. So we’ve done this once before
in the class but let’s do it again anyway.
This is the geometric sum, some of you may even remember
from high school how to do a geometric sum,
but let’s do it slowly. So to work out what X is what
I’m going to do is I’m going to multiply X by δ,
so what’s δX? So this 2 here will become a
2δ, and this δ2 here will become a δ²2,
and this δ²2 will become a δ³2,
and this δ³2 will become a δ^(4)2,
and so on. Now what I’m going to do is I’m
going to subtract the second of those lines from the first of
those lines. So what I’m going to do is,
I’m going to subtract X–δX.
So I’m going to subtract the second line from the first line.
And when I do that I’m going to notice I hope that this 2δ
is going to cancel with this 2δ,
and this δ²2 is going to cancel with this
δ²2, and this δ³2 is going
to cancel with this δ³2 and so on.
So what I’m going to get left with is what?
Everything’s going to cancel except for what?
Except for that first 2 there, so this is just equal to 2.
Now this is a calculation I can do.
So I’ve got X=2 / [1-δ]. So just to summarize the
algebra, getting 2 forever, that means 2 + δ2 +
δ²2 + δ³2 etc..
The value of that object is 2/[1-δ].
So we can put that in here as well.
This object here 2/[1-δ] is the value of 2 forever.
Now before I go onto a new board I want to do one other
thing. On the left hand side I’ve got
my temptation, that was 1, I’ve got the value
of cooperating forever starting from tomorrow which is
2/[1-δ] and I’ve got the value of
defecting forever starting from tomorrow which is 0.
However, all of these objects on the right hand side,
they start tomorrow, whereas, the temptation today
is today. Temptation today happens today.
These differences in value start tomorrow.
Since they start tomorrow I need to discount them because we
don’t know that tomorrow is going to happen.
The world may end, or more importantly the
relationship may end, between today and tomorrow.
So how much do I have to weight them by?
By δ, I need to multiply all of these lines by δ
and so on. Now this is now a mess so let’s
go to a new board. Now let’s summarize what we now
have, What we’re doing here is asking is it the case that if
people play the grim trigger strategy that that is in fact an
equilibrium? That is a way of sustaining
cooperation. The answer is we need 1,
that’s our temptation, to be less than 2/[1-δ],
that’s the value of cooperating for ever starting from tomorrow,
minus 0, that’s the value of defecting forever starting
tomorrow, and this whole thing is
multiplied by δ because tomorrow may not
happen. Everyone happy with that so far?
I’m just kind of collecting up the terms that we did slowly
just now. So now what I want to do
is–question mark here because we don’t know whether it is–I’m
going to solve this for δ. So when I solve this for δ
I’ll probably get it wrong, but let’s be careful.
So this is equivalent to saying 1-δ=1/3.
Everyone happy with that? Let me just turn my own page.
So what have we shown so far? We’ve shown that if we’re
playing the grim trigger strategy, and we want to deter
people from doing what? From defecting from this
strategy in the very first period, then we’re okay provided
δ is bigger than 1/3. But at this point some of you
could say, yeah but that’s just one of the possible ways I could
defect from this strategy. After all, the defection we
just considered, the move away from equilibrium
we just considered was what? We considered my cheating
today, but thereafter, I reversed it back to doing
what I was supposed to do: I went along with playing D
thereafter. So the particular defection we
looked at just now was in Period 1, I’m going to defect,
but thereafter, I’m actually going to do what
the equilibrium strategy tells me to do.
I’m going to go along with the punishment and play my part of
(D,D) forever. So you might want to ask,
why would I do that? Why would I go along?
I cheated the first time but now I’m doing what the strategy
tells me to do. It tells me to play D.
Why am I going along with that? You could consider going away
from the equilibrium by defecting, for example in Period
1, and then in Period 2 do
something completely different like cooperating.
So we might want to worry, how about playing D now and
then C in the next period, and then D forever.
That’s just some other way of defecting.
So far we’ve said I’m going to defect by playing D and then
playing D forever, but now I’m saying let’s play D
now and then play a period of C and then D forever.
Is that going to be a profitable deviation?
Well let’s see what I’d get if I do that particular deviation.
What play is that going to induce?
Remember the other player is playing equilibrium,
so that player is going to induce, in the first period,
I’m playing D and Jake’s playing C.
In the second period Jake’s going to start punishing me,
so he’s going to play D and according to this deviation I’m
going to play C. So in the second period I’ll
play C and Jake will play D, and in the third period and
thereafter, we’ll just play D, D, D, D, D, D.
So these are just some other deviation other than the one we
looked at. So what payoff do I get from
this? Okay, I get three in the first
period, just as I did for my original defection,
that’s good news. But now in the second period
discounted, I actually get -1, I’m actually doing even worse
in the second period because I’m cooperating while Jake’s
defecting, and then in the third period I
get 0 and in the fourth period I get 0 and so on.
So the total payoff to this defection is 3 – δ.
Now, that’s even worse than the defection we considered to start
with. The defection we considered to
start with, I got 3 in the first period and thereafter I got 0.
Now I got 3 in the first period, -1 in the second period,
and then 0 thereafter. So this defection in which I
defect–this move away from equilibrium–in which I cheat in
the first period and then don’t go along with the punishment,
I don’t in fact play D forever is even worse.
Is that right? It’s even worse. So what’s the lesson here?
The lesson here is the reason that I’m prepared to go along
with my own punishment and play D forever after a defection is
what? It’s if Jake is going to play D
forever I may as well play D forever.
Is that right? So another way of saying this
is the only way which I could possibly hope to have a
profitable deviation, given that Jake’s going to
revert to playing D forever is for me to defect on Jake once
and then go along with playing D forever.
There’s no point once he’s playing D, there’s no point me
doing anything else, so this is worse,
this is even worse. This defection is even worse.
More generally, the reason this is even worse
is because the punishment we looked at before,
which was (D, D) for ever,
the punishment (D,D) forever is itself an equilibrium.
It’s credible because it’s itself an equilibrium. So unlike in the finitely
repeated games we did last time, unlike in the two period or the
five period repeated games, here the punishment really is a
credible punishment, because what I’m doing in the
punishment phase is playing an equilibrium.
There’s no point considering any other deviation other than
playing D once and then just going on playing D.
So that’s one other possible deviation, but there are others
you might want to consider. So far all we’ve considered is
what? We’ve considered the deviation
where I, in the very first period, I cheat on Jake and then
I just play D forever. But what about the second
period? Another thing I could do is how
about cheating not in the first period of the game but in the
second. So according to this strategy
what am I going to do. The first period of the game
I’ll go along with Jake and cooperate, but in the second
period I’ll cheat on him. Now how am I going to check
whether that’s a good deviation or not?
How do I know that’s not going to be a good deviation?
Well we already know that I’m not going to want to cheat in
the first period of the game. I want to argue that exactly
the same analysis tells me I’m not going to want to cheat in
the second period of the game. Why?
Because once we reach the second period of the game,
it is the first period of the game.
Once we reach the second period of the game, looking from period
two onwards, it’s exactly the same as it was
when we looked from period one initially.
So to say it again, what we argued before was–on
the board that I’ve now covered up–what we argued before was,
I’m not going to want to cheat in the very first period of the
game provided δ>1/3.
I want to claim that that same argument tells me I’m not going
to want to cheat in the second period of the game provided
δ>1/3. I’m not going to want to cheat
in the fifth period of the game provided δ
>1/3. Because this game from the
fifth period on, or the five hundredth period
on, or the thousandth period on
looks exactly the same as is it does from the beginning.
So what’s neat about this argument is the same analysis
says, this is not profitable if δ>1/3. So what have we learned here?
I want to show you some nerdy lessons and then some actual
sort of real world lessons. Let’s start with the nerdy
lessons. The nerdy lesson is this grim
strategy works because both–let’s put it up again so
we can actually see it–this grim strategy,
it works because both the play that it suggests if we both
cooperate and the play that it suggests if we both defect are
themselves equilibria. These are credible threats and
credible promises because what you end up doing both in the
promise and in the threat is itself equilibrium behavior.
That’s good. The second thing we’ve learned,
however, is for this to work we need δ>
1/3, we need the probability continuation to be bigger than
1/3. So leaving aside the nerdy
stuff for a second–you have more practice on the nerdy stuff
on the homework assignment–the lesson is we can get cooperation
in the Prisoner’s Dilemma using the grim trigger.
Remember the grim trigger strategy is cooperate until
someone defects and then defect forever.
So you get cooperation in the Prisoner’s Dilemma using the
grim trigger as a sub-game perfect equilibrium.
So this is an equilibrium strategy, that’s good news,
provided the probability of continuation is bigger than 1/3. Let’s try and generalize that
lesson away from the Prisoner’s Dilemma.
So last time our lesson was about what in general could we
hope for in ongoing relationships?
So let’s put down a more general lesson that refines what
we learned last time. So the more general lesson is,
in an ongoing relationship–let me mimic exactly the words I
used last time–so for an ongoing relationship to provide
incentives for good behavior today,
it helps–what we wrote last time was–it helps for that
relationship to have a future. But now we can refine this,
it helps for there to be a high probability that the
relationship will continue. So the specific lesson for
Prisoner’s Dilemma and the grim trigger strategy is we need
δ, the probability continuation,
to be bigger than 1/3. But the more general intuition
is, if we want my ongoing business relationship with me
and Jake to generate good behavior–so I’m going to
provide him with good fruit and he’s going to provide me with
good vegetables–we need the probability that that
relationship will continue to be reasonably high.
I claim this is a very natural intuition.
Why? Because the probability that
the relationship will continue is the weight that you put on
the future. The probability that the
relationship will continue, this thing, this is the weight
you put on the future. The more weight I put on the
future, the easier it is for the future to give me incentives to
behave well today, the easier it is for those to
overcome the temptations to cheat today.
That seems like a much more general lesson than just the
Prisoner’s Dilemma example. Let’s try to push this to some
examples and see if it rings true.
So the lesson we’ve got here is to get cooperation in these
relationships we need there to be a high probability,
a reasonably high probability that they’re going to continue.
We know exactly what that is for Prisoner’s Dilemma but the
lesson seems more general. So here’s two examples.
How many of you are seniors? One or two, quite a few are
seniors. Keep your hands up a second.
All of those of you who are seniors–we can pan these guys.
Let’s have a look at them. Actually, why don’t we get all
the seniors to stand up: make you work a bit here.
Now the tricky question, the tricky personal question.
How many of you who are seniors are currently involved in
personal relationships, you know: have a significant
other? Stay standing up if you have a
significant other. Look at this, it’s pathetic.
What have I been saying about economic majors?
All right, so let’s just think about, stay standing a second,
let’s get these guys to think about it a second.
So seniors who are involved in ongoing relationships with
significant others, what do we have to worry about
those seniors? Well these seniors are about to
depart from the beautiful confines of New Haven and
they’re going to take jobs in different parts of the world.
And the problem is some of them are going to take jobs in New
York while their significant other takes a job in San
Francisco or Baghdad or whatever,
let’s hope not Baghdad, London shall we say.
Now if it’s the case that you are going to take a job in New
York next year and your significant other is going to
take a job in Baghdad or London, or anyway far away,
in reality, being cynical a little bit, what does that do to
the probability that your relationship is going to last?
It makes it go down. It lowers the probability that
your relationship’s going to continue.
So what is the prediction–let’s be mean here.
These are the people with significant others who are
seniors, how many of you are going to be separated by a long
distance from your significant others next period?
Well one of them at the back, okay one guy,
at the back, two guys, honesty here,
three, four of you right? So what’s our prediction here?
What does this model predict as a social science experiment.
What does it predict? It predicts that for those of
you who just raised your hands, those seniors who just raised
their hands who are about to be separated by large distances,
those relationships, each player in that
relationship is going to have a lower value on the future.
So during the rest of your senior year, during the spring
of your senior year what’s the prediction of this model?
They’re going to cheat. So we could actually do a
controlled experiment, what we should do here is we
should keep track of the people here,
the seniors who are going to be separated–you can sit down now,
I’m sorry to embarrass you all. We could keep track of those
seniors who are about to be separated and go into a long
distance relationships, and those that are not.
The people who are not are our control group.
And we should see if during the spring semester the people who
are going to be separated cheat more often than the others.
So it’s a very clear prediction of the model that’s relevant to
some of your lives. Let me give you another example
that’s less exciting perhaps, but same sort of thing.
Consider the relationship that I have with my garage mechanic.
I should stress this is not a significant other relationship.
So I have a garage mechanic in New Haven, and that garage
mechanic fixes my car. And we have an ongoing business
relationship. He knows that whenever my car
needs fixing, even if it’s just a small thing
like an oil change, I’m going to go to him and have
him fix it, even though it might be cheaper for me to go to Jiffy
Lube or something. So I’m going to take my car to
him to be fixed, and he’s going to make some
money off me on even the easy things.
What do I want in return for that?
I want him to be honest and if all I need is an oil change I
want him to tell me that, and if what I actually need is
a new engine, he tells me I need new engine.
So my cooperating with him, is always going to him,
even if it’s something simple; and his cooperating with me,
is his not cheating on fixing the car.
He knows more about the car than I do.
But now what happens if he knows either that I’m about to
leave town (which is the example we just did),
or, more realistically, he kind of knows that my car is
a lemon and I’m about to get rid of it anyway.
Once I get a new car I’m not going to go to him anymore
because I have to go to the dealer to keep the warranty
intact. So he knows that my car is
about to break down anyway, and he knows that I know that
the car is about to break anyway,
so my lemon of a car is about to be passed on–probably to one
of my graduate students–then what’s going to happen?
So I’m going to have an incentive to cheat because I’m
going to start taking my useless car to Jiffy Lube for the oil
changes. And he’s going to have an
incentive to cheat. He’s going to start telling me
you know you really need a new engine or a new clutch–it’s a
manual so I have a clutch: it’s a real car–so I’m going
to need a new clutch rather than just tightening up a bolt.
So once again the probability of the continuation of the
relationship, as it changes,
it leads to incentives to cheat.
It leads to that relationship breaking down.
That’s the content, that’s the real world content
of the math we just did. Let’s try and push this a
little further. Now what we’ve shown is that
the grim trigger works provided δ>1/3,
and δ being bigger than 1/3 doesn’t seem like a very
large continuation probability. So just having a probability of
1/3 that the relationship continues allows the grim
trigger to work, so that seems good news for the
grim trigger. However, in reality,
in the real world, the grim trigger might have
some disadvantages. So let’s just think about what
the grim trigger is telling us in the real world.
It’s telling us that if even one of us cheats just a little
bit–I just provide one item of rotten fruit to Jake or he gives
me one too few branches of asparagus in his provisions to
me–then we never do business with each other again ever.
It’s completely the end. We just never cooperate again.
That seems a little bit drastic. It’s a little bit draconian if
you like. So in particular,
in the real world, there’s a complication here,
in the real world every now and then one of us going “to cheat”
by accident. That day that I didn’t have my
glasses on and I put in a rotten apple in the apples I supplied
to Jake. In the fruit,
he was counting out the asparagus and he lost count at
1,405 and he gave me one too few.
So we might want to worry about the fact that the grim trigger,
it’s triggered by any amount of cheating and it’s very drastic:
it says we never do business again.
The grim trigger is the analog of the death penalty.
It’s the business analog of the death penalty.
It’s not that I’m going to kill Jake if he gives me one too few
branches of asparagus, but I’m going to kill the
relationship. For you seniors or otherwise,
who are involved in personal relationships,
it’s the equivalent of saying, if you even see your partner
looking at someone else, let alone sitting next to them
in the class, the relationship is over.
It seems drastic. So we might be interested
because mistakes happen, because misperceptions happen,
we might be interested in using punishments that are less
draconian than the grim trigger, less draconian than the death
penalty. Is that right?
So what I want to do is I want to consider a different
strategy, a strategy other than the grim trigger strategy,
and see if that could work. So where shall I start?
Let’s start here, so again what I’m going to
revert to is the math and the nerdiness of our analysis of the
Prisoner’s Dilemma but I want you to have in mind business
relationships, your own personal
relationships, your friendships and so on.
More or less everything you do in life involves repeated
interaction, so have that in the back of your mind,
but let’s be nerdy now. So what I want to consider is a
one period punishment. So how are we going to write
down a strategy that has cooperation but a one period
punishment. So here’s the strategy.
It says–it’s kind of weird thing but it works–play C to
start and then play C if–this is going to seem weird but trust
me for a second–play C if either (C, C) or (D,D) were
played last. So, if in the previous period
either both people cooperated or both people defected,
then we’ll play cooperation this period.
And play D otherwise: play D if either (C,
D) or (D, C) were played last. Let’s just think about this
strategy for a second. What does that strategy mean?
So provided people start off cooperating and they go on
cooperating–if both Jake and I play this strategy–in fact,
we’ll cooperate forever. Is that right?
So I claim this is a one period punishment strategy.
Let’s just see how that works. So suppose Jake and I are
playing this strategy. We’re supposed to play C every
period. And suppose deliberately or
otherwise, I play D. So now in that period in which
I play D, the strategys played were D by me and C by Jake.
So next period what does this strategy tell us both to play?
So it was D by me and C by Jake, so this strategy tells us
to play D. So next period both of us will
play D. So both of us will be
uncooperative precisely for that period, that next period.
Now, what about the period after that?
The period after that, Jake will have played D,
I will have played D. So this is what will have
happened: we both played D, and now it tells us to
cooperate again. Everyone happy with that?
So this strategy I’ve written down–it seems kind of
cumbersome–but what it actually induces is exactly a one period
punishment. If Jake is the only cheat then
we both defect for one period and go back to cooperation.
If I’m the only person who cheats then we both defect for
one period and go back to cooperation.
It’s a one period punishment strategy.
Of course the question is, the question you should be
asking is, is this going to work?
Is this an equilibrium? So let’s just check.
Is this an SPE. Is it an equilibrium?
So what do we need to check? We need to check,
as usual, that the temptation is less than or equal to the
value of the promise–the value of the promise of continuing in
cooperation–the value of the promise minus the value of the
threat. And once again we have to be
careful, because the temptation occurs today and this difference
between values occurs tomorrow. Is that right?
So this is nothing new, this is what we’ve always
written down, this is what we have to check.
So the temptation for me to cheat today, that’s the same as
it was before, it’s 3 – 2.
The fact that it’s tomorrow is going to give me a δ
here. Here’s our square bracket.
So what’s the value of the promise?
So provided we both go on cooperating, we’re going to go
on cooperating forever, in which case we’re going to
get 2 for ever. Is that right?
So this is going to be the value of 2 forever starting
tomorrow (and again for ever means until the game ends).
The value of the threat is what? Be a bit careful now.
It’s the value of–so what’s going to happen?
If I cheat then tomorrow we’re both going to cheat,
so tomorrow, what am I going to get
tomorrow? 0.
So it’s the value of 0 tomorrow: we’re both going to
cheat, we’re both going to play D.
And then the next period what’s going to happen?
We’re going to play C again, and from thereon we’re going to
go on playing C. So it’s going to the value of 0
tomorrow and then 2 forever starting the next day.
That’s what we have to evaluate. So 3 – 2, I can do that one
again, that’s 1. So what’s the value of 2
forever, well we did that already today,
what was it? It’s in your notes.
Actually it’s on the board, it’s the X up there,
what is it? Here it is, 2 for ever:
we figured out the value of it before and it was
2/[1–δ]. So the value of 2 forever is
going to be 2/[1–δ]. How about the value of 0?
So starting for tomorrow I’m going to get 0 and then with one
period delay I’m going to get 2 for ever.
Well 2 forever, we know what the value of that
is, it’s 2/[1–δ], but now I get it with one
period delay, so what do I have to multiply
it by? By δ good.
So the value of 0 tomorrow and then 2 forever starting the next
day is δ x 2/[1–δ].
And here’s the δ coming from here which just
takes into account that all this analysis is starting tomorrow.
So to summarize, this is my temptation today.
This is what I’ll get starting tomorrow if I’m a good boy and
cooperate. And this is the value of what
I’ll get if I cheat today. Starting tomorrow I’ll get
nothing, and then I’ll revert back to cooperation.
And since all of these values in this square bracket start
tomorrow I’ve discounted them by δ.
Now this requires some math so bear with me while I probably
get some algebra wrong–and please can I get the T.A.’s to
stare at me a second because I’ll probably get this wrong.
Okay so what I’m going to do is, I’m going to look at my
notes, I’m going to cheat, that’s what I’m going to do.
Okay, so what I’m going to do is I’m going to have 1 is less
than or equal to, I’m going to take a common
factor of 2 / [1–δ] and δ, so I’m going to
have 2δ/[1–δ], and that’s going to leave
inside the square brackets: this is a 1 and this is a
δ. So this δ
here was that δ there, and then I took out a
common factor of 2/[1–δ]
from this bracket. Everyone okay with the algebra?
Just algebra, nothing fancy going on there.
So that’s good because now the 1-δ cancels,
this cancels with this, so this tells us we’re okay
provided 1/21/2. What did δ
need to be for the grim strategy?
1/3, so what have we learned here?
We learned–nerdily–what we learned was that for the grim
strategy we needed δ>1/3.
For the one period punishment we needed δ
>1/2, but what’s the more general lesson?
The more general lesson is, if you use a softer punishment,
a less draconian punishment, for that to work we’re going to
need a higher δ. Is that right?
So what we’re learning here is there’s a trade off,
there’s a trade off in incentives.
And the trade off is if you use a shorter punishment,
a less draconian punishment–instead of cutting
people’s hands off or killing them,
or never dealing with them again, you just don’t deal with
them for one period–that’s okay provided there’s a slightly
higher probability of the relationship continuing.
So shorter punishments are okay but they need–the implication
sign isn’t really necessary there–they need more weight
δ on the future. I claim that’s very intuitive.
What its saying is, we’re always trading things off
in the incentives. We’re trading off the ability
to cheat and get some cookies today versus waiting and,
we hope, getting cookies tomorrow.
So if, in fact, the difference between the
reward and the punishment isn’t such a big deal,
isn’t so big–the punishment is just, I’m going to give you one
fewer cookies tomorrow–then you better be pretty patient not to
go for the cookies today. I was about to say,
those of you who have children. I’m probably the only person in
the room with children. That cookie example will
resonate for the rest of you–wait until you get
there–you’ll discover that, in fact, cookies are the right
example. So shorter punishment,
less draconian punishments, less reduction in your kid’s
cookie rations tomorrow is only going to work,
is only going to sustain good behavior provided those kids put
a high weight on tomorrow. In that case,
it isn’t that the kids will worry about the relationship
breaking down, you’re stuck with your kids,
it’s just that they’re impatient.
Okay, so we’ve been doing a lot of formal stuff here and I want
to go on doing formal stuff, but what I want to do now is
spend the rest of today looking at an application.
An application is, I hope going to convince you
that repeated interaction really matters.
So this is assuming that the one about the seniors and their
boyfriends and girlfriends wasn’t enough.
Okay, so the application is going to take us back a little
bit because what I want to talk about is repeated moral hazard. Moral hazard is something we
discussed the first class after the mid-term.
So what I want you to imagine is that you are running a
business in the U.S. and you are considering making
an investment in an emerging market, and again,
so as not to offend anybody who watches this on the video,
let’s just call that emerging market Freedonia,
rather than give it a name like Kazakhstan, a name like
something other than Freedonia. So Freedonia,
for those of you who don’t know, is a republic in a Marx
Brothers film. So you’re thinking of
outsourcing some production of part of what your business is to
Freedonia. The reason you’re thinking of
doing this outsourcing, what makes it attractive is
that wages are low in Freedonia. So you get this outsourced in
Freedonia. You think you’re going to get
it done cheaply. The down side is because
Freedonia is an emerging market, the court system,
it doesn’t operate very well. And in particular,
it’s going to be pretty hard to enforce contracts and to jail
people and so on in Freedonia. So you’re considering
outsourcing. The plus is,
from your point of view, the plus is wages are cheap
where you’re going to get this production done.
The down side is it’s going to be hard to enforce contracts
because this is an emerging market.
So what you’re considering doing is employing an agent and
you’re going to pay that agent W, so W is the wage if you
employ them. I’ll put this up in a tree in a
second. Let’s assume that the “going
wage” in Freedonia is 1: we’ll just normalize it.
So the going wage in Freedonia is 1, and let’s assume that to
get this outsourcing to work you’re going to have to send
some resources to your agent, your employee in Freedonia.
And let’s assume that the amount you’re going to have to
send over there is equivalent to another 1.
So the going wage in Freedonia is 1 and the amount you’re going
to have to invest in giving this agent materials or machinery is
another 1. Let’s assume that this project
is a pretty profitable project. So if the project succeeds,
if the project goes ahead and succeeds, it’s going to generate
a gross revenue of 4. Of course you have to invest 1
so that’s a net revenue of 3 for you, but nonetheless there’s a
big potential return here. The bad news is that your agent
in Freedonia can cheat on you. In particular,
what he can do is he can simply take the 1 that you’ve sent to
him, sell those materials on the
market and then go away and just work in his normal job anyway.
So he can get his normal wage of 1 for just going and doing
his normal job, whatever that was,
and he can steal the resources from you.
So let’s put this up as a kind of tree.
This is a slight cheat, this tree, but we’ll see why in
a second. So your decision is to invest
and set W. So if you invest in Freedonia,
you’ll invest and set W, set the wage you’re going to
pay him. The going wage is 1 but you can
set a different wage or you could just not invest.
If you don’t invest you get nothing and your agent in
Freedonia just gets the going wage of 1.
If you do invest in Freedonia and set a wage of W,
then your agent has a choice. Either he can be honest or he
can cheat. If he cheats,
what’s going to happen to you? You had to invest 1 in sending
it over there, you’re going to get nothing
back, so you’ll get -1. And he will go away and work
his normal job and get 1, and, in addition,
he’ll sell your materials so he’ll get a total of 1 + 1 is?
2, thank you. So he’ll get a total of 2.
On the other hand, if he’s honest,
then you’re going to get a return of 4 minus the 1 you had
to invest minus whatever wage you paid to him.
So your return will be 3 minus the wage you pay him.
You’re only going to pay him once the job’s done,
3 – W, and he’s going to get W. He’s done his job–he hasn’t
exercised his outside option, he hasn’t sold your
materials–so he’ll just get W. Now, I’m slightly cheating here
because this isn’t really the way the tree looks because I
could choose different levels of W.
So this upper branch where I invest and set W is actually a
continuum of such branches, one for each possible W,
I could set. But for the purpose of today
this is enough. This gives us what we needed to
see. So let’s imagine that this is a
one shot investment. What I want to learn is in this
one shot investment, I invest in Freedonia.
I hire my agent once, what I want to learn is how
much do I have to pay that agent to actually get the job done?
Remember the starting position. The starting position is it
looks very attractive. It looks very attractive
because the returns on this project are 4 or 4 – 1,
so that the surplus available on this project is 3 minus the
wage, and the going wage was just 1.
So it looks like there’s lots of profit around to make this
outsourcing profitable. I mumbled that so let me try it
again. So the reason this looks
attractive is the going wage is just 1, so if I just pay him 1
and he does the project then I’ll get a gross return of 4
minus the 1 I invested minus the 1 that I had to pay him for a
net return of 2. It seems like that’s a 100%
profitable project, so it looks very attractive.
What’s the problem? The problem is if I only
set–this is going to give us backward induction–if I set the
wage equal to the going wage, so if I set W=1 what will my
agent do? He’s going to cheat.
The problem is if I set W=1, which is the going wage,
the going wage in Freedonia, the agent will cheat.
If he cheats I just lose my investment.
So how much do I have to set the W to?
Let’s look at this. So we have to set W.
What I need is I need his wage to be big enough so that being
honest and going on with my projectoutweighs his incentive
to cheat. I need W to be bigger than 2.
Is that right? I need W to be at least as big
as 2. So in setting the wage,
in equilibrium, what are we going to do?
I’m going to set a wage, let’s call it W*=2 (plus a
penny), is that right? So this is an exercise which we
visited the first day after the mid-term.
This is about incentive design. In this one shot game,
which we can easily solve by backward induction,
I’m going to need to set a wage equal to 2, and then he’ll work. So in a minute,
we’re going to look at the repeated version of this,
but before we do let’s just sum up where we are so far.
What is this telling us? It’s telling us that when you
invest in an emerging market, where the courts don’t work so
they aren’t going to be able to enforce this guy to work
well–in particular, he can run off with your
investment–even though wages are low, so it seems very
attractive to do outsourcing, if you worry about getting
incentives right you’re going to have pay an enormous wage
premium to get the guy to work. So the going wage in Freedonia
was 1, but you had to set a wage equal to 2, a 100% wage premium,
to get the guy to work. So the wage premium in this
emerging market is 100%, you’re paying 2 even though the
going wage is 1. By the way, this is not an
unreasonable prediction. If you look at the wages payed
by European and American companies in some of these
emerging markets, which have very,
very low going wages, and if you look at the wages
that are actually being paid by the companies that are doing
outsourcing you see enormous wage premiums.
You see enormous premiums over and above the going wage.
Now what I want to do is I want to revisit exactly the same
situation, but now we’re going to introduce the wrinkle of the
day. What’s the wrinkle of the day?
The wrinkle of the day is you’re not only going to invest
in Freedonia today, but if things go well you’ll
invest tomorrow, and if things go well again
you’ll invest the day after at least with some significant
probability. So the wage premium we just
calculated was the one shot wage premium.
It was getting this job–this single one shot job–outsourced
to Freedonia. Now I want to consider how much
you’re going to have to pay, what are wages going to be in
Freedonia in the foreign investment sector,
if instead of just having a one shot, one job investment,
you’re investing for the long term.
You’re going to be in Freedonia for a while.
So consider repeated interaction with probability
δ of continuing. So we don’t know that you’re
going to go on in Freedonia. Things might break down in
Freedonia because there’s a coup.
It might break down in Freedonia because the American
administration says you’re not allowed to do outsourcing
anymore. All sorts of things might
happen, but with some probability δ
the relationship is going to continue.
So repeated interaction with probability of δ.
Let’s redo the exercise we did before to see what wage you’ll
have to charge. Our question is what
wage–let’s call it W**–what wage will you pay? The way we’re going to solve
this, is exactly using the methods we’ve learned in this
class. So what we’re going to compare
is the temptation to cheat today–and we better make sure
that that’s less than δ times the value of continuing
the relationship minus the value of ending the relationship.
Let’s call this tomorrow. So what’s happening now is,
once again, I’m employing my agent in Freedonia,
and provided he does a good job, I’ll employ him again
tomorrow, at least with probability δ.
But if he doesn’t do a good job, if he runs off with my
investment and doesn’t do my job, what am I going to do?
What would you do? You’d fire him.
So the punishment–it’s clear what the punishment’s going to
be here–the punishment is, if he doesn’t do a good job,
you fire him. The value of ending the
relationship. This is firing and this is
continuing. So let’s just work out what
these things are. So his temptation to cheat
today: if he cheats today, he doesn’t get my wage.
But he does run off with my cash, and he does go and do his
job at the going wage. So if he cheats today he gets
2, he stole all my cash, and he’s going off and working
at the going wage, but he doesn’t get what I would
have paid him W** if the job was well done.
We need this to be less than the value of continuing the
relationship. Let’s do the easy bit first.
What does he get if we end the relationship?
He’s been fired, so he’ll just work at the going
wage for ever. So this is the value of 1 for
ever, or at least until the end of the world.
This is the value of what? As long as he stayed employed
by me what’s he going to get paid every period?
What’s he going to get paid? W**.
So the value of W** for ever. Let me cheat a little bit and
assume that the probability of some coup happening that ends
our relationship exogenously is the same probability of the coup
happening and ending his ongoing wage exogenously,
so we can use the same δ. So let’s just do some math
here, what’s the value of W** forever?
So remember the value of 2 forever was what?
2/[1-δ]. So what’s the value of W**
forever? So this is going to be
W**/[1-δ]. What’s the value of 1 forever?
1/[1-δ]. The whole thing is multiplied
by δ and this is 2-W**. Now I need to do some algebra
to solve for W**. So let’s try and do that.
So I claim that this is the same as [1–δ]
2–[1–δ] W**


Leave a Reply

Your email address will not be published. Required fields are marked *