6. Monte Carlo Simulation

The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or to
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. JOHN GUTTAG: Welcome
to Lecture 6. As usual, I want to start by
posting some relevant reading. For those who don't
know, this lovely picture is of the Casino at Monte
Carlo, and shortly you'll see why we're talking about
casinos and gambling today.

Not because I want to encourage
you to gamble your life savings away. A little history about
Monte Carlo simulation, which is the topic
of today's lecture. The concept was invented by the
Polish American mathematician, Stanislaw Ulam. Probably more well known for his
work on thermonuclear weapons than on mathematics,
but he did do a lot of very
important mathematics earlier in his life. The story here starts
that he was ill, recovering from some
serious illness, and was home and
was bored and was playing a lot of games
of solitaire, a game I suspect you've all played. Being a mathematician,
he naturally wondered, what's the probability of my
winning this stupid game which I keep losing? And so he actually spent
quite a lot of time trying to work out
the combinatorics, so that he could actually
compute the probability.

And despite being a really
amazing mathematician, he failed. The combinatorics were
just too complicated. So he thought, well suppose
I just play lots of hands and count the number I
win, divide by the number of hands I played. Well then he thought
about it and said, well, I've already played a lot
of hands and I haven't won yet. So it probably
will take me years to play enough hands to
actually get a good estimate, and I don't want to do that. So he said, well, suppose
instead of playing the game, I just simulate the
game on a computer.

He had no idea how
to use a computer, but he had friends
in high places. And actually talked
to John von Neumann, who is often viewed as the
inventor of the stored program computer. And said, John, could you do
this on your fancy new ENIAC machine? And on the lower
right here, you'll see a picture of the ENIAC. It was a very large machine. It filled a room. And von Neumann said,
sure, we could probably do it in only a few
hours of computation. Today we would think
of a few microseconds, but those machines were slow.

Hence was born Monte
Carlo simulation, and then they actually used it
in the design of the hydrogen bomb. So it turned out to be
not just useful for cards. So what is Monte
Carlo simulation? It's a method of
estimating the values of an unknown
quantity using what is called inferential statistics. And we've been using
inferential statistics for the last several lectures. The key concepts– and I want
to be careful about these things will be coming back to them– are the population. So think of the
population as the universe of possible examples. So in the case of
solitaire, it's a universe of all possible
games of solitaire that you could possibly play. I have no idea how big that
is, but it's really big, Then we take that
universe, that population, and we sample it by
drawing a proper subset. Proper means not
the whole thing. Usually more than one
sample to be useful. Certainly more than 0.

And then we make an inference
about the population based upon some set of
statistics we do on the sample. So the population is typically
a very large set of examples, and the sample is a
smaller set of examples. And the key fact
that makes them work is that if we choose
the sample at random, the sample will tend to
exhibit the same properties as the population from
which it is drawn. And that's exactly what we did
with the random walk, right? There were a very large number
of different random walks you could take of
say, 10,000 steps. We didn't look at all possible
random walks of 10,000 steps. We drew a small sample
of, say 100 such walks, computed the mean of
those 100, and said, we think that's probably
a good expectation of what the mean would be of
all the possible walks of 10,000 steps.

So we were depending
upon this principle. And of course the key fact
here is that the sample has to be random. If you start drawing the
sample and it's not random, then there's no
reason to expect it to have the same properties
as that of the population. And we'll go on
throughout the term, and talk about the various ways
you can get fooled and think of a random sample
when exactly you don't. All right, let's look at
a very simple example. People like to use flipping
coins because coins are easy. So let's assume
we have some coin. All right, so I bought
two coins slightly larger than the usual coin. And I can flip it. Flip it once, and let's
consider one flip, and let's assume
it came out heads. I have to say the coin I flipped
is not actually a $20 gold piece, in case any of you
were thinking of stealing it.

All right, so we've got one
flip, and it came up heads. And now I can ask
you the question– if I were to flip the same coin
an infinite number of times, how confident would
you be about answering that all infinite
flips would be heads? Or even if I were to
flip it once more, how confident would you be that
the next flip would be heads? And the answer is not very. Well, suppose I
flip the coin twice, and both times it came up heads. And I'll ask you
the same question– do you think that the next
flip is likely to be heads? Well, maybe you would be
more inclined to say yes and having only seen one
flip, but you wouldn't really jump to say, sure. On the other hand, if I flipped
it 100 times and all 100 flips came up heads, well,
you might be suspicious that my coin only has a head
on both sides, for example. Or is weighted in some funny way
that it mostly comes up heads. And so a lot of people,
maybe even me, if you said, I flipped it 100 times
and it came up heads.

What do you think
the next one will be? My best guess would
be probably heads. How about this one? So here I've
simulated 100 flips, and we have 50 heads here,
two heads here, And 48 tails. And now if I said, do you
think that the probability of the next flip
coming up heads– is it 52 out of 100? Well, if you had to guess, that
should be the guess you make. Based upon the
available evidence, that's the best guess
you should probably make. You have no reason to
believe it's a fair coin. It could well be weighted. We don't see it with coins,
but we see weighted dice all the time. We shouldn't, but they exist. You can buy them
on the internet. So typically our best
guess is what we've seen, but we really shouldn't
have very much confidence in that guess.

Because well, could've
just been an accident. Highly unlikely even
if the coin is fair that you'd get 50-50, right? So why when we see 100 samples
and they all come up heads do we feel better about
guessing heads for the 101st than we did when
we saw two samples? And why don't we feel so good
about guessing 52 out of 100 when we've seen a hundred
flips that came out 52 and 48? And the answer is
something called variance. When I had all heads, there was
no variability in my answer. I got the same
answer all the time. And so there was no variability,
and that intuitively– and in fact, mathematically–
should make us feel confident that, OK, maybe that's
really the way the world is. On the other hand, when almost
half are heads and almost half are tails, there's
a lot of variance. Right, it's hard to predict
what the next one will be.

And so we should have
very little confidence that it isn't an
accident that it happened to be 52-48 in one direction. So as the variance grows,
we need larger samples to have the same
amount of confidence. All right, let's look at
that with a detailed example. We'll look at roulette in
keeping with the theme of Monte Carlo simulation. This is a roulette wheel that
could well be at Monte Carlo.

There's no need to simulate
roulette, by the way. It's a very simple
game, but as we've seen with our earlier
examples, it's nice when we're learning about
simulations to simulate things where we actually can know
what the actual answer is so that we can then understand
our simulation better. For those of you who don't
know how roulette is played– is there anyone here who doesn't
know how roulette is played? Good for you.

You grew up virtuous. All right, so– well all right. Maybe I won't go there. So you have a wheel
that spins around, and in the middle are
a bunch of pockets. Each pocket has a
number and a color. You bet in advance
on what number you think is going to
come up, or what color you think is going to come up. Then somebody drops a ball in
that wheel, gives it a spin.

And through centrifugal
force, the ball stays on the
outside for a while. But as the wheel slows down
and heads towards the middle, and eventually settles
in one of those pockets. And you win or you lose. Now you can bet on
it, and so let's look at an example of that. So here is a roulette game. I've called it fair
roulette, because it's set up in such a way that
in principle, if you bet, your expected value should be 0.

You'll win some,
you'll lose some, but it's fair in the
sense that it's not either a negative or positive sum game. So as always, we have an
underbar underbar in it. Well we're setting up the
wheel with 36 pockets on it, so you can bet on the
numbers 1 through 36. That's way range
work, you'll recall. Initially, we don't
know where the ball is, so we'll say it's none.

And here's the key thing
is, if you make a bet, this tells you
what your odds are. That if you bet on a
pocket and you win, you get len of pockets minus 1. So This is why it's
a fair game, right? You bet $1. If you win, you get $36,
your dollar plus $35 back. If you lose, you lose. All right, self dot
spin will be random dot choice among the pockets. And then there is simply
bet, where you just can choose an amount to bet and
the pocket you want to bet on. I've simplified it. I'm not allowing you
to bet here on colors.

All right, so then
we can play it. So here is play roulette. I've made game the
class a parameter, because later we'll look at
other kinds of roulette games. You tell it how many spins. What pocket you want to bet on. For simplicity, I'm going
to bet on this same pocket all the time. Pick your favorite lucky number
and how much you want to bet, and then we'll have a
simulation just like the ones we've already looked at.

So the number you get
right starts at 0. For I and range number of
spins, we'll do a spin. And then tote pocket plus
equal game dot that pocket. And it will come back
either 0 if you've lost, or 35 if you've won. And then we'll just
print the results. So we can do it. In fact, let's run it. So here it is. I guess I'm doing a million
games here, so quite a few. Actually I'm going to do two. What happens when you
spin it 100 times? What happens when you
spin it a million times? And we'll see what we get. So what we see here is
that we do 100 spins. The first time I did it my
expected return was minus 100%.

I lost everything I bet. Not so unlikely,
given that the odds are pretty long that you could
do 100 times without winning. Next time I did a 100, my return
was a positive 44%, and then a positive 28%. So you can see, for 100 spins
it's highly variable what the expected return is. That's one of the
things that makes gambling attractive to people. If you go to a casino, 100 spins
would be a pretty long night at the table. And maybe you'd
won 44%, and you'd feel pretty good about it. What about a million spins? Well people aren't interested in
that, but the casino is, right? They don't really care what
happens with 100 spins. They care what happens
with a million spins. What happens when everybody
comes every night to play. And there what we see is– you'll notice much
less variance.

Happens to be minus
0.04 plus 0.6 plus 0.79. So it's still not 0,
but it's certainly, these are all closer to
0 than any of these are. We know it should
be 0, but it doesn't happen to be in these examples. But not only are they closer
to 0, they're closer together. There is much less variance
in the results, right? So here I show you
these three numbers, and ask what do you
expect to happen? You have no clue, right? So I don't know,
maybe I'll win a lot. Maybe I'll lose everything. I show you these three numbers,
you're going to look at it and say, well you
know, I'm going to be somewhere between
around 0 and maybe 1%. But you're never
going to guess it's going to be radically
different from that. And if I were to change this
number to be even higher, it would go even closer to 0. But we won't bother. OK, so these are
the numbers we just looked at, because I said
the seed to be the same.

So what's going on
here is something called the law of large numbers,
or sometimes Bernoulli's law. This is a picture of
Bernoulli on the stamp. It's one of the two most
important theorems in all of statistics, and we'll come
to the second most important theorem in the next lecture. Here it says, "in
repeated independent tests with the same actual
probability, the chance that the fraction of
times the outcome differs from p converges to 0
as the number of trials goes to infinity." So this says if I were to
spin this fair roulette wheel an infinite
number of times, the expected– the
return would be 0. The real true probability
from the mathematics. Well, infinite is a
lot, but a million is getting closer to infinite. And what this says is the
closer I get to infinite, the closer it will be
to the true probability.

So that's why we did better with
a million than with a hundred. And if I did a 100
million, we'd do way better than I did with a million. I want to take a minute to
talk about a way this law is often misunderstood. This is something called
the gambler's fallacy. And all you have
to do is say, let's go watch a sporting event. And you'll watch a
batter strike out for the sixth consecutive time. The next time they
come to the plate, the idiot announcer says,
well he struck out six times in a row. He's due for a hit this
time, because he's usually a pretty good hitter. Well that's nonsense. It says, people somehow
believe that if deviations from expected occur, they'll
be evened out in the future. And we'll see something
similar to this that is true, but this is not true.

And there is a great
story about it. This is told in a
book by Huff and Geis. And this truly happened in
Monte Carlo, with Roulette. And you could either
bet on black or red. Black came up 26 times in a row. Highly unlikely, right? 2 to the 26th is a giant number. And what happened is, word
got out on the casino floor that black had kept
coming up way too often. And people more or less
panicked to rush to the table to bet on red, saying, well
it can't keep coming up black. Surely the next one will be red. And as it happened when the
casino totaled up its winnings, it was a record
night for the casino. Millions of francs got
bet, because people were sure it would have to even out. Well if we think
about it, probability of 26 consecutive reds is that. A pretty small number. But the probability
of 26 consecutive reds when the previous 25
rolls were red is what? No, that. AUDIENCE: Oh, I
thought you meant it had been 26 times again. JOHN GUTTAG: No, if you
had 25 reds and then you spun the wheel once
more, the probability of it having 26 reds is
now 0.5, because these are independent events.

Unless of course the wheel
is rigged, and we're assuming it's not. People have a hard
time accepting this, and I know it seems funny. But I guarantee there will be
some point in the next month or so when you will find
yourself thinking this way, that something has to even out. I did so badly on
the midterm, I will have to do better on the final. That was mean, I'm sorry. All right, speaking of means– see? Professor Grimson not the only
one who can make bad jokes. There is something– it's
not the gambler's fallacy– that's often confused
with it, and that's called regression to the mean.

This term was coined in
1885 by Francis Galton in a paper, of which I've
shown you a page from it here. And the basic
conclusion here was– what this table says is
if somebody's parents are both taller than
average, it's likely that the child will be
smaller than the parents. Conversely, if the parents
are shorter than average, it's likely that the child
will be taller than average. Now you can think about this
in terms of genetics and stuff. That's not what he did. He just looked at
a bunch of data, and the data actually
supported this.

And this led him to this notion
of regression to the mean. And here's what
it is, and here's the way in which it is subtly
different from the gambler's fallacy. What he said here is,
following an extreme event– parents being unusually tall– the next random event is
likely to be less extreme. He didn't know much
about genetics, and he kind of assumed the
height of people were random. But we'll ignore that. OK, but the idea is here
that it will be less extreme. So let's look at it in roulette. If I spin a fair roulette
wheel 10 times and get 10 reds, that's an extreme event. Right, here's a probability
of basically 1.1024. Now the gambler's
fallacy says, if I were to spin it
another 10 times, it would need to even out. As in I should get more
blacks than you would usually get to make up for
these excess reds. What regression to the
mean says is different.

It says, it's likely that
in the next 10 spins, you will get fewer than 10 reds. You will get a
less extreme event. Now it doesn't have to be 10. If I'd gotten 7 reds instead of
5, you'd consider that extreme, and you would bet that the next
10 would have fewer than 7. But you wouldn't bet that
it would have fewer than 5. Because of this, if you now look
at the average of the 20 spins, it will be closer to
the mean of 50% reds than you got from the
extreme first spins. So that's why it's called
regression to the mean.

The more samples you
take, the more likely you'll get to the mean. Yes? AUDIENCE: So,
roulette wheel spins are supposed to be independent. JOHN GUTTAG: Yes. AUDIENCE: So it seems
like the second 10– JOHN GUTTAG: Pardon? AUDIENCE: It seems like
the second 10 times that you spin it. Like that shouldn't
have to [INAUDIBLE].. JOHN GUTTAG: Has nothing
to do with the first one. AUDIENCE: But you said
it's likely [INAUDIBLE].. JOHN GUTTAG: Right, because you
have an extreme event, which was unlikely. And now if you
have another event, it's likely to be
closer to the average than the extreme
was to the average. Precisely because
it is independent. That makes sense to everybody? Yeah? AUDIENCE: Isn't that the same
as the gambler's fallacy, then? By saying that, because
this was super unlikely, the next one [INAUDIBLE].

JOHN GUTTAG: No, the
gambler's fallacy here– and it's a good question,
and indeed people often do get these things confused. The gambler's fallacy would
say that the second 10 spins would– we would expect to
have fewer than 5 reds, because you're trying to even
out the unusual number of reds in the first Spin Whereas here we're not saying
we would have fewer than 5. We're saying we'd probably
have fewer than 10. That it'll be
closer to the mean, not that it would
be below the mean. Whereas the gambler's
fallacy would say it should be below that mean to
quote, even out, the first 10. Does that makes sense? OK, great questions. Thank you. All right, now you
may not know this, but casinos are not in the
business of being fair. And the way they don't
do that is in Europe, they're not all red and black. They sneak in one green.

And so now if you bet
red, well sometimes it isn't always red or black. And furthermore,
there is this 0. They index from 0 rather
than from one, and so you don't get a full payoff. In American roulette, they
manage to sneak in two greens. They have a 0 in a double 0. Tilting the odds even more
in favor of the casino. So we can do that
in our simulation. We'll look at European roulette
as a subclass of fair roulette. I've just added this
extra pocket, 0. And notice I have
not changed the odds. So what you get if you get
your number is no higher, but you're a little bit
less likely to get it because we snuck in that 0. Than American roulette is a
subclass of European roulette in which I add yet
another pocket. All right, we can
simulate those. Again, nice thing
about simulations, we can play these games.

So I've simulated 20 trials
of 1,000 spins, 10,000 spins, 100,000, and a million. And what do we see
as we look at this? Well, right away we can see
that fair roulette is usually a much better bet than
either of the other two. That even with only 1,000
spins the return is negative. And as we get more and
more as I got to a million, it starts to look much
more like closer to 0. And these, we have reason
to believe at least, are much closer to
true expectation saying that, while you
break even in fair roulette, you'll lose 2.7% in Europe
and over 5% in Las Vegas, or soon in Massachusetts. All right, we're
sampling, right? That's why the
results will change, and if I ran a
different simulation with a different seed I'd
get different numbers. Whenever you're sampling,
you can't be guaranteed to get perfect accuracy. It's always possible
you get a weird sample.

That's not to say that you won't
get exactly the right answer. I might have spun
the wheel twice and happened to get the exact
right answer of the return. Actually not twice,
because the math doesn't work out, but
35 times and gotten exactly the right answer. But that's not the point. We need to be able
to differentiate between what happens to be
true and what we actually know, in a rigorous sense, is true. Or maybe don't know it,
but have real good reason to believe it's true. So it's not just a
question of faith. And that gets us to
what's in some sense the fundamental question of
all computational statistics, is how many samples
do we need to look at before we can have real,
justifiable confidence in our answer? As we've just seen– not just, a few minutes
ago– with the coins, our intuition tells
us that it depends upon the variability in the
underlying possibilities. So let's look at
that more carefully.

We have to look at the
variation in the data. So let's look at first
something called variance. So this is variance of x. Think of x as just a list of
data examples, data items. And the variance is we
first compute the average of value, that's mu. So mu is for the mean. For each little x and big
X, we compare the difference of that and the mean.

How far is it from the mean? And square of the difference,
and then we just sum them. So this takes, how far is
everything from the mean? We just add them all up. And then we end up dividing
by the size of the set, the number of examples. Why do we have to
do this division? Well, because we don't want to
say something has high variance just because it has
many members, right? So this sort of normalizes
is by the number of members, and this just sums how different
the members are from the mean. So if everything
is the same value, what's the variance going to be? If I have a set of 1,000
6's, what's the variance? Yes? AUDIENCE: 0.

JOHN GUTTAG: 0. You think this is going to
be hard, but I came prepared. I was hoping this would happen. Look out, I don't know
where this is going to go. [FIRES SLINGSHOT] AUDIENCE: [LAUGHTER] JOHN GUTTAG: All right, maybe
it isn't the best technology. I'll go home and practice. And then the thing
you're more familiar with is the standard deviation. And if you look at the
standard deviation is, it's simply the square
root of the variance.

Now, let's understand
this a little bit and first ask, why am
I squaring this here, especially because
later on I'm just going to take a square root anyway? Well squaring it has
one virtue, which is that it means I don't care
whether the difference is positive or negative. And I shouldn't, right? I don't care which side
of the mean it's on, I just care it's
not near the mean. But if that's all
I wanted to do I could take the absolute value. The other thing we
see with squaring is it gives the outliers
extra emphasis, because I'm squaring that distance. Now you can think
that's good or bad, but it's worth
knowing it's a fact. The more important
thing to think about is standard deviation all by
itself is a meaningless number. You always have to think about
it in the context of the mean. If I tell you the
standard deviation is 100, you then say, well– and I ask
you whether it's big or small, you have no idea.

If the mean is 100 and the
standard deviation is 100, it's pretty big. If the mean is a billion and
the standard deviation is 100, it's pretty small. So you should never want to look
at just the standard deviation. All right, here
is just some code to compute those, easy enough. Why am I doing this? Because we're now getting
to the punch line. We often try and estimate
values just by giving the mean. So we might report on an exam
that the mean grade was 80. It's better instead
of trying to describe an unknown value by it– an unknown parameter
by a single value, say the expected return on
betting a roulette wheel, to provide a
confidence interval.

So what a confidence
interval is is a range that's likely to
contain the unknown value, and a confidence that
the unknown value is within that range. So I might say on
a fair roulette wheel I expect that your
return will be between minus 1% and plus 1%, and I expect that
to be true 95% of the time you play the game if you
play 100 rolls, spins. If you take 100 spins
of the roulette wheel, I expect that 95% of
the time your return will be between this and that. So here, we're saying the return
on betting a pocket 10 times, 10,000 times in European
roulette is minus 3.3%. I think that was the
number we just saw. And now I'm going to add to
that this margin of error, which is plus or minus 3.5%
with a 95% level of confidence. What does this mean? If I were to conduct an
infinite number of trials of 10,000 bets each, my
expected average return would indeed be
minus 3.3%, and it would be between these
values 95% of the time. I've just subtracted
and added this 3.5, saying nothing about
what would happen in the other 5% of the time.

How far away I
might be from this, this is totally silent
on that subject. Yes? AUDIENCE: I think
you want 0.2 not 9.2. JOHN GUTTAG: Oh, let's see. Yep, I do. Thank you. We'll fix it on the spot. This is why you have
to come to lecture rather than just
reading the slides, because I make mistakes. Thank you, Eric. All right, so it's telling me
that, and that's all it means. And it's amazing how
often people don't quite know what this means. For example, when they
look at a political pole and they see how many votes
somebody is expected to get. And they see this
confidence interval and say, what does that really mean? Most people don't know. But it does have a very precise
meaning, and this is it. How do we compute
confidence intervals? Most of the time we compute
them using something called the empirical rule. Under some assumptions, which
I'll get to a little bit later, the empirical rule says that if
I take the data, find the mean, compute the standard
deviation as we've just seen, 68% of the data will be within
one standard deviation in front of or behind the mean.

Within one standard
deviation of the mean. 95% will be within 1.96
standard deviations. And that's what
people usually use. Usually when people talk
about confidence intervals, they're talking about the
95% confidence interval. And they use this 1.6 number. And 99.7% of the data
will be within three standard deviations. So you can see if you are
outside the third standard deviation, you are
a pretty rare bird, for better or worse
depending upon which side. All right, so let's
apply the empirical rule to our roulette game. So I've got my three
roulette games as before. I'm going to run a
simple simulation. And the key thing
to notice is really this print statement here. Right, that I'll print the
mean, which I'm rounding. And then I'm going to give
the confidence intervals, plus or minus, and I'll just
take the standard deviation times 1.6 times
100, y times 100, because I'm showing
you percentages. All right so again, very
straightforward code. Just simulation, just like the
ones we've been looking at. And well, I'm just going– I don't think I'll
bother running it for you in the interest of time.

You can run it yourself. But here's what I
got when I ran it. So when I simulated betting
a pocket for 20 trials, we see that the– of 1,000 spins each,
for 1,000 spins the expected return for fair
roulette happened to be 3.68%. A bit high. But you'll notice the confidence
interval plus or minus 27 includes the actual
answer, which is 0. And we have very large
confidence intervals for the other two games. If you go way down to the bottom
where I've spun, spun the wheel many more times,
what we'll see is that my expected return for fair
roulette is much closer to 0 than it was here. But more importantly,
my confidence interval is much smaller, 0.8. So now I really have
constrained it pretty well. Similarly, for the other
two games you will see– maybe it's more accurate,
maybe it's less accurate, but importantly the confidence
interval is smaller.

So I have good reason to believe
that the mean I'm computing is close to the true mean,
because my confidence interval has shrunk. So that's the really
important concept here, is that we don't just guess– compute the value
in the simulation. We use, in this case,
the empirical rule to tell us how much faith we
should have in that value. All right, the empirical
rule doesn't always work. There are a couple
of assumptions. One is that the mean
estimation error is 0. What is that saying? That I'm just as likely
to guess high as gas low. In most experiments of this
sort, most simulations, that's a very fair assumption.

There's no reason to guess
I'd be systematically off in one direction or another. It's different when you use
this in a laboratory experiment, where in fact, depending upon
your laboratory technique, there may be a bias in your
results in one direction. So we have to assume that
there's no bias in our errors. And we have to assume that
the distribution of errors is normal. And we'll come back to
this in just a second. But this is a
normal distribution, called the Gaussian. Under those two assumptions
the empirical rule will always hold. All right, let's talk
about distributions, since I just introduced one. We've been using a
probability distribution.

And this captures the notion
of the relative frequency with which some random variable
takes on different values. There are two kinds. , Discrete
and these when the values are drawn from a finite
set of values. So when I flip
these coins, there are only two possible
values, head or tails. And so if we look at the
distribution of heads and tails, it's pretty simple. We just list the
probability of heads. We list the
probability of tails. We know that those two
probabilities must add up to 1, and that fully describes
our distribution. Continuous random variables
are a bit trickier. They're drawn from a set of
reals between two numbers. For the sake of
argument, let's say those two numbers are 0 and 1. Well, we can't just
enumerate the probability for each number. How many real numbers are
there between 0 and 1? An infinite number, right? And so I can't say, for each of
these infinite numbers, what's the probability of it occurring? Actually the probability is
close to 0 for each of them.

Is 0, if they're truly infinite. So I need to do
something else, and what I do that is what's called the
probability density function. This is a different kind of
PDF than the one Adobe sells. So there, we don't
give the probability of the random variable
taking on a specific value. We give the
probability of it lying somewhere between two values. And then we define a curve,
which shows how it works.

So let's look at an example. So we'll go back to
normal distributions. This is– for the continuous
normal distribution, it's described by this function. And for those of you who don't
know about the magic number e, this is one of many
ways to define it. But I really don't care
whether you remember this. I don't care whether
you know what e is. I don't care if you
know what this is. What we really want to say
is, it looks like this. In this case, the mean is 0.

It doesn't have to be 0. I've shown a mean of 0 and
a standard deviation of 1. This is called the so-called
standard normal distribution. But it's symmetric
around the mean. And that gets back to,
it's equally likely that our errors are in
either direction, right? So it peaks at the mean. The peak is always at the mean. That's the most
probable value, and it's symmetric about the mean. So if we look at it,
for example, and I say, what's the probability of the
number being between 0 and 1? I can look at it here
and say, all right, let's draw a line
here, and a line here. And then I can integrate
the curve under here. And that tells me
the probability of this random variable
being between 0 and 1. If I want to know
between minus 1 and 1. I just do this and then I
integrate over that area. All right, so the area
under the curve in this case defines the likelihood. Now I have to divide and
normalize to actually get the answer between 0 and 1.

So the question
is, what fraction of the area under the curve
is between minus 1 and 1? And that will tell
me the probability. So what does the
empirical rule tell us? What fraction is between
minus 1 and 1, roughly? Yeah? 68%, right? So that tells me 68% of
the area under this curve is between minus 1 and 1,
because my standard deviation is 1, roughly 68%. And maybe your eyes
will convince you that's a reasonable guess. OK, we'll come back and look
at this in a bit more detail on Monday of next week.

And also look at
the question of, why does this work
in so many cases where we don't actually
have a normal distribution to start with? .

test attribution text

Add Comment