# 17. Stochastic Processes II

The following content is
MIT OpenCourseWare continue to offer high quality
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: And today
it's me, back again. And we'll study continuous
types of stochastic processes. So far we were discussing
discrete time processes. We studied the basics like
variance, expectation, all this stuff– moments,
moment generating function, and some important concepts for
Markov chains, and martingales.

So I'm sure a lot of
you would have forgot about what martingale
and Markov chains were, but try to review this
before the next few lectures. Because starting
next week when we start discussing continuous
types of stochastic processes– not from me. You're not going to hear
martingale from me that much. But from people– say,
outside speakers– they're going to use
this martingale concept to do pricing. So I will give you
some easy exercises. You will have some
problems on martingales. Just refer back to the notes
that I had like a month ago, and just review. It won't be difficult
problems, but try to make the concept comfortable. OK. And then Peter taught
some time series analysis. Time series is just the same
as discrete time process. And regression analysis, this
was all done on discrete time. That means the underlying space
was x_1, x_2, x_3, dot dot dot, x_t.

But now we're going to
talk about continuous time processes. What are they? They're just a collection
of random variables indexed by time. But now the time
is a real variable. Here, time was just
in integer values. Here, we have real variable. So a stochastic process
develops over time, and the time variable
is continuous now. It doesn't necessarily mean
that the process itself is continuous– it may as
well look like these jumps. It may as well have a
lot of jumps like this. It just means that
the underlying time variable is continuous. Whereas when it
was discrete time, you were only looking
at specific observations at some times. I'll draw it here. Discrete time looks
more like that. OK. So the first
difficulty when you try to understand continuous time
stochastic processes when you look at it is, how do
you describe the probability distribution? How to describe the
probability distribution? So let's go back to
discrete time processes.

So the universal example
was a simple random walk. And if you remember, how we
described it was x_t minus x_(t-1), was either 1 or minus
1, probability half each. This was how we described it. And if you think about it,
this is a slightly indirect way of describing the process. You're not describing
the probability of this process following
this path, it's like a path. Instead what you're
doing is, you're describing the probability
of this event happening. From time t to t plus 1,
what is the probability that it will go down? And at each step you describe
the probability altogether, when you combine them, you get
the probability distribution over the process.

But you can't do it for
continuous time, right? The time variable is
continuous so you can't just take intervals t
and interval t prime and describe the difference. If you want to do that, you have
to do it infinitely many times. You have to do it for
all possible values. That's the first difficulty. Actually, that's
the main difficulty. And how can we handle this? It's not an easy question. And you'll see a very
indirect way to handle it. It's somewhat in the
spirit of this thing. But it's not like you draw some
path to describe a probability density of this path. That's the omega. What is the probability
density at omega? Of course, it's not
a discrete variable so you have a probability
density function, not a probability mass function. In fact, can we
even write it down? You'll later see
that we won't even be able to write this down. So just have this
in mind and you'll see what I was trying to say.

So finally, I get to talk
was able to cover it before they talked about it, but
you'll see a lot more from now. And let's see what
it actually is. So it's described
as the following, it actually follows
from a theorem. There exists a
probability distribution over the set of continuous
functions from positive reals to the reals such that
first, B(0) is always 0. So probability of B(0)
is equal to 0 is 1. Number two– we call
this stationary. For all s and t,
B(t) minus B(s) has normal distribution with mean
0 and variance t minus s. And the third–
independent increment. That means if intervals
[s i, t i] are not overlapping, then B(t_i) minus
B(s_i) are independent. So it's actually
a theorem saying that there is some
strange probability distribution over the
continuous functions from positive reals–
non-negative reals– to the reals. So if you look at some
continuous function, this theorem gives you a
probability distribution. It describes the probability
of this path happening.

It doesn't really describe it. It just says that there
exists some distribution such that it always starts at
0 and it's continuous. Second, the distribution for all
fixed s and t, the distribution of this difference is
normally distributed with mean 0 and variance
t minus s, which scales according to the time. And then third,
independent increment means what happened between
this interval, [s1, t1], and [s2, t2], this
part and this part, is independent as long as
intervals do not overlap.

It sounds very similar to
the simple random walk. But the reason we have to do
this very complicated process is because the
time is continuous. You can't really describe at
each time what's happening. Instead, what you're describing
is over all possible intervals what's happening. When you have a fixed interval,
it describes the probability distribution. And then when you have
several intervals, as long as they don't
overlap, they're independent. OK? And then by this theorem,
we call this probability distribution a Brownian motion. So probability distribution,
the definition, distribution given by this theorem is
called the Brownian motion.

That's why I'm
saying it's indirect. I'm not saying Brownian
motion is this probability distribution. It satisfies these conditions,
but we are reversing it. Actually, we have these
properties in mind. We're not sure if such a
probability distribution even exists or not. And actually this theorem
is very, very difficult. I don't know how to
prove it right now. I have to go through a book. And even graduate
probability courses usually don't cover it
because it's really technical. That means this just shows
how continuous time stochastic processes can be so much more
complicated than discrete time. Then why are you– why are
we studying continuous time processes when it's
so complicated? Well, you'll see in
the next few lectures.

Any questions? OK. So let's go through
this a little bit more. AUDIENCE: Excuse me. PROFESSOR: Yes. AUDIENCE: So when you talk about
the probability distribution, what's the underlying space? Is it the space of– PROFESSOR: Yes, that's
a very good question. The space is the space
of all functions. That means it's a space
of all possible paths, if you want to think
all possible ways your variable can
evolve over time. And for some fixed
drawing for this path, there's some probability
that this path will happen. It's not the probability spaces
that you have been looking at. It's not one point– well,
a point is now a path. And your probability
distribution is given over paths,
not for a fixed point.

And that's also a reason why
it makes it so complicated. Other questions? So the main thing you have to
remember– well, intuitively you will just know it. But one thing you want to try
to remember is this property. As your time scales, what
happens between that interval is it's like a normal variable. So this is a collection of
a bunch of normal variables. And the mean is always
0, but the variance is determined by the
length of your interval. Exactly that will
be the variance. So try to remember
this property. A few more things, it has
a lot of different names. It's also called Wiener process. And let's see,
there was one more.

Is there another name for it? I thought I had one more
name in mind, but maybe not. AUDIENCE: Norbert Wiener
was an MIT professor. PROFESSOR: Oh, yeah. That's important. AUDIENCE: Of course. PROFESSOR: Yeah, a
professor at MIT. But apparently he
wasn't the first person who discovered this process. I was some other person in 1900. And actually, in the
first paper that appeared, of course, they didn't know
about each other's result. In that paper the
reason he studied this was to evaluate stock
prices and auction prices. And here's another slightly
different description, maybe a more
intuitive description of the Brownian motion. So here is this philosophy. Philosophy is that Brownian
motion is the limit of simple random walks.

The limit– it's a
very vague concept. You'll see what I mean by this. So fix a time
interval of 0 up to 1 and slice it into
very small pieces. So I'll say, into n pieces. 1 over n, 2 over n, 3 over
n, dot dot dot, to n minus 1 over n. And consider a
simple random walk, n-step simple random walk. So from time 0 you go
up or down, up or down. Then you get
something like that. OK? So let me be a little
bit more precise. Let Y_0, Y_1, to Y_n,
be a simple random walk, and let Z be the function
such that at time t over n, we let it to be Y of t. That's exactly just written
down in formula what it means. So this process is Z. I
take a simple random walk and scale it so that it
goes from time 0 to time 1. And then in the
intermediate values– for values that
are not this, just linearly extended– linearly
extend in intermediate values.

It's a complicated way of
saying just connect the dots. And take n to infinity. Then the resulting distribution
is a Brownian motion. So mathematically,
that's just saying the limit of simple random
walks is a Brownian motion. But it's more than that. That means if you
have some suspicion that some physical quantity
follows a Brownian motion, and then you
observe the variable at discrete times at
very, very fine scales– so you observe it really, really
often, like a million times in one second. Then once you see– if you see
that and take it to the limit, it looks like a Brownian motion.

Then now you can conclude
that it's a Brownian motion. What I'm trying to say is
this continuous time process, whatever the strange thing
is, it follows from something from a discrete world. It's not something new. It's the limit of these
objects that you already now. So this tells you that it might
be a reasonable model for stock prices because for
stock prices, no matter how– there's only a
finite amount of time scale that you can observe the prices. But still, if you
observe it infinitely as much as you can, and
the distribution looks like a Brownian motion,
then you can use a Brownian motion to model it.

So it's not only the
theoretical observation. It also has implication
when you want to use Brownian motion
as a physical model for some quantity. It also tells you why
Brownian motion might appear in some situations. So here's an example. Here's a completely
different context where Brownian motion
was discovered, and why it has the
name Brownian motion. So a botanist– I don't know if
I'm pronouncing it correctly– named Brown in the
1800s, what he did was he observed a pollen
particle in water. So you have a cup of water
and there's some pollen. Of course you have gravity
that pulls the pollen down. And pollen is heavier than
water so eventually it will go down, eventually.

But that only explains
the vertical action, it will only go down. But in fact, if you
observe what's happening, it just bounces back
and forth crazily until it finally reaches
down the bottom of your cup. And this motion,
if you just look at a two-dimension picture,
it's a Brownian motion to the left and right. So it moves as according
to Brownian motion.

Well, first of all, I should
say a little bit more. What Brown did was
he observed it. He wasn't able to explain the
horizontal actions because he only understood
gravity, but then people tried to explain it. They suspected that it was
the water molecules that caused this action, but weren't
able to really explain it. But the first person to
actually rigorously explain it was, surprisingly,
Einstein, that relativity guy, that famous guy.

So I was really surprised. He's really smart, apparently. And why? So why will this follow
a Brownian motion? Why is it a reasonable model? And this gives you a fairly
good reason for that. This description, where it's the
limit of simple random walks. Because if you think
about it, what's happening is there is a big
molecule that you can observe, this big particle. But inside there's
tiny water molecules, tiny ones that don't really
see, but it's filling the space. And they're just moving crazily. Even though the water looks
still, what's really happening is these water
molecules are just crazily moving inside the cup.

And each water molecule, when
they collide with the pollen, it will change the action
of the pollen a little bit, by a tiny amount. So if you think about each
collision as one step, then each step will either
push this pollen to the left or to the right by
some tiny amount. And it just
accumulates over time. So you're looking at a
very, very fine time scale. Of course, the times
will differ a little bit, but let's just forget about
it, assume that it's uniform. And at each time it just
pushes to the left or right by a tiny amount.

And you look at what
accumulates, as we saw, the limit of a simple random
walk is a Brownian motion. And that tells you why
we should get something like a Brownian motion here. So the action of pollen
particle is determined by infinitesimal– I don't
know if that's the right word– but just, quote,
"infinitesimal" interactions with water molecules. That explains, at
least intuitively, why it follows Brownian motion. And the second example
is– any questions here– is stock prices. At least to give you some
reasonable reason, some reason that Brownian motion is not so
bad a model for stock prices. Because if you look
at a stock price, S, the price is determined by
buying actions or selling actions. Each action kind of
pulls down the price or pulls up the price,
pushes down the price or pulls up the price.

And if you look at very, very
tiny scales, what's happening is at a very tiny amount
they will go up or down. Of course, it doesn't go up
and down by a uniform amount, but just forget about
that technicality. It just bounces back and
forth infinitely often, and then you're taking
these tiny scales to be tinier, so
very, very small. So again, you see
this limiting picture. Where you have a discrete–
something looking like a random walk, and
you take t as infinity. So if that's the only
action causing the price, then Brownian motion will
be the right model to use. Of course, there are many
other things involved which makes this deviate
from Brownian motion, but at least, theoretically,
it's a good starting point.

Any questions? OK. So you saw Brownian motion. You already know that it's used
in the financial market a lot. It's also being used in science
and other fields like that. And really big names, like
Einstein, is involved. So it's a really, really
important theoretical thing. Now that you've learned it,
it's time to get used to it. So I'll tell you
some properties, and actually prove a little
bit– just some propositions to show you some properties. Some of them are quite
surprising if you never saw it before. OK. So here are some properties. Crosses the x-axis
infinitely often, or I should say the t-axis. Because you start from 0, it
will never go to infinity, or get to negative infinity.

It will always go balanced
positive and negative infinitely often. And the second, it does
not deviate too much from t equals y squared. We'll call this y. Now, this is a very
vague statement. What I'm trying to say is
draw this curve as this. If you start at time
0, at some time t_0, the probability
distribution here is given as a normal
random variable with mean 0 and variance t_0.

And because of that,
the standard deviation is square root t_0. So the typical value will be
around the standard deviation. And it won't deviate. It can be 100 times this. It won't really be a million
times that or something. So most likely it will
look something like that. So it plays around
this curve a lot, but it crosses the
axis infinitely often. It goes back and forth. What else? The third one is quite
really interesting. It's more theoretical
interest, but it also has real-life implications. It's not differentiable
anywhere. It's nowhere differentiable. So this curve,
whatever that curve is, it's a continuous path, but it's
nowhere differentiable, really surprising. It's hard to imagine
even one such path. What it's saying is if you
take one path according to this probability
distribution, then more than likely
you'll obtain a path which is nowhere differentiable.

That just sounds nice,
but why it does it matter? It matters because we
can't use calculus anymore. Because all the
theory of calculus is based on differentiation. However, our paths have some
nice things, it's universal, and it appears in very
different contexts. But if you want to
do analysis on it, it's just not differentiable. So the standard
tools of calculus can't be used here, which
is quite unfortunate if you think about it. You have this nice model,
which can describe many things, you can't really
do analysis on it. We'll later see
that actually there is a variant, a different
calculus that works. And I'm sure many of you
would have heard about it. It's called Ito's calculus. So we have this nice object. Unfortunately, it's
not differentiable, so the standard calculus
does not work here. However, there is
a modified version of calculus called
Ito's calculus, which extends the classical
calculus to this setting.

And it's really powerful
and it's really cool. But unfortunately, we don't
have that much time to cover it. I will only be able to tell
you really basic properties and basic computations of it. And you'll see how
this calculus is being used in the
financial world in the coming-up lectures. But before going
into Ito's calculus, let's talk about the property
of Brownian motion a little bit because we have
to get used to it. Suppose I'm using it as
a model of a stock price.

So I'm using– use
Brownian motion as a model for stock price–
say, daily stock price. The market opens at 9:30 AM. It closes at 4:00 PM. It starts at some
price, and then moves according to the
Brownian motion. And then you want to obtain the
distribution of the min value and the max value for the stock. So these are very
useful statistics. So a daily stock
price, what will the minimum and the
maximum– what will the distribution of those be? So let's compute it. We can actually compute it. What we want to do is– I'll
just compute the maximum.

I want to compute this
thing over s smaller than t of the Brownian motion. So I define this new process
from the Brownian motion, and I want to compute
the distribution of this new stochastic process. And here's the theorem. So for all t, the
probability that you have M(t) greater than a and
positive a is equal to 2 times the probability that you have
the Brownian motion greater than a. It's quite surprising. If you just look
at this, there's no reason to expect that
such a nice formula should exist at all. And notice that maximum
is always at least 0, so we don't have to worry
about negative values. It starts at 0. How do we prove it? Proof. Take this tau. It's a stopping time, if
you remember what it is. It's a minimum value of t
such that the Brownian motion at time t is equal to a.

That's a complicated
way of saying, just record the first time
you hit the line a. Line a, with some
Brownian motion, and you record this time. That will be your tau of a. So now here's some
strange thing. The probability that B(t),
B(tau_a), given this– OK. So what this is saying is, if
you're interested at time t, if your tau_a happened
before time t, so if your Brownian motion
hit the line a before time t, then afterwards you have the
same probability of ending up above a and ending up below a. The reason is because you
can just reflect the path. Whatever path that
ends over a, you can reflect it to obtain
a path that ends below a. And by symmetry, you
just have this property. Well, it's not obvious how
you'll use this right now. And then we're almost done. The probability that maximum
at time t is greater than a that's equal to the probability
that you're stopping time is less than t,
just by definition.

And that's equal to the
probability that B(t) minus B(tau_a) is positive given
tau a is less than t– Because if you know
that tau is less than t, there's only two possible ways. You can either go up afterwards,
or you can go down afterwards. But these two are
the same probability. What you obtain is 2 times the
probability that– and that's just equal to 2
times the probability that B(t) is greater than a. What happened? Some magic happened. First of all, these two
are the same because of this property by symmetry. Then from here to here, B(tau_a)
is always equal to a, as long as tau_a is less than t.

This is just– I rewrote this
as a, and I got this thing. And then I can just remove
this because if I already know that tau_a is less
than t– order is reversed. If I already know that B at
time t is greater than a, then I know that
tau is less than t. Because if you want to reach
a because of continuity, if you want to go over a, you
have to reach a at some point. That means you hit
a before time t. So that event is already
inside that event. And you just get rid of it. Sorry, all this should
be– something looks weird. Not conditioned. OK. That makes more sense. Just the intersection
of two properties. Any questions here? So again, you just want
to compute the probability that the maximum is
greater than a at time t.

In other words, just
by definition of tau_a, that's equal to the problem
that tau_a is less than t. And if tau_a is less
than t, afterwards, depending on afterwards
what happens, it increases or decreases. So there's only
two possibilities. It increases or it decreases. But these two events
have the same probability because of this property. Here's a bar and
that's an intersection. But it doesn't matter, because
if you have the B of X_1 bar y equals B of x_2 bar
y then probability of X_1 intersection Y
over probability of Y is equal to– these two cancel. So this bar can just be
replaced by intersection. That means these two events
have the same probability. So you can just take one. What I'm going to take
is one that goes above 0.

So after tau_a, it
accumulates more value. And if you rewrite it,
what that means is just B_t is greater than a given
that tau_a is less than t. But now that just
became redundant. Because if you already know
that B(t) is greater than a, tau_a has to be less than t. And that's just the conclusion. And it's just some nice
result about the maximum over some time interval. And actually, I think Peter uses
distribution in your lecture, right? AUDIENCE: Yes. [INAUDIBLE] is that the
distribution of the max minus the movement of
the Brownian motion. And use that range of
the process as a scaling for [INAUDIBLE] and get more
precise measures of volatility than just using, say,
the close-to-close price [INAUDIBLE]. PROFESSOR: Yeah. That was one property. And another property is– and
that's what I already told you, but I'm going to prove this. So at each time
the Brownian motion is not differentiable
at that time with probability equal to 1.

Well, not very
strictly, but I will use this theorem to prove it. OK? Suppose the Brownian motion
has a differentiation at time t and it's equal to a. Then what you just see is that
the Brownian motion at time t plus epsilon, minus
Brownian motion at time t, has to be less than or
equal to epsilon times a. Not precisely, so
I'll say just almost. Can make it
mathematically rigorous. But what I'm trying
to say here is by– is it mean value theorem? So from t to t plus epsilon, you
expect to gain a times epsilon. That's– OK? You should have this– then. In fact, for all epsilon. Greater than epsilon prime'. Let's write it like that. So in other words, the
maximum in this interval, B(t+epsilon) minus t, this
distribution is the same as the maximum at epsilon prime.

That has to be less
than epsilon times A. So what I'm trying to say is if
this differentiable, depending on the slope, your Brownian
motion should have always been inside this cone from t
up to time t plus epsilon. If you draw this slope, it must
have been inside this cone. I'm trying to say that
this cannot happen. From here to here, it
should have passed this line at some point. OK? So to do that I'm looking
at the distribution of the maximum value
over this time interval. And I want to say that it's
even greater than that. So if your maximum
is greater than that, you definitely can't
have this control.

So if differentiable,
then maximum of epsilon prime– the maximum of epsilon,
actually, and just compute it. So the probability that M
epsilon is less than epsilon*A is equal to 2 times the
probability of that, the Brownian motion at epsilon
is less than or equal to a. This has normal distribution. And if you normalize
it to N(0, 1), divide by the standard deviation
so you get the square root of epsilon A. As epsilon goes to
0, this goes to 0. That means this goes to half. The whole thing goes to 1. What am I missing? I did something wrong. I flipped it. This is greater. Now, if you combine it,
if it was differentiable, your maximum should have
been less than epsilon*A.

But what we saw here is your
maximum is always greater than that epsilon times A.
With probability 1, you take epsilon goes to 0. Any questions? OK. So those are some
interesting things, properties of Brownian motion
that I want to talk about. I have one final thing,
and this one it's really important theoretically. And also, it will be the main
lemma for Ito's calculus. So the theorem is called
quadratic variation. And it's something that
doesn't happen that often. So let 0– let me write
it down even more clear. Now that's something strange. Let me just first parse
it before proving it. Think about it as just
a function, function f.

What is this quantity? This quantity means that
from 0 up to time T, you chop it up into n pieces. You get T over n, 2T
over n, 3T over n, and you look at the function. The difference between
each consecutive points, record these differences,
and then square it. And you sum it as
n goes to infinity. So you take smaller and smaller
scales take it to infinity. What the theorem says
is for Brownian motion this goes to T, the limit. Why is this something strange? Assume f is a lot
better function. Assume f is continuously
differentiable. That means it's differentiable,
and its differentiation is continuous. Derivative is continuous. Then let's compute the
exact same property, exact same thing. I'll just call this–
maybe i will be better. This time t_i and time t_(i-1),
then the sum over i of f at t_(i+1) minus f at t_i.

If you square it, this is at
most sum from i equal 1 to n, f of t_(i+1) minus f of t_i,
times– by mean value theorem– f prime of s_i. So by mean value theorem, there
exists a point s_i such that f(t_(i+1)) minus f(t_i) is equal
to f prime s_i, times that.

S_i belongs to that interval. Yes. And then you take this term out. You take the maximum, from 0
up to t, f prime of s squared, times i equal 1 to n,
t_(i+1) minus t_i squared. This thing is T over n
because we chopped it up into n intervals. Each consecutive
difference is T over n. If you square it, that's equal
to T squared over n squared. If you had n of them,
you get T squared over n. So you get whatever that maximum
is times T squared over n. If you take n to
infinity, that goes to 0. So if you have a
reasonable function, which is differentiable,
this variation– this is called the quadratic
variation– quadratic variation is 0. So all these classical functions
that you've been studying will not even have this
motion, what's happening is it just bounced back
and forth too much. Even if you scale it
smaller and smaller, the variation is big
enough to accumulate.

They won't disappear like if it
was a differentiable function. And that pretty much– it's
a slightly stronger version than this that it's
not differentiable. We saw that it's
not differentiable. And this a different
way of saying that it's not differentiable. It has very important
implications. And another way to write it is–
so here's a difference of B, it's dB squared is equal to dt. So if you take the
differential– whatever that means– if you take
the infinitesimal difference of each side, this part
is just dB squared, the Brownian motion difference
squared; this part is d of t. And that we'll see again. But before that, let's
just prove this theorem. So we're looking at the sum of
B of t_(i+1), minus B of t_i, squared. Where t of i is i
over n times the time. From 1 to n– 0 to n minus 1. OK.

What's the distribution of this? AUDIENCE: Normal. PROFESSOR: Normal, meaning 0,
variance t_(i+1) minus t_i. But that was just T over n. Is the distribution. So I'll write it like this. You sum from i equal
1 to n minus 1, X_i squared for X_i
is normal variable. OK? And what's the expectation
of X_i squared? It's T squared over n squared. OK. So maybe it's better
to write it like this. So I'll just write it again–
the sum from i equals 0 to n minus 1 of random variables Y_i,
such that expectation of Y_i– AUDIENCE: [INAUDIBLE]. PROFESSOR: Did I make
a mistake somewhere? AUDIENCE: The expected value
of X_i squared is the variance. PROFESSOR: It's T over n. Oh, yeah, you're right. Thank you. OK. So divide by n
and multiply by n.

What is this? What will this go to? AUDIENCE: [INAUDIBLE]. PROFESSOR: No. Remember strong law
of large numbers. You have a bunch of
random variables, which are independent,
identically distributed, and mean T over n. You sum n of them
and divide by n. You know that it just
converges to T over n, just this one number. It doesn't– it's
a distribution, but most of the time
it's just T over n. OK? If you take– that's
equal to T, because these are random variables
accumulating these squared terms. That's what's happened. Just a nice application of
strong law of large numbers, or just law of large numbers.

To be precise,
you'll have to use strong law of large numbers. OK. So I think that's enough
for Brownian motion. And final question? OK. Now, let's move on– AUDIENCE: I have a question. PROFESSOR: Yes. AUDIENCE: So this
[INAUDIBLE], is it for all Brownian motions B? PROFESSOR: Oh, yeah. That's a good question. This is what happens
with probability one. So always– I'll
just say always. It's not a very strict sense. But if you take one path
according to the Brownian motion, in that path
you'll have this. No matter what path you
get, it always happens. AUDIENCE: With probability one. PROFESSOR: With probability one.

So there's a hiding
statement– with probability. And you'll see why you need
this with probability one is because we're using this
probability statement here. But for all practical means,
like with probability one just means always. Now, I want to motivate
Ito's calculus. First of all, this. So now, I was saying that
Brownian motion, at least, is not so bad a model
for stock prices. But if you remember
what I said before, and what people
are actually doing, a better way to
describe it is instead of the differences being a
normal distribution, what we want is the
percentile difference. So for stock prices we want
the percentile difference to be normally distributed. In other words, you want to
find the distribution of S_t such that the difference
of S_t divided by S_t is a normal distribution.

So it's like a Brownian motion. That's the differential
equation for it. So the percentile difference
follows Brownian motion. That's what it's saying. Question, is S_t
equal to e sub B_t? Because in classical calculus
this is not a very absurd thing to say. If you differentiate each
side, what you get is dS_t equals e to the B_t, times dB_t.

That's S_t times dB_t. It doesn't look that wrong. Actually, it looks
right, but it's wrong. For reasons that you
don't know yet, OK? So this is wrong
and you'll see why. First of all, Brownian
motion is not differentiable. So what does it even
mean to say that? And then that means if you
want to solve this equation, or in other words, if you
want to model this thing, you need something else. And that's where Ito's
calculus comes in. OK. I'll try not to rush too much. So suppose– now we're
talking about Ito's calculus– you want to compute. So here is a motivation. You have a function f. I will call it a very
smooth function f. Just think about
the best function you can imagine, like
an exponential function. Then you have a Brownian
motion, and then you apply this function. As an input, you put
the Brownian motion inside the input. And you want to
estimate the outcome. More precisely, you
want to estimate infinitesimal differences.

Why will we want to do that? For example, f can be
the price of an option. More precisely, let
f be this thing. OK. You have some s_0. Up to s_0, the value
of f is equal to 0. After s_0, it's just
a line with slope 1. Then f of Brownian
motion is just the price exercise– what
is it– value of the option at the expiration. T is the expiration time. It's a call option.

That's the call option. So if your stock at time T goes
over s_0, you make that much. If it's below s_0,
you'll lose that much. More precisely, you have
to put it below like that. Let's just do it like that. And it looks like that. So that's like a
financial derivative. You have an underlying
stock and then some function applies to it. And then what you have, the
financial asset you have, actually can be described
as this function. A function of an
underlying stock, that's called financial derivatives. And then in the
mathematical world, it's just a function applied to
the underlying financial asset. And then, of course,
what you want to do is understand the
difference of the value, in terms of the difference
of the underlying asset. If B_t was a very
nice function as well. If B_t was differentiable, then
the classical world calculus tells us that d of f is equal to
d of B_t over d of t times dt. Yes. So if you can differentiate
it over the time difference, over a small time scale.

All we have to do is
understand the differentiation. Unfortunately, we can't do that. We cannot do this. Because we don't know
what– we don't even have this differentiation. OK. Try one, take one
failed, take two. Second try, OK? This is not
differentiable, but still I understand the minuscule
write something, f prime– is equal to
just dB_t of f prime. OK? What is this? We can't differentiate
Brownian motion, but still we understand the
minuscule and infinitesimal difference of the
Brownian motion. So I just gave up trying to
compute the differentiation. But instead, I'm going to just
compute how much the Brownian motion changed over this small
time scale, this difference, and describe the
change of our function in terms of the differentiation
of our function f.

F is a very good function,
so it's differentiable. So we know this. This is computable. This is computable. It's the difference of Brownian
motion over a very small time scale. So that at least
now is reasonable. We can expect it. It might be true. Here, it didn't
make sense at all. Here, it at least make
sense, but it's wrong. And why is it wrong? It's precisely because of this. The reason it's wrong,
the reason it is not valid is because of the fact
dB squared equals dt. And let's see how this comes
into play, this factor.

I think that will be the last
thing that we'll cover today. OK. So if you remember where
you got this formula from, you probably won't remember. But from calculus, this follows
from Taylor's expansion. f of t plus x, I'll say,
is equal to f of t plus f prime of t times x, plus
f double prime of t over 2, times x squared plus– over 3
factorial x cubed plus– df is just this difference. Over a very small
time increase, we want to understand the
difference of the function. That's equal to f
prime t times x. OK. In classical calculus we were
able to ignore all these terms.

So in the classical world f(t+x)
minus f(t) was about f prime t times x. And that's precisely
this formula. But if you use Brownian
motion here– so what I'm trying to say is if
B at some time t plus x, minus Brownian
motion B at time t, then let's just write
down the Taylor formula. We get f prime at B_t. x will be this difference,
B at t plus x minus B at t.

That's like the
difference in B_t. So up to this much
we see this formula. And the next term, we
get the second derivative of this function over
2 and x squared, x plus this difference. So what we get is dB_t squared. OK? But as you saw, this
is no longer ignorable. That is like a
dt, as we deduced. And that comes into play. So the correct– then by
Taylor expansion, the right way to do it is df is equal to the
first derivative term, dB_t, plus the second derivative
term, double prime over 2 dt. This is called Ito's lemma. And now let's say if
you want to remember one thing from the math part,
try to make it this one.

logic it makes sense. It's really amazing how somebody
came up with for the first time because it all makes sense. It all fits together if you
think about it for a long time. But actually, I once
saw that Ito's lemma is one of the most cited
lemmas, like most cited paper. The paper that's
containing this thing. Because people think
it's nontrivial. Of course, there
are facts that are being used more than
this, classical facts, like trigonometric functions,
exponential functions. They are being used
a lot more than this, but people think that's
trivial so they don't cite it in their research and paper. But this, people
respect the result. It's a highly nontrivial result.

And it's really amazing how
just by adding this term, all this theory of calculus
all now fit together. Without this– maybe it's
a too strong statement– but really Brownian motion
becomes much more rich because of this fact. Now we can do calculus with it. So there's two
things to remember. Well, if you want to remember
one thing, that's Ito's lemma. If you want to
remember two things, it's just quadratic variation,
dB_t squared is equal to dt.

And I remember that's
exactly because B_t is like a normal
variable with 0, t. And time scale– B_t is like
a normal random variable 0, t. dB_t squared is like
the variance of it. So it's t, and if you
differentiate it, you get dt. That was exactly
how we computed it. So, yeah, I'll just quickly
go over it again next time just to try to make it