# 21. Stochastic Differential Equations

The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: So far
we had a function, and then we differentiated to
get an equation of this type. But now, we're
given this equation. And we have to go
backwards, want to find the stochastic process
that satisfies this equation. So goal is to find
a stochastic process X(t) satisfying this equation. In other words, we want X of
t to be the integral of mu dS plus sigma… The goal is clear. We want that. And so these type
of equations are called differential equations,
I hope you already know that. Also for PDE, partial
differential equations, so even when it's
not stochastic, these are not easy problems. At least not easy
in the sense that, if you're given an
equation, typically you don't expect to have a
closed form solution. So even if you find this
X, most of the time, it's not in a very good form.

Still, a very important
result that you should first note before trying to solve any
of the differential equation is that, as long as mu and
sigma are reasonable functions, there does exist a solution. And it's unique. So we have the same
correspondence with this PDE. You're given a PDE, or given
a differential equation, not a stochastic
differential equation, you know that, if you're given
a reasonable differential equation, then a
solution exists. And it's unique. So the same principle
holds in stochastic world. Now, let me state it formally. This stochastic equation star
has a solution that is unique, of course with boundary
condition– has a solution. And given the
initial points– so if you're given the initial
point of a stochastic process, then a solution is unique. Just check, yes– as long as
mu and sigma are reasonable. One way it can be reasonable,
if it satisfies this conditions. These are very
technical conditions. But at least let me parse
to you what they are. They say, if you fix a time
coordinate and you change x, you look at the difference
between the values of mu when you change the second
variable and the sigma.

Then the change in your function
is bounded by the distance between the two points
by some constant K. So mu and sigma cannot change
too much when you change your space variable
a little bit. It can only change up to
the distance of how much you change the coordinate. So that's the first condition. Second condition says,
when you grow your x, very similar condition. Essentially it says it
cannot blow up too fast, the whole thing. This one is something about the
difference between two values. This one is about how it expands
as your space variable grows.

These are technical conditions. And many cases, they will hold. So don't worry too much about
the technical conditions. Important thing here is that,
given a differential equation, you don't expect to
have a good closed form. But you do expect to have
a solution of some form. OK, let's work
out some examples. Here is one of the few
stochastic differential equations that can be solved. This one can be solved. And you already know what X is.

But let's pretend you
don't know what X is. And let's try to solve it. I will show you
an approach, which can solve some differential
equations, some SDEs. But this really won't
happen that much. Still, it's like
a starting point. There's that. And assume X(0) [INAUDIBLE]. And mu, sigma are constants. Just like when solving
differential equations, first thing you'll do is
just guess, suppose, guess. If you want this to happen,
then d of X of t is– OK. x is just the second variable. And then these
two have to match. That has to be equal to that. That has to be equal to that. So we know that del f
over del t, mu of X of t is equal to mu of f. So we assumed that this is a
solution then differentiated that. And then, if that's a solution,
these all have to match. You get this equation. If you look at
that, that tells you that f is an exponential
function in the x variable.

So it's e to the sigma times x. And then we have a constant
term, plus some a times a function of t. The only way it can happen
is if it's in this form. So it's exponential
function in x. And in the time variable,
it's just some constant. When you fix a t,
it's a constant. And when you fix
a t and change x, it has to look like an
exponential function. It has to be in this form,
just by the second equation. Now, go back to this equation. What you get is partial f over
partial t is now a times g prime of t, f… AUDIENCE: Excuse me. PROFESSOR: Mm-hm. AUDIENCE: That one in
second to last line, yeah. So why is it minus
mu there at the end? PROFESSOR: It's equal. AUDIENCE: Oh, all right. PROFESSOR: Yeah. OK, and then let's plug it in. So we have a of g prime
t, f, plus 1/2 of sigma square f equals mu of f. In other words, a of g
prime of t is mu minus…

Square. g(t) is mu times some
constant c_1 plus c_2. OK, and then what we got is
original function f of t of x is e to the sigma*x plus mu
minus 1 over 2 sigma square t plus some constant. And that constant can
be chosen because we have the initial
condition, x of 0 equals 0. That means if t is equal to 0,
f(0, 0) is equal to e to the c. That has to be x_0. In different words, this is just
x_0 times e to the sigma*x plus mu minus 1 over 2
sigma square of t. Just as we expected,
we got this. So some stochastic
differential equations can be solved by analyzing it. But I'm not necessarily saying
that this is a better way to do it than just guessing. Just looking at
it, and, you know, OK, it has to be
exponential function.

You know what? I'll just figure
out what these are. And I'll come up
with this formula without going through
all those analysis. I'm not saying that's a
worse way than actually going through the analysis. Because, in fact, what we
did is we kind of already knew the answer and are
fitting into that answer. Still, it can be
one approach where you don't have a reasonable
guess of what the X_t has to be, maybe try to break it
down into pieces like this and backtrack to figure
out the function. Let me give you one
more example where we do have an explicit solution. And then I'll move
on and show you how to do when there
is no explicit solution or when you don't know how
to find an explicit solution. Maybe let's keep that there. Second equation is called this. What's the difference? The only difference is that
you don't have an X here.

So previously, our
main drift term also was proportional
to the current value. And the error was also
dependent on the current value or is proportional
to the current value. But here now the drift
term is something like an exponential–
minus exponential. But still, it's proportional
to the current value. But the error term
is just some noise. Irrelevant of what
the value is, this has the same variance as the error. So it's a slightly
different– oh, what is it? It's a slightly
different process. And it's known as
Ornstein-Uhlenbeck process. And this is used to model
a mean-reverting stochastic process. For alpha greater than 0, notice
that if X deviates from 0, this gives a force that drives
the stochastic process back to 0, that's
negatively proportional to your current value.

So yeah, this is used to model
some mean-reverting stochastic processes. And they first used it to
study the behavior of gases. I don't exactly see why. But that's what they say. Anyway, so this is
another thing that can be solved by doing
similar analysis. But if you try the same
method, it will fail. So as a test function, your
guess, initial guess, will be– or a(0) is equal to 1. Now, honestly, I don't know
how to come up with this guess. Probably, if you're
really experienced with stochastic
differential equations, you'll see some
form, like you'll have some feeling on how
this process will look like. And then try this, try that,
and eventually something might succeed. That's the best
explanation I can give. I can't really
give you intuition why that's the right guess.

Given some stochastic
differential equation, I don't know how to
say that you should start with this
kind of function, this kind of function. And it was the same
when, if you remember how we solved ordinary
differential equations or partial differential
equations, most of the time there is no good guess. It's only when your given
formula has some specific form such a thing happens. So let's see what happens here. That was given. Now, let's do exactly
the same as before. Differentiate it,
and let me go slow. So we have a prime of t. By chain rule, a prime
of t and that value, that part will be equal
to X(t) over a(t). This is chain rule. So I differentiate
that to get a prime t. That stays just as it was. But that can be rewritten
as X(t) divided by a(t).

And then plus a(t) times the
differential of that one. And that is just b(t)*dB(t). You don't have to differentiate
that once more, even though it's stochastic
calculus, because that's a very subtle point. And there's also one exercise
about it in your homework. But when you have a given
stochastic process written already in this integral
form, if we remember the definition of an integral,
at least how I defined it, is that it was an inverse
operation of a differential.

So when you differentiate
this, you just get that term. What I'm trying to
say is, there is no term, no term where you have
to differentiate this one more. Prime dt, something like
that, we don't have this term. This can be confusing. But think about it. Now, we laid it out
and just compare. So minus alpha of X of
t is equal to a prime t over a(t) X(t).

And your second term,
sigma*dB(t) is equal to a(t) times b(t). But these two cancel. And we see that a(t) has to
be e to the minus alpha t. This says that is
an exponential. Now, plug it in here. You get b(t). And that's it. So plug it back in. X of t is e to the minus alpha*t
of x of 0 plus 0 to t sigma e to the alpha*s. So this is a variance,
expectation is 0, because that's a
Brownian motion. This term, as we
expected, as time passes, goes to 0, exponential decay. And that is kind of hinted by
this fact, the mean reversion. So if you start from some
value, at least the drift term will go to 0 quite quickly. And then the important
term will be the noise term or the variance term. Any questions? And I'm really emphasizing
a lot of times today, but really you can
forget about what I did in the past two
boards, this board and the previous board. Because most of the
times, it will be useless.

And so now I will
describe what you'll do if you're given a stochastic
differential equation, and you have a computer
in front of you. What if such method fails? And it will fail
most of the time. That's when we use
these techniques called finite difference method,
Monte Carlo simulation, or tree method. The finite difference
method, you probably already saw it, if you took a
differential equation course.

But let me review it. This is for PDEs or
ODEs, for ODE, PDE, not stochastic
differential equations. But it can be adapted to work
for stochastic differential equations. So I'll work with an example. Let u of t be u prime
of t plus– u prime of t be u(t) plus 2,
u_0 is equal to 0. Now, this has an exact solution. But let's pretend that
there is no exact solution. And if you want to
do it numerically, you want to find the value of
u equals– u(1) numerically. And here's what
you're going to do. You're going to chop up
the interval from 0 to 1 into very fine pieces. So from 0 to 1, chop it
down into tiny pieces. And since I'm in
front of a blackboard, my tiniest piece will
be 1 over 2 and 1. I'll just take two steps. But you should think
of it as really repeating this a lot of times. I'll call my step to be h. So in my case, I'm increasing my
steps by 1 over 2 at each time.

So what is u of 1 over 2? Approximately, by
Taylor's formula, it's u(0) plus 1/2
times u prime of 0. That's Taylor approximation. OK, u(0) we already know. It's given to be equal to 0. u prime of 0, on the other hand,
is given by this differential equation. So it's 1 over 2 times
5 times u(0) plus 2. u(0) is 0. So we get equal to 1. Like we have this value
equal to 1, approximately. I don't know what happens. But it will be close to 1. And then for the next thing, u1. This one is, again by Taylor
approximation, is u of 1 over 2 plus 1 over 2 u
prime of 1 over 2. And now you know the value of u
of 1 over 2, approximate value, by this. So plug it in. You have 1 plus 1 over 2 and,
again, 5 times u(1/2) plus 2.

If you want to do
the computation, it should give 9 over 2. It's really simple. The key idea here
is just u prime is given by an
equation, this equation. So you can compute it once
you know the value of u at that point. And basically, the method is
saying take h to be very small, like 1 over 100. Then you just repeat
it 1 over 100 times. So the equation is the
i plus 1 step value can be approximated from
the i-th value plus h times u prime of h. Now repeat it and repeat it.

And you reach u of 1. And there is a
theorem saying, again, if the differential
equation is reasonable, then that will approach the
true value as you take h to be smaller and smaller. That's called the
finite difference method for differential equations. And you can do the exact
same thing for two variables, let's say. And what we showed was for one
variable, finite difference method, we want to find the
value of u, function u of t.

We took values at 0, h, 2h, 3h. Using that, we did some
approximation, like that, and found the value. Now, suppose we want
to find, similarly, a two-variable function,
let's say v of t and x. And we want to find
the value of v of 1, 1. Now the boundary
conditions are these. We already know
these boundaries. I won't really show
you by example. But what we're going to do
now is compute this value based on these two variables. So it's just the same.

Taylor expansion
for two variables will allow you to compute this
value from these two values. Then compute this from these
two, this from these two, and just fill out the
whole grid like that, just fill out layer by layer. At some point, you're
going to reach this. And then you'll have an
approximate value of that. So you chop up your
domain into fine pieces and then take the limit. And most cases, it will work. Why does it not work for
stochastic differential equations? Kind of works, but
the only problem is we don't know which
value we're looking at, we're interested in. So let me phrase it a
little bit differently. You're given a differential
equation of the form dX equals mu dt plus t dB of t
and time variable and space variable. Now, if you want to compute your
value at time 2h based on value h, in this picture, I
told you that this point came from these two points. But when it's stochastic, it
could depend on everything. You don't know
where it came from. This point could
have come from here. It could have came from here. It could have came from
here, came from here. You don't really know. But what you know is you have
a probability distribution. So what I'm trying
to say is now, if you want to adapt
this method, what you're going to do is take a
sample Brownian motion path. That means just, according
to the distribution of the Brownian motion, take
one path and use that path. Once we fix a path,
once a path is fixed, we can exactly know where
each value comes from. We know how to backtrack. That means, instead of
all these possibilities, we have one fixed
possibility, like that. So just use that finite
difference method with that fixed path.

That will be the idea. Let me do it a little
bit more formally. And here is how it works. If we have a fixed sample path
for Brownian motion of B(t), then X at time i plus 1 of h
is approximately equal to X at time a of h plus h times
dx at that time i of h, just by the exact
same Taylor expansion. And then d of X we know to
be– that is equal to mu of dt plus– oh, mu of dt is h. So let me write it like that,
sigma times d of X– dB. And these mu depend on
[? their paths, ?] x at i of h dt, sigma… With that, here to here
is Taylor expansion. Here to here I'm going
to use the differential equation d of X is equal
to mu dt plus sigma dB(t). Yes? AUDIENCE: Do we need
that h for [INAUDIBLE]? PROFESSOR: No, we
don't actually.

Oh, yeah, I was–
thank you very much. That was what confused me. Yes, thank you very much. And now we can
compute everything. This one, we're assuming
that we know the value. That one can be computed
from these two coordinates. Because we now have a fixed path
X, we know what X of i*h is. dt, we took it to be
h, approximated as h, or time difference. Again, sigma can be computed. dB now can be computed from B_t. Because we have a
fixed path, again, we know that it's equal to B of i
plus 1 of h minus B of i of h, with this fixed path. They're basically
exactly the same, if you have a fixed
path B. The problem is we don't have a fixed path
B. That's where Monte Carlo simulation comes in. So Monte Carlo simulation
is just a way to draw, from some probability
distribution, a lot of samples. So now, if you know how to
draw samples from the Brownian motions, then what you're going
to do is draw a lot of samples. For each sample, do this to
compute the value of X(0), can compute X of 1.

So, according to a different B,
you will get a different value. And in the end, you'll obtain
a probability distribution. So by repeating the
experiment, that means just redraw the path again
and again, you'll get different values of X of 1. That means you get a
distribution of X of 1, obtain distribution of X of 1. And that's it. And that will approach the
real distribution of X of 1. So that's how you numerically
solve a stochastic differential equation. Again, there's this
finite difference method that can be used to solve
differential equations. But the reason it doesn't apply
to stochastic differential equations is because there's
underlying uncertainty coming from Brownian motion. However, once you fix
a Brownian motion, then you can use that
finite difference method to compute X of 1.

So based on that
idea, you just draw a lot of samples of
the Brownian path, compute a lot of
values of X of 1, and obtain a probability
distribution of X of 1. That's the underlying principle. And, of course, you
can't do it by hand. You need a computer. Then, what is tree method? That's cool. Tree method is
based on this idea.

Remember, Brownian motion is
a limit of simple random walk. This gives you a kind
of approximate way to draw a sample from
Brownian motions. How would you do that? At time 0, you have 0. At time really tiny h,
you'll have plus 1 or minus 1 with same probability. And it goes up or down again,
up or down again, and so on. And you know exactly the
probability distribution. So the problem is that
it ends up here as 1/2, ends up here as 1/2,
1/4, 1/2, 1/4, and so on. So instead of drawing from
this sample path, what you're going to do is just compute
the value of our function at these points. But then the probability
distribution, because we know the
probability distribution that the path will end
up at these points, suppose that you computed
all these values here. I draw too many, 1, 2, 3, 4, 5. This 1 or 32 probability here. 5 choose 1, 5 over 32, 5
choose 2 is 10 over 32. Suppose that some stochastic
process, after following this, has value 1 here, 2 here,
3 here, 4, 5, and 6 here. Then, approximately, if
you take a Brownian motion, it will have 1
with probability 1 over 32, 2 with probability
5 over 32, and so on.

Maybe I didn't
explain it that well. But basically, tree
method just says, you can discretize the outcome
of the Brownian motion, based on the fact that it's a
limit of simple random walk. So just do the exact same method
for simple random walk instead of Brownian motion. And then take it to the limit. That's the principle. Yeah. Yeah, I don't know what's
being used in practice.

But it seems like these two
are the more important ones. This is more like if you
want to do it by hand. Because you can't really do
every single possibility. That makes you only
a finite possibility. Any questions? Yeah. AUDIENCE: So here you
said, by repeating the experiment we
get [INAUDIBLE] distribution for X(1). I was wondering if we could also
get the distribution for not just X(1) but also for X(i*h). PROFESSOR: All the
intermediate values? AUDIENCE: Yeah. PROFESSOR: Yeah,
but the problem is we're taking
different values of h. So h will be
smaller and smaller. But for those
values that we took, yeah, we will get
some distribution. AUDIENCE: Right, so we
might have distributions for X of d for many
different points, right? PROFESSOR: Yeah. AUDIENCE: Yeah. So maybe we could
uh– right, OK.

PROFESSOR: But one thing
you have to be careful is let's suppose you
take h equal 1 over 100. Then, this will give
you a pretty fairly good approximation for X of 1. But it won't give you
a good approximation for X of 1 over 50. So probably you can also get
distribution for X of 1 over 3, 1 over 4. But at some point, the
approximation will be very bad. So the key is to
choose a right h. Because if you pick
h to be too small, you will have a very
good approximation to your distribution. But at the same time, it
will take too much time to compute it. Any remarks from a
more practical side? OK, so that's
actually all I wanted to say about stochastic
differential equations. Really the basic
principle is there is such thing called stochastic
differential equation. It can be solved. But most of the time, it won't
have a closed form formula. And if you want to
do it numerically, here are some possibilities. But I won't go
any deeper inside.

So the last math lecture I will
conclude with heat equation. Yeah. AUDIENCE: The mean
computations of [INAUDIBLE], some of the derivatives are
sort of path-independent, or have path-independent
solutions, so that you
basically are looking at say the distribution
at the terminal value and that determines the
price of the derivative. There are other derivatives
where things really are path-dependent, like
with options where you have early exercise possibilities. When do you exercise,
early or not? Then the tree methods are really
good because at each element of the tree you can condition
on whatever the path was.

So keep that in mind,
that when there's path dependence in
the problem, you'll probably want to use
one of these methods. PROFESSOR: Thanks. AUDIENCE: I know that if
you're trying to break it down into simple random walks you
can only use [INAUDIBLE]. But I've heard of people
trying to use, instead of a binomial, a trinomial tree. PROFESSOR: Yes, so
this statement actually is quite a universal statement. Brownian motion is a limit
of many things, not just simple random walk. For example, if you take
plus 1, 0, or minus 1 and take it to the
limit, that will also converge to the Brownian motion. That will be the
trinomial and so on. And as Peter said,
if you're going to use tree method
to compute something, that will increase accuracy,
if you take more possibilities at each step. Now, there is two
ways to increase the accuracy is take more
possibilities at each step or take smaller time scales. OK, so let's move on to the
final topic, heat equation.

Heat equation is not a
stochastic differential equation, first of all. It's a PDE. That equation is known
as a heat equation where t is like
the time variable, x is like the space variable. And the reason we're
interested in this heat equation in this
course is, if you came to the previous lecture,
maybe from Vasily last week, Black-Scholes equation,
after change of variables, can be reduced to heat equation. That's one reason
we're interested in it. And this is a really,
really famous equation also in physics. So it was known before
Black-Scholes equation. Particularly, this can be
a model for– equations that model this situation. So you have an infinite
bar, very long and thin.

It's perfectly insulated. So heat can only travel
along the x-axis. And then at time 0, you
have some heat distribution. At time 0, you know
the heat distribution. Then this equation tells you the
behavior of how the heat will be distributed at time t. So u of t of x, for fixed
t, will be the distribution of the heat over the x-axis. That's why it's called
the heat equation. That's where the
name comes from. And this equation is
very well understood. It does have a
closed-form solution. And that's what I
want to talk about.

OK, so few observations
before actually solving it. Remark one, if u_1 and u_2
satisfies heat equation, then u_1 plus u_2 also
satisfies, also does. That's called linearity. Just plug it in. And you can figure it out. More generally
that means, if you integrate a family of functions
ds, where u_s all satisfy star, then this also
satisfies star, as long as you use reasonable function.

I'll just assume that we can
switch the order of integration and differentiation. So it's the same thing. Instead of summation,
I'm taking integration of lot of solutions. And why is that helpful? This is helpful
because now it suffices to solve for– what is it? Initial condition, u of t of x
equals delta, delta function, of 0. That one is a little bit subtle. The Dirac delta function is
just like an infinite ass at x equals 0.

It's 0 everywhere else. And basically, in this
example, what you're saying is, at time 0, you're putting
like a massive amount of heat at a single point. And you're observing what's
going to happen afterwards, how this heat will spread out. If you understand that,
you can understand all initial conditions. Why is that? Because if u sub s t, x–
u_0– is such solution, then integration of–
let me get it right– u of t of s minus x ds is a
solution with initial condition x(0, x).

What is it? Sorry about that. So this is really the key. If you have a solution to the
Dirac delta initial condition, then you can superimpose
a lot of those solutions to obtain a solution for
arbitrary initial condition. So this is based on that
principle, because each of them is now a solution. If you superpose this,
then that is a solution. And then if you plug
it in, you figure out that actually it has satisfied
this initial condition. That was my first observation. Second observation,
second remark, is for the initial value u(0, x)
equals a Dirac delta function, u of– is a solution. So we know the solution
for the Dirac delta part. First part, we figured out
that if we know the solution for the Dirac delta
function, then we can solve it for every
single initial value. And for the initial
value Dirac delta, that is the solution that solves
the differential equation. So let me say a few words

Have you seen this
equation before? It's the p.d.f. of
normal distribution. So what does it mean? It means, in this example,
if you have a heat traveling along the x-axis,
perfectly insulated, if you put a massive heat
at this 0, at one point, at time 0, then at
time t your heat will be distributed according
to the normal distribution. In other words, assume that
you have a bunch of particle. Heat is just like a
bunch of particles, say millions of particles
at a single point. And then you grab it. And then time t equals
0 you release it. Now the particle at time t
will be distributed according to a normal distribution. In other words, each particle
is like a Brownian motion. So for particle by
particle, the location of its particle at time t
will be kind of distributed like a Brownian motion. So if you have a massive
amount of particles, altogether their
distribution will look like a normal distribution. That's like its content. So that's also one way you see
the appearance of a Brownian motion inside of this equation.

It's like a bunch of Brownian
motions happening together at the exact same time. And now we can just
write down the solution. Let me be a little
bit more precise. OK, for the heat equation
delta u over delta t, with initial value u of 0,
x equals some initial value, let's say v of x, and t
greater than equal to 0. The solution is
given by integration. u at t of x is equal to e to
the minus– let me get it right. Basically, I'm just combining
this solution to there.

Plugging in that
here, you get that. So you have a explicit
solution, no matter what the initial conditions
are, initial conditions are given as, you can
find an explicit solution at time t for all x. That means, once you change
the Black-Scholes equation into the heat equation, you
now have a closed-form solution for it. In that case, it's like
a backward heat equation. And what will happen is
the initial condition you should think of as
a final payout function. The final payout
function you integrate according to this distribution. And then you get the
value at time t equals 0. The detail, one of
the final project is to actually carry
out all the details. So I will stop here. Anyway, we didn't see how the
Black-Scholes equation actually changed into heat equation. If you want to do
that project, it will be good to
have this in mind. It will help. Any questions? OK, so I think
that's all I have. I think I'll end a
little bit early today. So that will be the last math
lecture for the semester. From now on you'll only
have application lectures. There are great lectures
coming up, I hope, and I know.

So you should come
and really enjoy now. You went through
all this hard work. Now it's time to enjoy.