After seeing how we think about ordinary differential

equations in chapter 1, we turn now to an example of a partial differential equation,

the heat equation. To set things up, imagine you have some object

like a piece of metal, and you know how the heat is distributed across it at one moment;

what the temperature of every individual point is. You might think of that temperature here as

being graphed over the body.

The question is, how will that distribution

change over time, as heat flows from the warmer spots to the cooler ones. The image on the left shows the temperature

of an example plate with color, with the graph of that temperature being shown on the right,

both changing with time. To take a concrete 1d example, say you have

two rods at different temperatures, where that temperature is uniform on each one.

You know that when you bring them into contact,

the temperature will tend towards being equal throughout the rod, but how exactly? What will the temperature distribution be

at each point in time? As is typical with differential equations,

the idea is that it’s easier to describe how this setup changes from moment to moment

than it is to jump to a description of the full evolution. We write this rule of change in the language

of derivatives, though as you’ll see we’ll need to expand our vocabulary a bit beyond

ordinary derivatives. Don’t worry, we’ll learn how to read these

equations in a minute.

Variations of the heat equation show up in

many other parts of math and physics, like Brownian motion, the Black-Scholes equations

from finance, and all sorts of diffusion, so there are many dividends to be had from

a deep understanding of this one setup. In the last video, we looked at ways of building

understanding while acknowledging the truth that most differential equations to difficult

to actually solve. And indeed, PDEs tend to be even harder than

ODEs, largely because they involve modeling infinitely many values changing in concert. But our main character now is an equation

we actually can solve. In fact, if you’ve ever heard of Fourier

series, you may be interested to know that this is the physical problem which baby face

Fourier over here was solving when he stumbled across the corner of math now so replete with

his name.

We’ll dig into much more deeply into Fourier

series in the next chapter, but I would like to give at least a little hint of the beautiful

connection which is to come. This animation is showing how lots of little

rotating vectors, each rotating at some constant integer frequency, can trace out an arbitrary

shape. To be clear, what’s happening is that these

vectors are being added together, tip to tail, and you might imagine the last one as having

a pencil at its tip, tracing some path as it goes. This tracing usually won’t be a perfect

replica of the target shape, in this animation a lower case letter f, but the more circles

you include, the closer it gets. This animation uses only 100 circles, and

I think you’d agree the deviations from the real path are negligible. Tweaking the initial size and angle of each

vector gives enough control to approximate any curve you want.

At first, this might just seem like an idle

curiosity; a neat art project but little more. In fact, the math underlying this is the same

as the math describing the physics of heat flow, as you’ll see in due time. But we’re getting ahead of ourselves. Step one is to build up to the heat equation,

and for that let’s be clear on what the function we’re analyzing is, exactly. The heat equation To be clear about what this graph represents,

we have a rod in one-dimension, and we’re thinking of it as sitting on an x-axis, so

each point of the rod is labeled with a unique number, x. The temperature is some function of that position

number, T(x), shown here as a graph above it. But really, since this value changes over

time, we should think of it this a function as having one more input, t for time. You could, if you wanted, think of the input

space as a two-dimensional plane, representing space and time, with the temperature being

graphed as a surface above it, each slice across time showing you what the distribution

looks like at a given moment.

Or you could simply think of the graph of

the temperature changing over time. Both are equivalent. This surface is not to be confused with what

I was showing earlier, the temperature graph of a two-dimensional body. Be mindful of whether time is being represented

with its own axis, or if it’s being represented with an animation showing literal changes

over time. Last chapter, we looked at some systems where

just a handful of numbers changed over time, like the angle and angular velocity of a pendulum,

describing that change in the language of derivatives. But when we have an entire function changing

with time, the mathematical tools become slightly more intricate. Because we’re thinking of this temperature

as a function with multiple dimensions to its input space, in this case, one for space

and one for time, there are multiple different rates of change at play.

There’s the derivative with respect to x;

how rapidly the temperature changes as you move along the rod. You might think of this as the slope of our

surface when you slice it parallel to the x-axis; given a tiny step in the x-direction,

and the tiny change to temperature caused by it, what’s the ratio. Then there’s the rate of change with time,

which you might think of as the slope of this surface when we slice it in a direction parallel

to the time axis. Each one of these derivatives only tells part

of the story for how the temperature function changes, so we call them “partial derivatives”.

To emphasize this point, the notation changes

a little, replacing the letter d with this special curly d, sometimes called “del”. Personally, I think it’s a little silly

to change the notation for this since it’s essentially the same operation. I’d rather see notation which emphasizes

the del T terms in these numerators refer to different changes. One refers to a small change to temperature

after a small change in time, the other refers to the change in temperature after a small

step in space. To reiterate a point I made in the calculus

series, I do think it's healthy to initially read derivatives like this as a literal ratio

between a small change to a function's output, and the small change to the input that caused

it. Just keep in mind that what this notation

is meant to convey is the limit of that ratio for smaller and smaller nudges to the input,

rather than for some specific finitely small nudge. This goes for partial derivatives just as

it does for ordinary derivatives. The heat equation is written in terms of these partial derivatives.

It tells us that the way this function changes with respect to time depends on how it changes with respect to space. More specifically, it's proportional to the second partial derivative with respect to x. At a high level, the intuition is that at

points where the temperature distribution curves, it tends to change in the direction

of that curvature. Since a rule like this is written with partial

derivatives, we call it a partial differential equation. This has the funny result that to an outsider,

the name sounds like a tamer version of ordinary differential equations when to the contrary

partial differential equations tend to tell a much richer story than ODEs. The general heat equation applies to bodies

in any number of dimensions, which would mean more inputs to our temperature function, but

it’ll be easiest for us to stay focused on the one-dimensional case of a rod. As it is, graphing this in a way which gives

time its own axis already pushes the visuals into three-dimensions.

But where does an equation like this come

from? How could you have thought this up yourself? Well, for that, let’s simplify things by

describing a discrete version of this setup, where you have only finitely many points x

in a row. This is sort of like working in a pixelated

universe, where instead of having a continuum of temperatures, we have a finite set of separate

values. The intuition here is simple: For a particular

point, if its two neighbors on either side are, on average, hotter than it is, it will

heat up.

If they are cooler on average, it will cool

down. Focus on three neighboring points, x1, x2,

and x3, with corresponding temperatures T1, T2, and T3. What we want to compare is the average of

T1 and T3 with the value of T2. When this difference is greater than 0, T2

will tend to heat up. And the bigger the difference, the faster

it heats up. Likewise, if it’s negative, T2 will cool

down, at a rate proportional to the difference. More formally, the derivative of T2, with

respect to time, is proportional to this difference between the average value of its neighbors

and its own value. Alpha, here, is simply a proportionality constant. To write this in a way that will ultimately

explain the second derivative in the heat equation, let me rearrange this right-hand

side in terms of the difference between T3 and T2 and the difference between T2 and T1. You can quickly check that these two are the

same.

The top has half of T1, and in the bottom,

there are two minuses in front of the T1, so it’s positive, and that half has been

factored out. Likewise, both have half of T3. Then on the bottom, we have a negative T2

effectively written twice, so when you take half, it’s the same as the single -T2 up

top. As I said, the reason to rewrite it is that

it takes a step closer to the language of derivatives.

Let’s write these as delta-T1 and delta-T2. It’s the same number, but we’re adding

a new perspective. Instead of comparing the average of the neighbors

to T2, we’re thinking of the difference of the differences. Here, take a moment to gut-check that this

makes sense. If those two differences are the same, then

the average of T1 and T3 is the same as T2, so T2 will not tend to change. If delta-T2 is bigger than delta-T1, meaning

the difference of the differences is positive, notice how the average of T1 and T3 is bigger

than T2, so T2 tends to increase. Likewise, if the difference of the differences

is negative, meaning delta-T2 is smaller than delta-T1, it corresponds to the average of

these neighbors being less than T2. This is known in the lingo as a “second

difference”. If it feels a little weird to think about,

keep in mind that it’s essentially a compact way of writing this idea of how much T2 differs

from the average of its neighbors, just with an extra factor of 1/2 is all. That factor doesn’t really matter, because

either way we’re writing our equation in terms of some proportionality constant.

The upshot is that the rate of change for

the temperature of a point is proportional to the second difference around it. As we go from this finite context to the infinite

continuous case, the analog of a second difference is the second derivative. Instead of looking at the difference between

temperature values at points some fixed distance apart, you consider what happens as you shrink

this size of that step towards 0. And in calculus, instead of asking about absolute

differences, which would approach 0, you think in terms of the rate of change, in this case,

what’s the rate of change in temperature per unit distance. Remember, there are two separate rates of

change at play: How does the temperature as time progresses, and how does the temperature

change as you move along the rod.

The core intuition remains the same as what

we just looked at for the discrete case: To know how a point differs from its neighbors,

look not just at how the function changes from one point to the next, but at how that

rate of change changes. This is written as del^2 T / del-x^2, the

second partial derivative of our function with respect to x. Notice how this slope increases at points

where the graph curves upwards, meaning the rate of change of the rate of change is positive. Similarly, that slope decreases at points

where the graph curves downward, where the rate of change of the rate of change is negative. Tuck that away as a meaningful intuition for

problems well beyond the heat equation: Second derivatives give a measure of how a value

compares to the average of its neighbors. Hopefully, that gives some satisfying added

color to this equation. It’s pretty intuitive when reading it as

saying curved points tend to flatten out, but I think there’s something even more

satisfying seeing a partial differential equation arise, almost mechanistically, from thinking

of each point as tending towards the average of its neighbors.

Take a moment to compare what this feels like

to the case of ordinary differential equations. For example, if we have multiple bodies in

space, tugging on each other with gravity, we have a handful of changing numbers: The

coordinates for the position and velocity of each body. The rate of change for any one of these values

depends on the values of the other numbers, which we write down as a system of equations. On the left, we have the derivatives of these

values with respect to time, and the right is some combination of all these values. In our partial differential equation, we have

infinitely many values from a continuum, all changing.

And again, the way any one of these values

changes depends on the other values. But helpfully, each one only depends on its

immediate neighbors, in some limiting sense of the word neighbor. So here, the relation on the right-hand side

is not some sum or product of the other numbers, it’s also a kind of derivative, just a derivative

with respect to space instead of time. In a sense, this one partial differential

equation is like a system of infinitely many equations, one for each point on the rod. When your object is spread out in more than

one dimension, the equation looks quite similar, but you include the second derivative with

respect to the other spatial directions as well. Adding all the second spatial second derivatives

like this is a common enough operation that it has its own special name, the “Laplacian”,

often written as an upside triangle squared.

It’s essentially a multivariable version

of the second derivative, and the intuition for this equation is no different from the

1d case: This Laplacian still can be thought of as measuring how different a point is from

the average of its neighbors, but now these neighbors aren’t just to the left and right,

they’re all around. I did a couple of simple videos during my

time at Khan Academy on this operator, if you want to check them out. For our purposes, let’s stay focused on

one dimension. If you feel like you understand all this,

pat yourself on the back. Being able to read a PDE is no joke, and it’s

a powerful addition to your vocabulary for describing the world around you. But after all this time spent interpreting

the equations, I say it’s high time we start solving them, don’t you? And trust me, there are few pieces of math

quite as satisfying as what poodle-haired Fourier over here developed to solve this

problem. All this and more in the next chapter. I was originally inspired to cover this particular

topic when I got an early view of Steve Strogatz’s new book “Infinite Powers”.

This isn’t a sponsored message or anything

like that, but all cards on the table, I do have two selfish ulterior motives for mentioning

it. The first is that Steve has been a really

strong, perhaps even pivotal, advocate for the channel since its beginnings, and I’ve

had the itch to repay the kindness for quite a while. The second is to make more people love math. That might not sound selfish, but think about

it: When more people love math, the potential audience base for these videos gets bigger. And frankly, there are few better ways to

get people loving the subject than to expose them to Strogatz’s writing. If you have friends who you know would enjoy

the ideas of calculus, but maybe have been intimidated by math in the past, this book

really does an outstanding job communicating the heart of the subject both substantively

and accessibly.

Its core theme is the idea of constructing

solutions to complex real-world problems from simple idealized building blocks, which as

you’ll see is exactly what Fourier did here. And for those who already know and love the

subject, you will still find no shortage of fresh insights and enlightening stories. Again, I know that sounds like an ad, but

it’s not. I actually think you’ll enjoy the book..