I want to introduce you to this video with the idea of covariance between two random variables or more precisely, the product of the distances of each of the random variables to its mean or to its mathematical expectation. Let me write this down. First I have X – I will do it in another color. This is the value of the random variable X minus X's mathematical expectation. You can look at this as the average value of the general population of X multiplied …

then this is a random variable y … at the distance from Y to its mathematical expectation to the mean value of Y for the general population. If it doesn't seem logical to you yet, one can always imagine it like some game with some numbers. But in reality it says how much they differ from each other. You always take X and Y for each of the data points. Let's say we have the whole population. Each X and Y that are connected to each other, these are the coordinates you put here. Let's say X is above average, and Y is below average. Let's say this about the general population you have. An example of random variables you take a sample from the universe and you get that X = 1 and Y is … let's say Y = 3. Let's say we know in advance that the predicted value of X is 0. Let's say that the predicted value of Y is equal to 4. What happens in this situation? Now we don't know all the covariance, we have only one sample of this random variable. But what is happening here? We have one minus … we will not calculate the full estimated value, I just want to calculate what's going on, when we perform actions within the predicted value.

We will have 1 minus 0, so 1 over (3 – 4) over -1. You will have 1 over -1, which is -1. What does this tell us? He tells us, at least for this sample, this time, when we subtracted the random variables X and Y, X was above its estimated value when Y was below its projected value. If we continue to do this, say, for the whole population, then it would be logical that they will have negative covariance. When one rises, the other decreases. When one decreases, the other rises. If they both went up, they would have a positive covariance, or if both decreased. The extent to which they do this together, will tell you the degree of covariance.

Hopefully this shows you the logic of what covariance is trying to tell us. But the more important thing I want to do in this video, is to connect this formula. I want to connect this definition of covariance with everything we did when determining the regression by the method of least squares. It's a kind of math fun – showing all of these links and where is the definition of covariance it really becomes useful. I think this is largely determined by the place where they appear in the regressions. All this already a kind we've seen it before, you'll just see it differently. Throughout this video you just will rework this definition of covariance here.

It will be the same thing as the mathematical expectation of … and I'm going to multiply those two binomials here. The mathematical expectation of the random variable X on the random variable Y minus – I will do X first. Plus X on the negative value of the mathematical expectation of Y. I'll just say minus X on the mathematical expectation of Y. This negative sign comes from this negative sign here. Then we have minus the mathematical expectation of X by Y …

We just open the brackets and multiply. And then, finally, is the negative value of X's mathematical expectation by the negative value of the mathematical expectation of Y. Negative signs are mutually exclusive. You'll just have the plus of X's mathematical expectation according to the mathematical expectation of Y. And of course, this is the mathematical expectation of this whole thing. Let's see if we can write this differently. the mathematical expectation of the sum of a group of random variables or the sum and difference of a group of random variables, this is the sum or difference of their mathematical expectations. It will be the same thing. Remember, the mathematical expectation, in many contexts, you can look at it as the arithmetic mean. Or, in a continuous distribution, you can look at it as a probabilistic weighted sum, or a probability-weighted integral. I think we've seen this before. Let me transform that. This is equal to the mathematical expectation of random variables X and Y. X multiplied by Y. I try to make them in the appropriate colors.

Then we have minus X on the mathematical expectation of Y. Then we will have minus the mathematical expectation of X according to the mathematical expectation of Y. Then you will have minus the mathematical expectation on this, I will close this bracket, of this thing here. Mathematical expectation of X by Y. I know this can seem confusing to all of these placed in parentheses mathematical expectations. But one way to imagine it is by thinking that these things (E's) already have values for mathematical expectation and you can look at them as numbers. We already used them. We will remove them from the mathematical expectation, because the mathematical expectation of a mathematical expectation is the same thing as mathematical expectation itself. Let me write this down here to remind you. The mathematical expectation of X's mathematical expectation will be equal to X's mathematical expectation. Imagine it this way. You can look at this as an average of the population for the random variable. That will be a known fact, he said in the universe.

The mathematical expectation from this will be equal to the mathematical expectation itself. If the mean of the population or the mathematical expectation is X is 5 – it is like saying that the mathematical expectation is 5. The mathematical expectation of 5 will be 5, which is the same thing as X's mathematical expectation. I hope this sounds logical to you, it will we use it after a while. We are almost ready. We found the mathematical expectation of this and we have one member left. The last article is a mathematical expectation of this. Here we can use the property from the beginning. I will write it down. mathematical expectation of (I put big brackets) on this thing here. The mathematical expectation of X by the mathematical expectation of Y. Let's see if we can simplify this here.

This will be a mathematical expectation of the product of these two random variables. I will leave it as it is. I'll just freeze the things that I will leave as they are. The predicted value of XY. What do we have here? We have a mathematical expectation of X over Y – again, you can look at this as … if you go back to what we just said, this will be a number, the mathematical expectation of Y, so we can eliminate that. If that was the mathematical expectation of 3X, this would be the same thing as 3 on X's mathematical expectation. We can rewrite this as a negative value of the mathematical expectation of Y by X's mathematical expectation. You can look at it, anyway we have taken it out of mathematical expectation, we have not included it in the calculation. Like this. Then you have a minus.

Same thing here. You may not take this mathematical expectation into account of H. Minus the mathematical expectation of X by the mathematical expectation of Y. This becomes confusing to everyone those E's we have. Finally, the mathematical expectation of this thing, of two mathematical expectations, it will simply be the product of these two mathematical expectations. That will be a plus the mathematical expectation of X on the mathematical expectation of Y. What do we have here? We have the mathematical expectation of Y by the mathematical expectation of X. Then we subtract the mathematical expectation of X by the mathematical expectation of Y. These two things are exactly the same. That will be – in fact, look at this. We take it out twice and then we have it one more time. It's all the same thing. This is the mathematical expectation of Y according to the mathematical expectation of X. This is the mathematical expectation of Y by the mathematical expectation of X, but simply written in a different order.

This is the predicted value of Y over the predicted value of X. We take this out twice and then collect it. We can imagine this and that will be mutually exclusive. You can also choose this and that. But what do we have on the left? We have the covariance of these two random variables X and Y, which is equal to the mathematical expectation of … I will switch back to my colors because that is the end result. The mathematical expectation of X … the mathematical expectation of the product of XY minus … how much is this the mathematical expectation of Y by the mathematical expectation of X. You can calculate these mathematical expectations, if you know everything about the probability distribution or the density functions for each of these random variables. Or, if there was the whole population, from which you make excerpts, every time you are looking for proof of the values of these random variables. But let's just say there was only one sample of these random variables.

How will you calculate them? If you calculate the mathematical expectation of … let's say you have a group of points, several coordinates. I think you're going to start noticing how it's connected with what we do in regression. The mathematical expectation of X over Y, this can be approximated by the average sample value of the results of X and Y. This will be the average of the sample for X over Y. You take each of the XY pairs, you calculate their product and then you take the average of all of them. This is the product of X and Y. Then, this thing here, the mathematical expectation of Y can be roughly calculated as the average value of the sample for Y, and the mathematical expectation of X can be approximately calculated as the average of the sample for X.

How can the covariance of be estimated? two random variables? How can it be approximately calculated? This is the average value of their work for the sample minus the mean value of Y in the sample over the average value of X in the sample. This should start to look familiar to you. But why, what is it? That was the numerator. This here is the numerator when we tried to find the slope (angular coefficient) of this regression line. When we tried to find the slope (angular coefficient) of the line of regression, we had – let me rewrite the formula here, to remind you – this was literally the average of the works for each of our points, or all XY, minus the mean of all Y on the average of all X.

All this on the average of all X squared. You can look at this as – on the average of X for all X. But I can just write it as X squared. Minus the mean of X squared. Thus we found the slope (angular coefficient) of our regression line. Maybe a better way to imagine it is assume that in our regression line, the points we have are a sample of a whole universe of probable points, then you can say that we are calculating approximately the slope (angular coefficient) of our regression line. You can see this little diacritical mark that looks like a hat in many books. I don't want you to be confused. It shows that you are approximately calculating the regression line of the general population from one of its samples. Here – everything what we have learned now is covariance or it is an estimate of the covariance of X and Y. What is this here? Like I just said, you can rework it's very easy, like this bottom here, you can write this as the average of X over X, which is simply X ^ 2, minus the mean of X at the average value of X.

This is actually the mean of X squared. What is this? You can look at this as the covariance of X with X. But we have already seen it. I showed it to you a long time ago, many videos when we first learned what it is. The covariance of a random variable with itself is actually just a variation of this random variable. You can see for yourself. If you replace this Y with X (points to the formula for Cov (X, Y)), this becomes X minus the mathematical expectation of X, by X minus the mathematical expectation of X.

This is the mathematical expectation of X minus the mathematical expectation of X squared. This is the definition of variation. Another way to think about the slope (angular coefficient) of our line of regression, it can literally be said to be the covariance of our two random variables on the variation of X. Or you can look at it as an independent random variable. This here is the slope (angular coefficient) of our regression line. I think that was interesting. I wanted to connect things, which you see in different parts of the statistics and show you that they are really connected..