WEBVTT - autoGenerated
00:00:00.000 --> 00:00:08.000
Okay, for this covariance matrix, I have introduced a separate symbol, sigma x. Sigma x denotes
00:00:08.000 --> 00:00:16.000
a matrix K by K, and that is then called the covariance matrix with, as I said, the variances
00:00:16.000 --> 00:00:23.000
on the main diagonal and the property that sigma x is symmetric.
00:00:23.000 --> 00:00:32.000
We stopped last lecture at this slide here, where I would like to draw your attention
00:00:32.000 --> 00:00:41.000
to a particularly important distinction in notation, and actually also in real matters,
00:00:41.000 --> 00:00:48.000
because you always have to distinguish between the random variables, or in this case, a vector
00:00:48.000 --> 00:00:54.000
of a possibly different random variables, which we have denoted by x. I write it here
00:00:54.000 --> 00:01:00.000
as a row vector transpose, so it is a column vector again, right? So this is the same vector
00:01:00.000 --> 00:01:08.000
x, which we have just looked at. x is a column vector consisting of the components x1, x2,
00:01:08.000 --> 00:01:16.000
and so forth, until xk. So this is a vector of random variables, meaning we do not yet
00:01:16.000 --> 00:01:25.000
know which particular value a random variable will assume as the result of some random event,
00:01:25.000 --> 00:01:35.000
some random event. These random variables are theoretical constructs. We never observe
00:01:35.000 --> 00:01:43.000
a random variable in the real world. What we observe are single observations or observations
00:01:43.000 --> 00:01:50.000
on single random variables after the random variable has taken on this particular value.
00:01:50.000 --> 00:01:59.000
So after a random event has occurred, the observation is the outcome of the random variable.
00:01:59.000 --> 00:02:05.000
So for instance, what we may have is that we just look at one single random variable
00:02:05.000 --> 00:02:12.000
called capital Y, which would not be a vector in this case, but say a scalar random variable,
00:02:12.000 --> 00:02:20.000
and then we then have n observations or sometimes I would say n realizations of this random
00:02:20.000 --> 00:02:29.000
variable capital Y. Maybe the random variable capital Y generates a random event, random
00:02:29.000 --> 00:02:38.000
observation, each period in time, which we observe it. Or we have n different units of
00:02:38.000 --> 00:02:46.000
observations, say n different people, and we say the random variable capital Y takes on different
00:02:46.000 --> 00:02:56.000
value for each of the people. So for instance, let's assume that Y is the highest degree in
00:02:56.000 --> 00:03:02.000
education that a particular person has obtained during his life. So this may be a high school
00:03:02.000 --> 00:03:09.000
diploma as the highest degree of education, or it may be secondary education at the master's level,
00:03:09.000 --> 00:03:13.000
which is the highest degree of education, or perhaps the doctorate degree, or perhaps there's
00:03:13.000 --> 00:03:23.000
no education at all. So we may say that Y is the random variable education in general. And then we
00:03:23.000 --> 00:03:29.000
have different people, which we observe, let's say n people. And for each of these n people,
00:03:29.000 --> 00:03:38.000
we record the highest degree of education which this particular person has obtained. So we can
00:03:38.000 --> 00:03:47.000
then think of Y1, Y2, up to Yn, small y's in this case, as the particular outcomes of a random
00:03:47.000 --> 00:03:54.000
variable capital Y for n different experiments, for n different persons, which we see.
00:03:54.000 --> 00:04:03.000
But for the time being, I will always denote the observations or realizations of a random
00:04:03.000 --> 00:04:12.000
variable by small letters, like here Y1, Y2, Yn. I collect them all in a column vector again. So
00:04:12.000 --> 00:04:16.000
this here is a row vector, but here's the transpose. So it's a column vector just for
00:04:16.000 --> 00:04:21.000
notation or convenience to save space on my slide. I denote this as a row vector and then with a prime,
00:04:21.000 --> 00:04:26.000
but it's still a column vector. This Y here is also a vector of observations,
00:04:27.000 --> 00:04:34.000
unlike this X here, which was a vector of different random variables. Let's say X1 can
00:04:34.000 --> 00:04:43.000
be education as we had it right here, as we had it down here, but X2 to X3, X4 up to Xk
00:04:43.000 --> 00:04:50.000
can be completely different random vectors. For instance, there can be income here, or there can
00:04:50.000 --> 00:04:55.000
be savings, or there can be the wage, or whatever variables we may be interested in. We may here
00:04:55.000 --> 00:05:02.000
collect different random variables in one vector, whereas here we collect different observations on
00:05:02.000 --> 00:05:08.000
the same random variable in a vector of realizations, in a vector of observations.
00:05:08.000 --> 00:05:13.000
And it is very important that you understand this distinction between the concept of a random
00:05:13.000 --> 00:05:21.000
variable, which has not yet taken on its realization, and the outcome of a random event,
00:05:21.000 --> 00:05:30.000
which a random variable capital Y has generated. Now I will show you why it is important to
00:05:30.000 --> 00:05:37.000
properly think about these concepts and distinguish them well in terms of notation,
00:05:37.000 --> 00:05:44.000
because this can be a source of confusion, as I will tell you now. Suppose we have these
00:05:44.000 --> 00:05:52.000
observations here on a single random variable capital Y. And for ease of exposition, let us
00:05:52.000 --> 00:05:59.000
also assume that the observations have already been mean adjusted. So I have already subtracted
00:05:59.000 --> 00:06:07.000
the mean of all the Ys from each single observation Y1. I do not introduce separate
00:06:07.000 --> 00:06:14.000
notation for that, so we will just say the Y1, Y2, Yn have already been mean adjusted. So the
00:06:14.000 --> 00:06:19.000
mean has already been subtracted from all the original observations that we've made.
00:06:22.000 --> 00:06:28.000
If this is the case, then we know that the mean of all the observations, which I denote by Y bar,
00:06:28.000 --> 00:06:36.000
is equal to zero. The mean of all the observations is the sum of all the observations that we have
00:06:36.000 --> 00:06:46.000
made. So I and the sum goes from 1 to capital N. If I sum all the observations in this vector Y
00:06:46.000 --> 00:06:52.000
and divide by the number of observations, so here's the vector 1 over N in front of the sum,
00:06:53.000 --> 00:06:59.000
then obviously the whole thing is zero because I have already done mean adjustment.
00:07:00.000 --> 00:07:07.000
In fact, I don't need to divide by N. Already the sum of the YIs will of course be equal to zero.
00:07:08.000 --> 00:07:16.000
Now why do I do this? I do this because I actually want to talk about the variances of the Ys.
00:07:16.000 --> 00:07:24.000
In particular, I want to speak about now an estimator for the variance of the random variable
00:07:24.000 --> 00:07:33.000
Y. So we also always have to distinguish between the sample variance and the variance of the random
00:07:33.000 --> 00:07:39.000
variable. The sample variance is the variance which I can compute from the sample, from the
00:07:39.000 --> 00:07:46.000
observations that I have taken. The sample variance is in general not the same thing
00:07:46.000 --> 00:07:55.000
as the variance of the random variable, which is a theoretical construct. So what we think is that
00:07:56.000 --> 00:08:02.000
the way we have defined a random variable is that it has a variance, which is the expectation of
00:08:03.000 --> 00:08:09.000
the random variable Y minus its mean and this thing squared, expectation of the squared term,
00:08:10.000 --> 00:08:21.000
and this is in general different from the sample variance which we observe. But the sample variance
00:08:21.000 --> 00:08:30.000
is a common estimator for the theoretical construct of a variance of the random variable Y.
00:08:30.000 --> 00:08:36.000
So what we may do is that we compute this estimator, for instance in the form
00:08:37.000 --> 00:08:48.000
sigma Y hat squared, which would be the mean of the squared YIs, of the sum of the squared YIs.
00:08:50.000 --> 00:08:57.000
Why do we do this? Well, you remember that the variance is actually a measure of the error we
00:08:58.000 --> 00:09:06.000
make when we observe the random variable. Let's say a random variable actually has a mean of
00:09:06.000 --> 00:09:14.000
zero, so the theoretical mean of Y, the expectation of Y, is equal to zero, and now we observe
00:09:14.000 --> 00:09:21.000
realizations of the random variable Y, which typically are different from zero. They are
00:09:21.000 --> 00:09:30.000
somehow distributed around the expectation of Y usually, so that we have some type of density
00:09:30.000 --> 00:09:41.000
function where Y is the expected value and the observations are located more or less close to the
00:09:42.000 --> 00:09:47.000
mean of to the expected value of zero. So some of the observations will be positive and other
00:09:47.000 --> 00:09:56.000
observations will be negative, and we can say that each single observation is in some way the error
00:09:56.000 --> 00:10:04.000
in estimating the mean of Y, because each single observation is not exactly zero, as we would
00:10:04.000 --> 00:10:13.000
expect Y to take on the value zero on average, but it is either positive or it is negative,
00:10:13.000 --> 00:10:22.000
depending on what we observe. So in this sense, you can think of the Y1s and Y2s as a collection
00:10:22.000 --> 00:10:33.000
of errors on estimating the expected value of Y. Now you know that if we just add up all the errors,
00:10:33.000 --> 00:10:39.000
then we will get zero, which is not informative. We are interested when we compute the sample
00:10:39.000 --> 00:10:50.000
variance in the average error that we make when we observe the random variable Y in n different
00:10:50.000 --> 00:10:58.000
realizations. And in order to prevent that we just add up positive and negative errors,
00:10:58.000 --> 00:11:04.000
you know that we take the square of the error. So all of them are positive values,
00:11:05.000 --> 00:11:13.000
measuring now the squared error. What we do here is that we take the mean of all the squared error,
00:11:13.000 --> 00:11:23.000
so that we have an estimate of what type of squared error is made on average when we observe
00:11:23.000 --> 00:11:32.000
the realizations Y1, Y2 to Yn. So this thing here is an estimator for the variance of Y.
00:11:33.000 --> 00:11:39.000
You may know that this estimator is not a particularly good estimator because it is
00:11:40.000 --> 00:11:46.000
biased. We should actually divide by n minus one here rather than by n, so we should divide by the
00:11:46.000 --> 00:11:53.000
degrees of freedom rather by the observations. We will come back to this issue later in the econometrics
00:11:53.000 --> 00:12:03.000
part of this lecture, yet still sigma Y squared hat, which I have defined here as one over n times
00:12:03.000 --> 00:12:10.000
sum of the squared small yi's, is an estimator for the variance of Y even though it is very biased.
00:12:10.000 --> 00:12:17.000
It is actually an estimator whose bias vanishes when n becomes very large because then the
00:12:17.000 --> 00:12:23.000
difference between one over n and one over n minus one is not so big anymore. So asymptotically,
00:12:23.000 --> 00:12:29.000
this estimator here is what we call consistent. We will come to this concept also in the econometrics
00:12:30.000 --> 00:12:36.000
part. What I just would like to draw your attention to is that this expression here,
00:12:36.000 --> 00:12:43.000
one over n times the sum over the squared yi's, is written in a slightly more compact way
00:12:44.000 --> 00:12:53.000
in vector notation as one over n times y prime y. So we can just take the vector of observations y,
00:12:53.000 --> 00:13:00.000
which as you recall is a column vector, and multiply it, sort of transpose it so that
00:13:00.000 --> 00:13:05.000
becomes a row vector, and then multiply it with itself with a column vector.
00:13:07.000 --> 00:13:11.000
This expression here, and if you are not so familiar with matrix algebra,
00:13:11.000 --> 00:13:17.000
then please make sure that you understand this. This expression y prime y is exactly the same
00:13:17.000 --> 00:13:26.000
thing as the sum here, which goes from one to capital N over yi squared. And you may perhaps
00:13:26.000 --> 00:13:32.000
think or believe that this y prime y is more compact. It looks actually less complicated
00:13:32.000 --> 00:13:41.000
perhaps than this sum here. I will very often write the sum of squared components of a vector
00:13:41.000 --> 00:13:47.000
just as y prime y as something which we call the inner product of a vector, and then
00:13:47.000 --> 00:13:52.000
often multiply by one over n or which is the same thing by n inverse.
00:13:55.000 --> 00:14:01.000
Now here comes a small exercise for you. Suppose that the observations are not mean adjusted.
00:14:01.000 --> 00:14:08.000
I have just assumed that the observations are mean adjusted for ease of notation,
00:14:08.000 --> 00:14:15.000
but there's no reason to do this except ease of notation. So just assume that we have
00:14:15.000 --> 00:14:22.000
observations which have non-zero mean, so y bar is different from zero, and then show that the
00:14:22.000 --> 00:14:32.000
analogous estimator for the variance of y is one over n y prime y minus y bar squared.
00:14:32.000 --> 00:14:43.000
This is again an estimator of the variance of the random variable capital Y
00:14:44.000 --> 00:14:51.000
constructed from the observations small y, which we have. And this here is actually what we call
00:14:51.000 --> 00:14:59.000
the sample variance. So we take the sample variance and use this as an estimator for the
00:14:59.000 --> 00:15:11.000
variance of the random variable. Now let's move on to a vector notation. So suppose
00:15:11.000 --> 00:15:18.000
a complete analogy to what I have just said that all the components of the vector x, which was,
00:15:18.000 --> 00:15:26.000
as you recall, a column vector with k components, have an expected value of zero. So E of xi is
00:15:26.000 --> 00:15:36.000
equal to zero for all i's in our column vector x. Often I will not write E of xi is equal to zero
00:15:36.000 --> 00:15:43.000
for all i, but I would rather write E of x is equal to zero. You have to understand then that
00:15:43.000 --> 00:15:50.000
this zero here is different from that zero. Normally I would now ask you why, and I hope
00:15:50.000 --> 00:15:57.000
somebody would raise his or her hand. But since things are more complicated now in the digital
00:15:57.000 --> 00:16:03.000
recording, I will just tell you or you see it here on the slide already, this zero is supposed to
00:16:03.000 --> 00:16:11.000
indicate a null vector. So that's a vector with k components because E of x is a vector. That's
00:16:11.000 --> 00:16:18.000
the expectation of a vector with k components. Whereas this null here is a scalar, so it is a
00:16:18.000 --> 00:16:29.000
real number, whereas here we have k scalars in the null vector. So that's also a usual way to
00:16:29.000 --> 00:16:37.000
denote matrices with entries of zero that you just write zeros. You just write one zero here,
00:16:38.000 --> 00:16:45.000
and the reader is supposed to understand that the zero indicates actually a null vector
00:16:45.000 --> 00:16:56.000
because the left-hand side of the equation tells you clearly what kind of vector you have here,
00:16:56.000 --> 00:17:01.000
what size the vector takes on. So obviously on the right-hand side you must have the same size.
00:17:02.000 --> 00:17:11.000
Now, suppose that our vector x has expectation zero. In this case, we know that the expectation
00:17:11.000 --> 00:17:22.000
of x x prime is equal to the variances on the main diagonal of this matrix and the covariances
00:17:22.000 --> 00:17:28.000
in the lower triangular part and in the upper triangular part. This was exactly what I have
00:17:28.000 --> 00:17:36.000
repeated at the beginning of this lecture, that the expectation of x x prime consists of a
00:17:37.000 --> 00:17:42.000
matrix of variances and covariances, and we call this matrix the covariance matrix.
00:17:43.000 --> 00:17:48.000
I have just here used a slightly different notation from the notation I had used in
00:17:48.000 --> 00:17:57.000
previous slides because all those covariances I just write here as sigma i j in parentheses,
00:17:57.000 --> 00:18:02.000
which shall indicate that there are all the possible combinations of i's and j's
00:18:03.000 --> 00:18:10.000
somehow organized here in this lower triangular part and the same thing up here. So I don't
00:18:10.000 --> 00:18:18.000
explicitly write down all the terms which have cross products as their origins, but just rather
00:18:18.000 --> 00:18:22.000
indicate by this notation here that they are all in time this matrix.
00:18:22.000 --> 00:18:34.000
So here I take the expectation of x x prime in order to denote the variance of the matrix of
00:18:34.000 --> 00:18:41.000
covariances. Whereas on the previous slide, I had told you that you have to take y prime y
00:18:41.000 --> 00:18:49.000
as the basic component of the estimator of the variance of y. On the previous slide, we had,
00:18:49.000 --> 00:18:57.000
well divide this equation here by n, then we had that sigma square hat y is equal to 1 over n y prime
00:18:57.000 --> 00:19:06.000
y. Where I would like to draw your attention to is the position of the prime here. When I estimate
00:19:06.000 --> 00:19:15.000
a variance from a vector of observations, I use y prime y, but when I look at the expectation of
00:19:15.000 --> 00:19:21.000
a vector of random variables, then I use the expectation of x x prime. So the prime is here
00:19:21.000 --> 00:19:28.000
with the second component of the product, whereas here the prime is with the first component of
00:19:28.000 --> 00:19:35.000
the vector product, of the inner product in this case. So the transpositions are different,
00:19:36.000 --> 00:19:40.000
and this is because we are talking of completely different concepts.
00:19:40.000 --> 00:19:48.000
X x prime is a matrix of covariances, or better, the expectation of x x prime,
00:19:48.000 --> 00:19:52.000
I should perhaps better write here, the expectation of x x prime is a matrix of
00:19:52.000 --> 00:19:59.000
variances and covariances, while y prime y is a scalar which just estimates a single
00:20:00.000 --> 00:20:10.000
variance times m, of course. So this may appear confusing when you are unfamiliar with
00:20:10.000 --> 00:20:18.000
vector notation. And this is why I go through this in some detail, and I really ask you
00:20:18.000 --> 00:20:27.000
to look at it again at home until you are really sure you have understood what the difference is,
00:20:27.000 --> 00:20:35.000
and that you can apply the rules of matrix algebra correctly when it comes to either
00:20:36.000 --> 00:20:44.000
constructing a covariance matrix of a vector of random variables or estimating a single covariance
00:20:44.000 --> 00:20:52.000
from a vector of observations. So why is that so, and what leads to the confusion?
00:20:54.000 --> 00:21:02.000
Let's go into the issue again from a different point of view. As I had told you, y prime y
00:21:02.000 --> 00:21:11.000
is the sum over all the squared y's, and we have n observations on a single random variable,
00:21:11.000 --> 00:21:20.000
capital Y, as we had said. Why do we take y prime y, or why do we actually use the sum of the squared
00:21:20.000 --> 00:21:29.000
yi's here? Well note that each term yi squared in this sum here, each single term
00:21:29.000 --> 00:21:38.000
is already an estimator of the variance, sigma y squared. So here I use sigma y squared as the
00:21:38.000 --> 00:21:44.000
true variance of the random variable capital Y. There's no hat on it because it's not an estimator,
00:21:44.000 --> 00:21:52.000
but this denotes the true unknown variance of the random variable Y. And each single term,
00:21:52.000 --> 00:22:02.000
yi squared, is a particular estimator of this variance here. Because we know that the expectation
00:22:02.000 --> 00:22:12.000
of yi squared is sigma y squared. That's the thing we know from our analysis of second moments,
00:22:12.000 --> 00:22:20.000
or from the definition of a variance actually, the expectation of yi squared is equal to sigma y
00:22:20.000 --> 00:22:30.000
squared. Actually what I do here is slightly incorrect notation because it would be better
00:22:30.000 --> 00:22:36.000
to write a capital Y here, since yi squared is already the observation, and the observation
00:22:36.000 --> 00:22:45.000
doesn't really have an expectation anymore. What I mean is that when we take this yi squared here
00:22:45.000 --> 00:22:54.000
as an unknown expression or an expression of an unknown realization of the observation,
00:22:55.000 --> 00:23:01.000
then it would have an expected value, and the expectation of the squared y is actually the
00:23:01.000 --> 00:23:11.000
true variance. So we have each yi squared as an estimator of the variance of capital Y,
00:23:12.000 --> 00:23:21.000
and what we do here is that we take the sum of all those estimators, or when we divide this thing
00:23:21.000 --> 00:23:30.000
by n as we do it, we take the average of all the estimators. The reason for this is that the
00:23:30.000 --> 00:23:38.000
estimate of a single yi squared for sigma y squared is based on just one observation,
00:23:38.000 --> 00:23:44.000
and therefore it may involve huge errors if this particular observation was far off the
00:23:44.000 --> 00:23:50.000
expected value. So it is not a good estimator of the variance. It is an estimator of the variance,
00:23:50.000 --> 00:23:56.000
but it's not a good estimator of the variance. It is much better to use as an estimator for the
00:23:56.000 --> 00:24:04.000
true variance of capital Y the average of different estimates, because then the errors,
00:24:04.000 --> 00:24:11.000
the huge errors which are involved in using one particular yi squared, these errors are averaged
00:24:11.000 --> 00:24:20.000
out. So it is much better to use the mean of all the single estimates yi squared, so to use as an
00:24:20.000 --> 00:24:28.000
estimator for the true variance of capital Y, the average over the sum of all those estimates,
00:24:28.000 --> 00:24:35.000
and that would be 1 over n times the sum of the squared yi, so it would be y prime y divided by n.
00:24:39.000 --> 00:24:45.000
By contrast, the expression E of x, x prime has nothing to do with estimation,
00:24:47.000 --> 00:24:50.000
so here we were concerned with estimating the variance which we don't know,
00:24:51.000 --> 00:25:01.000
whereas here we just compute the variance as a theoretical construct, and this computation
00:25:01.000 --> 00:25:07.000
has nothing to do with estimation. It's just a way of writing down what the variance of a particular
00:25:07.000 --> 00:25:14.000
random variable x, in this case a random vector, is what the expectation of this random vector is.
00:25:14.000 --> 00:25:24.000
We know that the typical element of this matrix here is the expectation of xi times xj.
00:25:25.000 --> 00:25:31.000
xi is a scalar random variable, xj is a scalar random variable, so the expectation of this thing
00:25:31.000 --> 00:25:40.000
is sigma xi xj provided that the x's all have zero expectation. On the main diagonal this would
00:25:40.000 --> 00:25:46.000
specialize to sigma xi squared, so we would have the variance on the main diagonal and we would have
00:25:46.000 --> 00:25:55.000
the covariance or i different from j in the lower and upper triangular part of the matrix.
00:25:56.000 --> 00:26:04.000
So these expressions here for single entries in the matrix x, x prime are completely analogous
00:26:04.000 --> 00:26:13.000
to the expectation of some unknown realization yi squared being the true
00:26:13.000 --> 00:26:18.000
variance sigma squared y as we had it before.
00:26:21.000 --> 00:26:29.000
The upshot of all of that is we can always take the expectation of a single pair of random variables,
00:26:29.000 --> 00:26:36.000
for instance e of xi xj, to express the theoretical concept of a covariance or of a
00:26:36.000 --> 00:26:46.000
variance if i is equal to j. But that is just sort of the concept, the theoretical concept of a
00:26:46.000 --> 00:26:53.000
variance or of a covariance. However if we work with real world data, if we work with the
00:26:53.000 --> 00:26:59.000
observations which we have, then we have to estimate and if we want to estimate a covariance
00:26:59.000 --> 00:27:03.000
then we need more than just one observation. We need actually many observations on the same random
00:27:03.000 --> 00:27:10.000
variable or on the same pair of random variables xi and xj if we are to estimate the covariance.
00:27:10.000 --> 00:27:16.000
So in this case when we want to estimate we take the mean of many observations,
00:27:17.000 --> 00:27:25.000
observations either in pairs yi times yj for different values of i and j or if j is equal to
00:27:25.000 --> 00:27:32.000
i, observations on yi squared. So then we take the mean of those many observations to estimate
00:27:32.000 --> 00:27:40.000
just one particular variance in say a mite matrix of this type. Well these are here x's and y i
00:27:40.000 --> 00:27:46.000
haven't denoted them by y's but in principle this could be y's as well. And if we just want to have
00:27:46.000 --> 00:27:55.000
the estimator of a particular variance say of y then we have to estimate this by a much greater
00:27:55.000 --> 00:28:02.000
number of observations than one. Typically we would like to have as many observations on a
00:28:02.000 --> 00:28:06.000
single random variable as possible in order to estimate the variance.
00:28:12.000 --> 00:28:15.000
Are there, I mean I know this is a little difficult if you are not used to that,
00:28:15.000 --> 00:28:21.000
are there any questions relating to what I have just said or have I expressed myself clearly?
00:28:25.000 --> 00:28:29.000
Let me see.
00:28:32.000 --> 00:28:36.000
Oh yeah somebody noted that the recording didn't start but that's good. Okay
00:28:39.000 --> 00:28:47.000
now in a regression analysis which we will deal with in great parts at least of this lecture
00:28:48.000 --> 00:28:55.000
we often have k explanatory random variables. So we do not just do bivariate regression where
00:28:55.000 --> 00:29:02.000
one variable explains some other variable but we often have multivariate regression where we have
00:29:02.000 --> 00:29:10.000
k different regressors call them x1 and 2xk and all of them explain one dependent variable.
00:29:11.000 --> 00:29:19.000
Now suppose that for each of these k random variables here we have n observations in mean
00:29:19.000 --> 00:29:27.000
adjusted column vectors x1 and xk. So now we combine the two ideas we have been dealing with
00:29:27.000 --> 00:29:35.000
separately so far. On the one hand we have k different random variables and on the other hand
00:29:35.000 --> 00:29:42.000
for each of those k random variables we have n observations. Either n observations through time
00:29:42.000 --> 00:29:51.000
or n observations as a cross section over people. So again if x1 is education we may have n people
00:29:51.000 --> 00:29:59.000
whose degree of education we observe and if xk is income for instance then we may have the
00:29:59.000 --> 00:30:08.000
income of the same n people. So for each of these single random variables which we may collect in one
00:30:08.000 --> 00:30:16.000
vector we have n observations each. These n observations I denote again here with small
00:30:17.000 --> 00:30:28.000
letters for each variable so I would have a vector x1 for the scalar random variable
00:30:28.000 --> 00:30:39.000
capital x1 let's say education and this is vector x1 I would denote as a vector with entries x11 x12
00:30:39.000 --> 00:30:49.000
and so forth sorry x11 x21 x31 all the way down to xn1. So the second index here always denotes
00:30:49.000 --> 00:30:57.000
the column and the first index denotes the row as you are used to it in matrix algebra indicating
00:30:57.000 --> 00:31:06.000
then that for the first random variable in our set of regressors so for the first random variable
00:31:06.000 --> 00:31:15.000
x1 we have observations ranging from one to individual n and the same thing then of course
00:31:15.000 --> 00:31:21.000
holds for the second regressor for the second random variable which we observe where we again
00:31:21.000 --> 00:31:26.000
have n observations hopefully on the same people otherwise we are we cannot automate this way as
00:31:26.000 --> 00:31:31.000
we will see later but for the time being suppose these are the same people for which we also
00:31:31.000 --> 00:31:38.000
observe some second random variable this could be for instance gender right we could have a
00:31:38.000 --> 00:31:44.000
variable which is denoted male and female if you just restrict gender to these two possibilities
00:31:45.000 --> 00:31:49.000
and typically we would then say well female is one and male is zero or something like this and
00:31:49.000 --> 00:31:57.000
we would have a vector of ones and zeros and would consider this vector of ones and zeros as random
00:31:57.000 --> 00:32:07.000
events where nature somehow determines at birth what the sex of a particular person is and that
00:32:07.000 --> 00:32:14.000
is then invariable in variable throughout life in the usual case at least and so we may have a k
00:32:14.000 --> 00:32:23.000
different explanatory variables each of which has generated n observations and the observations
00:32:23.000 --> 00:32:31.000
unlike the random variables i don't know small numbers now obviously these regressors here
00:32:32.000 --> 00:32:39.000
these observations which we have here n observations on k different regressors
00:32:39.000 --> 00:32:47.000
can be organized as a as a matrix i will come to this in a minute first i would like to note that
00:32:48.000 --> 00:32:56.000
when we just have those kind of vectors of observations for single random variables x1
00:32:56.000 --> 00:33:05.000
then the relationship between this x1 the capital x1 and the small x1 here is the same thing as the
00:33:05.000 --> 00:33:12.000
relation i had previously on my slide between random variable capital y and observation small y
00:33:12.000 --> 00:33:22.000
which i had just now i replaced the y's by x's and i allow for k different scalar random variable
00:33:22.000 --> 00:33:31.000
each of which has n observations which some random event has generated so like i did it with
00:33:31.000 --> 00:33:39.000
y's i can estimate the variances and the covariances of those random variables x1
00:33:40.000 --> 00:33:48.000
to xk capital x1 to capital xk and the variance of course would be denoted by sigma squared hat
00:33:48.000 --> 00:33:55.000
xi which is then one over n times the inner product of the observations on the i th random
00:33:55.000 --> 00:34:06.000
variable so xi prime xi or if i want to estimate the covariance then i would have sigma hat xi xj
00:34:07.000 --> 00:34:16.000
and this would be one over n times the inner product of no times the product of xi prime with xj
00:34:16.000 --> 00:34:29.000
be aware that this product here may equally well be written as xj prime times xi so this
00:34:29.000 --> 00:34:37.000
is the same thing because this thing here is scalar right so it is just a real number xi prime
00:34:37.000 --> 00:34:45.000
xj is a real number a real number is by definition symmetric if you can think of it as a degenerate
00:34:45.000 --> 00:34:53.000
case of a symmetric matrix so such a product xi times xj is always just a real number and
00:34:53.000 --> 00:35:00.000
therefore since a real number is by definition symmetric you can also take the transpose of this
00:35:00.000 --> 00:35:06.000
expression here the transpose of this expression here would be just xj prime times xi so this is
00:35:06.000 --> 00:35:14.000
the same thing and this is the reason actually why the covariance matrix is symmetric so there were
00:35:14.000 --> 00:35:26.000
a couple of questions coming in let me just take these the original y vector untransposed
00:35:26.000 --> 00:35:33.000
is a row vector no it's a column vector right i'm gonna go back here to show you that somebody
00:35:33.000 --> 00:35:39.000
was asking whether y is a row vector it's not the case all my vectors are typically column vectors
00:35:39.000 --> 00:35:52.000
unless i explicitly say so oh where was the original y here y was written here as a row vector
00:35:52.000 --> 00:35:59.000
transposed so it is a column vector it is as i have already said just for notational convenience
00:35:59.000 --> 00:36:05.000
in order to save space in text or on my slide but i don't write y is a column vector which would
00:36:06.000 --> 00:36:11.000
take up space plus that down up to here and not much much would fit on the on the slide
00:36:11.000 --> 00:36:19.000
but rather i write it as a column vector as a row vector sorry as a row vector transposed
00:36:19.000 --> 00:36:24.000
and a row vector transposed is a column vector so the y is a column vector the original y
00:36:27.000 --> 00:36:29.000
let's see what other questions they were
00:36:29.000 --> 00:36:38.000
oh yeah somebody else answered that already column guess yeah these answers are correct
00:36:40.000 --> 00:36:47.000
okay so it seems to be just this one question here okay then let's move on as i have already said
00:36:47.000 --> 00:36:55.000
we can collect the observations which we have the column vectors x1 up to xk in one matrix
00:36:55.000 --> 00:37:02.000
and this collection is very easy the first column of this matrix which i now call x
00:37:03.000 --> 00:37:10.000
and i should perhaps indicate this x here is now a matrix it is not the same x which i have used
00:37:10.000 --> 00:37:17.000
in previous slides to indicate a vector of random variables but here please new use of the symbol
00:37:17.000 --> 00:37:24.000
capital x it is just a matrix which consists of the column vectors of observations
00:37:25.000 --> 00:37:32.000
so here we have the column vector x1 and next to it is the column vector x2
00:37:33.000 --> 00:37:43.000
and so forth here we have the column vector xk the column index always tells you to which of
00:37:43.000 --> 00:37:50.000
the column vectors this referred so here's the column index one and here's the column index two
00:37:50.000 --> 00:37:59.000
the second uh subscript and here's the column index k also the second subscript because this
00:37:59.000 --> 00:38:09.000
column of matrix x is equal to the column vector small x1 and the second column obviously is equal
00:38:09.000 --> 00:38:15.000
to the column vector x2 and the kth column is equal to the column vector xk
00:38:17.000 --> 00:38:22.000
now here again you have to be careful because i'm not completely consistent in my notation
00:38:23.000 --> 00:38:27.000
the reason is just that i lack sufficient possibilities in notation sufficient
00:38:28.000 --> 00:38:36.000
symbols here i use a capital x to note a vector of observations i told you that in general in these
00:38:36.000 --> 00:38:43.000
first slides i will distinguish between random variables which receive capital symbols
00:38:43.000 --> 00:38:53.000
and observations which receive small letters as symbols here i violate this convention because
00:38:53.000 --> 00:39:00.000
matrices as a general rule are always denoted as capital almost always denoted as capital
00:39:01.000 --> 00:39:07.000
letters so please don't take this here for a random variable it is actually a vector of
00:39:07.000 --> 00:39:16.000
observations as you can see from definition okay as i have already said this x here is by notation
00:39:16.000 --> 00:39:24.000
is a different x from the x i have used previously it's not the random vector of random variables
00:39:24.000 --> 00:39:33.000
anymore but rather it is a vector of observations in the former notation where we had used a vector
00:39:33.000 --> 00:39:42.000
x there were k scalar random variables in this vector x and here we have n observations on k
00:39:42.000 --> 00:39:47.000
random variables in the matrix capital x so it's a matrix of observations
00:39:47.000 --> 00:39:58.000
now convince yourself please this is an exercise that we can estimate all the covariances of the
00:39:58.000 --> 00:40:04.000
explanatory random variables so a random variables here we can estimate all the variances and
00:40:04.000 --> 00:40:12.000
covariances by this expression here which looks fairly complicated but you will recognize that
00:40:12.000 --> 00:40:20.000
is the estimator of a covariance matrix because on the main diagonal you have the sigma square
00:40:20.000 --> 00:40:31.000
hats of the individual random variables x1 x2 x3 down to xk so these here are the estimators of
00:40:33.000 --> 00:40:40.000
actually the estimates the estimates of the variances of the k different random variables
00:40:41.000 --> 00:40:48.000
and in the off diagonal elements as usual you have the estimators of the the covariances
00:40:50.000 --> 00:40:58.000
here and there and again the matrix is symmetric convince yourself now that you can
00:40:58.000 --> 00:41:04.000
estimate all these covariances here by this simple expression 1 over n x prime x
00:41:04.000 --> 00:41:12.000
now this is x prime x again be careful to observe the position of the prime here
00:41:12.000 --> 00:41:22.000
this is an estimate which we compute here so we use x prime x it is completely different from
00:41:22.000 --> 00:41:33.000
the concept of the random variable of the the expectation of the random vector x which was
00:41:34.000 --> 00:41:41.000
which was the expectation of x x prime because here again we deal with estimation and then we
00:41:41.000 --> 00:41:48.000
have to take x prime x things are a little confusing here because of the change of the
00:41:48.000 --> 00:41:58.000
symbol x that i have alluded to in the previous slide initially we had a random vector x vector x
00:41:59.000 --> 00:42:05.000
here we have a matrix x of observations and we use the matrix x of observations in order
00:42:06.000 --> 00:42:14.000
estimate the variances and the covariances of the k random components which we previously
00:42:14.000 --> 00:42:23.000
also had organized in the vector x so this expression here 1 over n x prime x is just
00:42:23.000 --> 00:42:31.000
a generalization of our concept of 1 over n y prime y see the same position of the prime
00:42:31.000 --> 00:42:40.000
yeah between the x's here or between the y's here so this just generalizes this idea here to vector
00:42:40.000 --> 00:42:46.000
format whereas here we were computing the variance of a single scalar random variable
00:42:47.000 --> 00:42:54.000
here we are computing the covariance matrix of a random vector which contains the components x1
00:42:54.000 --> 00:43:02.000
to xk otherwise this expression here is completely analogous to the expression here
00:43:02.000 --> 00:43:07.000
but obviously as long as we just have one scalar random variable why there is no covariance to
00:43:07.000 --> 00:43:14.000
compute because just one random variable as soon as we consider more than one variables more than
00:43:14.000 --> 00:43:21.000
one random variable we always have a covariance to compute and to to estimate and this is why then
00:43:21.000 --> 00:43:26.000
things move up from scalar expressions from real numbers like we have it here to matrix
00:43:31.000 --> 00:43:40.000
okay that was all of the confusing stuff um i admit it is difficult at first
00:43:40.000 --> 00:43:51.000
site to see the systematics of these setups here but i really encourage you to deal with that
00:43:51.000 --> 00:43:57.000
probably because we will use it over and over again any further questions relating to what
00:43:57.000 --> 00:44:06.000
i have said here so this is more difficult part of the lecture is thereby over with we are over
00:44:06.000 --> 00:44:16.000
with it and we move to the last section of the set of slides which deal with probability
00:44:16.000 --> 00:44:24.000
theory by introducing the normal distribution and some related distribution functions
00:44:26.000 --> 00:44:35.000
because you know in inference we very often use the normal distribution or related distribution
00:44:36.000 --> 00:44:42.000
functions the normal distribution has a key importance in statistics and in inference in
00:44:42.000 --> 00:44:49.000
particular and this is why i will also introduce it here already in the probability part we'll see
00:44:49.000 --> 00:44:57.000
it again in the next set of slides which then give you a review of statistics well the normal
00:44:57.000 --> 00:45:06.000
distribution is also often called a Gaussian distribution in honor of Karl Friedrich Gauss
00:45:06.000 --> 00:45:15.000
the German mathematician that is very nice to relate the normal distribution to Gauss
00:45:15.000 --> 00:45:22.000
but what is nice is not actually normal because the normal distribution is actually not
00:45:22.000 --> 00:45:30.000
due to Gauss but it is due to the French mathematicians de Moivre and Laplace still
00:45:30.000 --> 00:45:35.000
for some reason well the reason is actually that Gauss was able to solve a particular integral
00:45:35.000 --> 00:45:41.000
related to the normal distribution but the distribution in itself is has been detected
00:45:41.000 --> 00:45:50.000
or developed by de Moivre and Laplace and Gauss only solved this integral which is related to
00:45:50.000 --> 00:45:57.000
computing the expected value of a normal random variable which in some way was a certain break
00:45:57.000 --> 00:46:01.000
through in understanding the normal distribution and that is probably the reason why the
00:46:01.000 --> 00:46:09.000
distribution is called Gauss but it does not truly do justice to the earlier contributions of de
00:46:09.000 --> 00:46:18.000
Moivre and Laplace. Here's the definition of the normal distribution and a continuous random
00:46:18.000 --> 00:46:26.000
variable x is normally distributed if for some given real numbers mu and sigma so these are
00:46:26.000 --> 00:46:33.000
just some numbers the probability density function of x is given by this expression here
00:46:34.000 --> 00:46:44.000
1 over sigma times the square root of 2 pi times the exponential function of minus x minus mu
00:46:44.000 --> 00:46:51.000
squared divided by 2 sigma squared that's the probability density function of a normally
00:46:51.000 --> 00:47:02.000
distributed random variable note for a scalar random variable you can write the whole thing
00:47:02.000 --> 00:47:09.000
also for random vectors which are normally distributed and we will have to deal with
00:47:09.000 --> 00:47:15.000
matrices like we just did it in the previous section of these slides but I won't do this
00:47:15.000 --> 00:47:21.000
here I will just introduce the normal distribution here for scalar random variable x
00:47:22.000 --> 00:47:30.000
which takes on every possible value in R so it is a continuous random variable and the probability
00:47:30.000 --> 00:47:39.000
density function f of x fx of x is given by this rather complicated expression here or
00:47:39.000 --> 00:47:43.000
seemingly complicated expression and you'll get used to it I hope.
00:47:46.000 --> 00:47:58.000
Note that I often do the expectation x of z to denote e the what we call the Euler number
00:47:58.000 --> 00:48:09.000
the Euler number e to the power of z so e is equal to 2.71 for the digits behind the decimal
00:48:09.000 --> 00:48:18.000
comma so this is e to the power of z written as x of z so the exponential function.
00:48:18.000 --> 00:48:29.000
Now it is possible to show that this number mu which pops up here right this is just
00:48:29.000 --> 00:48:37.000
some real number that this number is actually the expected value of a normally distributed
00:48:37.000 --> 00:48:44.000
random variable and it is possible to show that sigma squared is actually the variance
00:48:45.000 --> 00:48:51.000
of this random variable or sigma is the standard deviation of this random variables
00:48:51.000 --> 00:49:04.000
this is the random variable so mu and sigma are some numbers which come up as parameters of this
00:49:04.000 --> 00:49:10.000
expression here and it turns out that if you compute the expected value and if you compute
00:49:10.000 --> 00:49:20.000
the variance of such a random variable then you have the result that mu is the expected value
00:49:20.000 --> 00:49:25.000
of a normally distributed random variable and sigma square is the variance.
00:49:27.000 --> 00:49:33.000
So mu is equal to e of x and sigma square is equal to e of x minus mu squared
00:49:34.000 --> 00:49:38.000
these computations to show these results is non-trivial this was actually a
00:49:39.000 --> 00:49:48.000
contribution of Gauss to solve the integrals which come up when you write down the expectation
00:49:48.000 --> 00:49:55.000
of x or the expectation of x minus mu squared to solve these integrals and show that these
00:49:55.000 --> 00:50:02.000
properties here hold so I will not prove that to you but you'll please just leave it you'll find
00:50:02.000 --> 00:50:08.000
the proofs of course in any good book on probability theory.
00:50:11.000 --> 00:50:18.000
This means that a normal distribution is characterized by these numbers mu and sigma
00:50:18.000 --> 00:50:25.000
or sigma squared and therefore a normally distributed random variable x with mean mu
00:50:25.000 --> 00:50:32.000
and variance sigma squared is usually written in a shorthand notation as x is distributed as
00:50:32.000 --> 00:50:38.000
a normally distributed random variable this is what the n stands for with parameters mu
00:50:38.000 --> 00:50:43.000
and sigma squared or with expected value mu and variance sigma squared that's the same thing.
00:50:44.000 --> 00:50:53.000
So this symbol you read as is distributed as and the n here indicates the normal distribution.
00:50:53.000 --> 00:51:02.000
We know a little more than that because we not only know that mu and sigma squared are
00:51:02.000 --> 00:51:08.000
key parameters with described expectation and the variance of a normally distributed variable
00:51:09.000 --> 00:51:17.000
we also know that a normally distributed random variable variable is fully and uniquely described
00:51:17.000 --> 00:51:26.000
by mu and sigma squared. So as you recall mu is the first non-central moment of a random variable
00:51:26.000 --> 00:51:32.000
sigma square is the second central moment of a random variable and you may say well
00:51:32.000 --> 00:51:36.000
what do we know about higher moments of a randomly distributed
00:51:38.000 --> 00:51:43.000
variable as a normally distributed random variable so what do we know about the third
00:51:43.000 --> 00:51:48.000
moment the skewness or the fourth moment the kurtosis or even higher moments so fifth sixth
00:51:48.000 --> 00:51:54.000
seventh moment of such a random variable. Well what can be proven is that the normal
00:51:54.000 --> 00:52:02.000
distribution is fully and uniquely described by mu and sigma squared which means that all the
00:52:02.000 --> 00:52:08.000
higher moments and measures in particular the skewness and the kurtosis depend only on mu and
00:52:08.000 --> 00:52:14.000
on sigma squared so if you know mu and sigma squared then you know all you need to know about
00:52:14.000 --> 00:52:23.000
a normal distribution everything else or other moments just depend on mu and on sigma squared.
00:52:27.000 --> 00:52:33.000
Okay as you probably are aware the normal distributions are a very important class
00:52:33.000 --> 00:52:39.000
of statistical distributions and they often serve as a type of benchmark distribution which is
00:52:40.000 --> 00:52:48.000
reasonable because in terms of kurtosis for instance there are other types of distributions
00:52:48.000 --> 00:52:55.000
which have higher kurtosis than the normal distribution and there are distributions which
00:52:55.000 --> 00:53:00.000
have lower kurtosis than the normal distribution so in some sense in terms of kurtosis at least
00:53:00.000 --> 00:53:08.000
the normal distribution lies between many other distributions and you may say then that the normal
00:53:08.000 --> 00:53:15.000
distribution is some type of benchmark similarly as you are certainly aware the normal distribution
00:53:15.000 --> 00:53:24.000
is a symmetric distribution so the skewness of the distribution is zero which again means that
00:53:24.000 --> 00:53:28.000
there are distributions which are screwed to the right to the right and there are distributions
00:53:28.000 --> 00:53:33.000
which are screwed to the left so there are distributions with positive or with negative
00:53:33.000 --> 00:53:40.000
skewness and the normal distribution has skewness of zero so it lies in between the possible other
00:53:41.000 --> 00:53:52.000
cases and may therefore again serve as a benchmark. There's another reason why the normal distribution
00:53:52.000 --> 00:53:57.000
is a benchmark and this is perhaps more important than what I just tried to explain in terms of
00:53:57.000 --> 00:54:09.000
skewness and kurtosis. Under certain conditions the sum of any any well the the sum of random
00:54:09.000 --> 00:54:15.000
variables following any distribution converges to a normal distribution so the normal distribution
00:54:15.000 --> 00:54:23.000
is also a limiting case for some sums of random variables as we will see when we discuss
00:54:23.000 --> 00:54:32.000
the central limit theorem. Note that all normal distributions are symmetrical around their mean
00:54:32.000 --> 00:54:39.000
as I already said this is the skewness property that the skewness is zero and that they have a
00:54:39.000 --> 00:54:47.000
bell-shaped density curve with a single peak at the center of the distribution. I show you examples
00:54:47.000 --> 00:54:54.000
of two normal distributions in this diagram here so for instance we have this blue probability
00:54:54.000 --> 00:55:03.000
density function which has a mean of 45 right so here's the scale for some reason here's 5 and here
00:55:03.000 --> 00:55:14.000
is 85 and here is 45 which is the mean of the blue normal normally distributed random variable
00:55:14.000 --> 00:55:21.000
so you see here the parameter mu is equal to 45 and the standard deviation is 10 so the variance
00:55:21.000 --> 00:55:32.000
is actually 100 and then you get we get this pdf here where this blue pdf which as you can see is
00:55:32.000 --> 00:55:40.000
symmetric about the mean 45 and has a single peak not all random variables need to have
00:55:40.000 --> 00:55:45.000
same peaks in their pdfs and here we have a second random variable which is normally
00:55:45.000 --> 00:55:51.000
distributed and even though at first sight it may seem that this looks very different it is actually
00:55:51.000 --> 00:55:58.000
generated by the same expression that we had in the green box when I introduced the randomly
00:55:58.000 --> 00:56:06.000
distributed variable x with type of probability density function which I showed you I have just
00:56:06.000 --> 00:56:14.000
chosen other parameters here here the mean is 52 so the expected value is now 52 so this is where
00:56:14.000 --> 00:56:22.000
the peak of the pdf is at 52 and the standard deviation is only five so the variance is only
00:56:22.000 --> 00:56:31.000
25 which means that this distribution here is narrower than the blue distribution and the
00:56:31.000 --> 00:56:39.000
height of the distribution the peak of the distribution is much bigger than it was here
00:56:39.000 --> 00:56:47.000
the blue distribution this type of distribution this type of density function here is often called
00:56:47.000 --> 00:56:54.000
bell-shaped density function and with some degree of fantasy you may think of this thing here as
00:56:54.000 --> 00:57:02.000
being like a church bell not quite actually but so the expression is quite common to speak of a
00:57:02.000 --> 00:57:08.000
bell-shaped probability density function and what people mean when they speak of this bell-shaped
00:57:08.000 --> 00:57:14.000
form is precisely the type of form you see here in the blue and in the red pdf
00:57:14.000 --> 00:57:24.000
now there's a special case of a normally distributed random variable and this is called
00:57:24.000 --> 00:57:29.000
the standard normal distribution the standard normal distribution is just the special case of
00:57:29.000 --> 00:57:35.000
a normal distribution where mu the expected value is equal to zero and the standard deviation is
00:57:35.000 --> 00:57:40.000
equal to one so if the standard deviation is equal to one then the variance is also equal to one so
00:57:40.000 --> 00:57:44.000
it just makes no difference of whether you say variance is one of the standard deviation
00:57:44.000 --> 00:57:50.000
this one a normal distribution with these two values mu equal to zero and sigma equal to one
00:57:50.000 --> 00:57:57.000
is called a standard normal distribution and the corresponding pdf has a particular symbol which
00:57:57.000 --> 00:58:07.000
is widely used in the literature namely a phi so phi of z is the pdf of a randomly distributed
00:58:08.000 --> 00:58:15.000
of a normally distributed random variable capital z with mu equal to zero and sigma equal to one
00:58:16.000 --> 00:58:23.000
phi of z here is defined as f z f capital z of small z so that's the random variable and this
00:58:23.000 --> 00:58:30.000
is a particular value the random variable may take on actually with probability zero
00:58:30.000 --> 00:58:36.000
or maybe greater or smaller than this z here and so that's a particular random number a
00:58:36.000 --> 00:58:42.000
particular real number here whereas that is a scalar random variable and then we know of course
00:58:42.000 --> 00:58:48.000
if we set the mu equal to zero and the sigma equal to one the expression we had in the previous
00:58:48.000 --> 00:58:53.000
green box where we defined the random variable the normally distributed random variable
00:58:53.000 --> 00:59:00.000
simplifies to this expression here one over the square root of two pi times the exponential of
00:59:00.000 --> 00:59:06.000
minus one half z square negative one half z square
00:59:09.000 --> 00:59:14.000
the corresponding cumulative distribution function then is often denoted as capital phi
00:59:14.000 --> 00:59:19.000
so that if you are not so familiar with greek letters is a small phi and this here is a capital phi
00:59:20.000 --> 00:59:25.000
the capital phi is the cumulative distribution function so the integral of the probability
00:59:25.000 --> 00:59:32.000
density function that's also conventional that we typically use small letters for pdfs and we use
00:59:32.000 --> 00:59:42.000
capital letters for cdfs so capital phi of z is equal to the probability of the random variable
00:59:42.000 --> 00:59:51.000
capital z being less than or equal to this real number set and obviously this probability you
00:59:51.000 --> 01:00:00.000
will compute as the integral of the pdf running from minus infinity up to z as the upper limit
01:00:00.000 --> 01:00:14.000
of the integral now the function phi of z is difficult to compute that is done by computers
01:00:14.000 --> 01:00:21.000
and you can find values for the function phi of z which you typically need in statistical
01:00:21.000 --> 01:00:28.000
hypothesis testing either in statistic tables in textbooks or you can compute them with
01:00:28.000 --> 01:00:35.000
statistics computer packages almost all statistical computer packages if they're all i know have
01:00:35.000 --> 01:00:43.000
functions have the function phi of z as built in function programmed so that you can evaluate phi
01:00:43.000 --> 01:00:51.000
of z as at any given point set there are some important rules which actually follow from the
01:00:51.000 --> 01:00:59.000
properties of a cdf for instance the probability that the random variable z is greater than some
01:00:59.000 --> 01:01:08.000
real number small z is of course just one minus phi of z minus capital phi of z because
01:01:08.000 --> 01:01:14.000
capital phi of z as we had just said is the probability that that is less than or equal
01:01:14.000 --> 01:01:23.000
to z so obviously this is just a complementary event and therefore the probability is one minus
01:01:23.000 --> 01:01:34.000
capital phi of z then since the normal distribution is symmetric it is true that the probability of
01:01:34.000 --> 01:01:43.000
z the random variable z being smaller than negative z is equal to the probability of z
01:01:43.000 --> 01:01:51.000
being greater than z regardless of whether small z is positive or negative but it's easier to think
01:01:51.000 --> 01:01:56.000
of it if you think that small z is a positive number then you here have a negative number and
01:01:56.000 --> 01:02:00.000
here you have a positive number but it would even be true if that is negative and here you have a
01:02:00.000 --> 01:02:04.000
negative number and here you have a positive number doesn't play a role so the symmetry of
01:02:04.000 --> 01:02:12.000
the normal distribution implies this property here and then of course the probability that z lies
01:02:12.000 --> 01:02:18.000
between a and b and i have already told you in a previous lecture that doesn't really play a role
01:02:18.000 --> 01:02:25.000
of whether we have less than or equal to here or just a strict less than because the probability of
01:02:25.000 --> 01:02:33.000
the equal sign is always zero so the probability of z being between a of and b is just the
01:02:33.000 --> 01:02:42.000
probability of phi of b minus phi of a both capital phi's you may easily derive this from
01:02:42.000 --> 01:02:47.000
the previous properties i would encourage you to do that at home if you like
01:02:47.000 --> 01:02:59.000
why is this random this standard normal random variable of particular importance
01:02:59.000 --> 01:03:05.000
well the reason is actually that any random variable which is normally distributed so not
01:03:05.000 --> 01:03:12.000
standard normally distributed but somehow normally distributed with some mu and some sigma can always
01:03:12.000 --> 01:03:18.000
easily be transformed we have a standard normal distribution and the way to do this is the
01:03:18.000 --> 01:03:25.000
following if we have some normally distributed random variable x so x is distributed normally
01:03:25.000 --> 01:03:35.000
with mu x and variance sigma x squared then we can compute the transformation z so transformed
01:03:35.000 --> 01:03:42.000
new random variable which would be random variable x minus its expected value minus mu x
01:03:42.000 --> 01:03:48.000
divided by its standard deviation and it's easy to see that this random variable here
01:03:48.000 --> 01:03:54.000
the standardized or transformed random variable has a standard normal distribution so it is
01:03:54.000 --> 01:04:02.000
distributed as a normal distribution with expectation zero and standard deviation and
01:04:02.000 --> 01:04:10.000
and variance one obviously the expectation is zero because the expectation of x is mu x and
01:04:10.000 --> 01:04:18.000
mu x minus mu x is zero and it's very easy also to compute that this thing has a standard deviation
01:04:18.000 --> 01:04:28.000
of one we can generalize this result slightly if x is distributed as mu and sigma squared so just
01:04:29.000 --> 01:04:39.000
some normal distribution and we have scalars a and b real numbers then y which is defined as a
01:04:39.000 --> 01:04:47.000
times x plus some constant b so some fine transformation of the normally distributed
01:04:47.000 --> 01:04:54.000
random variable x such an affine transformation would also follow a normal distribution
01:04:55.000 --> 01:05:03.000
namely a normal distribution with mean a times mu x plus b and with variance a squared times sigma x
01:05:03.000 --> 01:05:12.000
squared the property is important because this tells us any linear transformation or any
01:05:12.000 --> 01:05:18.000
affine transformation actually difference between linear and defined is that affine also has a
01:05:18.000 --> 01:05:25.000
constant added here any affine transformation of a random variable of a normally distributed
01:05:25.000 --> 01:05:32.000
random variable is still normally distributed so the normal distribution is invariant to any type
01:05:32.000 --> 01:05:39.000
of the fine transformations and that makes it sometimes very easy to compute the distribution
01:05:39.000 --> 01:05:46.000
of test statistics because we know that the normal distribution property is preserved under
01:05:46.000 --> 01:05:56.000
affine transformations now here is a small exercise define that equation four is a special
01:05:56.000 --> 01:06:06.000
case of five so look at equation four here and make it clear to you that this thing five is
01:06:06.000 --> 01:06:17.000
actually a generalization of equation four okay here's the pdf of the standard normal distribution
01:06:17.000 --> 01:06:26.000
so now we have mean zero unlike the two previous examples i gave you for a normally distributed
01:06:26.000 --> 01:06:31.000
normally distributed random variable and then we have a variance of one and a standard deviation
01:06:31.000 --> 01:06:37.000
of one obviously half of the probability is to the left of the mean and the other half
01:06:37.000 --> 01:06:42.000
is to the right of the mean that's trivial because the pdf is symmetric
01:06:48.000 --> 01:06:57.000
now we may look at how much probability mass is within a certain interval around
01:06:57.000 --> 01:07:04.000
the expected value zero so for instance if we take an interval of one standard deviation
01:07:05.000 --> 01:07:13.000
then the probability mass which is the probability mass of this interval
01:07:14.000 --> 01:07:23.000
here so the probability of the random variable having a realization within plus or minus one
01:07:23.000 --> 01:07:34.000
standard deviation this probability is 0.68 so roughly it is two-thirds right with a probability
01:07:34.000 --> 01:07:45.000
of 66 percent or precisely 68.26 percent a standard normal random variable will generate
01:07:45.000 --> 01:07:50.000
observations which are less than one standard deviation apart from its expected value
01:07:51.000 --> 01:07:54.000
so this probability here is 66 percent
01:07:57.000 --> 01:08:02.000
this generalizes two variables to normal variables which are not standard normal
01:08:04.000 --> 01:08:10.000
if we have the area between one standard minus one standard deviation and plus one standard
01:08:10.000 --> 01:08:19.000
deviation then this is 68.26 percent so roughly two-thirds is the probability of being within
01:08:19.000 --> 01:08:28.000
plus or minus one standard deviation the area between the z scores minus two and two for
01:08:28.000 --> 01:08:37.000
standard normal distribution is roughly 95 percent right and this again generalizes to any type of
01:08:37.000 --> 01:08:44.000
standard deviation so if we have a normal distribution which is not the standard normal
01:08:44.000 --> 01:08:52.000
then we know the probability of observations lying between minus two standard deviations and
01:08:52.000 --> 01:09:00.000
plus two standard deviations is roughly 95 percent and you will certainly recall from elementary
01:09:00.000 --> 01:09:08.000
statistics that you have always had tests with a critical value of point you know of 1.96 or
01:09:09.000 --> 01:09:22.000
minus 1.96 these are actually related to the diagram here because they leave precisely 95
01:09:22.000 --> 01:09:30.000
percent probability mass not 95.44 but they have 95 percent in between so the remaining
01:09:30.000 --> 01:09:36.000
probability mass here is then 2.5 percent and 2.5 percent here actually it's much easier to
01:09:36.000 --> 01:09:43.000
just apply the value of plus or minus two standard deviations rather than this 1.96
01:09:43.000 --> 01:09:50.000
this just confuses students so please bear in mind that a useful rule of thumb for the approximate
01:09:50.000 --> 01:09:55.000
five percent confidence level is just plus or minus two standard deviations.
01:09:55.000 --> 01:10:07.000
Oops yes and I think I leave it at this I have not quite completed this set of slides but there
01:10:07.000 --> 01:10:15.000
is not much to be covered in the next lecture the last minutes as I told you I would like to reserve
01:10:15.000 --> 01:10:24.000
for questions and answers so I will stop the recording here and ask you to go into the
01:10:24.000 --> 01:10:29.000
meeting so click on the other link which we have used already last week Thursday
01:10:29.000 --> 01:10:34.000
and then we can see whether there are questions or comments what I have presented in the lecture
01:10:35.000 --> 01:10:43.000
so I will stop now the recording and then stop this