WEBVTT - autoGenerated
00:00:00.000 --> 00:00:19.000
Thank you.
00:00:30.000 --> 00:00:54.000
Hello and welcome to today's lecture on estimation and inference.
00:00:55.000 --> 00:01:01.000
Do you have any questions? Please raise your hand if you do.
00:01:10.000 --> 00:01:13.000
If you do not have any questions then I would have a question.
00:01:13.000 --> 00:01:19.000
We started yesterday with this review of basic econometrics and I started off with a very easy
00:01:19.000 --> 00:01:27.000
setup which surely all of you have seen of bivariate regressions in sort of standard
00:01:28.000 --> 00:01:34.000
scalar notation and then I moved on to multivariate regression which I'm sure you're also acquainted
00:01:34.000 --> 00:01:42.000
with but in matrix notation. Would please those of you raise your hand who are not familiar with
00:01:42.000 --> 00:01:48.000
matrix notation in econometrics.
00:01:59.000 --> 00:02:05.000
Okay so these are three participants which have raised the hand which is quite good I would say.
00:02:06.000 --> 00:02:13.000
So the majority I would say the overwhelming majority currently there's 16 participants here
00:02:13.000 --> 00:02:20.000
of you are acquainted with matrix notation. Those three which are not I can only encourage to
00:02:21.000 --> 00:02:28.000
review your knowledge in linear algebra and because I will as I said already yesterday
00:02:28.000 --> 00:02:34.000
do most of the analysis in basic econometrics and actually also a substantial part later on
00:02:34.000 --> 00:02:43.000
in matrix notation that we make use of results of linear algebra actually quite a bit in today's
00:02:43.000 --> 00:02:49.000
lecture already. So if you have trouble and following my reasoning in today's lecture
00:02:49.000 --> 00:02:57.000
because you have deficits deficiencies in your knowledge of linear algebra and then please
00:02:57.000 --> 00:03:07.000
review your linear algebra. Look at any textbook in linear algebra and familiarize yourself with
00:03:07.000 --> 00:03:15.000
matrix notation and I'm sure after some time you realize that matrix notation is much more simple
00:03:16.000 --> 00:03:25.000
than any other type than the sort of this kind of notation with summation science which dominates
00:03:25.000 --> 00:03:32.000
undergraduate textbooks. So it is really worth the effort but there is certainly an effort in
00:03:33.000 --> 00:03:40.000
going into matrix notation and learning the rules which you need to know in order to follow
00:03:41.000 --> 00:03:50.000
this lecture here. So please those three which have not seen that before or who are not
00:03:50.000 --> 00:03:57.000
acquainted with it try to familiarize yourself with that as soon as possible.
00:03:58.000 --> 00:04:05.000
Now what we did last time was among other things that I introduced you to the N rows inverse
00:04:06.000 --> 00:04:14.000
the properties of which I have shown here I showed you to you once again four properties which define
00:04:14.000 --> 00:04:21.000
a unique matrix a plus which functions as a replacement for the standard inverse in the case
00:04:21.000 --> 00:04:27.000
that in the case where the matrix to be inverted cannot be inverted because it is not regular
00:04:27.000 --> 00:04:34.000
because it is singular. In this case the more Penrose inverse steps in and that is actually
00:04:34.000 --> 00:04:43.000
the case in econometrics where we have seen that the least squares estimator for some descriptive
00:04:44.000 --> 00:04:51.000
regression model where we estimate a parameter b relies strongly on the more Penrose inverse of
00:04:51.000 --> 00:04:59.000
the data matrix x because b is just the more Penrose inverse times the vector of observations
00:05:00.000 --> 00:05:08.000
y so more Penrose inverse of the data matrix x of its planetary variables x times the observed
00:05:08.000 --> 00:05:18.000
dependent variables y and I noted that x plus in the case of a matrix x which has full column rank
00:05:19.000 --> 00:05:27.000
is equal to x prime x inverse c times x prime right so this is same thing now where we left
00:05:27.000 --> 00:05:35.000
off is actually this type of properties here remember I defined a matrix m which is just
00:05:35.000 --> 00:05:45.000
the identity matrix minus x times x plus and this is matrix m and I noted that the
00:05:45.000 --> 00:05:56.000
residuals of the regression e can be written as m times y so basically this is you can see it
00:05:56.000 --> 00:06:05.000
from this expression here when you multiply from the right hand side y into this m matrix
00:06:05.000 --> 00:06:16.000
and you would get y here minus x times x plus y but x plus y is b so this is just y minus x b
00:06:16.000 --> 00:06:26.000
and this is of course just e the vector of regression residuals now this matrix m here has
00:06:26.000 --> 00:06:33.000
certain properties and this is where I left off yesterday one important property is that
00:06:33.000 --> 00:06:40.000
the regressor matrix x or x prime depending on from which side you multiply with m
00:06:41.000 --> 00:06:48.000
the regressor matrix x or the matrix x prime are both orthogonal to m so the product of x prime
00:06:48.000 --> 00:06:57.000
m is equal to zero and the product of m times x is also equal to zero and in addition m times m
00:06:57.000 --> 00:07:07.000
is m so the matrix is item voted and I think I gave you already this exercise where you have to
00:07:07.000 --> 00:07:17.000
show that x prime e is equal to zero that should not be very difficult and that the sum of the
00:07:18.000 --> 00:07:25.000
components of the e vector is equal to zero if the regression contains a constant term we will
00:07:25.000 --> 00:07:33.000
come back to this issue in today's lecture and then also as a third part this MATLAB exercise
00:07:33.000 --> 00:07:43.000
or whatever other software you want to use now as you had to show in this exercise or as you have
00:07:43.000 --> 00:07:55.000
to show in this exercise the vector of regression residuals e is orthogonal to the regressor matrix
00:07:55.000 --> 00:08:02.000
x typically in this regressor matrix x we have as one column mostly the first column just a
00:08:02.000 --> 00:08:11.000
vector of ones which is a column for the constant term so for the constant which we are to estimate
00:08:11.000 --> 00:08:21.000
in the regression equation as a control for the means of the variables we would have this vector
00:08:21.000 --> 00:08:28.000
of ones here which i in this lecture denote by a greek letter yota this little i here without
00:08:29.000 --> 00:08:40.000
the dot is a greek yota so this is just a vector of ones and clearly when x prime e is equal to zero
00:08:40.000 --> 00:08:49.000
then in particular it is true that yota prime e is equal to zero in any least squares estimation
00:08:49.000 --> 00:08:57.000
with a constant term so in any squares estimation where you have this column vector of ones as one
00:08:57.000 --> 00:09:06.000
column of the regressor matrix x so that yota prime e is equal to zero implies or is equivalent
00:09:06.000 --> 00:09:15.000
to saying that the sum of all the regression residuals is always equal to zero by construction
00:09:15.000 --> 00:09:23.000
so whenever we sum all the regression residuals in a regression where we have used a constant term
00:09:23.000 --> 00:09:30.000
then we will see that value of the sum is exactly equal to zero using this knowledge
00:09:31.000 --> 00:09:36.000
actually just using the result of the previous exercise where you had to show that x prime e
00:09:36.000 --> 00:09:44.000
is equal to zero of which of course this property here is a special case when x prime contains a
00:09:44.000 --> 00:09:51.000
vector of yota prime and you have a second exercise here which i ask you to solve which
00:09:51.000 --> 00:09:59.000
says that if the regression contains a constant term then show that y prime y is equal to b prime
00:09:59.000 --> 00:10:09.000
x prime xb plus e prime e and the significance of this result here is basically that the cross
00:10:09.000 --> 00:10:20.000
product between b prime x and e or the cross product between x b and e prime does not appear
00:10:20.000 --> 00:10:28.000
in this regression that this is always zero right so y prime y can be decomposed in two square terms
00:10:28.000 --> 00:10:35.000
one term being b prime x prime xb the other one being e prime e but we will make use of this
00:10:35.000 --> 00:10:46.000
result when we talk about the r squared okay using this result which i ask you to show in the
00:10:46.000 --> 00:10:54.000
exercise now we would have equation 14 which is just the same equation divided by n so we would
00:10:54.000 --> 00:11:01.000
have one over n y prime y is equal to one over n b prime x prime xb plus one over n e prime e
00:11:02.000 --> 00:11:11.000
and now probably you already sense where i'm aiming at this y prime y is a constituent element
00:11:11.000 --> 00:11:17.000
of the computation of the variance of y as you know right same is true for e prime e and this
00:11:17.000 --> 00:11:24.000
here will be something like the explained variance explained by the ols but by the least squares
00:11:24.000 --> 00:11:34.000
estimate b and the data matrix x the regressor matrix x now how can we use this equation here
00:11:34.000 --> 00:11:41.000
to come to something which is related to the variance well we will still have to subtract
00:11:42.000 --> 00:11:51.000
the squared means of the variables which we see here so what we do there is that we take the
00:11:52.000 --> 00:12:02.000
ordinary equation y is equal to xb plus e and pre-multiply it by one over n yuta prime right
00:12:02.000 --> 00:12:10.000
so this is what you see here just the standard equation y is equal to xb plus e is pre-multiplied
00:12:10.000 --> 00:12:19.000
by the vector yuta prime so by a row vector of once so that essentially all those vectors here
00:12:19.000 --> 00:12:28.000
are added up yuta prime y adds up all the elements of y yuta prime xb adds up all the elements of
00:12:28.000 --> 00:12:35.000
the vector x times b right x times b is a vector and the same thing of course here where we then
00:12:35.000 --> 00:12:43.000
know that yuta prime e is equal to zero by this orthogonality property which i have explained
00:12:43.000 --> 00:12:50.000
to you and then i also divide the whole equation by n and then of course adding up all the elements
00:12:50.000 --> 00:12:57.000
of a vector and dividing by n gives us just the mean of the elements in the vector so the result
00:12:57.000 --> 00:13:06.000
of this operation is that y bar is equal to x bar prime times b right the arrow component
00:13:06.000 --> 00:13:12.000
cancels or vanishes because yuta prime is equal to zero so we're just left with the y component
00:13:12.000 --> 00:13:19.000
and the x prime b component and in both cases obviously we can just this operation but of
00:13:19.000 --> 00:13:27.000
pre-multiplication by yuta just results in the means if we divide by n so that we have
00:13:27.000 --> 00:13:37.000
y bar which is now a scanner is equal to x bar prime times b and note please that x bar prime
00:13:37.000 --> 00:13:43.000
is just a row vector it's not a matrix anymore it's a row vector and then each element in
00:13:43.000 --> 00:13:52.000
this one row which this vector x bar prime has we find the mean of the respective column of the x
00:13:52.000 --> 00:13:59.000
matrix you can also write it in this way that you say x bar prime is a row vector which has the
00:13:59.000 --> 00:14:09.000
scalar one and then something which i here call x dot two similar x dot three x dot k always with
00:14:09.000 --> 00:14:18.000
a bar on top to indicate that these are means so this essentially means that x dot two is the mean
00:14:18.000 --> 00:14:25.000
of the second column of the x matrix right so this is why i use this dot symbol here to indicate
00:14:25.000 --> 00:14:32.000
that the index for the observation is going through all possible observations that we have
00:14:32.000 --> 00:14:38.000
but fixed is the index of the column namely in this case the second column so x dot two here x
00:14:38.000 --> 00:14:47.000
dot two bar is the mean of the second column of the x matrix so that's also a scanner and similarly
00:14:47.000 --> 00:14:54.000
all the other values here are scalars up to x dot k which is also a scalar so this is a one by k
00:14:54.000 --> 00:15:05.000
vector which we have here and this one by k vector is multiplied by the k by one column vector b so
00:15:05.000 --> 00:15:10.000
that the result one by k times k by one is just one by one right so on the left hand side of this
00:15:10.000 --> 00:15:14.000
equation is a one by one so a scalar and on the right hand side of this equation is also one by
00:15:14.000 --> 00:15:22.000
one so also a scalar since both the left hand side and the right hand side are scalar we can just
00:15:22.000 --> 00:15:30.000
square them right so for instance we can write y bar squared you may perhaps first think well
00:15:30.000 --> 00:15:35.000
why can we square this doesn't this look like a vector perhaps it does but it is not a vector
00:15:35.000 --> 00:15:43.000
it is a scalar and we can square it and we can write the square of the right hand side here in
00:15:43.000 --> 00:15:52.000
the form b prime x bar times x bar prime b because x bar prime b is a scalar and it's transposed
00:15:52.000 --> 00:16:00.000
then it's of course also a scalar so b prime times x bar is also a scalar two scalars multiplied by
00:16:00.000 --> 00:16:07.000
each other is just the square of the the value so we can write it in this form here this would be
00:16:07.000 --> 00:16:14.000
equation 16 well now we are of course ready to move to the variance because we would have one
00:16:14.000 --> 00:16:24.000
over n y prime y here and we would have y bar squared here so if we subtract equation 16
00:16:25.000 --> 00:16:34.000
from equation 14 then one over n y bar y minus y bar squared is just an estimate of the variance
00:16:35.000 --> 00:16:42.000
and the same thing holds here of course one over n b prime x x b prime x prime x b
00:16:43.000 --> 00:16:52.000
minus the square of b prime x bar would just be an estimate of the variance of b prime x
00:16:54.000 --> 00:17:04.000
so we use of course again the fact that e bar is equal to zero as to y but this should be clear
00:17:04.000 --> 00:17:11.000
and subtract 16 from 14 as i have just explained so i get an estimate of the variance of y
00:17:11.000 --> 00:17:24.000
bar head of y is equal to variance head of x b plus variance head of e why does variance head of e
00:17:25.000 --> 00:17:32.000
pop up here again well obviously while it is true that the mean of the ease is equal to zero
00:17:33.000 --> 00:17:40.000
and therefore this term here was equal to zero i still have the one over n e prime e here
00:17:41.000 --> 00:17:46.000
on subtracting its squared means would mean subtracting is squared zero
00:17:46.000 --> 00:17:55.000
so this would then give me the estimate of the variance which you see here so equation 17 tells
00:17:55.000 --> 00:18:02.000
me that the sample variance of the y's can be decomposed in the variance of the explanatory
00:18:02.000 --> 00:18:09.000
part of the regression which is the variance of this x b here and the variance of the error terms
00:18:10.000 --> 00:18:13.000
actually just always an estimate of the variance
00:18:17.000 --> 00:18:23.000
as an exercise please make it clear to you why this decomposition in equation 17 is not
00:18:23.000 --> 00:18:28.000
general true for regressions without constant well that should be easy
00:18:29.000 --> 00:18:32.000
i'll give you actually a hint how you can show this formally
00:18:34.000 --> 00:18:40.000
um or thank you perhaps that you would find it helpful to show this formally
00:18:40.000 --> 00:18:45.000
for the case of y and x having just a sample average of zero
00:18:45.000 --> 00:18:55.000
okay this concludes the descriptive part of regression analysis we will now move on to
00:18:55.000 --> 00:19:01.000
actually a model for regression analysis so to the linear regression model
00:19:01.000 --> 00:19:06.000
unless you have any questions for this first descriptive part
00:19:11.000 --> 00:19:12.000
if so then please raise your hand
00:19:15.000 --> 00:19:23.000
okay apparently no questions then let's move on with the linear regression model
00:19:24.000 --> 00:19:29.000
so i just want to point out what we have done so far is just descriptive statistics we were given
00:19:30.000 --> 00:19:37.000
sample of observations on y and on x and then we tried to find the best fit for the given
00:19:37.000 --> 00:19:45.000
observations so the errors e we can think of as just approximation errors for our for the model
00:19:45.000 --> 00:19:55.000
we have used right the model was merely descriptive but what we introduce now is the idea of a true
00:19:55.000 --> 00:20:02.000
model which generates all the observations both those observations which we have at our disposal
00:20:02.000 --> 00:20:09.000
which i call the current observations and possibly also future observations because
00:20:09.000 --> 00:20:16.000
if we want to do forecasting then we need to rely on a model which also generates observations which
00:20:16.000 --> 00:20:24.000
we don't know of as of today so which generates future observations in the merely descriptive
00:20:24.000 --> 00:20:30.000
approach which we had there was nothing which allowed us to forecast anything because we had
00:20:30.000 --> 00:20:35.000
not assumed the existence of some type of true model we were just trying to find the best fit
00:20:35.000 --> 00:20:43.000
for the observations which we had at hand but mostly economists hope that the information
00:20:43.000 --> 00:20:50.000
they have in their data is just a sample of a larger set of possible observations which they
00:20:50.000 --> 00:20:57.000
might also have at their hand generated by the same model so that they can actually make inference
00:20:57.000 --> 00:21:02.000
about the properties of the model and that they can do forecasting if this is desired
00:21:04.000 --> 00:21:11.000
so the true model or the verding true model is actually the same thing as what i called the
00:21:11.000 --> 00:21:20.000
population in the first two sets of slides so the review of probability and the review
00:21:20.000 --> 00:21:26.000
of statistics the true model is a population it is not just the sample
00:21:29.000 --> 00:21:37.000
if we write it for just one observation the true model would look like this yi would be a scalar
00:21:37.000 --> 00:21:47.000
and we would say well we have linear model xi prime times beta plus ui is how we assume that
00:21:47.000 --> 00:21:57.000
yi is generated in the population so we would say yi the population is generated by some systematic
00:21:57.000 --> 00:22:06.000
part xi prime times beta right xi prime is row vector multiplied by a column vector here so
00:22:06.000 --> 00:22:15.000
that this is one by one row vector would be one by k beta is k by one and then there is also a
00:22:15.000 --> 00:22:22.000
error term ui which is generated in the population right the gender the the population
00:22:22.000 --> 00:22:31.000
generates these error terms basically defines the distribution of those error terms ui here
00:22:32.000 --> 00:22:39.000
and so these ui's can be thought of as shocks for instance whereas the e's which i had in the
00:22:39.000 --> 00:22:47.000
descriptive part can only be thought of as approximation errors this representation
00:22:47.000 --> 00:22:55.000
here in equation 18 is just one single equation for one single observation yi but if we write
00:22:55.000 --> 00:23:00.000
it in matrix notation then the notation is actually easier because then we can suppose
00:23:00.000 --> 00:23:10.000
that there is a whole vector of observations y going from y1 over y2 over yi to yn and this is
00:23:10.000 --> 00:23:18.000
being explained or that the model is that this is being explained as x times beta plus u where of
00:23:18.000 --> 00:23:24.000
course x is now a matrix and not only a column vector sorry not only a row vector and u is also
00:23:25.000 --> 00:23:34.000
a column vector not a scalar like in equation 18 so the ui's can be thought of as economic
00:23:34.000 --> 00:23:40.000
shocks which distort some type of let's say equilibrium relationship between the y and the
00:23:40.000 --> 00:23:48.000
x's right so we can think perhaps there's some equilibrium between y and x variables the strength
00:23:48.000 --> 00:23:54.000
of the x variables being measured by the beta and the ui's are shocks which distort this
00:23:54.000 --> 00:24:08.000
equilibrium relationship which propel the yi's of the equilibrium trajectory described by the xi beta
00:24:11.000 --> 00:24:17.000
in order to perhaps point out even more clearly what's the difference between the descriptive
00:24:17.000 --> 00:24:27.000
model and the approach where we assume the existence of some population model think of it
00:24:27.000 --> 00:24:37.000
this way in equation 18 which was this equation here or similarly in 19 ui is actually causal
00:24:37.000 --> 00:24:47.000
for yi so in equation 18 an increase in ui would mean that yi increases at least as long as the xi
00:24:47.000 --> 00:24:57.000
beta is independent of the ui's right so ui would be causal for yi which means that the error process
00:24:57.000 --> 00:25:06.000
the u's is part of the data generating process for the yi's the ui's generate generate the yi's
00:25:08.000 --> 00:25:13.000
in the descriptive model which we analyzed last lecture and at the beginning of this lecture
00:25:13.000 --> 00:25:21.000
um equation 10 we did not know at all how the data were generated and the ei's were just
00:25:21.000 --> 00:25:31.000
implied by given values for yi and xi so there the ei could not be interpreted as something causal
00:25:31.000 --> 00:25:38.000
but rather this had to be interpreted as something which was implied by the difference between the
00:25:38.000 --> 00:25:45.000
yi's and the xi's the ei's measure just the quality of the fit but they did not allow any
00:25:45.000 --> 00:25:51.000
kind of inference on what happens if the economy or the market or the firm or whatever we study
00:25:52.000 --> 00:25:55.000
is hit by some type of shock
00:25:59.000 --> 00:26:06.000
now when we think that this model which we have just described model eight model of equation 18
00:26:06.000 --> 00:26:15.000
or 19 is valid for all i's even for future observations or unobserved observations which we
00:26:15.000 --> 00:26:24.000
just ignore because somebody has forgotten to record the data for instance we always think of
00:26:24.000 --> 00:26:32.000
it as having just n observations so in principle the model would say um it is valid for all times
00:26:32.000 --> 00:26:39.000
and the model can generate infinitely many observations but we are in the situation where
00:26:39.000 --> 00:26:48.000
we have just a sample of n observations for the y's and for the x x variables so we would have
00:26:48.000 --> 00:26:55.000
again a vector y which is a one by n sorry n by one vector and we would have a matrix x which is
00:26:55.000 --> 00:27:03.000
accordingly n by k matrix this describes our sample this describes the observation we have
00:27:03.000 --> 00:27:09.000
available for estimating the relationship for estimating the unknown beta but the model in
00:27:09.000 --> 00:27:18.000
principle would also be valid for other observations outside of this vector either observations further
00:27:18.000 --> 00:27:24.000
down in the past or observations in between which are missing or future observations
00:27:26.000 --> 00:27:36.000
in any case it is our aim to estimate the unknown vector beta now if we had additional observations
00:27:36.000 --> 00:27:43.000
on the regressors let's say we have x n plus one and x n plus two and so forth then we could use
00:27:43.000 --> 00:27:51.000
the vector beta if we knew it or if we have estimated it we could use either the vector beta
00:27:51.000 --> 00:28:00.000
or its estimate to arrive some forecast of the y vector so this would allow us to forecast
00:28:00.000 --> 00:28:07.000
the y if we have knowledge of the beta and if we have more of the x observations then we have
00:28:07.000 --> 00:28:13.000
observations on the y model as i have already pointed out this would not be true for a model
00:28:13.000 --> 00:28:16.000
which is merely descriptive such as the model in equation 10
00:28:16.000 --> 00:28:28.000
i hope i have made this idea clear if it is not then just ask me a question
00:28:28.000 --> 00:28:35.000
but if you don't ask then i move on to explain the most basic assumptions which we typically make
00:28:35.000 --> 00:28:41.000
in this type of model in this type of analysis namely the so-called gauss markov assumptions
00:28:42.000 --> 00:28:50.000
of which there are four and here they are so assumption a one is the assumption that the
00:28:50.000 --> 00:29:01.000
expectation of the shocks of the errors ui is equal to zero for y so this means these shocks
00:29:01.000 --> 00:29:08.000
these errors cannot be forecasted or can only be forecasted with zero we have actually no knowledge
00:29:08.000 --> 00:29:16.000
about them at all we just know their mean is okay the expected value is zero so best thing
00:29:16.000 --> 00:29:21.000
when we do a forecast is just to ignore this error here because we don't know anything about it
00:29:21.000 --> 00:29:29.000
the expected value is actually zero this is quite an innocuous assumption because we do allow
00:29:29.000 --> 00:29:34.000
for non-zero mean of our variables when we include a constant in the regressor matrix
00:29:34.000 --> 00:29:43.000
so it is quite natural to suppose that the error by itself has just an expected value of zero
00:29:44.000 --> 00:29:53.000
because if it had some expected value different from zero then we could estimate this value
00:29:53.000 --> 00:29:59.000
different from zero as a part of our error term and define modified residues which again have
00:29:59.000 --> 00:30:06.000
an expected value of zero so the expectation of ui is equal to zero for all errors i
00:30:06.000 --> 00:30:14.000
and this means that the expectation of the vector u is the zero vector all right so this zero here
00:30:14.000 --> 00:30:22.000
is a scalar and this zero denotes a vector namely an n by one vector since u is n by one
00:30:22.000 --> 00:30:30.000
this assumption a1 and that is a very innocuous assumption the second assumption is by far not
00:30:30.000 --> 00:30:38.000
as innocuous it is the assumption on strict exogeneity of x it is the assumption that the
00:30:38.000 --> 00:30:46.000
matrix x of regresses is strictly exogenous now what exactly does this mean this means that the
00:30:46.000 --> 00:30:55.000
conditional expectation of u given x is the same thing as the expectation of u as the
00:30:55.000 --> 00:31:04.000
unconditional expectation of u right so that is what we call exogeneity x itself is exogenous
00:31:04.000 --> 00:31:13.000
and therefore it does it is not at all helpful in expecting u right the condition the conditional
00:31:13.000 --> 00:31:19.000
expectation is equal to the unconditional expectation now note that many textbooks
00:31:19.000 --> 00:31:27.000
you will find a combination of assumptions a1 and a2 by for instance stating assumption a2
00:31:27.000 --> 00:31:35.000
as saying that the expectation of u given x is equal to zero well that clearly evolves already
00:31:35.000 --> 00:31:40.000
assumption a1 because the expectation of u the unconditional expectation of u is equal to zero
00:31:40.000 --> 00:31:46.000
so here by means of assumption a1 and a2 we have the same property right i noted here a1 and a2
00:31:46.000 --> 00:31:52.000
imply the conditional expectation of u given x is equal to zero and in many textbooks you actually
00:31:52.000 --> 00:31:59.000
find this property here conditional expectation of u given x is equal to zero as assumption a2
00:32:00.000 --> 00:32:07.000
but strictly speaking this metals assumption a1 and a2 and i'd rather take the two of them apart
00:32:07.000 --> 00:32:12.000
right so exogeneity just says that the conditional expectation is equal to the
00:32:12.000 --> 00:32:17.000
unconditional expectation and i make no assumption in a2 about what value the
00:32:17.000 --> 00:32:25.000
unconditional expectation has but rather this is assumption a1 third assumption is what we call
00:32:25.000 --> 00:32:31.000
homoscedasticity one of you yesterday already mentioned the phenomenon of heteroscedasticity
00:32:31.000 --> 00:32:37.000
i hadn't even defined what homoscedasticity was but i'm sure most of you know about heteroscedasticity
00:32:37.000 --> 00:32:46.000
from your uh credit um econometrics lectures homoscedasticity it says that the variance of the u i's
00:32:46.000 --> 00:32:54.000
is the same for all i so for all i we have the same variance which i call sigma square u right
00:32:55.000 --> 00:33:04.000
uh regardless which which u i we take all of the ui's shall have the same variance
00:33:05.000 --> 00:33:11.000
homoscedasticity has nothing to do with the covariances right homoscedasticity is just
00:33:11.000 --> 00:33:21.000
an assumption about the variance of the ui's the covariances i deal with in assumption a4
00:33:22.000 --> 00:33:27.000
and this assumption i call the uncorrelatedness assumption the uncorrelatedness assumption says
00:33:27.000 --> 00:33:35.000
that the covariance between some error u i and some other error u j is equal to zero
00:33:36.000 --> 00:33:45.000
for all possible pairs of i and j where i is different from j obviously i must be different
00:33:45.000 --> 00:33:51.000
from j because if i were the same thing as j then we would have here the covariance between
00:33:51.000 --> 00:33:57.000
u i and u i and that would be the variance of u i so it would be equal to sigma square u and
00:33:57.000 --> 00:34:03.000
this is not equal to zero but we have just assumed that it is greater than zero so the uncorrelatedness
00:34:03.000 --> 00:34:10.000
assumption is an assumption about different use which always shall have a covariance of zero
00:34:12.000 --> 00:34:16.000
what we do not do in the gauss markov assumptions but we'll do only a little later
00:34:16.000 --> 00:34:25.000
in an assumption a5 is that we assume a particular distribution for the shocks u so it is not part of
00:34:25.000 --> 00:34:31.000
the gauss markov assumptions for instance to assume that the u's are normally distributed
00:34:31.000 --> 00:34:37.000
for instance which would be our assumption a5 right the properties which i'm now going to prove
00:34:37.000 --> 00:34:45.000
using the gauss markov assumptions are completely independent of the distribution of the u's
00:34:45.000 --> 00:34:54.000
provided that these three or no actually four properties are satisfied expected value the
00:34:54.000 --> 00:34:58.000
unconditional expected value is zero the conditional expected value is the same as the
00:34:58.000 --> 00:35:05.000
unconditional expected value all of the u's have the same variance so we have almost elasticity
00:35:05.000 --> 00:35:09.000
and all of the u's are uncorrelated but there's no assumption about normality here
00:35:09.000 --> 00:35:20.000
now when we have assumptions a3 and a4 which by the way are not at all innocuous assumptions
00:35:20.000 --> 00:35:26.000
so that you would find many data when this is these assumptions a3 and a4 are actually
00:35:26.000 --> 00:35:37.000
violated but if we do assume a3 and a4 then it is true that the covariance matrix of the u's
00:35:37.000 --> 00:35:46.000
which i denote v of u so the expectation of u times u prime is equal to an identity matrix
00:35:46.000 --> 00:35:54.000
multiplied by the variance of the ui's namely sigma square u so it would be a diagonal matrix
00:35:54.000 --> 00:36:02.000
which has zeros everywhere of the diagonal right and on the diagonal there would just be this
00:36:02.000 --> 00:36:08.000
sigma square u term here because the sigma square u would be multiplied by the ones in
00:36:08.000 --> 00:36:18.000
the main diagonal so assumptions a3 and a4 together imply that the covariance matrix of
00:36:19.000 --> 00:36:28.000
the of the shocks u is diagonal and that on the diagonal we always have the same element
00:36:28.000 --> 00:36:38.000
sigma square u again sometimes in textbooks you find that this property that the variance
00:36:40.000 --> 00:36:47.000
matrix that the covariance matrix is equal to sigma square u times identity matrix that this
00:36:47.000 --> 00:36:55.000
property is called homoscedasticity this is not true this is actually wrong to say it that way
00:36:55.000 --> 00:37:02.000
and homoscedasticity is just an assumption about the diagonal of this matrix here the fact that the
00:37:02.000 --> 00:37:08.000
off diagonal elements are zero has nothing to do with homo or heteroscedasticity it has something
00:37:08.000 --> 00:37:14.000
to do with the uncorrelatedness of the variables right and this is assumption of the of the shocks
00:37:14.000 --> 00:37:19.000
so this is assumption a4 the homoscedasticity assumption is just the assumption about the
00:37:19.000 --> 00:37:26.000
diagonal not the assumption about the structure of the whole matrix all right in
00:37:28.000 --> 00:37:37.000
the very the covariance matrix v of u has general form which you see here namely on the main
00:37:37.000 --> 00:37:44.000
diagonal you always have the variance of the use right because when you compute this expected
00:37:44.000 --> 00:37:52.000
value here of u u prime then on the diagonal you will always have the expected value of for
00:37:52.000 --> 00:38:00.000
instance in this element here expected value of u1 squared right or here the expected value of u2
00:38:00.000 --> 00:38:09.000
squared or here the expected value of un squared and since the expectation of u is equal to
00:38:10.000 --> 00:38:19.000
zero this would be the the diagonal would then be the diagonal which is filled by the variances
00:38:19.000 --> 00:38:27.000
of the of the shocks and similarly in the off diagonal elements you would have any kinds of
00:38:27.000 --> 00:38:34.000
cross products for instance here you would have the expectation of u2 times u1 which is equal to
00:38:34.000 --> 00:38:42.000
the covariance of u2 times the covariance of u2 and u1 i am given that the expectation of u2 is
00:38:42.000 --> 00:38:48.000
equal to zero expectation of u1 is also equal to zero so that's the covariance and since the
00:38:48.000 --> 00:38:55.000
expectation of u2 times u1 is the same thing as the expectation of u1 times u2 the covariance
00:38:55.000 --> 00:39:00.000
which you find here is the same as the covariance which you find here right so it doesn't matter
00:39:00.000 --> 00:39:08.000
whether i first write u2 and u1 as i do here or first u1 and u2 as i write here this covariance
00:39:08.000 --> 00:39:16.000
is based on the product of the two terms so these two two covariances are the same and this holds
00:39:16.000 --> 00:39:23.000
of course for all the off diagonal elements the variance the covariance matrix is symmetric so
00:39:23.000 --> 00:39:31.000
for instance this element here is the same as that element down here and this element here is
00:39:31.000 --> 00:39:37.000
the same as this element they are by the same argument this covariance matrix i will sometimes
00:39:37.000 --> 00:39:44.000
denote as variance of u if you please bear in mind that it's not only the variance of ui's
00:39:44.000 --> 00:39:50.000
but it is also the covariance which i mean when i speak of the variance of u sometimes i will also
00:39:50.000 --> 00:39:56.000
say the covariance matrix of u sometimes i will also use this notation here as sigma u
00:39:57.000 --> 00:40:01.000
same thing right this is just a different symbol for the same thing for the variance of u
00:40:06.000 --> 00:40:11.000
now let me discuss assumption a2 i have not yet spoken much about this assumption
00:40:12.000 --> 00:40:15.000
except that i already warned you that assumption a2 is also not enough yes
00:40:16.000 --> 00:40:22.000
because assumption a2 may actually not be valid in many economic models i would
00:40:22.000 --> 00:40:32.000
even say as a rule it is not valid in economic models and the easiest example to see this would
00:40:32.000 --> 00:40:37.000
be the simple model of the Keynesian cross which you have learned in your first macro
00:40:38.000 --> 00:40:45.000
course where you say for instance that GDP if you denote it y in period i for instance
00:40:46.000 --> 00:40:53.000
is equal to consumption in period i and investment in period i we just forget about government
00:40:54.000 --> 00:41:01.000
exports and imports and say yi is equal to ci plus i by definition consumption plus investment
00:41:01.000 --> 00:41:08.000
and then we use a very simple Keynesian type consumption function consumption in period i
00:41:08.000 --> 00:41:17.000
we explain as some parameter small c times income in period i plus some shock some ui
00:41:17.000 --> 00:41:26.000
something which prevents a very simple linear at exact linear relationship so there's also
00:41:26.000 --> 00:41:33.000
the shock ui here which accounts for every other thing which affects consumption rather than just
00:41:33.000 --> 00:41:41.000
income now if we have this model here and our aim is it to estimate this parameter c to
00:41:41.000 --> 00:41:48.000
estimate its consumption function here then we have to realize already that assumption a2
00:41:48.000 --> 00:41:55.000
is violated it does not hold this model the easiest way to see this is that you assume that
00:41:55.000 --> 00:42:02.000
investment is just something exogenous okay so we really make life simple here say that investment
00:42:02.000 --> 00:42:08.000
is exogenous the argument could easily also be presented when investment also depends on income
00:42:08.000 --> 00:42:15.000
for instance now why does estimating the consumption function 21 this function here
00:42:15.000 --> 00:42:25.000
violate a2 observe that yi is by definition equal to ci plus i but ci is by virtue of the second
00:42:25.000 --> 00:42:35.000
equation in this model equal to small c times yi plus investment plus ui so you can take this
00:42:35.000 --> 00:42:43.000
equation here left hand side here right hand side here the second line and solve for yi right you
00:42:43.000 --> 00:42:48.000
have the yi on the left hand side you have the yi on the right hand side so for yi then you will
00:42:48.000 --> 00:42:59.000
find yi is equal to 1 over 1 minus c times investment plus the shock now this shows you that
00:42:59.000 --> 00:43:07.000
the yi which should be the regressor in our regression for estimating the consumption
00:43:07.000 --> 00:43:17.000
function here that this yi correlates with the error term ui right yi is the same thing as 1
00:43:17.000 --> 00:43:25.000
over 1 minus c times investment plus ui so the regressor yi would change when there is some shock
00:43:25.000 --> 00:43:34.000
ui so when this ui here is shocked then also the regressor yi changes and therefore yi is not
00:43:34.000 --> 00:43:42.000
strictly exogenous and assumption a2 is violated this is really important that you understand this
00:43:42.000 --> 00:43:50.000
example well um i asked for that in one of the last exams actually and was disappointed that
00:43:50.000 --> 00:43:55.000
many students were not able to reproduce it so please make sure that you can reproduce it for
00:43:55.000 --> 00:44:00.000
the next exam just in case i post a similar question i would certainly not post the same one
00:44:00.000 --> 00:44:06.000
but the idea is quite general and they should have understood it well
00:44:08.000 --> 00:44:13.000
which gives me an opportunity to ask you if you have any questions as far as this example is
00:44:13.000 --> 00:44:25.000
concerned or anything else no question okay just as an exercise show what the correlation between
00:44:25.000 --> 00:44:34.000
yi and ui is based on namely the expectation of yi and ui so the expectation of the product yi and ui
00:44:34.000 --> 00:44:40.000
is equal to sigma square u over 1 minus c so this is different from zero so
00:44:41.000 --> 00:44:47.000
very clearly the covariance and the correlation between yi and ui is different from zero
00:44:47.000 --> 00:44:57.000
now we have introduced uh the least squares estimator already in the descriptive model we do
00:44:57.000 --> 00:45:03.000
not need to go again through the derivation of the least squares estimator because this is exactly
00:45:03.000 --> 00:45:07.000
the same as in the descriptive model that's actually why i presented it already when i
00:45:07.000 --> 00:45:12.000
discussed the descriptive model so we know the least squares estimator which i denote here by
00:45:12.000 --> 00:45:20.000
beta hat is defined as x prime x inverse times x prime y which is the same thing as the
00:45:20.000 --> 00:45:26.000
muhanro's inverse x plus times y right so this is the estimator i started this lecture with
00:45:28.000 --> 00:45:38.000
when we denote the least squares estimator by beta hat then it is natural to also denote
00:45:38.000 --> 00:45:47.000
the estimated residuals by a hat so u hat are the residuals of the regression not the true errors
00:45:47.000 --> 00:45:56.000
because the true errors u are unobserved and unobservable but we can estimate their size
00:45:56.000 --> 00:46:04.000
of course this estimate typically is not exactly equal to the true shots so u hat is defined as y
00:46:04.000 --> 00:46:13.000
minus what we have explained the regression namely minus x beta hat right so this is just
00:46:13.000 --> 00:46:23.000
the definition here u hat is y minus x beta hat and similarly i then set y hat as the explained
00:46:23.000 --> 00:46:31.000
systematic part of the regression so y hat is equal to x beta hat so that we could also write
00:46:31.000 --> 00:46:42.000
the u hat here as y minus y hat right so u hat would be the y we have observed in the sample
00:46:42.000 --> 00:46:50.000
we have minus what we have explained in the regression via x times beta hat because y hat
00:46:50.000 --> 00:46:59.000
is the same thing as x times beta hat right okay so this is what i we call the residuals
00:47:00.000 --> 00:47:05.000
the residuals means already this is what we have estimated for the true errors right
00:47:06.000 --> 00:47:12.000
if i'm careful enough then i will always call the u the errors or the shots and i will call the
00:47:12.000 --> 00:47:18.000
u hats the residuals and you should know that the residuals is always yeah well what is left over
00:47:18.000 --> 00:47:24.000
what is left unexplained right problem wise explained is y hat unexplained is the difference
00:47:24.000 --> 00:47:34.000
between y and y hat and this unexplained thing we call u hat note that x prime u hat is equal to zero
00:47:34.000 --> 00:47:40.000
and i ask you again why that should be very easy for you to answer if you listen carefully to the
00:47:40.000 --> 00:47:49.000
first part of today's lecture now the gauss markov assumptions implies certain properties of the least
00:47:49.000 --> 00:47:58.000
squares estimator in particular they imply the property that the least squares estimator is the
00:47:58.000 --> 00:48:06.000
best estimator which we can find in a certain class of estimators which i will explain in
00:48:06.000 --> 00:48:15.000
the following um i will now derive this so-called gauss markov theorem uh for you
00:48:16.000 --> 00:48:23.000
uh so starting from the gauss markov assumptions i will at the end of this lecture have proven
00:48:23.000 --> 00:48:30.000
theorem which is called the gauss markov theorem which says that the least squares estimator
00:48:30.000 --> 00:48:39.000
is in some well-defined sense the best linear and unbiased estimator in yeah
00:48:39.000 --> 00:48:46.000
but the best linear unbiased estimator so the best estimator in a particular class of estimators
00:48:46.000 --> 00:48:52.000
namely in the class of linear and unbiased estimators and in order to do this we should
00:48:52.000 --> 00:49:00.000
first think about where what do i actually mean by best what kind of criterion can we use to say
00:49:00.000 --> 00:49:08.000
that one estimator is better than another estimator obviously uh one criterion could be
00:49:08.000 --> 00:49:15.000
bias already um i introduced you to the concept of a bias but i will not bother much about the
00:49:15.000 --> 00:49:24.000
bias here because i will require that the bias is zero right from the start um so all the estimators
00:49:24.000 --> 00:49:30.000
we will consider in this gauss markov theorem will be unbiased estimator and therefore the bias
00:49:30.000 --> 00:49:37.000
is no criterion for distinguishing one estimator from another estimator and many many unbiased
00:49:37.000 --> 00:49:45.000
estimators actually we would like to know which of the many unbiased estimator the class of linear
00:49:45.000 --> 00:49:52.000
estimators is the best one so we need an additional criterion for what is good or what is best
00:49:54.000 --> 00:50:00.000
in order to define such a criterion we first consider some arbitrary estimator which i denote
00:50:00.000 --> 00:50:08.000
by beta tilde so beta tilde is not necessarily equal to beta hat quite to the contrary beta tilde
00:50:08.000 --> 00:50:15.000
can be any estimator and beta hat is some particular estimator namely the least squares
00:50:15.000 --> 00:50:22.000
estimator x plus y and as i say there are many other possibilities for choosing certain estimators
00:50:22.000 --> 00:50:33.000
now it should be clear to you that any sensible estimator beta tilde depends on the errors u
00:50:34.000 --> 00:50:40.000
and is therefore by itself a random variable which has an expected value only then is it
00:50:40.000 --> 00:50:46.000
sensible to speak of an unbiased estimator so there's something like the expectation of beta tilde
00:50:46.000 --> 00:50:52.000
and since beta tilde this estimator is a random variable the beta tilde also has a covariance
00:50:52.000 --> 00:51:00.000
matrix which i denote v of beta tilde but of course v of beta tilde the covariance matrix
00:51:00.000 --> 00:51:06.000
is nothing else but the expectation of beta tilde minus its expectation
00:51:07.000 --> 00:51:14.000
times beta tilde minus expectations transposed prime right so that would be the covariance
00:51:14.000 --> 00:51:25.000
matrix of any arbitrary estimator beta tilde i recall here for you the definition of unbiasedness
00:51:25.000 --> 00:51:29.000
which we have had in previous lectures an estimator beta tilde is unbiased for the true
00:51:29.000 --> 00:51:37.000
parameter if it is true or the true parameter beta if and only if which is why i write the
00:51:37.000 --> 00:51:42.000
if you're with two f's if and only if the expectation of beta tilde is equal to the
00:51:42.000 --> 00:51:50.000
proof parameter beta right so we knew that beta tilde this estimator beta tilde is unbiased
00:51:50.000 --> 00:51:58.000
then rather than writing e of beta tilde here we could also write beta could write beta tilde
00:51:58.000 --> 00:52:03.000
minus the true parameter vector beta here oh there in the case of unbiasedness
00:52:03.000 --> 00:52:13.000
the second property which i introduce you to now and which will allow us to identify a well-defined
00:52:13.000 --> 00:52:22.000
best estimator is the definition of variance minimality so we want to find the vector which
00:52:22.000 --> 00:52:31.000
is which has a minimal variance an estimator which has a minimal variance given that the estimator is
00:52:31.000 --> 00:52:41.000
unbiased now by the way so let b be just a set of estimators for parameter vector beta so b contains
00:52:41.000 --> 00:52:46.000
many different estimators uh for the parameter vector beta some of them are good estimators
00:52:46.000 --> 00:52:54.000
others are bad estimators many are in person mediocre we will then say that an estimator
00:52:54.000 --> 00:53:02.000
which i now call beta bow an estimator beta bow from the set beta is variance minimal
00:53:03.000 --> 00:53:12.000
with respect to this set of estimators beta if and only if the difference between the covariance
00:53:12.000 --> 00:53:20.000
matrix of beta tilde and the covariance matrix of this particular estimator beta bow of which we
00:53:21.000 --> 00:53:27.000
say that is very minimal that this difference between the two covariance matrices is positive
00:53:27.000 --> 00:53:32.000
semi-definite for all vectors beta tilde from the set
00:53:34.000 --> 00:53:42.000
the seeps sounds perhaps very technical um the idea though is really easy the idea basically
00:53:42.000 --> 00:53:51.000
says that the estimator beta tilde is variance minimal if its covariance matrix is smaller than
00:53:51.000 --> 00:53:56.000
any other covariance matrix just that smaller is something which is only well defined in the
00:53:56.000 --> 00:54:02.000
scalar case we can say one is smaller than two but it's not so clear what we say for matrices
00:54:02.000 --> 00:54:12.000
when is a matrix smaller than another matrix well for matrices which are which are symmetric
00:54:13.000 --> 00:54:21.000
there is a convention which is actually a generalization of the idea of positiveness
00:54:21.000 --> 00:54:30.000
in the scalar case which says that something is smaller than some other matrix of one matrix in
00:54:30.000 --> 00:54:35.000
this case the variance the covariance matrix of beta bow is smaller than the covariance matrix of
00:54:36.000 --> 00:54:43.000
beta tilde if the difference between the two is positive semi-definite
00:54:45.000 --> 00:54:52.000
and i review here for you if you haven't if you don't remember it the symmetric matrix
00:54:52.000 --> 00:55:01.000
a is positive semi-definite if and only if small a prime times capital a times small a
00:55:02.000 --> 00:55:10.000
is greater or equal to zero for any vector a so regardless which vector a we take
00:55:11.000 --> 00:55:17.000
a prime a a is greater than or equal to zero for any vector a
00:55:17.000 --> 00:55:26.000
okay so basically the definition of variance minimality says it is the smallest possible
00:55:27.000 --> 00:55:32.000
covariance matrix which is associated with a variance minimal vector within variance
00:55:32.000 --> 00:55:39.000
minute mill estimator beta bow and any other vector any other estimator excuse me any other
00:55:39.000 --> 00:55:47.000
estimator from the set of estimators capital b any other such estimator beta tilde has a covariance
00:55:47.000 --> 00:55:55.000
matrix which is at least as sizable as the variance matrix beta bow so either it is the same
00:55:57.000 --> 00:56:04.000
size in which case the difference between these two variables these two matrices would be zero
00:56:04.000 --> 00:56:12.000
or it is even bigger than that one so that we would have a positive definite difference between
00:56:12.000 --> 00:56:22.000
v of beta tilde and v of beta bow as an exercise please prove if the difference between
00:56:22.000 --> 00:56:27.000
the covariance matrix of beta tilde and the covariance matrix of beta bow is positive
00:56:27.000 --> 00:56:35.000
semi-definite then the variance of beta bow is minimal in each component so it means the
00:56:36.000 --> 00:56:45.000
variance of beta bow k is less than or equal excuse me less than or equal to the variance of
00:56:45.000 --> 00:56:54.000
beta tilde k for all k's going from one to capital k so for all elements on the main diagonal of the
00:56:54.000 --> 00:57:01.000
covariance matrix it would be true that the variance of the estimator beta bow is smaller
00:57:01.000 --> 00:57:06.000
than the variance of the other estimators beta tilde smaller or equal to
00:57:10.000 --> 00:57:18.000
now let us return to the least squares estimator beta hat first thing what we want to do is that
00:57:18.000 --> 00:57:25.000
we want to check for the unbiasedness of beta hat so we want to check is the least squares estimator
00:57:25.000 --> 00:57:34.000
actually unbiased in order to prove the unbiasedness of beta hat we only need the two assumptions a1
00:57:34.000 --> 00:57:42.000
and a2 so we do not need any assumption about the homeless elasticity or about the uncorrelatedness
00:57:42.000 --> 00:57:51.000
of the error terms right a1 and a2 are sufficient how does the proof go we look at the expectation
00:57:51.000 --> 00:57:58.000
of beta hat well beta hat by definition is the mu penrose inverse times y so it is x prime x
00:57:58.000 --> 00:58:07.000
inverse x prime y and we take the expectation of that now next line is just replacing y here
00:58:08.000 --> 00:58:15.000
by what y is equal to namely x beta plus u so we have the expectation of x prime x
00:58:15.000 --> 00:58:24.000
inverse x prime times x prime beta plus u next line takes this sum apart here
00:58:25.000 --> 00:58:32.000
and pre multiplies from the left hand side x prime x inverse x prime with x beta and with u
00:58:33.000 --> 00:58:40.000
so we have the expectation of x prime x inverse x prime x beta plus the expectation of x prime x
00:58:40.000 --> 00:58:49.000
inverse x prime u now note x prime x inverse times x prime x is equal to the identity matrix
00:58:49.000 --> 00:58:55.000
so these two terms here cancel so that the first term here is just the expectation of beta
00:58:56.000 --> 00:59:02.000
beta is the unknown vector of true parameters so beta is not a random variable beta is just a
00:59:02.000 --> 00:59:09.000
vector of values so the expectation of beta is equal to beta right so now we are here the
00:59:09.000 --> 00:59:17.000
expectation of beta hat is equal to beta plus the expectation well actually the expectation of this
00:59:17.000 --> 00:59:29.000
term here but here we use the law of iterated expectations so we we use that the expectation
00:59:29.000 --> 00:59:36.000
of x prime x inverse x prime u can be written as the expectation of the expectation
00:59:37.000 --> 00:59:46.000
of x prime x inverse x prime u given x right so the expectation actually this is what i
00:59:47.000 --> 00:59:54.000
taught you that the expectation of a conditional expectation is equal to the unconditional
00:59:54.000 --> 01:00:06.000
expectation not the law of iterated expectations so beta so this would then mean that this is beta
01:00:06.000 --> 01:00:15.000
plus the expectation of and now in this second expected value here i can factor out all the x's
01:00:16.000 --> 01:00:23.000
because this is the expectation for a given x so i can write this as x prime x inverse x prime
01:00:24.000 --> 01:00:32.000
times the expectation of u given x and as you recall by assumptions a1 and a2 of the set of
01:00:32.000 --> 01:00:39.000
bus Markov assumptions this conditional expectation of u given x is equal to zero
01:00:40.000 --> 01:00:47.000
because assumptions a1 and a2 say that x is strictly exogenous because the says that
01:00:47.000 --> 01:00:51.000
the conditional expectation that this is the same thing as the unconditional expectation of
01:00:51.000 --> 01:00:57.000
u and the unconditional expectation of u by virtue of assumption a1 is equal to zero
01:00:58.000 --> 01:01:04.000
so this here is zero we apply if we multiply x prime x inverse a x prime by zero then this
01:01:04.000 --> 01:01:11.000
is also zero so the whole second term here cancels vanishes and we're just left with beta
01:01:12.000 --> 01:01:17.000
so we have shown that the expectation of beta hat is equal to beta
01:01:17.000 --> 01:01:25.000
by the way such a proof is also something which i may ask for in a written exam so you should be
01:01:25.000 --> 01:01:38.000
able to reproduce all right here's an exercise which you can do in your computer to some way
01:01:39.000 --> 01:01:46.000
do a Monte Carlo study which replicates this result for instance what you can do is
01:01:47.000 --> 01:01:53.000
just generate a hundred by three matrix x of random numbers so we would have three
01:01:53.000 --> 01:02:02.000
progresses for instance and then you write some loop over the following steps and go through this
01:02:02.000 --> 01:02:11.000
loop say a thousand times in each of the repetitions of this loop you generate a hundred by one
01:02:11.000 --> 01:02:18.000
random vector of disturbances so in loop j you generate a vector of disturbances uj
01:02:19.000 --> 01:02:26.000
and then you compute a vector yj is equal to x times some parameters and i have just set all
01:02:26.000 --> 01:02:32.000
the parameters equal to one here plus the disturbances uj the errors right disturbances
01:02:32.000 --> 01:02:40.000
is another synonym synonym for error software shocks okay plus the the errors uj here so in
01:02:40.000 --> 01:02:46.000
this case the beta vector would just be a vector of once you can also choose any other parameter
01:02:46.000 --> 01:02:54.000
values here or whatever you like but should be the same over the thousand loops and then given
01:02:54.000 --> 01:03:03.000
these artificially generated data in loop j you estimate an ols regression or let meta or r or
01:03:03.000 --> 01:03:13.000
whatever you use estimate beta hat j as x prime x inverse x prime times yj and let's save the
01:03:13.000 --> 01:03:20.000
estimated beta hat j well if when you do this a thousand times and you have arrived at a thousand
01:03:20.000 --> 01:03:27.000
estimated beta hat chains then you compute the average over all of those beta hat chains
01:03:28.000 --> 01:03:34.000
what should be the case is that you find that the average is true this is close to the true beta
01:03:34.000 --> 01:03:39.000
so should be close to this beta vector one one one if you have generated all your data
01:03:40.000 --> 01:03:43.000
with this one one one beta vector
01:03:43.000 --> 01:03:50.000
you may actually look at how close it is and it will depend on what kind of variance you have
01:03:50.000 --> 01:03:58.000
chosen for the uj here so if you find that the average is not yet close to the true beta
01:03:59.000 --> 01:04:03.000
and this probably the case because you have chosen some rather big value for the variance of the
01:04:03.000 --> 01:04:10.000
uj which is fine which you may do but then just go on and rerun your program with many more loops
01:04:10.000 --> 01:04:17.000
let's say 10 000 loops and see again how the average of the beta j's develops by the law of
01:04:17.000 --> 01:04:24.000
large numbers you should see that the larger the number of loops is the closer your average
01:04:24.000 --> 01:04:33.000
estimated beta gets to the true value of beta which shows you that the ols estimator is actually
01:04:33.000 --> 01:04:44.000
unbiased now we know now the least squares estimator is unbiased but as i told you there
01:04:44.000 --> 01:04:52.000
are many unbiased estimators around so we may consider a whole class of estimators which are
01:04:52.000 --> 01:05:00.000
unbiased we will actually not look at the class of all unbiased estimators but just
01:05:00.000 --> 01:05:08.000
confine our attention to the class of linear estimators for beta which are unbiased now the
01:05:08.000 --> 01:05:16.000
first thing is to ask what exactly is linear estimator of beta linear estimator of beta is
01:05:16.000 --> 01:05:26.000
of the form beta tilde is equal to some known matrix d times the random variable y plus some
01:05:26.000 --> 01:05:37.000
constant small d so d is just some k by n non-random matrix and small d is a k by 1 non-random
01:05:37.000 --> 01:05:45.000
vector y is the only random variable we have observed because the x's are by the assumption
01:05:45.000 --> 01:05:52.000
of strict exogeneity not random variables in their own right they are known to us we can just treat
01:05:52.000 --> 01:06:01.000
them as if they were numbers which are given to us fixed exogenous regressors okay so the vector
01:06:01.000 --> 01:06:08.000
beta tilde if it is a linear vector necessarily takes the form d times y plus small d there are
01:06:08.000 --> 01:06:16.000
many possible choices for the d or the capital d for the small d so the class of all linear estimators
01:06:16.000 --> 01:06:20.000
for beta good estimators and bad estimators is really huge
01:06:23.000 --> 01:06:29.000
since we have assumed that the x is strictly exogenous or deterministic
01:06:30.000 --> 01:06:36.000
which i treat here as a symbol for the strict exogeneity even though there are slight
01:06:36.000 --> 01:06:43.000
differences but let's take it here as just deterministic if the x is deterministic then
01:06:43.000 --> 01:06:50.000
the least squares estimator beta hat is a linear estimator where d capital d is just the matrix x
01:06:50.000 --> 01:06:57.000
plus so it's just x prime x inverse times x prime right and the small d is equal to zero
01:06:58.000 --> 01:07:05.000
so capital d would be x plus times y small d is equal to zero so we see the least squares
01:07:05.000 --> 01:07:10.000
estimator is a linear estimator for beta one of many possible estimators
01:07:13.000 --> 01:07:18.000
now let us look at the class of all linear estimators which are unbiased
01:07:18.000 --> 01:07:27.000
when is linear estimator unbiased see if beta hat sorry beta tilde is a general linear estimator
01:07:28.000 --> 01:07:36.000
then it has this form of d times y plus small d so the expectation of beta tilde is d times the
01:07:36.000 --> 01:07:45.000
expectation of y plus the d and for y we can substitute in x beta plus u because y is equal
01:07:45.000 --> 01:07:50.000
to x beta plus u so it would be d times the expectation of x beta plus u plus d
01:07:52.000 --> 01:08:00.000
well the expectation of u is of course zero by virtue of assumption a1 right so the expectation
01:08:00.000 --> 01:08:09.000
of x beta is given that x is deterministic the same thing as x beta so the expectation of beta
01:08:09.000 --> 01:08:20.000
tilde is equal to d x beta plus small d so we know that an arbitrary linear estimator beta tilde
01:08:20.000 --> 01:08:27.000
is unbiased for every value of beta if and only if the following two things are true
01:08:27.000 --> 01:08:34.000
namely d times x must be equal to the identity matrix right d times x here must be identity
01:08:34.000 --> 01:08:41.000
matrix so that we get beta here and the small d must be equal to zero okay so these two conditions
01:08:41.000 --> 01:08:49.000
26 and 27 are the conditions which are equivalent to saying that a linear estimator is unbiased
01:08:49.000 --> 01:08:58.000
and we can check again if the least squares estimator is unbiased we have proven that already
01:08:58.000 --> 01:09:06.000
but just another take on that the least squares estimator has as d times x well x prime x inversely
01:09:06.000 --> 01:09:15.000
time x prime which is x plus and the panoramas x inverse so this is d times x when you see x
01:09:15.000 --> 01:09:21.000
prime x inverse times x prime x is equal to i and the least squares estimator has d equal to zero so
01:09:21.000 --> 01:09:31.000
the least squares estimator is unbiased as an exercise for you please use the following
01:09:32.000 --> 01:09:36.000
prove the following lemma which i will use in the proof of the cosmarchov
01:09:37.000 --> 01:09:46.000
theorem the lemma says that if it is true that d times x is equal to i so that property 26 on
01:09:46.000 --> 01:09:56.000
the previous slide holds then we can decompose d times d prime as x plus x plus prime plus
01:09:56.000 --> 01:10:03.000
d minus x plus d minus x plus prime okay so this is similar to the decomposition i talked about
01:10:03.000 --> 01:10:14.000
earlier when i decomposed y into x beta plus u you have to show here again that the cross terms
01:10:14.000 --> 01:10:24.000
x plus times d minus x plus prime for instance that these terms are all zero
01:10:24.000 --> 01:10:28.000
there are two cross terms here and you just have to see that these are zero so that's easy to prove
01:10:28.000 --> 01:10:41.000
now look at the least squares estimator again and subtract its expected value so here we have beta
01:10:41.000 --> 01:10:47.000
hat minus beta which has an expected value of zero of course this is equal to x prime x inverse
01:10:47.000 --> 01:10:56.000
the x prime y minus the true vector beta for y we can again substitute in x beta plus u so this
01:10:56.000 --> 01:11:06.000
is x prime x inverse x prime times x beta plus u minus beta then same argument as before x prime x
01:11:06.000 --> 01:11:12.000
inverse times x prime x is equal to the identity matrix so the first component here is just beta
01:11:13.000 --> 01:11:22.000
plus x prime x inverse x prime u minus beta so the beta here and the minus beta here cancel
01:11:22.000 --> 01:11:26.000
and we see that beta hat minus beta is equal to x prime x inverse the x prime u
01:11:30.000 --> 01:11:37.000
this we need for deriving for the purpose of deriving deriving a convenient expression of
01:11:37.000 --> 01:11:42.000
the covariance matrix of the least squares estimator beta hat so how does the covariance
01:11:42.000 --> 01:11:49.000
matrix of beta hat look like well by definition the covariance matrix of beta hat is equal to
01:11:49.000 --> 01:11:55.000
the expectation of beta hat minus its expectation times beta hat minus its expectation transposed
01:11:57.000 --> 01:12:02.000
so we know the expectation of beta hat is equal to beta so it's the expectation of beta hat minus
01:12:02.000 --> 01:12:07.000
beta times beta hat minus beta prime right this is the covariance matrix of beta hat
01:12:09.000 --> 01:12:15.000
here we have this difference beta hat minus beta for which we have computed on the previous
01:12:15.000 --> 01:12:21.000
page that this is equal to x prime x inverse x prime u so basically x plus times times u
01:12:23.000 --> 01:12:30.000
so we can write this product here as the expectation of x prime x inverse x prime u
01:12:31.000 --> 01:12:35.000
times u prime x x prime x inverse so just the transpose of the first term
01:12:35.000 --> 01:12:48.000
well since the x's are all strictly exogenous deterministic we can pull the expectation operator
01:12:48.000 --> 01:12:54.000
just in the middle here or we can factor out all those x's so that we can write this as x prime x
01:12:54.000 --> 01:13:00.000
inverse x prime times the expectation of u u prime times x x prime x inverse
01:13:00.000 --> 01:13:07.000
but what is the expectation of u u prime well that's the covariance matrix of u
01:13:08.000 --> 01:13:16.000
the covariance matrix of u is sigma square u times identity matrix sigma square u is a scalar
01:13:16.000 --> 01:13:24.000
if assumption a3 is satisfied or actually the expectation of u u prime is sigma square u
01:13:24.000 --> 01:13:30.000
times i only if assumption a3 and a4 hold right so if assumptions a3 and a4 hold then
01:13:30.000 --> 01:13:37.000
this is sigma square u times i sigma square u is a scalar so we can factor that out and the i
01:13:37.000 --> 01:13:44.000
of course disappears has no significance so we are left with sigma square u times x prime x
01:13:44.000 --> 01:13:51.000
inverse x prime x x prime x inverse you see that one of these products here is just offsetting
01:13:51.000 --> 01:13:57.000
the factors of it are just offsetting each other so x prime x in x prime x times x prime x
01:13:57.000 --> 01:14:02.000
inverse c is just identity matrix and we are left with sigma square u times x prime x inverse
01:14:04.000 --> 01:14:12.000
so we know the variance the covariance matrix of beta hat is sigma square u times x prime x
01:14:12.000 --> 01:14:25.000
inverse what we are mostly interested in when we use a covariance matrix is the main diagonal of
01:14:25.000 --> 01:14:29.000
the covariance matrices of the covariance matrix because the main diagonal contains the
01:14:29.000 --> 01:14:36.000
variances of the estimated coefficients beta hat k and these variances we need when we want to
01:14:36.000 --> 01:14:41.000
construct confidence intervals right now we have to take the square root of the elements on the
01:14:41.000 --> 01:14:48.000
main diagonal of the covariance matrix the off diagonal elements are of lesser interest and only
01:14:48.000 --> 01:14:56.000
come into play in applications which we will not have this lecture so as sometimes this weights
01:14:56.000 --> 01:15:03.000
and maximum likelihood estimation these kind of things now the square root of the variance of
01:15:03.000 --> 01:15:08.000
beta hat k is across the standard deviation that i have said this would be the standard error of
01:15:08.000 --> 01:15:14.000
beta hat k and this we need for constructing the confidence balance on the true for the true
01:15:14.000 --> 01:15:21.000
coefficients beta k where you know that the rule of thumb which i explained to you last week
01:15:21.000 --> 01:15:29.000
is that the probability that the estimated coefficient beta hat k deviates by more than
01:15:29.000 --> 01:15:37.000
plus minus two standard errors from the true coefficient beta k is less than approximately
01:15:37.000 --> 01:15:47.000
5 percent so this is how we come in the easiest output which i showed you yesterday to the standard
01:15:47.000 --> 01:15:56.000
errors right this is again the regression i showed you yesterday the log of private consumption
01:15:56.000 --> 01:16:02.000
being regressed on the constant and on the log of income we had these coefficients i had highlighted
01:16:02.000 --> 01:16:09.000
on a previous slide yesterday now i highlight the standard errors right these standard errors
01:16:09.000 --> 01:16:18.000
here are exactly the main diagonals from exactly the square roots of the elements on the main
01:16:18.000 --> 01:16:27.000
diagonal of the covariance matrix of beta hat so the square root of some v of beta hat k here
01:16:27.000 --> 01:16:36.000
this is where the standard errors come from actually it is an estimate of the true standard
01:16:36.000 --> 01:16:41.000
error we never know what this true standard error is but we estimate it as being that big
01:16:43.000 --> 01:16:48.000
how exactly this estimate is derived we will discuss a little later because now i will continue
01:16:48.000 --> 01:17:01.000
with the gas market theory two more exercises here proof for any given linear estimator beta tilde
01:17:01.000 --> 01:17:09.000
that the covariance matrix of beta tilde is sigma square u times d times d prime right
01:17:09.000 --> 01:17:17.000
so if the linear unbiased estimator is beta tilde is equal to d times y then the covariance matrix
01:17:17.000 --> 01:17:26.000
is v of beta tilde equal to sigma square u times d d prime and there's another exercise in MATLAB
01:17:26.000 --> 01:17:31.000
try to reproduce the highlighted standard errors in the previous output i just showed you
01:17:32.000 --> 01:17:42.000
so compute them directly from x and u hat now here is the Gauss Markov theorem which i already
01:17:42.000 --> 01:17:49.000
announced suppose that the for Gauss Markov's assumptions a1 through a4 hold then it is true
01:17:49.000 --> 01:17:57.000
that the variance of the least squares estimator beta hat is minimal in the set of all linear
01:17:57.000 --> 01:18:05.000
unbiased estimators beta tilde is equal to b times y so i'm now looking just at estimators beta tilde
01:18:06.000 --> 01:18:13.000
is equal to d times y and i omit the plus small d because we know the small d must be equal to
01:18:13.000 --> 01:18:20.000
zero for the estimator to be biased for the estimator to be unbiased sorry right so the
01:18:20.000 --> 01:18:30.000
Gauss Markov theorem says that beta hat is the best estimator in the set of all linear unbiased
01:18:30.000 --> 01:18:37.000
estimators and this is why beta hat is called the blue the best linear unbiased estimator
01:18:40.000 --> 01:18:49.000
now let us prove this theorem by showing that the covariance matrix of a beta hat is minimal
01:18:50.000 --> 01:18:58.000
in the sense in which we have defined this see we use the lemma which i gave you as an exercise
01:18:58.000 --> 01:19:04.000
to prove by the lemma we have the following property if we take any covariance matrix
01:19:04.000 --> 01:19:09.000
or if we take the covariance matrix of any linear and unbiased vector beta tilde
01:19:09.000 --> 01:19:17.000
and subtract from it the covariance matrix of the particular estimator beta hat the least squares
01:19:17.000 --> 01:19:26.000
estimator of which we want to prove the property of variance minimality then clearly we have that
01:19:26.000 --> 01:19:32.000
the first covariance matrix is equal to sigma square u times dv prime minus the covariance
01:19:32.000 --> 01:19:42.000
matrix of beta bdant covariance matrix of beta hat this we can write as sigma square u is
01:19:43.000 --> 01:19:51.000
the dv prime we can write as x plus x plus prime plus d minus x plus d minus x plus prime
01:19:51.000 --> 01:19:56.000
by virtue of the lemma minus the covariance matrix of beta hat
01:19:59.000 --> 01:20:09.000
this is the same thing as the sigma square u times x prime x inversely plus sigma square u times
01:20:09.000 --> 01:20:18.000
the d minus x plus times d minus x plus prime minus the covariance matrix of beta hat so i
01:20:18.000 --> 01:20:26.000
haven't yet done anything to the covariance matrix of the the least squares estimator
01:20:28.000 --> 01:20:34.000
why is the first term coming up here x prime x inversely well because if you look at the
01:20:34.000 --> 01:20:41.000
definition of the more Penrose inverse x plus times x prime plus prime simplifies to x prime x
01:20:41.000 --> 01:20:49.000
inversely right just check that it is not clear to you so x plus x plus prime is equal to x prime x
01:20:49.000 --> 01:20:55.000
inversely and the rest here is just repeating the first the the second line in this proof
01:20:56.000 --> 01:21:02.000
but now you see that sigma square u times x prime x inversely is exactly the same thing as the
01:21:02.000 --> 01:21:09.000
variance the covariance matrix of beta hat so this negative term here cancels against this
01:21:09.000 --> 01:21:15.000
positive term here and we are left with sigma square u times d minus x plus times d minus x
01:21:15.000 --> 01:21:24.000
plus prime now you see that you have here the product of a matrix with itself transposed
01:21:25.000 --> 01:21:30.000
right so such a product is always necessarily positive semi-definite so this is a square term
01:21:30.000 --> 01:21:37.000
it cannot be negative right this is the intuition d minus x plus times d minus x plus prime is
01:21:37.000 --> 01:21:45.000
necessarily positive semi-definite and therefore we have shown that for any linear unbiased
01:21:45.000 --> 01:21:52.000
estimator beta tilde it is true that v of beta tilde minus v of the least squares estimator
01:21:52.000 --> 01:22:00.000
is equal to a positive semi-definite matrix and therefore the least squares estimator is
01:22:00.000 --> 01:22:07.000
variance minimal so it is the best estimator in the class of all linear unbiased estimators
01:22:11.000 --> 01:22:12.000
any question here
01:22:17.000 --> 01:22:26.000
so this is a very important theorem because it tells us how we can get a good estimate of beta
01:22:26.000 --> 01:22:32.000
actually the best estimate of beta which is possible in the class of linear and unbiased
01:22:32.000 --> 01:22:40.000
estimators provided that the Gauss Markov assumptions are satisfied there is one problem
01:22:40.000 --> 01:22:50.000
still remaining which is that we cannot compute the covariance matrix v of beta hat unless we know
01:22:50.000 --> 01:22:59.000
what sigma square u is like we don't have an estimate of sigma square u yet see when i went
01:22:59.000 --> 01:23:06.000
through these computations here i computed the covariance matrix of beta hat is equal to sigma
01:23:06.000 --> 01:23:14.000
square u times x prime x inverse now we have the data in the x prime x matrix we observe the x
01:23:14.000 --> 01:23:20.000
matrix so we can compute x prime x inverse no problem but we don't know yet what sigma square
01:23:20.000 --> 01:23:26.000
u is so we cannot compute any standard errors unless we have an estimate of the sigma square
01:23:26.000 --> 01:23:31.000
u sigma square u is an unobserved parameter exactly in the same way in which the beta
01:23:32.000 --> 01:23:37.000
vector here consists of unknown parameters right we also have to estimate the sigma square u
01:23:37.000 --> 01:23:41.000
so that's the last problem we have to solve
01:23:46.000 --> 01:23:53.000
one idea for doing that could be the following we could say sigma square u is as we know the
01:23:53.000 --> 01:24:02.000
expectation of the squared ui's right so what we may do is that we say well if we have
01:24:02.000 --> 01:24:10.000
the u vector let's say n u's then we can just build the sum of all the squares
01:24:11.000 --> 01:24:19.000
and take the mean of that so one over n u prime u that should be approximately equal to
01:24:19.000 --> 01:24:25.000
the expectation of ui squared and therefore approximately equal to sigma square u by the
01:24:25.000 --> 01:24:31.000
law of large numbers right the law of large numbers would imply that if we know those u's here
01:24:32.000 --> 01:24:40.000
and if the number of u's gets very large then one over n u prime u is equal to or very close
01:24:40.000 --> 01:24:48.000
to the value sigma square u we are looking for but the problem is that we do not observe the u
01:24:48.000 --> 01:24:55.000
the only thing we do observe is an estimate of the u's from using the least squares estimator so
01:24:55.000 --> 01:25:02.000
what we do have is u hat but we do not have the u this is actually important that you distinguish
01:25:02.000 --> 01:25:08.000
very properly between the u and the u hat the u hat is y minus x beta hat
01:25:10.000 --> 01:25:18.000
so one idea could be to say well why don't we replace the u's here in this expression
01:25:18.000 --> 01:25:28.000
by the u hats which we have and then we estimate the variance sigma square u as one over n
01:25:28.000 --> 01:25:36.000
u hat prime u hat so just as i do it here right i call this estimator here again sigma square u
01:25:37.000 --> 01:25:45.000
go upside down and so this one estimator of the of the variance sigma square u
01:25:45.000 --> 01:25:53.000
right so question is can we do this yes we can do it but it would not be a good idea to do so
01:25:53.000 --> 01:26:00.000
so we shouldn't use this estimator why should we not observe should we not use this estimator
01:26:00.000 --> 01:26:09.000
well first observe that x beta plus u is equal to y by definition and it is also true that y
01:26:09.000 --> 01:26:18.000
is equal to x beta hat plus u hat plus u hat right so we know from this equation here that u
01:26:19.000 --> 01:26:29.000
the thing we are actually interested in is equal to u hat plus something right so u is not sort of
01:26:29.000 --> 01:26:34.000
the same thing as u hat because there's something else here namely x plus beta hat minus beta
01:26:35.000 --> 01:26:47.000
okay now pre-multiply this in this equation 29 each side with its own transpose so pre-multiply
01:26:47.000 --> 01:26:55.000
the left side by u prime and pre-multiply the right hand side by this term here as
01:26:55.000 --> 01:27:01.000
what this term transposed this of course we can do because we would pre-multiply by the same thing
01:27:01.000 --> 01:27:06.000
on both sides of the equation just written in a different way so doing that we would get u prime
01:27:06.000 --> 01:27:16.000
u is equal to u hat plus x times beta hat minus beta prime transpose times u hat plus x multiplied
01:27:16.000 --> 01:27:25.000
by beta hat minus beta this looks ugly indeed here but simplifies quite a bit because we know
01:27:25.000 --> 01:27:33.000
that x prime times u hat is equal to zero and u hat prime x is also equal to zero due to this
01:27:33.000 --> 01:27:41.000
property of the m matrix i talked about earlier in today's lecture so the expression simplifies
01:27:41.000 --> 01:27:53.000
as u prime u is equal to u hat prime u hat plus beta hat minus beta prime x prime x beta hat
01:27:53.000 --> 01:28:02.000
minus beta note that this thing here is a quadratic expression so what we know is u prime u
01:28:03.000 --> 01:28:13.000
is greater than u hat prime u hat okay it is greater by something here which is always
01:28:13.000 --> 01:28:22.000
greater than or equal to zero so u prime u is systematically larger than u hat prime u hat
01:28:24.000 --> 01:28:31.000
and this result warms you against using u hat prime u hat in an estimator
01:28:32.000 --> 01:28:40.000
where actually we would need to use u prime u right so this here would be an excellent
01:28:40.000 --> 01:28:48.000
estimator of the variance if we could observe the use but the idea to replace the u prime u
01:28:48.000 --> 01:28:57.000
by u hat prime u hat is not as good as one may perhaps initially think because u prime u is
01:28:57.000 --> 01:29:04.000
larger than u hat prime u hat this is what we have shown here right since u prime u is
01:29:04.000 --> 01:29:11.000
systematically larger than u hat prime u hat this estimator would not be a good estimator of the
01:29:12.000 --> 01:29:21.000
variance sigma square u if we used u prime u hat just in the place of u prime u we would actually
01:29:24.000 --> 01:29:31.000
we would actually end up with a an estimate of sigma square u which is systematically
01:29:31.000 --> 01:29:36.000
lower than the true value of sigma square u so the estimate would be biased
01:29:37.000 --> 01:29:44.000
now probably you know already what we have to do the good the unbiased estimator of sigma square u
01:29:44.000 --> 01:29:51.000
is given by sigma hat square u defined as one over n minus k times u hat prime u hat
01:29:51.000 --> 01:29:57.000
right so not dividing by n as in the expression i showed you before which was appropriate for
01:29:57.000 --> 01:30:04.000
using u prime u but when we use u hat prime u hat we have to divide by less than that
01:30:05.000 --> 01:30:11.000
to make the expression a little bit bigger again so we have to divide by n minus k right
01:30:13.000 --> 01:30:17.000
and this theorem i cannot prove anymore in today's lecture so i will do this on monday
01:30:18.000 --> 01:30:24.000
but i haven't produced the theorem already they will continue here next week unless there are
01:30:24.000 --> 01:30:36.000
any remaining questions i don't see anybody raising his hand
01:30:38.000 --> 01:30:48.000
so apparently no question then thank you very much for your attention and see you next week