WEBVTT - autoGenerated
00:00:30.000 --> 00:00:46.000
Thank you.
00:01:00.000 --> 00:01:08.000
Good morning, everybody, and welcome to the lecture on estimation and inference.
00:01:09.000 --> 00:01:13.000
Are there any questions? Please raise your hand.
00:01:21.000 --> 00:01:29.000
It seems that there are no questions, so I will briefly repeat where we left off on Tuesday.
00:01:30.000 --> 00:01:37.000
We are currently in a part of the lecture which is a little technical, as you have noticed.
00:01:38.000 --> 00:01:47.000
And I'm not going down to the greatest detail of technicality, but at least on an,
00:01:48.000 --> 00:01:55.000
well, somewhat more than intuitive level, I would like to explain to you how econometric
00:01:55.000 --> 00:02:04.000
arguments are made and what kind of properties can be established with what kind of assumptions.
00:02:05.000 --> 00:02:11.000
And in order to understand that well, when we talk about asymptotic properties, as we do here,
00:02:11.000 --> 00:02:18.000
so about properties which estimators will attain when the numbers of observations
00:02:18.000 --> 00:02:25.000
increases greatly, so that it is really a huge number of observations which we can employ,
00:02:26.000 --> 00:02:34.000
basically, in our imagination approaching infinity, and then we can have these type of
00:02:35.000 --> 00:02:43.000
properties under certain assumptions. And I go back to the setup here, which we studied already
00:02:43.000 --> 00:02:50.000
last week, where we actually started out from the normal formula of the least squares estimator,
00:02:50.000 --> 00:02:56.000
of which you know that we can write least squares estimator beta hat as the true parameter beta
00:02:57.000 --> 00:03:06.000
plus x prime x inverse c x prime u. So in order to have an unbiased estimator,
00:03:06.000 --> 00:03:13.000
we would need an assumption which implies that this x prime x inverse c x prime u is equal to
00:03:13.000 --> 00:03:20.000
zero. And we know that our assumption A2 is precisely the assumption which we need, namely
00:03:20.000 --> 00:03:28.000
the assumption that the matrix of regressors is strictly exogenous to the vector of errors.
00:03:29.000 --> 00:03:36.000
However, this is a very strong assumption, and for many data sets that may just not be true,
00:03:36.000 --> 00:03:42.000
I will explain this a little more detail a little later, why this is such a problematic assumption,
00:03:42.000 --> 00:03:51.000
assumption A2. And so we look for a weaker set of assumptions. I ask here on the slide,
00:03:51.000 --> 00:03:58.000
what happens to the least squares estimator if A1 to A5 are not satisfied. Actually, I will just
00:03:58.000 --> 00:04:05.000
talk about assumption A2 not being satisfied, or assumption A5 not being satisfied, or the two of
00:04:05.000 --> 00:04:11.000
them. But I will not really discuss what happens if assumption A1, which is a completely harmless
00:04:11.000 --> 00:04:20.000
assumption, or assumptions A3 and 4 are not satisfied. Our focus is on A2 and on A5. And
00:04:20.000 --> 00:04:27.000
first we talk about A2, about the assumption of strict exogeneity of the x matrix. Now,
00:04:28.000 --> 00:04:35.000
as I pointed out last time, and as I had pointed out already in the previous lecture, x prime x
00:04:35.000 --> 00:04:46.000
can also be written as 1 over n times the sum of small xi times small xi prime, where the xi
00:04:47.000 --> 00:04:58.000
is the observations we make for the ith regressor in terms of explanatory variables. So all the
00:04:58.000 --> 00:05:06.000
variables which explain or may potentially explain the behavior of the ith regressor of our
00:05:07.000 --> 00:05:18.000
n, sorry, which explains the observation on our ith observation, out of n observations which we have.
00:05:19.000 --> 00:05:25.000
Travel speaking clearly today, apparently. Once again, i is the number of the observation,
00:05:25.000 --> 00:05:35.000
and xi collects the regressors, which potentially explain the ith observation, out of n observations,
00:05:35.000 --> 00:05:42.000
which we have. So xi is a column vector, and xi prime is a row vector. When we multiply xi
00:05:42.000 --> 00:05:51.000
by xi prime, we get a matrix. And this matrix would be then obviously k by k matrix, because xi
00:05:51.000 --> 00:05:59.000
is a k by one column vector, and xi prime is a one by k row vector. So that's a k by k matrix,
00:05:59.000 --> 00:06:08.000
which we get here, exactly like x prime x is a k by k matrix. Now, x prime x is equal to this sum
00:06:08.000 --> 00:06:16.000
here. Sum over xi times xi prime, the factor one over n is actually not of course included in this
00:06:16.000 --> 00:06:25.000
x prime x matrix, but the factor one over n came in this in these parentheses here, just by dividing
00:06:25.000 --> 00:06:33.000
by one over n here, since this is the inverse of one over n is actually equal to n times one over n
00:06:33.000 --> 00:06:40.000
that cancels. And the sum over xi xi prime is precisely equal to x prime x. The only reason
00:06:40.000 --> 00:06:48.000
why I wrote this in this more complicated way here, you know, I'm usually supportive of writing
00:06:48.000 --> 00:06:55.000
everything in matrices, because it's much easier than all those summation signs. But the only reason
00:06:55.000 --> 00:07:03.000
for me to write it as a summation of a vector product here is that I want to show to you that
00:07:03.000 --> 00:07:10.000
one over n times the sum of xi xi prime is actually just an average which we take, right?
00:07:10.000 --> 00:07:17.000
It's the average over all those matrices xi xi prime. We add all of those matrices up,
00:07:17.000 --> 00:07:24.000
we have n terms to add up here, and then we divide by n. So this is an average which we compute here.
00:07:25.000 --> 00:07:33.000
And the same thing happens here with xi times ui. Again, we have for n observations,
00:07:33.000 --> 00:07:41.000
n terms of the form xi times ui. I add them all up and then I divide by n, so this is just an average.
00:07:42.000 --> 00:07:48.000
The argument we want to make, and we have already made in the last minutes of last lecture,
00:07:48.000 --> 00:07:55.000
is that these averages converge to something. Actually, what we would like to show is that
00:07:55.000 --> 00:08:03.000
this matrix average here converges just to some finite matrix. Any matrix, the only requirement
00:08:03.000 --> 00:08:10.000
being that it is invertible, and that this here converges to zero. So then we would know that some
00:08:10.000 --> 00:08:18.000
matrix times zero is zero, and this would imply that the limit, the plim of beta hat, would be
00:08:18.000 --> 00:08:23.000
equal to the true parameter beta, so we would have established the consistency of the least
00:08:23.000 --> 00:08:30.000
squares estimator. That was where we left off last lecture. Now, I define what a consistent estimator
00:08:30.000 --> 00:08:39.000
is. Recall, consistency means that the plim of some series of random variables converges to
00:08:39.000 --> 00:08:45.000
a certain limit, which we denote as x here, then x is the plim of xn.
00:08:48.000 --> 00:08:54.000
I introduced you to the law of large numbers, which essentially says that under certain conditions,
00:08:55.000 --> 00:09:03.000
the sample average has as its plim the expectation of the random variable, which
00:09:03.000 --> 00:09:12.000
we have underlying our samples of which we take the average here.
00:09:13.000 --> 00:09:21.000
Now, this law of large numbers doesn't help us directly in our estimation setting, because the
00:09:21.000 --> 00:09:29.000
law of large numbers actually implies that we add up observations which are independently
00:09:29.000 --> 00:09:35.000
and identically distributed. That need not be true in the regression context, but there are
00:09:35.000 --> 00:09:40.000
weaker forms of the law of large numbers, which have the same implication, which then allow also
00:09:40.000 --> 00:09:48.000
for some type of weak dependence. This is one of the technical details which I spare out of this
00:09:48.000 --> 00:09:55.000
lecture. Here, I just inform you that such conditions exist. So under proper conditions
00:09:55.000 --> 00:10:01.000
of weak dependence, the sample average of a series of random variables, which are not
00:10:02.000 --> 00:10:08.000
identically and independently distributed, at least not necessarily so, also converges
00:10:08.000 --> 00:10:13.000
to the expectation of the random variable, which I denote here by new z.
00:10:14.000 --> 00:10:21.000
And then essentially, we left off at three different possibilities of assumptions,
00:10:21.000 --> 00:10:29.000
which would still deliver the preconditions for this weak law of large numbers.
00:10:30.000 --> 00:10:37.000
One assumption would be assumption 6.1, where we would require that the conditional expectation
00:10:37.000 --> 00:10:47.000
of the error term ui given just the observation on regressor xi is equal to the unconditional
00:10:47.000 --> 00:10:56.000
expectation of ui. And this is, of course, just zero by assumption one. So we would only require
00:10:56.000 --> 00:11:11.000
that the observation xi is strictly exogenous to ui. But we would not require any more that the
00:11:11.000 --> 00:11:21.000
whole matrix of x's is strictly exogenous to the whole vector of shocks in our model. So this is a
00:11:21.000 --> 00:11:29.000
much, much weaker assumption than assumption A2. And in this case, we say that the regressors xi are
00:11:29.000 --> 00:11:40.000
predetermined. So the xi, you can think of as the xi realizing before the ui realizes
00:11:40.000 --> 00:11:48.000
the xi's are not affected by the value of the xi. And the ui then realizes without
00:11:48.000 --> 00:11:58.000
any reference to what value xi has taken on. So there is no feedback mechanism from xi to ui either.
00:12:01.000 --> 00:12:08.000
Another alternative assumption is assumption 6.2. And this is actually a weaker assumption
00:12:08.000 --> 00:12:16.000
than assumption 1. So it is implied by 6.1. This assumption would say that for each observation
00:12:16.000 --> 00:12:25.000
i, we have the property that the expectation of the product xi times ui. So actually the covariance
00:12:25.000 --> 00:12:31.000
of xi and ui under assumption A1 is equal to zero. And in this case, we say that the
00:12:31.000 --> 00:12:39.000
regressors and the error terms are contemporaneously uncorrelated. So note that 6.2 is implied by 6.1.
00:12:41.000 --> 00:12:51.000
And moreover, it is the case that both 6.1 and 6.2 are implied by assumption A1.
00:12:52.000 --> 00:13:00.000
So by assumption A1 along with the new assumption A6.3, which would say that for each observation i,
00:13:00.000 --> 00:13:06.000
xi and ui are just independent. So that's the strongest of all the assumptions, right? So
00:13:06.000 --> 00:13:16.000
just put it in the right order. A6.3 implies A6.1 and A6.1 implies A6.2. So the weakest of those
00:13:16.000 --> 00:13:26.000
assumptions is A6.2. All three assumptions state in slightly different terms how disconnected
00:13:26.000 --> 00:13:36.000
the regressors xi are from the errors ui. Now, what do we need this for or better, what do we
00:13:36.000 --> 00:13:44.000
need to have the desired property that the CLIM of the least squares estimator is equal to the true
00:13:44.000 --> 00:13:51.000
value of the parameter. So it's equal to beta. As I pointed out just a couple of minutes ago,
00:13:51.000 --> 00:13:58.000
this is the formula for our estimator. Beta hat is equal to the true parameter plus the product
00:13:58.000 --> 00:14:08.000
of these two terms. So obviously it is sufficient for consistency to have that the CLIM of this
00:14:08.000 --> 00:14:17.000
first matrix here, 1 over n times the sum of xi xi prime, that this is some finite and invertible
00:14:17.000 --> 00:14:24.000
matrix. And I call this sigma xx here. I don't make any assumptions about this matrix sigma xx
00:14:24.000 --> 00:14:29.000
other than that it be invertible. Of course, we do need the invertibility because we have to take
00:14:29.000 --> 00:14:36.000
the inverse here. But apart from that, sigma xx can be just any matrix. The only requirement being
00:14:36.000 --> 00:14:43.000
it must be a finite matrix. Why must it be a finite matrix? It must be a finite matrix because
00:14:43.000 --> 00:14:51.000
we want to multiply it by a matrix whose limit is 0. And obviously, a finite number times 0 is
00:14:51.000 --> 00:15:00.000
always 0. However, if one element of this matrix here were to converge to infinity, then infinity
00:15:00.000 --> 00:15:10.000
times 0 is not defined. So we cannot be sure that the product is 0 or that the product would converge
00:15:11.000 --> 00:15:18.000
to 0. This converges to infinity and this converges to 0. There are many things which can happen,
00:15:18.000 --> 00:15:26.000
right? So we may be in trouble in this case actually. Therefore, we do need the property
00:15:26.000 --> 00:15:32.000
that this matrix here is finite. And we'll come back to this finiteness property a little later
00:15:32.000 --> 00:15:38.000
and you'll see how important it is. All right. So assumption A7 would say,
00:15:38.000 --> 00:15:44.000
well, this is a finite matrix. It is invertible. So the inverse of this matrix is also a finite
00:15:44.000 --> 00:15:52.000
matrix. And then assumption A8 would say that this matrix here, basically x prime u, converges
00:15:52.000 --> 00:16:00.000
as the number of observation goes to infinity to 0. So the plim of this matrix is 0.
00:16:01.000 --> 00:16:07.000
So there are two assumptions about the plimps of sequences of random variables. One assumption
00:16:07.000 --> 00:16:14.000
says that the plim of x prime x is equal to sigma xx and that this matrix is invertible.
00:16:14.000 --> 00:16:19.000
And the other assumption says that the plim of x prime u is equal to 0.
00:16:22.000 --> 00:16:28.000
Now this assumption A8 is actually the decisive assumption which we need to get the property of
00:16:28.000 --> 00:16:35.000
consistency for the least squares estimator, right? Because we do need that one of these
00:16:35.000 --> 00:16:42.000
matrices here is 0 asymptotically, right? Converges to 0 asymptotically has a plim of 0.
00:16:43.000 --> 00:16:48.000
Then it's not so important anymore attempts with the other matrix except for the case
00:16:48.000 --> 00:16:57.000
it must not become infinite. So the decisive assumption is assumption A8 here,
00:16:57.000 --> 00:17:04.000
which guarantees us that the second matrix here becomes 0 asymptotically and then 0 times
00:17:04.000 --> 00:17:10.000
something which is finite is equal to 0. So all these ugly terms here just fade away
00:17:10.000 --> 00:17:18.000
asymptotically and we have the asymptotic result that the least squares estimator is equal to beta
00:17:18.000 --> 00:17:25.000
asymptotically or converges to beta as the number of observations increases. So we would know when
00:17:25.000 --> 00:17:31.000
we have sufficiently many observations that then we are probably quite close already to the true
00:17:31.000 --> 00:17:41.000
value of the parameter vector. Now this assumption A8 here is actually implied by assumption A1,
00:17:41.000 --> 00:17:47.000
which says just that the expected value of the error term is 0, so harmless assumption,
00:17:48.000 --> 00:17:57.000
and assumption A6.2, so the weakest of the three assumptions which I have explained before.
00:17:57.000 --> 00:18:04.000
It suffices that the correlation or the covariance between xi and ui is equal to 0,
00:18:04.000 --> 00:18:13.000
so it suffices for consistency to have uncorrelatedness between the error term and the regressor.
00:18:16.000 --> 00:18:22.000
Conversely, we are always in trouble when we see that the regressor correlates with
00:18:23.000 --> 00:18:29.000
with an error term and you will get this as an exercise actually to show that this can happen
00:18:29.000 --> 00:18:38.000
very easily in simple macroeconomic models. So when we have assumption A1 and assumption A6.2,
00:18:40.000 --> 00:18:46.000
then we can use a weak law of large numbers to show that then assumption A8 votes. It is
00:18:46.000 --> 00:19:01.000
actually implied by A1 and by A6.2. Now perhaps some comments on what the difference between
00:19:01.000 --> 00:19:09.000
these assumptions A2 on the one hand and A6.1, 6.2, 6.3 on the other hand is.
00:19:10.000 --> 00:19:20.000
Assumption A2 postulates that the complete x matrix is exogenous and therefore it is uncorrelated
00:19:20.000 --> 00:19:29.000
with a complete u vector. So when you think of let's say macroeconomic data, you will
00:19:30.000 --> 00:19:38.000
basically never have assumption A2 satisfied because macroeconomic data are very often
00:19:38.000 --> 00:19:46.000
some kind of time series data or some type of cross-section data. And in both cases,
00:19:46.000 --> 00:19:53.000
assumption A2 is not likely to be satisfied. Think first of time series data. When you have
00:19:53.000 --> 00:20:00.000
time series data, then typically what happens in an economy in some period t minus one
00:20:01.000 --> 00:20:09.000
affects what happens in the economy in period t. Certain decisions made in period t minus one
00:20:09.000 --> 00:20:17.000
will almost always influence what kind of magnitude the variables in period t take on.
00:20:17.000 --> 00:20:25.000
So if you have a shock which hits the economy in period t minus one, then you will almost
00:20:25.000 --> 00:20:34.000
certainly see that this affects the regressors in period t. And when we have this dependency
00:20:34.000 --> 00:20:41.000
between regressors in period t and shocks in period t minus one, then we have already a
00:20:41.000 --> 00:20:48.000
violation of assumption A2 because then it is not true anymore that the complete x matrix is
00:20:48.000 --> 00:21:00.000
exogenous with respect to the u vector because the x t observations incorporate the effects of the
00:21:00.000 --> 00:21:07.000
u t minus one shock. So there is no complete exogeneity and assumption A2 fails to hold.
00:21:08.000 --> 00:21:15.000
Now, how about cross-section data? Would that make things better? Well, mostly not. Think,
00:21:15.000 --> 00:21:24.000
for instance, of a sample of countries which you have and whose relationships you study.
00:21:25.000 --> 00:21:33.000
We are in a connected world where countries trade with each other. So when one country is hit by a
00:21:33.000 --> 00:21:39.000
shock, then economic variables in this one country will be affected by this shock.
00:21:40.000 --> 00:21:47.000
And as a consequence of this shock, the export and import decisions of this country will be changed.
00:21:48.000 --> 00:21:52.000
For the better or for the worse, it doesn't matter. There will be changes. There will be
00:21:52.000 --> 00:21:59.000
effects of the shock which has hit a specific country on the exports and imports of goods
00:21:59.000 --> 00:22:07.000
and services of this country. That's almost certain. But since this country trades with other
00:22:07.000 --> 00:22:14.000
countries, this is the essence of the meaning that the country exports and imports, this other
00:22:14.000 --> 00:22:20.000
country, which is a trading partner of the first country, will also be affected by the shock because
00:22:20.000 --> 00:22:28.000
export and import demand of the first country will change. So clearly, other countries are also
00:22:28.000 --> 00:22:34.000
affected by a shock which hits one specific country. And therefore, we do not have the fact
00:22:34.000 --> 00:22:42.000
that the X matrix is completely exogenous to the shocks which have hit the economy,
00:22:43.000 --> 00:22:51.000
even if perhaps the shock occurred in a previous period. It will nevertheless affect
00:22:52.000 --> 00:22:58.000
the current period. And they're not only one country, but many countries which trade with
00:22:58.000 --> 00:23:06.000
this one country. So assumption A2 is a very strong assumption which usually in economic data
00:23:06.000 --> 00:23:17.000
you will find violated. There are some conditions, however, where assumption A2 may hold. For instance,
00:23:17.000 --> 00:23:25.000
if you have regressors, which say, just illustrate the sex of persons, you have some type of
00:23:26.000 --> 00:23:34.000
regression where you want to explain whether wages differ for people who are male or people
00:23:34.000 --> 00:23:41.000
who are female. So there's wage equality for men and women. Then it may be that you have a
00:23:42.000 --> 00:23:49.000
regressor matrix in which you just have a regressor or let's say two regressors which
00:23:49.000 --> 00:23:56.000
indicate whether a person is male or female. So if it's a man, then you have a regressor
00:23:56.000 --> 00:24:02.000
which has a one for each person which is male and a zero for each person which is female. And
00:24:02.000 --> 00:24:10.000
if you have a woman, it's just the opposite. Obviously, these type of regressor matrices
00:24:10.000 --> 00:24:17.000
where you just indicate the sex of certain people would be exogenous to the shocks because the shocks
00:24:17.000 --> 00:24:25.000
can't change the sex of the people. Or if you speak of regions or countries which trade with each
00:24:25.000 --> 00:24:31.000
other, then it may be that you have certain regressors which indicate which regions are
00:24:31.000 --> 00:24:40.000
regions in mountain areas, for instance, or which are islands, remote islands perhaps, or which
00:24:40.000 --> 00:24:47.000
share a certain border with another country with which it trades. So these kind of geographical
00:24:47.000 --> 00:24:52.000
regressors would be regressors which are truly exogenous to the shocks because they cannot be
00:24:52.000 --> 00:24:57.000
changed by the shocks. And in this case, it may be the case that you have a regressor matrix
00:24:57.000 --> 00:25:04.000
where assumption A2 holds. But this happens only rarely that the complete regressor matrix is made
00:25:04.000 --> 00:25:14.000
up of such regressors. In most cases with economic daters, you must at least check if there is not
00:25:14.000 --> 00:25:22.000
some dependence between the U vector and the complete X matrix. Now, since it is such a
00:25:22.000 --> 00:25:29.000
strong assumption, assumption A2, economists typically replace it by a weaker requirement,
00:25:30.000 --> 00:25:36.000
and much weaker would be requirement of assumption A6.2, that it is only
00:25:37.000 --> 00:25:43.000
the observation I which we have to look at. For each observation I, for each observation I,
00:25:43.000 --> 00:25:55.000
we require that Xi and Ui are uncorrelated, which basically means that whoever decides on Xi does so
00:25:55.000 --> 00:26:04.000
before she or she knows that Ui has materialized in a certain way. In that case, there cannot be
00:26:04.000 --> 00:26:12.000
any correlation between Xi and Ui. And this is why we sometimes speak of Xi being predetermined.
00:26:12.000 --> 00:26:18.000
Xi is determined before people know how Ui is determined. In such a case, that would be
00:26:18.000 --> 00:26:25.000
assumption A6.1 already, so a little stronger. In such a case, this would also imply that we have
00:26:25.000 --> 00:26:36.000
uncorrelatedness of the Xi and Ui variables, so assumption A6.2 would also be satisfied.
00:26:37.000 --> 00:26:47.000
And we just need this uncorrelatedness between Xi and Ui. We do not require that Xi be uncorrelated
00:26:47.000 --> 00:26:57.000
with Uj, or some j which differs from i, or that Xt be uncorrelated with Ut minus 1, which is very
00:26:57.000 --> 00:27:03.000
unlikely for time series data. So assumption A6.2 that is very important to understand is much,
00:27:03.000 --> 00:27:10.000
much weaker than assumption A2, and the same holds for assumption A6.1 or A6.3.
00:27:10.000 --> 00:27:18.000
So in particular, when you work with time series data, you always have to check if shocks in t
00:27:18.000 --> 00:27:27.000
do not correlate with the regressors of period t. The easiest specification is that typically when
00:27:27.000 --> 00:27:35.000
you have a dependent variable of period t, try to explain it just by variables dated period t minus
00:27:35.000 --> 00:27:42.000
1, so dated with a preceding period, but not with variables from the same period. However,
00:27:44.000 --> 00:27:49.000
while this is easily said, it is not so easily done because most economic models would imply
00:27:49.000 --> 00:27:57.000
that any dependent variable of period t is also affected by other variables which
00:27:58.000 --> 00:28:04.000
were determined just in periods t and not in period t minus 1.
00:28:07.000 --> 00:28:13.000
The contemporaneous correlation may also occur in other settings if we have
00:28:13.000 --> 00:28:20.000
measurement errors in the explanatory variables. So even if we have a setup where the true variables
00:28:20.000 --> 00:28:26.000
would have this property of being uncorrelated to the error terms, so we could think of when
00:28:26.000 --> 00:28:33.000
you just look at your model where you could think of xi being uncorrelated with ui, then it's not
00:28:33.000 --> 00:28:39.000
clear whether in the data this property is also fulfilled, because as you will see when we have
00:28:39.000 --> 00:28:45.000
measurement errors in our variables, then we will also get this type of correlation. I'll show this
00:28:45.000 --> 00:28:55.000
later to you. But anyway, as long as assumptions A7 and A8 are satisfied, we can establish then
00:28:55.000 --> 00:29:01.000
that the plim of beta hat is equal to beta, because the plim of beta hat is equal to beta
00:29:01.000 --> 00:29:09.000
plus the plim of this product of matrices here, and the plim of a product of matrices is equal to
00:29:09.000 --> 00:29:16.000
the product of the plims of these matrices. So this is equal to sigma xx inverse c times 0,
00:29:16.000 --> 00:29:23.000
and this is under assumptions A7 and A8 equal to 0, and therefore the plim of beta hat is equal
00:29:23.000 --> 00:29:29.000
to beta, so beta hat would be consistent for beta if assumptions A7 and A8 are satisfied.
00:29:30.000 --> 00:29:37.000
However, as I hope you have understood, beta hat is not unbiased anymore in finite samples,
00:29:37.000 --> 00:29:43.000
so we do not have an unbiased estimator when we have a consistent estimator. These are different
00:29:43.000 --> 00:29:48.000
concepts, unbiasedness being a concept in finite samples, and this can be true under strong
00:29:48.000 --> 00:29:55.000
assumptions like A2, but if A2 is violated, then typically our beta hat, our beta hat here,
00:29:55.000 --> 00:30:01.000
the least squares estimator, is beta plus this product here, and this product is typically non-zero
00:30:02.000 --> 00:30:10.000
unless we have infinitely many observations that we can invoke asymptotic arguments.
00:30:10.000 --> 00:30:19.000
I think now I have explained this at some level of detail, and actually for the second time since
00:30:19.000 --> 00:30:24.000
I also explained it already on Tuesday. Are there any questions?
00:30:31.000 --> 00:30:37.000
I don't see any, so I move on to an exercise where you may try to get a feeling of what
00:30:37.000 --> 00:30:44.000
happens when you do this in MATLAB or any other programming language, so what you can do is similar
00:30:44.000 --> 00:30:51.000
to exercises of this type which I gave you before. Generate in MATLAB again some regressor matrix x,
00:30:51.000 --> 00:30:58.000
let's say with 100 observations and three regressors, just take random numbers as regressors,
00:30:58.000 --> 00:31:07.000
and in this case you just compute the expression 1 over 100 times the sum over the 100 components xi
00:31:07.000 --> 00:31:14.000
times xi prime. So you just compute this here and as a result you'll get a matrix which is the
00:31:14.000 --> 00:31:22.000
average over the first 100 observations which you have. Now add another 100 observations to the
00:31:22.000 --> 00:31:31.000
matrix x, so make x matrix of 200 by 3 without changing the first 100 observations. So leave
00:31:31.000 --> 00:31:39.000
the first 100 observations unordered and add another 100 observations in terms of just random numbers
00:31:39.000 --> 00:31:47.000
and again compute now the average over all those observations and continue like this to see whether
00:31:47.000 --> 00:31:55.000
1 over n times xi xi prime converges to a finite and non-singular matrix. You can just do this for
00:31:55.000 --> 00:32:03.000
for as many observations as you want and what you should check for is if this average matrix which
00:32:03.000 --> 00:32:14.000
you compute here or there, if this changes by a great deal going from one number of observations
00:32:14.000 --> 00:32:20.000
to the next number of observations and if these changes become smaller over time and you can
00:32:20.000 --> 00:32:26.000
basically see that the matrix converges to some limit and then you can check if this limit which
00:32:26.000 --> 00:32:32.000
you find there if this is actually a non-singular matrix and if you do this correctly you should
00:32:32.000 --> 00:32:39.000
see that is indeed a non-singular matrix. So the assumption A7 which we basically test here or
00:32:39.000 --> 00:32:48.000
simulate here seems to be a reasonable assumption. However there are some exceptions and one exception
00:32:48.000 --> 00:32:56.000
is of particular importance. Suppose in your regressor matrix you have both a constant which
00:32:56.000 --> 00:33:03.000
is normally the case and a linear trend in x which is very often the case. So your regressor
00:33:03.000 --> 00:33:10.000
matrix x is made up of a first column which is just a column of ones for the constant and somewhere
00:33:10.000 --> 00:33:17.000
you have a second column or the last column which is a linear trend so it's a column of entries one
00:33:17.000 --> 00:33:24.000
two three four five and so forth up to observation and that would be the linear trend. If you have
00:33:24.000 --> 00:33:34.000
this type of matrix then note that the x prime x matrix does not convert to a finite matrix
00:33:34.000 --> 00:33:43.000
anymore right so there is an immediate violation of assumption A7 because x prime x as we know
00:33:43.000 --> 00:33:52.000
is the sum over all the ends over xi xi prime. Now what is xi xi prime? xi for the first column
00:33:53.000 --> 00:33:58.000
is equal to sorry xi for the first
00:34:00.000 --> 00:34:09.000
no I should say for one of those entries when you compute xi times xi prime you will get a
00:34:09.000 --> 00:34:21.000
multiplication of a row of ones with a column vector of ends so this thing here right which
00:34:21.000 --> 00:34:31.000
means that you sum all the numbers from one to n so that is equal to the sum over all i's when i
00:34:31.000 --> 00:34:37.000
runs from one to n and we know that this is equal to n times n plus one over two.
00:34:39.000 --> 00:34:45.000
Now watch out that this element which is just one element of the matrix xi times xi prime
00:34:45.000 --> 00:34:52.000
right this element converges to infinity right because already
00:34:55.000 --> 00:35:01.000
or yeah it converts to infinity but but recall we multiply the whole thing by one over n
00:35:02.000 --> 00:35:09.000
right so this n factor here cancels against the one over n but then we are still left with
00:35:09.000 --> 00:35:18.000
n plus one over two so we see that one over n times x prime x contains an entry n plus one over
00:35:18.000 --> 00:35:28.000
two and that converges to infinity so the matrix sigma x x is not finite in this case right so when
00:35:28.000 --> 00:35:33.000
we have this set up that there is a constant and a linear trend in the regressor matrix
00:35:34.000 --> 00:35:43.000
then our assumption a7 here is violated because the matrix sigma x x is not finite.
00:35:46.000 --> 00:35:51.000
Now you may be worried about that because you think that happens rather often that you include
00:35:51.000 --> 00:35:55.000
a constant and a linear trend actually you always typically include a constant and very often you
00:35:55.000 --> 00:36:03.000
include a linear trend the good news about the bad news which i just communicated to you is that
00:36:03.000 --> 00:36:11.000
actually the consistency property can be maintained or is maintained even if we have
00:36:11.000 --> 00:36:19.000
a constant and a linear trend and the reason for this is the following when we have such a sigma
00:36:19.000 --> 00:36:29.000
x x matrix where one element becomes infinite then this means that basically the inverse here
00:36:29.000 --> 00:36:38.000
we compute the inverse of sigma x x converges to zero in at least one vector and that's fine
00:36:38.000 --> 00:36:43.000
we don't have a problem with that because we want this product here to become to become zero
00:36:43.000 --> 00:36:48.000
right we multiply by zero and multiplying is zero by zero doesn't cause any problem
00:36:48.000 --> 00:36:55.000
problem actually it implies that the rate of convergence the speed of convergence
00:36:55.000 --> 00:37:02.000
in the splint convergence property here is faster than usual so including a one over
00:37:03.000 --> 00:37:07.000
including a constant and a linear trend leads to faster convergence to beta
00:37:08.000 --> 00:37:15.000
and not too slower or no convergence to beta so we still have the property that the estimator is
00:37:15.000 --> 00:37:21.000
consistent but you should be aware of the fact that this then uses different arguments than the
00:37:22.000 --> 00:37:26.000
ones i have established here because we cannot make use of assumption a seven
00:37:26.000 --> 00:37:35.000
assumption a seven is violated and then different techniques for proving consistency have to be used
00:37:35.000 --> 00:37:40.000
but the result will be yes there's still the consistency of the least squares estimator even
00:37:40.000 --> 00:37:46.000
if we have a constant and linear trend in our regressor matrix and the speed of convergence
00:37:46.000 --> 00:37:53.000
is even higher than in the usual case so there is no problem but as i say the proof is different
00:37:54.000 --> 00:37:56.000
more technical so i leave it out here
00:37:59.000 --> 00:38:05.000
second thing which i already talked about x i and u i may easily be correlated macroeconomic
00:38:05.000 --> 00:38:12.000
models so when x i and u i are correlated then we know that the expectation of the product x i times
00:38:12.000 --> 00:38:18.000
u i is different from zero and then we have no consistency right and i asked you here to
00:38:18.000 --> 00:38:25.000
demonstrate that this assumption a seven is violated for the estimation of the consumption
00:38:25.000 --> 00:38:34.000
function um actually i think i miss spoke here or miswrote this exercise it should be assumption a
00:38:34.000 --> 00:38:40.000
eight right demonstrate please that assumption a eight is violated for the estimation of the
00:38:40.000 --> 00:38:47.000
consumption function in the kz cross model of equations 20 and 21 remember that i introduced
00:38:47.000 --> 00:38:52.000
it to this model there or reintroduced you to this model there the the income expenditure
00:38:52.000 --> 00:38:59.000
model which you've learned in macro well show that for the estimation of the consumption function
00:38:59.000 --> 00:39:07.000
in this model you would not have consistency because the expectation of y i income and the
00:39:07.000 --> 00:39:14.000
shock u i is equal to sigma square over u divided by one minus c and this is different from zero c
00:39:14.000 --> 00:39:21.000
being the marginal propensity to consume so you see that here economics and econometrics meets
00:39:22.000 --> 00:39:28.000
and you cannot not do econometrics by saying well i just assume certain things and and then i have
00:39:28.000 --> 00:39:35.000
certain results but you always have to check whether in the specific model you work in or
00:39:35.000 --> 00:39:41.000
in the specific data sets you work with the assumptions which i introduced you to here in
00:39:41.000 --> 00:39:49.000
this lecture can actually be thought to hold and that is not necessarily the case in many instances
00:39:49.000 --> 00:39:56.000
this is actually not the case and after some thought you will realize that those assumptions
00:39:56.000 --> 00:40:01.000
which make life easy in econometrics are not satisfied and that we have to use econometric
00:40:01.000 --> 00:40:07.000
techniques which are a little bit more advanced in particular instrumental variables techniques
00:40:07.000 --> 00:40:14.000
to which i will introduce you later and where i will thank you many examples of why and how
00:40:14.000 --> 00:40:23.000
these assumptions here may be a violated practice you have to use these type of techniques in order
00:40:23.000 --> 00:40:29.000
to produce reasonable econometric estimates whereas if you just use the basic techniques
00:40:29.000 --> 00:40:35.000
like the least squares estimator i talk about so much in this lecture now then every referee
00:40:35.000 --> 00:40:42.000
will turn down your paper and we'll say well go back and study econometrics all right any questions
00:40:47.000 --> 00:40:57.000
now let's turn to a violation of assumption a5 you recall assumption a5 was the assumption which
00:40:57.000 --> 00:41:05.000
we added later to the Gauss Markov assumptions a1 to a4 we did not need a5 to establish the blue
00:41:05.000 --> 00:41:13.000
property of the least squares estimator because a5 is the assumption on the norm normality of the
00:41:13.000 --> 00:41:20.000
error term so the assumption on the error term following a Gaussian distribution be normally
00:41:20.000 --> 00:41:25.000
distributed and this assumption we did not need for the blue property of the least squares estimator
00:41:25.000 --> 00:41:30.000
in that sense the least squares estimator was as i said a non-parametric estimator or is a
00:41:30.000 --> 00:41:37.000
non-parametric estimator but normality is a useful assumption to have when you want to test
00:41:37.000 --> 00:41:42.000
when you want to apply standard tests like the t test or the f test or the chi-square test then
00:41:42.000 --> 00:41:49.000
you always need a normal distribution and we made already an argument that you may get a normal
00:41:49.000 --> 00:41:58.000
distribution asymptotically if the distribution finite samples of the error terms is not normal
00:41:58.000 --> 00:42:07.000
so if assumption a5 does not hold there i will repeat this argument now with reference to our
00:42:08.000 --> 00:42:15.000
model to our econometric model and therefore i first introduce a new convergence concept
00:42:15.000 --> 00:42:22.000
for random variables because so far i have basically assumed plim convergence for random
00:42:22.000 --> 00:42:31.000
variables once i also mentioned that there is a different convergence concept like convergence
00:42:31.000 --> 00:42:39.000
in mean squares so that the mean squared error converges to zero that is even stronger than
00:42:40.000 --> 00:42:48.000
plim convergence but here i introduce you to a concept for the convergence of random variables
00:42:48.000 --> 00:42:56.000
which is even weaker than plim convergence namely convergence in distribution and the
00:42:56.000 --> 00:43:02.000
definition of convergence and distribution says that a series of random variables xn
00:43:02.000 --> 00:43:09.000
converges in distribution to a random variable x so if your random variable x would be the limit
00:43:09.000 --> 00:43:16.000
distribution if it is true that the distribution function so the cdf the cumulative distribution
00:43:16.000 --> 00:43:26.000
function of this series of random variables here converges to the cdf of the random variable x
00:43:27.000 --> 00:43:36.000
and so if it is true that the limit of fn of x is equal to x for every point x at which f is
00:43:36.000 --> 00:43:44.000
continuous now that's a technicality continuity here and then of course fn and f denote the cdfs
00:43:44.000 --> 00:43:51.000
of xn and of x respectively as a shorthand notation what you typically read in the literature
00:43:51.000 --> 00:43:59.000
and then what you what i'll use in this lecture here is that we write xn converges in distribution
00:43:59.000 --> 00:44:10.000
that's what this little b here says to x meaning that the sequence of cdfs converges to the cdf of
00:44:10.000 --> 00:44:18.000
capital x now as i already said plim convergence implies this convergence in distribution
00:44:18.000 --> 00:44:24.000
so if a variable a sequence of variables converges in the plim sense to some limit
00:44:25.000 --> 00:44:32.000
then it also converges in distribution but the reverse is unlikely not true so a convergence
00:44:32.000 --> 00:44:40.000
in distribution is a weaker concept now suppose that our assumption a5 about the normality of
00:44:40.000 --> 00:44:47.000
the error terms does not hold which is often the case in empirical work then you recall that we
00:44:47.000 --> 00:44:53.000
had the central limit theorem which says that if a random variable epsilon i is iid
00:44:55.000 --> 00:45:01.000
and satisfies the properties that its expectation is zero and its variance is constant as sigma
00:45:01.000 --> 00:45:10.000
square then it is true that one over the square root of n times the sum of the epsilon i converges
00:45:10.000 --> 00:45:17.000
in distribution to the distribution function of a standard normal no not standard now to a
00:45:17.000 --> 00:45:24.000
normally distributed variable with expectation zero and variance sigma square and that as i
00:45:24.000 --> 00:45:32.000
have already emphasized many times in this lecture is a very strong result because this holds for any
00:45:32.000 --> 00:45:39.000
distribution of the epsilon i's right regardless of how the epsilon i's are distributed as long as
00:45:39.000 --> 00:45:45.000
the epsilon i's are all iid so independently identically distributed we would know that one
00:45:45.000 --> 00:45:54.000
over the square root of n times the sum over the n epsilon i's converges to a normally distributed
00:45:54.000 --> 00:46:03.000
variable as the number of observations n grows to infinity i give you here as an exercise
00:46:04.000 --> 00:46:10.000
a partial proof of the central limit theorem actually the part of the proof which is hard
00:46:10.000 --> 00:46:18.000
to prove namely that is truly a normal distribution to which the distribution converges in distribution
00:46:18.000 --> 00:46:24.000
this i do not let you show as an exercise that would be too hard but what i ask you to show are
00:46:24.000 --> 00:46:30.000
the easy parts of the proof namely that the expectation of one over square root of n times
00:46:30.000 --> 00:46:37.000
the sum of all the epsilon i's is equal to zero for all n regardless how many observations we have
00:46:38.000 --> 00:46:42.000
and that the variance of one over square root of n times the sum of all the epsilon i's
00:46:42.000 --> 00:46:49.000
is equal to sigma square for all n's so essentially what you are to prove is that the zero here is
00:46:49.000 --> 00:46:56.000
true and the sigma square is true but what we are not to prove is that the distribution function is
00:46:56.000 --> 00:47:01.000
normal distortion right so that's the hard part and the other parts a and b here are very easy
00:47:01.000 --> 00:47:09.000
actually i i realized that some of you expect the exercises and the proofs i asked you for are much
00:47:09.000 --> 00:47:15.000
more difficult than they actually are mostly the exercises the proofs i asked you for and the
00:47:15.000 --> 00:47:22.000
exercises are easy proofs sometimes just one or two lines and so if you find an easy proof
00:47:22.000 --> 00:47:29.000
then you can be quite confident that at least no greater level of difficulty is required i cannot
00:47:29.000 --> 00:47:35.000
promise that it is correct but you you may be completely satisfied with the one or two line
00:47:35.000 --> 00:47:46.000
proof in many cases of my exercises all right um now look at assumptions a7 and a8 again of which
00:47:47.000 --> 00:47:54.000
we know that they are sufficient assumptions for the convergence in the plimps sense of the least
00:47:54.000 --> 00:48:02.000
squares estimator as a plimps convergence implies convergence distribution we would also know that
00:48:02.000 --> 00:48:09.000
these assumptions a7 and a8 are sufficient for convergence and distribution of the least squares
00:48:09.000 --> 00:48:18.000
estimator so where exactly does least squares estimator beta hat converge to when the number
00:48:18.000 --> 00:48:27.000
of observations increases you may possibly now say well it converges to a normal distribution
00:48:28.000 --> 00:48:36.000
i want to show you that this is not the case right for very easy reason actually but be warned
00:48:36.000 --> 00:48:43.000
that you have to be careful here look at the distribution of beta hat well we know
00:48:44.000 --> 00:48:54.000
that under assumption a7 and a8 the asymptotic expectation of beta of beta hat is actually
00:48:55.000 --> 00:49:03.000
beta because we know that the estimator beta hat has beta as its plimps i should actually change
00:49:03.000 --> 00:49:10.000
the slide a little bit and not only write the symbol here which means that beta hat is distributed
00:49:11.000 --> 00:49:19.000
as something which has this expectation and this variance but i should put a little a over this
00:49:19.000 --> 00:49:25.000
tilde here to make clear that it is the asymptotic distribution it is not the finite sample
00:49:25.000 --> 00:49:29.000
distribution you may misread this line actually as saying that it would be the finite sample
00:49:29.000 --> 00:49:35.000
distribution that's wrong and this what i mean here by this tilde is the asymptotic distribution
00:49:35.000 --> 00:49:40.000
and now that i read it again i see that perhaps it would be better if i put a little a over it to
00:49:40.000 --> 00:49:48.000
indicate asymptotic distribution anyway the asymptotic value to which beta hat converges is
00:49:48.000 --> 00:49:54.000
beta as we know and we also know asymptotically the variance would be sigma squared u times x prime x
00:49:54.000 --> 00:50:04.000
inverse c now beta hat minus beta therefore is distributed as asymptotically distributed
00:50:04.000 --> 00:50:12.000
as a variable with expectation zero and finite and asymptotic variance sigma square u times x
00:50:12.000 --> 00:50:18.000
prime x inverse c as before so i just subtract beta on both sides of the sign here get beta hat minus
00:50:18.000 --> 00:50:28.000
beta now what does this imply this implies that beta hat minus beta is distributed as a random
00:50:28.000 --> 00:50:37.000
variable with expectation asymptotic expectation zero and then well i introduce those factors one
00:50:37.000 --> 00:50:45.000
over n and one over n here where again please you remember that the one over n in those parentheses
00:50:45.000 --> 00:50:53.000
taken inverse right is equal to n so n times one over n cancels and actually i haven't changed
00:50:53.000 --> 00:51:00.000
anything in this notation as opposed to this notation here because the factor one over n
00:51:00.000 --> 00:51:09.000
cancels against the factor one over n inverse c here but writing it this way you see that this
00:51:09.000 --> 00:51:18.000
term here one over an x prime x inverse c by assumption a seven converges to a matrix sigma
00:51:18.000 --> 00:51:25.000
x x inverse c and that's a finite matrix as we know right so essentially now we see here
00:51:25.000 --> 00:51:33.000
that the variance is one over n sigma square u times a finite matrix now what happens with sigma
00:51:33.000 --> 00:51:40.000
square u times a finite matrix when n goes to infinity obviously uh the matrix which is
00:51:40.000 --> 00:51:47.000
the covariance matrix of beta hat minus beta becomes smaller and smaller as n increases
00:51:47.000 --> 00:51:52.000
and when n goes to infinity the asymptotic covariance matrix here will be just zero
00:51:53.000 --> 00:52:00.000
right so this whole thing converges to zero which means that we would have a degenerate
00:52:00.000 --> 00:52:06.000
distribution actually no did normal distribution or anything like that but just a degenerate
00:52:06.000 --> 00:52:13.000
distribution beta hat minus beta would asymptotically have an expectation of zero
00:52:14.000 --> 00:52:23.000
and no variance anymore right so basically the least squares estimator shrinks to exactly the
00:52:23.000 --> 00:52:31.000
true parameter beta with no variation anymore right so the distribution is asymptotically
00:52:31.000 --> 00:52:41.000
degenerate does any of you have an idea how this relates to the result which i had previously
00:52:43.000 --> 00:52:50.000
stated uh where did i stated um now perhaps i haven't even stated it in that form yet it is
00:52:50.000 --> 00:52:57.000
still to come but um i had at least given you an idea of the result which i'm aiming at
00:52:57.000 --> 00:53:04.000
that we have asymptotically a normal distribution for our least squares estimator what i have shown
00:53:04.000 --> 00:53:10.000
you here is that the least squares estimator does not asymptotically follow a normal distribution
00:53:10.000 --> 00:53:16.000
but rather shrinks to the true or the distribution shrinks to the true parameter value beta which is
00:53:16.000 --> 00:53:22.000
completely in line with the fact that the plim of beta hat is equal to beta right because as you know
00:53:22.000 --> 00:53:29.000
plim convergence says that the probability of beta hat being at least one epsilon apart from beta
00:53:29.000 --> 00:53:35.000
that this probability goes to zero so beta hat actually becomes closer and closer to beta with
00:53:35.000 --> 00:53:41.000
no variance at all asymptotically so where does now the normal distribution come in is there
00:53:41.000 --> 00:53:48.000
anybody of you who recalls this from under graded econometrics or has an idea from what i said before
00:54:00.000 --> 00:54:06.000
now volunteer well the the thing is actually trivial but i just want to draw your attention
00:54:06.000 --> 00:54:11.000
to the fact the non-degenerate distribution which is then actually an asymptotically normal
00:54:11.000 --> 00:54:19.000
distribution is only obtained if we scale the beta hat minus beta by a factor of square root of n
00:54:20.000 --> 00:54:26.000
right so it is not beta hat minus beta which is normally distributed it is not beta hat which is
00:54:26.000 --> 00:54:33.000
normally distributed but only square root of n times beta hat minus beta is normally distributed
00:54:34.000 --> 00:54:42.000
by multiplying beta hat minus beta by square root of n we prevent
00:54:43.000 --> 00:54:51.000
this term here from shrinking to a single point right when n increases
00:54:52.000 --> 00:54:58.000
this term here actually increases too whereas beta hat shrinks to beta as we know
00:54:58.000 --> 00:55:05.000
the product of the two then converges to a constant distribution and in particular to a
00:55:05.000 --> 00:55:14.000
normal distribution right so we need the scaling factor square root of n to get another distribution
00:55:15.000 --> 00:55:23.000
and when we replace the assumption a 6.2 by the slightly stronger assumption 6.3 so when we really
00:55:23.000 --> 00:55:30.000
assume stochastic independence of xi and ui and not only that the variables be
00:55:30.000 --> 00:55:36.000
contemporaneously uncorrelated then it is possible to show that under assumption a1
00:55:37.000 --> 00:55:45.000
multi-two multi-two a3 a4 and a6.3 we have the asymptotic normality of the least squares
00:55:45.000 --> 00:55:51.000
estimator in the sense that the square root of n times beta hat minus beta converges in distribution
00:55:52.000 --> 00:56:01.000
to a normal distribution right with expectation zero and with covariance matrix sigma square u
00:56:01.000 --> 00:56:11.000
times sigma xx inverse so that's the limit the plim of this matrix x prime x divided by n
00:56:11.000 --> 00:56:19.000
inverse okay but you need the scaling factor square root of n and i mean this is actually
00:56:20.000 --> 00:56:26.000
clear if you look at this at this expression here again because you see this term here converges to
00:56:26.000 --> 00:56:35.000
something which is finite namely sigma xx inverse and the fact that beta hat shrinks to a degenerate
00:56:35.000 --> 00:56:42.000
distribution at the point beta comes from the fact that in the variance term we have one over n here
00:56:43.000 --> 00:56:51.000
right so when we primarily apply beta hat minus beta by square root of n then this increases the
00:56:51.000 --> 00:56:57.000
standard deviation of beta hat minus beta so taking the square of this standard deviation
00:56:58.000 --> 00:57:05.000
we would have to square the square root of n to get a factor n and this factor n in terms of
00:57:05.000 --> 00:57:12.000
variances then exactly cancels against the factor one over n here leaving just as the variance sigma
00:57:12.000 --> 00:57:22.000
square u times this fixed matrix sigma xx inverse c which you see here right so when we want to run
00:57:22.000 --> 00:57:31.000
tests invoking asymptotic arguments we can make use of this result here but we always have to
00:57:31.000 --> 00:57:39.000
pre-multiply our test statistic by a factor square root of n right typically the test statistic would
00:57:39.000 --> 00:57:48.000
be constructed as beta hat minus the suspected vector of value of beta or the hypothesized
00:57:48.000 --> 00:57:53.000
value of beta which we test under the null hypothesis so it can easily be that we test
00:57:53.000 --> 00:58:01.000
a null hypothesis of beta being equal to zero right so beta hat minus the suspected value of beta
00:58:01.000 --> 00:58:07.000
typically divided by the standard deviation and then pre-multiplied by the square root of n
00:58:10.000 --> 00:58:18.000
so the practical relevance of this result perhaps in somewhat easier terms is the following
00:58:18.000 --> 00:58:25.000
even if the shocks ui are not normally distributed we may use the normal distribution for inference
00:58:25.000 --> 00:58:35.000
in large samples under assumptions a1, a3, a4 and a6.3 because then each regression coefficient beta
00:58:35.000 --> 00:58:42.000
k hat which we have estimated is approximately normally distributed with an expected value of
00:58:42.000 --> 00:58:49.000
beta k so of the true parameter and a standard error of sigma divided by the square root of n
00:58:51.000 --> 00:58:58.000
times well the square root of c bar kk and c bar kk was the diagonal element of the sigma xx inverse
00:58:59.000 --> 00:59:08.000
matrix. Therefore for sufficiently large number of observations n we can use the t-test and we can
00:59:08.000 --> 00:59:14.000
use the f-test and even chi-square tests so all these tests which are based on the normal distribution
00:59:14.000 --> 00:59:19.000
we can use if the number of observations is sufficiently large and then of course the question
00:59:19.000 --> 00:59:23.000
is what exactly does this mean sufficiently large and there is no rule of thumb which I can
00:59:24.000 --> 00:59:31.000
give you there but certainly not an exact mathematical result because it always depends
00:59:31.000 --> 00:59:37.000
on how far is the distribution of the shocks ui from the normal distribution right if these shocks
00:59:37.000 --> 00:59:44.000
ui were normally distributed well then we didn't even need to invoke any asymptotic arguments then
00:59:44.000 --> 00:59:50.000
we can just use the t-st distribution as you know use f-tests and t-distribution is very close to
00:59:50.000 --> 00:59:56.000
the normal distribution already for 25 or 30 observations so no problem if the ui's are
00:59:56.000 --> 01:00:02.000
normally distributed if they are not normally distributed it may be that still their distribution
01:00:02.000 --> 01:00:07.000
is quite similar to the normal distribution so it's quite close to the normal distribution
01:00:07.000 --> 01:00:14.000
and it may well be that with say 30 40 50 observations you can already invoke asymptotic
01:00:14.000 --> 01:00:20.000
arguments with very little error but if it is the case that the ui's are distributed
01:00:22.000 --> 01:00:29.000
in a way which is very very far off the normal distribution then obviously it takes more
01:00:29.000 --> 01:00:37.000
observations than just 30 40 50 to have a good approximation of the distribution of the regression
01:00:37.000 --> 01:00:44.000
coefficients through the normal distribution there's no rule of thumb which may tell us when
01:00:44.000 --> 01:00:50.000
we have sufficiently many observations who generate or to use the normal distribution
01:00:50.000 --> 01:00:59.000
for testing purposes in order to give you some feeling for that i again ask you to
01:01:00.000 --> 01:01:10.000
explore this issue in MATLAB or R or whatever you want to use so the setup is as in the previous
01:01:10.000 --> 01:01:15.000
exercises and you can probably make use of parts of the code you've written for previous exercises
01:01:16.000 --> 01:01:24.000
generate again let's say 100 by 3 matrix of regressors called x just use random numbers
01:01:24.000 --> 01:01:30.000
for this and then again use a do loop which you run through let's say a thousand
01:01:31.000 --> 01:01:41.000
times so in each loop you generate 100 by one random vector of shocks uj but now don't use
01:01:41.000 --> 01:01:46.000
normally distributed shocks because then the issue is trivial and we have a normal distribution of
01:01:46.000 --> 01:01:53.000
the irrigation coefficients but here this exercise i asked you to use uniformly distributed
01:01:53.000 --> 01:02:00.000
disturbances right uniform distribution would mean on some interval let's say from negative
01:02:00.000 --> 01:02:10.000
a to a you have the same probability constant density for each possible value of the error term
01:02:10.000 --> 01:02:15.000
and that would be a uniform distribution and typically in MATLAB and in Gauss there are
01:02:15.000 --> 01:02:22.000
commands which just generate you uniformly distributed disturbances right so and generate
01:02:22.000 --> 01:02:32.000
the observations yj this whole vector the j refers to the loop which you run through a thousand times
01:02:32.000 --> 01:02:40.000
so that's a vector y in loop j generated as x times some parameters and you may take them
01:02:40.000 --> 01:02:46.000
as one one one but you can also use different values here class shocks which are uniformly
01:02:46.000 --> 01:02:51.000
distributed okay so and then you have the regressor matrix and you have your
01:02:53.000 --> 01:03:01.000
dependent variable which you observe then estimate beta that would be beta hat in loop j
01:03:01.000 --> 01:03:08.000
beta hat j as x prime x inverse the x prime yj and save this estimate and do this a thousand times
01:03:08.000 --> 01:03:17.000
and then take all the beta hat j's and plot a histogram for each component right you in each
01:03:17.000 --> 01:03:23.000
beta hat j you have three estimates uh for this coefficient that coefficient and that coefficient
01:03:24.000 --> 01:03:30.000
so for each of these coefficients plot a histogram and have a look at it whether the
01:03:30.000 --> 01:03:36.000
histograms are approximately normal right if it looks like the central limit theorem already holds
01:03:36.000 --> 01:03:44.000
for hundred observations if the arrows follow a uniform distribution rather than a normal
01:03:44.000 --> 01:03:51.000
distribution how can you check that well either you just look at the shape of the the histogram
01:03:51.000 --> 01:03:57.000
and see whether this has this kind of typical bell shape for the normal distribution or if you want
01:03:57.000 --> 01:04:04.000
to be a little more exact then you can also check whether the third and fourth moment of the
01:04:04.000 --> 01:04:09.000
distribution are close to those of a normal distribution by computing the schooness of the
01:04:09.000 --> 01:04:17.000
distribution which should be zero if the distribution is is normal and by computing the
01:04:17.000 --> 01:04:22.000
kurtosis of the fourth moment which should actually be three on that normal distribution and you will
01:04:22.000 --> 01:04:30.000
have to see whether the kurtosis is approximately three for your three estimates here okay
01:04:30.000 --> 01:04:38.000
and then of course you can do the same thing with uh higher numbers of n's so that would be a little
01:04:38.000 --> 01:04:43.000
effort once you've made the first programming uh for the previous exercise it's very easy to just
01:04:43.000 --> 01:04:50.000
increase n right and use n equal to two hundred five hundred thousand two thousand five thousand
01:04:50.000 --> 01:04:57.000
whatever you want to see and observe how the histograms change as n increases right and
01:04:57.000 --> 01:05:01.000
observe what happens to the schooness and to the kurtosis of the histograms if n increases
01:05:02.000 --> 01:05:10.000
when you do this exercise you should see that the schooness approaches zero when you use the
01:05:12.000 --> 01:05:16.000
when you use more and more observations and the kurtosis approaches three right and of course the
01:05:16.000 --> 01:05:21.000
histograms should look more and more like the histogram of a normal distribution so i think
01:05:21.000 --> 01:05:27.000
this would be a very useful exercise to get some understanding of what happens if the shocks are not
01:05:27.000 --> 01:05:35.000
normal in this case uh uniformly distributed now the uniform distribution is in some kind of very
01:05:35.000 --> 01:05:42.000
special distribution because if you have a normal if you have a uniform distribution distribution
01:05:42.000 --> 01:05:49.000
which ranges let's say from negative a to plus a then you are sure that there will never be any
01:05:49.000 --> 01:05:58.000
shock which is larger in size than a or smaller in size than negative a so the uniform distribution
01:05:58.000 --> 01:06:05.000
really fixes you a certain range for your shocks and you know shocks extremer the negative a or
01:06:05.000 --> 01:06:15.000
plus a will never occur this is a property which in some ways nice for the uniform distribution it
01:06:15.000 --> 01:06:21.000
is also a little unrealistic we must say because we never have this knowledge in reality and even
01:06:21.000 --> 01:06:28.000
under a normal distribution you could never fix any number saying well i'm sure that the shock will
01:06:28.000 --> 01:06:34.000
not exceed this number in absolute value there will always be some probability that even more
01:06:34.000 --> 01:06:41.000
extreme shocks occur than any given number so in that sense not this uniform distribution
01:06:41.000 --> 01:06:50.000
is very far off the properties of a normal uh so you can redo now the previous two exercises
01:06:50.000 --> 01:06:56.000
using a distribution for the error which in some sense is just the opposite of the uniform
01:06:56.000 --> 01:07:03.000
distribution while the uniform distribution basically has no probability mass in the distant
01:07:03.000 --> 01:07:09.000
tails of the distribution right more distant than negative a and a there's no probability mass for
01:07:09.000 --> 01:07:16.000
any event anymore there are distributions which have considerably more probability mass in the
01:07:16.000 --> 01:07:21.000
tails of the distribution than the normal distribution and one such distribution which
01:07:21.000 --> 01:07:28.000
you may easily simulate in a computer is the so-called cushy distribution the cushy distribution
01:07:28.000 --> 01:07:36.000
is just the t distribution with just one degree of freedom and any program like gauss metlab r
01:07:36.000 --> 01:07:43.000
gives you commands to generate the t distribution and specify the degrees of freedom
01:07:44.000 --> 01:07:50.000
so specify a t distribution normal students t distribution with just one degree of freedom
01:07:50.000 --> 01:07:57.000
that's a very particular case actually of the t distribution called the cushy distribution
01:07:58.000 --> 01:08:05.000
which is difficult because mathematically a random variable which is cushy distributed
01:08:05.000 --> 01:08:14.000
doesn't even have an expected value technical reasons uh for that but the cushy distribution
01:08:14.000 --> 01:08:19.000
doesn't have anywhere defined moments it doesn't have an expected value it doesn't have a variance
01:08:19.000 --> 01:08:25.000
it doesn't have a schooness it doesn't have a kurtosis so it is indeed a distribution which
01:08:25.000 --> 01:08:32.000
is very very different from the normal distribution and actually basically just with changing one
01:08:32.000 --> 01:08:40.000
command hardly more than two or three keystrokes you can change the programs you have used in the
01:08:40.000 --> 01:08:47.000
previous exercises to have the error terms generated by cushy distribution rather than by
01:08:48.000 --> 01:08:57.000
a uniform distribution and then run the exercises again and see how fast the distribution of the
01:08:57.000 --> 01:09:05.000
estimated regression coefficients beta hat converges to a normal distribution even though the
01:09:05.000 --> 01:09:13.000
errors which generate your observations are coming from cushy distribution from t distribution with
01:09:13.000 --> 01:09:23.000
just one degree of freedom okay i really recommend that you do those exercises um are there any
01:09:23.000 --> 01:09:36.000
questions always worries me when there are no questions i hope you can follow what i've said
01:09:36.000 --> 01:09:42.000
here um but i cannot do more than just offer you the possibility to pose questions
01:09:46.000 --> 01:09:49.000
all right if not now uh you have another chance tomorrow morning tomorrow afternoon
01:09:49.000 --> 01:09:56.000
let's turn to section two of this set of slides estimation problems so we have basically now
01:09:56.000 --> 01:10:03.000
established the theory of least squares estimation both with finite sample properties and asymptotic
01:10:03.000 --> 01:10:08.000
properties of the least squares estimator now let's talk about estimation problems and the
01:10:08.000 --> 01:10:13.000
first problem i would like to discuss is multicollinearity you're probably aware of
01:10:13.000 --> 01:10:20.000
multicollinearity my experience is that in undergraduates um uh professors seem to over
01:10:20.000 --> 01:10:27.000
emphasize uh multicollinearity because when i speak to students in the bachelor's program
01:10:28.000 --> 01:10:34.000
say they want to write their bachelor thesis with me and i ask them to do some estimation or when
01:10:34.000 --> 01:10:38.000
they are in a seminar where they have to do some econometric estimation then they always come up
01:10:38.000 --> 01:10:44.000
with multicollinearity and basically if they have any problems in their in their estimates then they
01:10:44.000 --> 01:10:50.000
always first suspect the family multicollinearity actually this is usually not the case why
01:10:50.000 --> 01:10:56.000
multicollinearity is a relevant problem it is typically by far not the most important
01:10:56.000 --> 01:11:03.000
problem and multicollinearity typically typically can be dealt with really easily it is easy to
01:11:03.000 --> 01:11:11.000
detect multicollinearity and it is easy to cure the problem and so mostly you should not have
01:11:11.000 --> 01:11:19.000
problems with multicollinearity but of course you should be well equipped to detect multicollinearity
01:11:19.000 --> 01:11:25.000
or approximate multicollinearity and to do what needs to be done if there is such a problem so
01:11:25.000 --> 01:11:32.000
i will discuss multicollinearity here but i will also warn you that when you have certain problems
01:11:32.000 --> 01:11:39.000
in your estimation if it is multicollinearity then you should really plan out about that easily and
01:11:39.000 --> 01:11:47.000
solve it easily if it is not solved easily then typically it is not multicollinearity which is
01:11:47.000 --> 01:11:54.000
the source of your problem okay so i use new notation here the regressor matrix x i now write
01:11:54.000 --> 01:12:03.000
as x index one x index two up to index index k so x one up to x k small x's are just the columns
01:12:03.000 --> 01:12:09.000
of our regressor matrix x previously i used the index i to denote the rows of the regressor
01:12:09.000 --> 01:12:15.000
matrix x right when i had this sum of x i x i prime i talked about in today's lecture but now
01:12:15.000 --> 01:12:22.000
these are the columns okay change the notation suppose that two of those columns are linearly
01:12:22.000 --> 01:12:32.000
dependent so for instance x i is equal to alpha times x j obviously this is the most easy case
01:12:32.000 --> 01:12:40.000
of multicollinearity which i now discuss with you um as you know linearly dependent can also mean
01:12:40.000 --> 01:12:48.000
that x i is a weighted sum of more than just one column so we could also look at multicollinearity
01:12:48.000 --> 01:12:56.000
as x i being equal to alpha one x j plus alpha two x k for instance plus
01:12:57.000 --> 01:13:02.000
further alphas and further columns right so any kind of linear combination of other columns
01:13:03.000 --> 01:13:09.000
being equal to one specific column like x i would be a case of multicollinearity but to save on
01:13:09.000 --> 01:13:16.000
notation and to expose the problem which now resides most clearly i just use this completely
01:13:16.000 --> 01:13:24.000
trivial form of multicollinearity where one column is just some constant times the second
01:13:24.000 --> 01:13:31.000
column right i and j are of course different indices now our regression model is beta one
01:13:31.000 --> 01:13:39.000
x one plus plus plus beta i column x i plus beta j column x j so a single out here the two columns
01:13:39.000 --> 01:13:46.000
which are related by this multicollinearity problem problem here plus plus plus beta k
01:13:46.000 --> 01:13:55.000
times x k plus vector u what i want to show you is this regression model here is completely
01:13:55.000 --> 01:14:02.000
indistinguishable from the following regression model y is equal to beta one x one plus plus plus
01:14:02.000 --> 01:14:11.000
nothing changes and now beta i plus one times x i so i just increase this regression coefficient
01:14:11.000 --> 01:14:22.000
here by one i have beta i plus one here now plus beta j minus alpha times x j so
01:14:23.000 --> 01:14:29.000
in order to compensate for the increase in this regression coefficient here i decrease
01:14:30.000 --> 01:14:37.000
this regression coefficient beta j and transform it to beta j minus alpha x j
01:14:38.000 --> 01:14:44.000
plus plus plus the rest is unchanged that obviously what have we done we have added zero
01:14:44.000 --> 01:14:52.000
to the equation because i have added x i and i have subtracted negative alpha x j
01:14:53.000 --> 01:15:00.000
since x i is the same thing as alpha x j this regression equation is equally true
01:15:01.000 --> 01:15:07.000
as this regression equation here explains the observations equally well and then the problem
01:15:07.000 --> 01:15:16.000
is of course for the least squares procedure which regression coefficients shall it identify
01:15:16.000 --> 01:15:23.000
as the true regression coefficients as the best regression coefficients for the objective to
01:15:23.000 --> 01:15:29.000
minimize the sum of least squares and the least squares procedure if you just think of it as a
01:15:29.000 --> 01:15:36.000
person for for a moment right would say well the beta i and the beta j are just as good as the
01:15:36.000 --> 01:15:44.000
regression coefficients beta i plus one and beta j minus alpha they just give exactly the same sum
01:15:44.000 --> 01:15:51.000
of squares as the minimum right so i have two minima and actually when the when the least squares
01:15:51.000 --> 01:15:56.000
procedure thinks about the problem a little longer it will find out there are not just two minima
01:15:56.000 --> 01:16:02.000
which give exactly the same value of the sum of least squares but there are infinitely many
01:16:03.000 --> 01:16:12.000
because also this model here is indistinguishable from this and that model this model here would
01:16:12.000 --> 01:16:20.000
say y is equal to beta one x one plus plus plus nothing changed now beta i plus some lambda some
01:16:20.000 --> 01:16:27.000
arbitrary lambda taken from the space of real numbers times x i plus beta j minus alpha times
01:16:27.000 --> 01:16:37.000
lambda x j right so we can take any real number lambda here and increase this regression coefficient
01:16:37.000 --> 01:16:45.000
by lambda and then decrease this regression coefficient by alpha times lambda right and
01:16:45.000 --> 01:16:51.000
again we would have added just zero to the initial regression equation because lambda times x i is
01:16:51.000 --> 01:17:00.000
equal to lambda times alpha times x j right so the regression equation has changed but the coefficients
01:17:00.000 --> 01:17:04.000
are different of course from this coefficient here as long as lambda is different from one
01:17:04.000 --> 01:17:08.000
right so that these two coefficients are also different and of course they are different from
01:17:08.000 --> 01:17:15.000
beta i and beta j so essentially any regression coefficient can be chosen for beta i and we would
01:17:15.000 --> 01:17:22.000
always find a regression coefficient for beta j which exactly offsets the beta i here in such a
01:17:22.000 --> 01:17:28.000
way that the sum of the squares is minimum therefore what we say is that the regression
01:17:28.000 --> 01:17:36.000
coefficients are just not identified right the least squares procedure is unable to find a unique
01:17:37.000 --> 01:17:43.000
minimum for the sum of lists for the sum of the squared residuals and therefore it cannot
01:17:44.000 --> 01:17:49.000
estimate beta i and beta j because infinitely many values of beta i and beta j
01:17:50.000 --> 01:17:54.000
would be exactly minimal for the least squares problem
01:17:56.000 --> 01:18:03.000
so that's the main message here in the case of exact multicollinearity the coefficients of x i
01:18:03.000 --> 01:18:11.000
and x j are not identified note that it is only a problem for the coefficients of x i and x j in
01:18:11.000 --> 01:18:16.000
this case because i have assumed that only x i and x j are connected to each other by this linear
01:18:16.000 --> 01:18:21.000
relationship here all the other regression coefficients are identified there's no problem
01:18:21.000 --> 01:18:29.000
estimating them but these here are not s i'm not i don't know so what would happen to the least
01:18:29.000 --> 01:18:37.000
squares estimator if we just plug in some matrix x which has this multicollinearity problem that
01:18:37.000 --> 01:18:43.000
x i is equal to alpha times x j or some more involved form of multicollinearity with more than
01:18:43.000 --> 01:18:54.000
just one column of x being able to reproduce another column of x i taking in the form of a
01:18:54.000 --> 01:19:02.000
weighted average now the problem is in this case here x i equal to alpha x j the x matrix does not
01:19:02.000 --> 01:19:11.000
have full rank right the rank of x is equal to is smaller than k because we have the exact linear
01:19:11.000 --> 01:19:20.000
uh the the exact linearity here between x i and x j and therefore when the rank of x is less than
01:19:20.000 --> 01:19:28.000
k so when the matrix x doesn't have full column rank then the rank of x prime x is smaller than k
01:19:28.000 --> 01:19:34.000
and therefore x prime x is not invertible and our least squares estimator just is not defined
01:19:35.000 --> 01:19:42.000
right so when you try something like this when you set up a regressor matrix which has an exact
01:19:42.000 --> 01:19:47.000
linearity and you try to estimate something your estimation routine will break down and the
01:19:47.000 --> 01:19:52.000
estimator will return an error message saying that x prime x inversely is not invertible
01:19:52.000 --> 01:19:59.000
right so you will not get any least squares result does this happen yes indeed it typically
01:19:59.000 --> 01:20:06.000
happens for beginners of econometrics which make excuse me if i say it stupid mistakes right so
01:20:06.000 --> 01:20:14.000
one thing which happens for for beginners sometimes is that they have dummy variables let's say for
01:20:14.000 --> 01:20:24.000
male and female so for instance they want to check the wage of workers and test whether male workers
01:20:24.000 --> 01:20:32.000
are receiving better pay than female workers for instance so for all the wages which they have as
01:20:32.000 --> 01:20:38.000
dependent observations they have as regressors information on whether this is paid to a male
01:20:38.000 --> 01:20:45.000
worker or to a female worker so they specify a dummy variable which takes on the value of one
01:20:45.000 --> 01:20:55.000
if worker i is male and takes on variable zero is if worker i is female and then they say okay
01:20:55.000 --> 01:21:01.000
this is a dummy variable for the male workers i also construct a dummy variable for the female
01:21:01.000 --> 01:21:09.000
workers so d two i is equal to zero if the worker is male and is one if the worker is if the worker
01:21:09.000 --> 01:21:15.000
is female actually so far this is not even a stupid mistake right you can you can do it that
01:21:15.000 --> 01:21:20.000
way you can define the dummy variable for the male workers dummy variable for the female worker
01:21:20.000 --> 01:21:28.000
but what you must not do is that you then use these two dummy variables joined with a constant
01:21:28.000 --> 01:21:35.000
term your regressor matrix because if you did that if you had a constant so just a vector of ones
01:21:36.000 --> 01:21:38.000
and you have one dummy variable for the male workers
01:21:40.000 --> 01:21:46.000
and you have one dummy variable for the female workers then obviously there is an exact linear
01:21:46.000 --> 01:21:51.000
dependency in your regressor matrix regardless of what other regressors you still have in this
01:21:51.000 --> 01:21:59.000
matrix because it is the case that the constant x one is equal to the male dummy plus this female
01:21:59.000 --> 01:22:06.000
right x one is equal to d one plus d two so these three columns here just have rank two
01:22:06.000 --> 01:22:13.000
they do not have rank three right there's an exact linear relationship and this already
01:22:13.000 --> 01:22:18.000
shows you what the cure of the problem is if you have an exact linear relationship
01:22:18.000 --> 01:22:26.000
then the easy cure is you just delete one of those three regressors and it is completely
01:22:26.000 --> 01:22:31.000
unimportant which of these three you throw out right you can throw out the constant you can throw
01:22:31.000 --> 01:22:36.000
out the male dummy or you can throw out the female dummy it doesn't play a role just plays
01:22:36.000 --> 01:22:41.000
a role for the interpretation of your regression coefficients but in terms of regression results
01:22:41.000 --> 01:22:47.000
you can easily compute the regression coefficients which you would get for the
01:22:48.000 --> 01:22:53.000
for any regressor you have deleted and the interpretation is always exactly the same
01:22:53.000 --> 01:23:00.000
regardless of which regressor you delete so this type of problem is easily cured
01:23:01.000 --> 01:23:08.000
note that in my example here i have not given you an example where x i is equal to alpha times x j
01:23:08.000 --> 01:23:18.000
but i have already used somewhat more evolved example of x one being the linear combination
01:23:18.000 --> 01:23:24.000
of two columns right linear combination with coefficient alpha one equal to one here and
01:23:24.000 --> 01:23:29.000
alpha two equal to one here so it's slightly more complicated but it's in principle the same thing
01:23:29.000 --> 01:23:36.000
right if you had this simple form of multiple linearity x i being equal to alpha times x j
01:23:36.000 --> 01:23:41.000
well then throw out alpha times then throw out variable x j or throw out variable x i
01:23:41.000 --> 01:23:44.000
you are just allowed to keep one variable in there but then everything is fine
01:23:44.000 --> 01:23:52.000
okay so these kind of things happen typically with dummy variables male female or with seasonal
01:23:52.000 --> 01:23:57.000
dummy variables right you get quarterly data you put in a dummy variable for the first quarter for
01:23:57.000 --> 01:24:02.000
the second quarter for the third quarter and for the fourth quarter in principle you could do this
01:24:02.000 --> 01:24:07.000
but then you're not allowed to put in a constant or what most people do put in a constant and a dummy
01:24:07.000 --> 01:24:12.000
variable for the first quarter for the second quarter for the third quarter but not anymore
01:24:12.000 --> 01:24:17.000
for the fourth quarter because the dummy variable for the fourth quarter would add up with the other
01:24:17.000 --> 01:24:22.000
three dummy variables to the constant right so these type of errors sometimes occur with
01:24:24.000 --> 01:24:31.000
beginners in econometrics but they are easily skewed now more common and more difficult is
01:24:31.000 --> 01:24:39.000
the case of approximate multicollinearity so let's say x i is just approximately equal to alpha times
01:24:39.000 --> 01:24:50.000
x j in this case when this is not an exact relationship here then the x matrix will still
01:24:50.000 --> 01:25:00.000
have rank k but it will be the case that the x prime x matrix has determinant which is close to
01:25:00.000 --> 01:25:07.000
zero you know a singular matrix has a determinant which is equal to zero so the x prime x matrix
01:25:07.000 --> 01:25:15.000
in this case would be close to being not invertible so the determinant would be close to zero
01:25:16.000 --> 01:25:23.000
what does this mean well the determinant is very important for determining the inverse of a matrix
01:25:24.000 --> 01:25:31.000
we need x prime x inverse in our old sst matrix and recall from multi from your linear algebra
01:25:31.000 --> 01:25:39.000
lectures that the inverse of a matrix is always the inverse of the determinant times some matrix
01:25:39.000 --> 01:25:45.000
which is called the adjoint the adjoint is the matrix of cofactors and i'm not sure if you recall
01:25:45.000 --> 01:25:49.000
this or if you've had this in linear algebra but if you haven't it's not so important right here
01:25:49.000 --> 01:25:56.000
just remember this is some matrix which always exists the adjoint of x prime x exists even in
01:25:56.000 --> 01:26:04.000
the case when x prime x is not invertible so it can always define it it's well defined the adjoint
01:26:05.000 --> 01:26:11.000
and the inverse is always the inverse of the determinant times the adjoint the inverse of
01:26:11.000 --> 01:26:16.000
the determinant does not exist if the determinant is zero because you cannot divide by zero
01:26:16.000 --> 01:26:25.000
right so in this case x prime x inverse would be very large because if the determinant is close
01:26:25.000 --> 01:26:32.000
to zero then we would have to take the inverse of the determinant here so we would divide something
01:26:33.000 --> 01:26:40.000
by a number which is very close to zero so which gives a huge number here right so determinant of
01:26:40.000 --> 01:26:45.000
x prime x inversely or one over the determinant of x prime x i should better say one over the
01:26:45.000 --> 01:26:52.000
determinant of x prime x would be a huge number and that will be multiplied by some matrix which
01:26:52.000 --> 01:26:58.000
is the adjoint of x prime x so we would get a huge covariance matrix x prime x inversely
01:26:59.000 --> 01:27:06.000
now actually the estimate of the covariance matrix would be sigma hat square u times x prime x
01:27:06.000 --> 01:27:11.000
inverse times this matrix here so the estimated standard errors of the regression coefficients
01:27:11.000 --> 01:27:18.000
will be large and therefore the regression coefficients tend to be insignificant that's
01:27:18.000 --> 01:27:27.000
a typical phenomenon of approximate multicollinearity that you see two or even more two regresses are
01:27:27.000 --> 01:27:33.000
insignificant but when you eliminate just one of the two regresses then the remaining one suddenly
01:27:33.000 --> 01:27:40.000
becomes becomes significant for two regressors again just experimenting on the on the specification
01:27:40.000 --> 01:27:46.000
of the regression this problem is easily seen and easily cured well eliminate one of the regressors
01:27:46.000 --> 01:27:53.000
right or at least eliminate the common component which a chance with another regressor which you
01:27:53.000 --> 01:27:59.000
can done by some type of auxiliary regression for instance and then you should see that the other
01:27:59.000 --> 01:28:07.000
regressor becomes significant if actually it is an important regressor and if multicollinearity was
01:28:07.000 --> 01:28:13.000
at the source of the what was the root of the problem so that is easily found out by just
01:28:13.000 --> 01:28:22.000
experimenting with the specification i have an exercise here for you suppose that y is equal to
01:28:22.000 --> 01:28:29.000
beta one x one plus beta two x two plus u and for simplicity also the means are equal to zero and
01:28:29.000 --> 01:28:37.000
all the variances are equal to one right in this case it is the case that the sample covariance
01:28:37.000 --> 01:28:46.000
between x one and x two is equal to the sample coefficient of correlation r one two so these
01:28:46.000 --> 01:28:52.000
assumptions here hold then you will get that the covariance between x one and x two speaking for
01:28:52.000 --> 01:28:58.000
the sample is equal to the coefficient of correlation so basically covariance is equal
01:28:58.000 --> 01:29:04.000
to coefficient of correlation so one over n x one prime x two or one over n x two prime x one is
01:29:04.000 --> 01:29:10.000
equal to r one two now we have approximate multicollinearity of course when the correlation
01:29:10.000 --> 01:29:17.000
is high or very low so either when it is close to plus one or close to negative one what you
01:29:17.000 --> 01:29:25.000
shall show in this exercise here is that the estimated variance of beta one hat is equal to
01:29:25.000 --> 01:29:34.000
the estimated variance of beta two hat namely sigma hat square u divided by n over divided by
01:29:34.000 --> 01:29:43.000
one minus the coefficient of correlation squared and then of course when the squared coefficient
01:29:43.000 --> 01:29:50.000
of correlation converges to one when it is close to one then you see of course that the variance
01:29:50.000 --> 01:29:56.000
increases to close to infinity so you would get huge standard errors
01:29:56.000 --> 01:30:00.000
and when you have to get a huge standard errors then of course your variables become insignificant.
01:30:02.000 --> 01:30:10.000
You can do the same thing in MATLAB here is an exercise which will read out I've written down
01:30:10.000 --> 01:30:18.000
everything I would just like to conclude today's lectures by giving you an example which we have
01:30:18.000 --> 01:30:26.000
already studied and namely our earlier regression of stock prices on total effector productivity
01:30:26.000 --> 01:30:34.000
and I consider in addition to total effector productivity a quality adjusted version of total
01:30:34.000 --> 01:30:40.000
effector productivity such data exist I call it tfpq for quality adjusted and this is how the data
01:30:40.000 --> 01:30:46.000
looks like you see the blue line is the regular tfp line which we had already used in this regression
01:30:46.000 --> 01:30:52.000
and the red line is quality adjusted total factor productivity so you both have an upward
01:30:52.000 --> 01:30:59.000
trend they don't look that similar actually and therefore we may think well why don't we test
01:30:59.000 --> 01:31:07.000
them both in our regression here's the regression of stock prices on a constant on a linear trend
01:31:07.000 --> 01:31:14.000
on total factor productivity on quality adjusted measure of total factor productivity and now you
01:31:14.000 --> 01:31:23.000
see both tfp and tfpq are insignificant the t statistics are smaller than two and see the
01:31:23.000 --> 01:31:30.000
standard errors just to remember this perhaps for the next outputs 2.6 here 3.8 there we don't know
01:31:30.000 --> 01:31:36.000
whether this big or not but we will just compare important here is both variables are insignificant
01:31:38.000 --> 01:31:44.000
now we may suspect that there is approximate collinearity due to tfp and tfpq being similar
01:31:44.000 --> 01:31:50.000
variables and actually measuring similar things so let's eliminate one of the regressors here I
01:31:50.000 --> 01:31:58.000
eliminate tfpq and suddenly as you see tfp is significant strongly indicating that we had a
01:31:58.000 --> 01:32:06.000
multi collinearity problem and the standard error is much much smaller is 0.8 now before it was
01:32:06.000 --> 01:32:16.000
2 point something right no it wasn't 2.6 2.6 so the standard error is much smaller now less than
01:32:16.000 --> 01:32:25.000
one third of what it was before so that's also nice and if I kick out tfp and rather leave tfpq
01:32:25.000 --> 01:32:31.000
in the regression then the standard error is much smaller again than the tfpq standard error before
01:32:31.000 --> 01:32:36.000
which was 3 point something and again the variable is significant highly significant actually at the
01:32:36.000 --> 01:32:43.000
one percent level it is significant so both variables taking as a single regressor in the
01:32:43.000 --> 01:32:48.000
regression are significant but if we have them jointly in the regression then both of them are
01:32:48.000 --> 01:32:56.000
not significant so that's a typical phenomenon of close or of approximate multi collinearity
01:32:57.000 --> 01:33:04.000
right okay this concludes today's lectures are there any remaining questions
01:33:06.000 --> 01:33:06.000
yes please
01:33:13.000 --> 01:33:17.000
how much do the book versions from green differ from each other do you remain recommend to buy
01:33:17.000 --> 01:33:21.000
the eighth version well no actually I mean I've said this already at the beginning of the lecture
01:33:22.000 --> 01:33:27.000
they don't differ by much you can use old versions of those textbooks so there's no
01:33:27.000 --> 01:33:31.000
problem with that the basic things which I explained here are just the same in each version
01:33:31.000 --> 01:33:36.000
of the textbook so take the cheapest you would like to take take an older version that's all true
01:33:38.000 --> 01:33:40.000
okay any other question
01:33:42.000 --> 01:33:48.000
then thanks for your attention and see you tomorrow after