WEBVTT - autoGenerated
00:00:00.000 --> 00:00:11.000
Good. We started the discussion of asymptotic properties of the least squares estimator
00:00:11.000 --> 00:00:19.000
at the end of the last lecture. You see the slide with which I stopped last Tuesday, which
00:00:19.000 --> 00:00:27.000
poses the question what happens to the properties of the least squares estimator if some or
00:00:27.000 --> 00:00:35.000
all of the basic assumptions, A1 to A5, which gave us the property, for instance, of the
00:00:35.000 --> 00:00:44.000
least squares estimator being a blue estimator, are not satisfied.
00:00:44.000 --> 00:00:49.000
In order to study this question, I introduced a slightly different representation of the
00:00:49.000 --> 00:00:59.000
least squares estimator than the one we have used so far. So not the matrix expression,
00:00:59.000 --> 00:01:06.000
which I hope you are now used to, x prime x inverse e x prime y, or if you want to write
00:01:06.000 --> 00:01:13.000
it in terms of the error terms, beta plus x prime x inverse e x prime u, which would
00:01:13.000 --> 00:01:19.000
be the same thing. This we have used many times now in previous lectures, but I showed
00:01:19.000 --> 00:01:24.000
to you, and I hope you have convinced yourself of the truth of my demonstration, that you
00:01:24.000 --> 00:01:32.000
can equally write this least squares estimator. Excuse me. This least squares estimator is
00:01:32.000 --> 00:01:41.000
beta plus a product of two matrices, or actually these two matrices, x prime x inverse e and
00:01:41.000 --> 00:01:50.000
x prime u. In this representation where you make the sums explicit, so here we sum vectors
00:01:50.000 --> 00:02:00.000
xi and xi prime, both of them k by 1 or 1 by k, and here xi and ui are multiplied by
00:02:00.000 --> 00:02:06.000
each other and then summed over all n, and this is exactly the same expression as the
00:02:06.000 --> 00:02:13.000
one here, just in different notation. So we do this because we want to make explicit
00:02:13.000 --> 00:02:21.000
what happens when the number of observations n grows and goes to infinity. So now we have
00:02:21.000 --> 00:02:28.000
the explicit notation here, which shows us where n plays a role in these expressions.
00:02:28.000 --> 00:02:33.000
This is the upper limit of these sums here, and then I have also introduced factor 1 over
00:02:33.000 --> 00:02:39.000
n to get sample means essentially, at least here where we have sample observations. Of
00:02:39.000 --> 00:02:47.000
course, we do not sample the ui, so we cannot have sample observations. This u term here,
00:02:47.000 --> 00:02:52.000
but we do know the xi's. We can then think about the question, what happens when the
00:02:52.000 --> 00:03:01.000
sample size goes to infinity? Now I have already given you the definition of convergence in
00:03:01.000 --> 00:03:06.000
a probability. I repeat it here for your convenience, but won't discuss it any longer.
00:03:06.000 --> 00:03:12.000
You know, there is the notion of a random variable converging in probability if the
00:03:12.000 --> 00:03:23.000
probability for the random variable being very close to, no, being more than epsilon
00:03:23.000 --> 00:03:30.000
apart from some value x, if this probability goes to zero. Then we say that the plim of
00:03:30.000 --> 00:03:36.000
xn is equal to x. And this we had already, and here it is precisely the same definition
00:03:36.000 --> 00:03:44.000
just written in matrix terms as the one we have already discussed. We also know that
00:03:44.000 --> 00:03:51.000
a estimator is called consistent if it is true that the plim of this estimator is equal
00:03:51.000 --> 00:03:58.000
to x. So this definition is also not new. A random variable xn is a consistent estimator
00:03:58.000 --> 00:04:08.000
for a constant x if and only if the plim of xn is equal to x. So if the random variable
00:04:08.000 --> 00:04:18.000
converges for n going to infinity to the limit x, if convergence is defined in the plim sense,
00:04:18.000 --> 00:04:30.000
meaning that the probability of the estimator being of x goes to zero. Now obviously xn can
00:04:30.000 --> 00:04:39.000
for instance be an expression, something like that here. So the sum over xi xi prime divided
00:04:39.000 --> 00:04:48.000
by n is just a random variable, a random variable which depends on n. So one example of an estimator
00:04:48.000 --> 00:04:55.000
for some unknown parameter or matrix of parameters in this case could be such an expression here
00:04:55.000 --> 00:05:01.000
and we could denote this just by xn. You will notice that this expression here is precisely
00:05:01.000 --> 00:05:07.000
the same expression as the one we have here where we take the inverse of x prime x. So
00:05:07.000 --> 00:05:16.000
this is actually x prime x, same thing here. This is x prime x divided by n. We also had
00:05:16.000 --> 00:05:23.000
already the law of large numbers which I also repeat here for your convenience which
00:05:23.000 --> 00:05:31.000
says that the sample average of independently and identically distributed random variables
00:05:31.000 --> 00:05:40.000
z converges, so the sample average converges in probability to the expectation of z when
00:05:40.000 --> 00:05:52.000
the number of observation goes to infinity. So we know that the plim of the sample average
00:05:52.000 --> 00:06:00.000
is equal to the expectation of z if the sample average is the average over many, many, many
00:06:00.000 --> 00:06:07.000
independently and identically distributed random variables x. Now as I already mentioned
00:06:07.000 --> 00:06:15.000
in the probability section of this lecture there are weaker forms of the law of large
00:06:15.000 --> 00:06:23.000
numbers which do not necessarily assume that the zi random variables here are independently
00:06:23.000 --> 00:06:28.000
and identically distributed. And actually we will need such a weaker form of the law
00:06:28.000 --> 00:06:37.000
of large numbers. These weaker forms of the law of large numbers allow for certain
00:06:37.000 --> 00:06:46.000
forms of weak dependence between the zi's. And what exactly a weak dependence is is unfortunately
00:06:46.000 --> 00:06:54.000
beyond the scope of this lecture so I will not define this. These are technical conditions
00:06:54.000 --> 00:07:05.000
which shall imply that the correlation between zi and zi minus some value n converges to
00:07:05.000 --> 00:07:13.000
zero if n goes to infinity. So the correlation shall be practically zero when the observations
00:07:13.000 --> 00:07:22.000
zi and zi minus n are far apart, sufficiently far apart. So this is some kind of intuition
00:07:22.000 --> 00:07:28.000
for what is meant by weak dependence but the technical conditions are more complicated,
00:07:28.000 --> 00:07:33.000
fairly more complicated actually and so I will not bother to give them to you in detail. You find
00:07:33.000 --> 00:07:41.000
them in advanced statistics textbooks. Just believe it please that if proper conditions
00:07:41.000 --> 00:07:49.000
for weak dependence are stated then the law of large numbers still holds. So assuming independently
00:07:49.000 --> 00:07:56.000
and identically distributed random variable z is safe but it is stronger than we actually
00:07:56.000 --> 00:07:59.000
need in order to have the property of the law of large numbers.
00:08:04.000 --> 00:08:10.000
And I'll just state it assuming that weak dependence is at least on an intuitive level
00:08:10.000 --> 00:08:17.000
known to you and as I say it can be technically defined. So a weak law of large numbers would
00:08:17.000 --> 00:08:23.000
state that under a proper condition of weak dependence the sample average z and bar of
00:08:23.000 --> 00:08:30.000
a series of random variable zi with the same expect value converges in probability to the
00:08:30.000 --> 00:08:38.000
expectation of the zi's. So again we would have the plim of z and bar is equal to the expectation
00:08:38.000 --> 00:08:46.000
which I here denote by mu z. Note for instance that in this formulation of the weak law of large
00:08:46.000 --> 00:08:54.000
numbers I have not assumed that the zi have identical distribution. I weaken this assumption
00:08:54.000 --> 00:09:00.000
of identical distribution which we had in the standard form of the law of large numbers to the
00:09:01.000 --> 00:09:08.000
condition that the expectations are all the same. So it says here with the same expected value
00:09:08.000 --> 00:09:14.000
for all the zi's but while the expected value can be the same for all the variables obviously
00:09:14.000 --> 00:09:21.000
the variances can for instance be different and I also do not assume anymore that the zi's are
00:09:21.000 --> 00:09:28.000
all independent. Some type of dependence is allowed but the dependence must be some type of weak
00:09:28.000 --> 00:09:33.000
dependence and then we would still have the weak law of large numbers.
00:09:35.000 --> 00:09:43.000
Now you remember assumption a2 which was the strict exogeneity assumption stating that the
00:09:43.000 --> 00:09:52.000
expectation of u given x is equal to the expectation of u. This assumption essentially
00:09:52.000 --> 00:10:07.000
says that all values in the matrix x are in some sense exogenous to those values of u here so that
00:10:07.000 --> 00:10:18.000
actually it is implied that the correlation of u and x is zero even for values of u and x which
00:10:18.000 --> 00:10:23.000
come from different observations or different time periods possibly even from time periods which
00:10:23.000 --> 00:10:30.000
are very far apart or observations which have very little to do with each other we would still
00:10:31.000 --> 00:10:41.000
suppose that the correlation is zero. This is actually more than we need to assume.
00:10:41.000 --> 00:10:51.000
For consistency as a property of the least squares estimator we it suffices to assume something which
00:10:51.000 --> 00:11:00.000
is much weaker than this assumption here and one such assumption is assumption a6.1 which says
00:11:00.000 --> 00:11:10.000
that for each observation i the expectation of ui given xi is equal to the expectation of ui.
00:11:11.000 --> 00:11:19.000
This assumption is please make this clear to you much much weaker than this assumption here
00:11:20.000 --> 00:11:27.000
because for instance we would not assume that the expectation of ui given some xj
00:11:28.000 --> 00:11:37.000
with j different from i that this is also the expectation of ui rather it could be that xj
00:11:38.000 --> 00:11:48.000
really contributes information to ui for instance i think it is best imagined with time series data
00:11:49.000 --> 00:12:01.000
if we have ut here so some disturbance for periods t and here's x t minus one we would not assume
00:12:01.000 --> 00:12:11.000
with assumption a6.1 that the expectation of ut given x t minus one is equal to zero so equal to
00:12:11.000 --> 00:12:21.000
the expectation of ut because x t may contain important information about what type of shock
00:12:21.000 --> 00:12:28.000
or error we expect in the next period so it may well be the case that the regressor actually
00:12:28.000 --> 00:12:37.000
contains information about the next shock in empirical data we would not assume that this is
00:12:37.000 --> 00:12:45.000
the case that ut given x t minus one or x t minus k any lag value of x t for instance is equal to
00:12:45.000 --> 00:12:52.000
the expectation of x t we do assume this in this assumption here so this is why this assumption
00:12:52.000 --> 00:13:01.000
here is much much stronger than this assumption here this assumption in a6.1 relates just to
00:13:01.000 --> 00:13:07.000
x's and u's of the same observation and does not say anything about the relationship between
00:13:07.000 --> 00:13:16.000
x's and u's of different observations whereas here we have the complete exogeneity of u with respect
00:13:16.000 --> 00:13:27.000
to x or of x with respect to u so here all relationships between u's and x's regardless
00:13:27.000 --> 00:13:32.000
of whether they come from the same observation or from different observations have to satisfy
00:13:32.000 --> 00:13:38.000
this property and therefore this assumption here is so much stronger than assumption a6.1
00:13:38.000 --> 00:13:46.000
why we call this assumption a2 the strict exogeneity assumption we say of assumption
00:13:46.000 --> 00:13:59.000
6.1 that the regressors xi are predetermined so xi does not give us any knowledge on ui because
00:13:59.000 --> 00:14:07.000
xi has already been determined before we knew about ui before ui was revealed this is why we
00:14:07.000 --> 00:14:11.000
call xi then predetermined rather than strictly exogenous
00:14:15.000 --> 00:14:23.000
now if we have this assumption 6.1 and if we impose like we always actually do the additional
00:14:23.000 --> 00:14:28.000
assumption that the expectation of ui is equal to zero so a1 will actually be an assumption which
00:14:28.000 --> 00:14:34.000
we always make that an error term has a zero expectation which is a completely innocuous
00:14:34.000 --> 00:14:42.000
assumption if this is true that a6.1 holds and we also have a1 then this applies a weaker
00:14:42.000 --> 00:14:50.000
assumption than predeterminedness and i call this assumption a6.2 for each observation i the
00:14:50.000 --> 00:14:59.000
expectation of xi times ui which is of course equal to the covariance of xi and ui this expectation
00:14:59.000 --> 00:15:10.000
is zero so assumption a6.1 implies assumption 6.2 which means that assumption a6.2 is weaker
00:15:10.000 --> 00:15:20.000
than assumption a6.1 okay conversely both assumption 6.1 and 6.2 are of course implied
00:15:21.000 --> 00:15:29.000
assuming again that a1 holds by assumption a6.3 which would state that for each observation i xi
00:15:29.000 --> 00:15:40.000
and ui are independent assumption a6.3 is still weaker than assumption a2 right because here again
00:15:40.000 --> 00:15:47.000
we just have an assumption concerning x's and u's from the same observation i if these are
00:15:47.000 --> 00:15:55.000
independent this does not yet say that xi and uj or u ui and xj are independent for j being different
00:15:55.000 --> 00:16:01.000
from from i so assumption 6.3 is still an assumption which is much weaker than the strict
00:16:01.000 --> 00:16:11.000
exogeneity assumption of a2 put in the right order we would say a2 is much stronger than assumption
00:16:11.000 --> 00:16:20.000
6.3 even though a2 does not exactly imply assumption 6.3 but it makes a lot of other
00:16:20.000 --> 00:16:29.000
implications which assumption 6.3 does not make and assumption 6.3 implies 6.1 and 6.2
00:16:30.000 --> 00:16:37.000
and assumption 6.1 implies assumption 6.2 for all of this always suppose that assumption a1 holds
00:16:39.000 --> 00:16:45.000
okay so these are different sets of assumptions which are used in the econometrics literature
00:16:45.000 --> 00:16:53.000
to prove the properties of particular estimators and we will mostly make use of assumption
00:16:53.000 --> 00:17:01.000
6.1 noting that assumption 6.1 actually also implies implies assumption 6.2
00:17:03.000 --> 00:17:08.000
very rarely we will also make use of assumption 6.3 the stronger assumption but mostly this is not
00:17:09.000 --> 00:17:22.000
necessary in some sense these assumptions 6.1 6.2 6.3 and by contrast a2 make different modeling
00:17:22.000 --> 00:17:30.000
choices about how disconnected the regressors are from the error terms and as i have said
00:17:31.000 --> 00:17:39.000
the weakest assumption here is a 6.2 so this would allow the greatest degree of dependence
00:17:39.000 --> 00:17:47.000
between xi and ui since it only stipulates that the covariance between xi and ui is zero
00:17:47.000 --> 00:17:54.000
but i have given you in the review of probability examples of random variables where the covariance
00:17:54.000 --> 00:18:00.000
between two variables was zero but the variables were not independent right they were big dependent
00:18:03.000 --> 00:18:08.000
now after having stated these assumptions we may ask what do we actually need to ensure consistency
00:18:09.000 --> 00:18:15.000
so what do we need to ensure that the plim of beta hat the plim of the least squares estimator
00:18:15.000 --> 00:18:24.000
is equal to beta well this is now relatively easy to to answer this question because we know from
00:18:24.000 --> 00:18:28.000
the representation which i already discussed at the beginning of this lecture that beta hat is
00:18:28.000 --> 00:18:34.000
equal to beta plus this matrix product here which is actually x prime x inverse times x prime u
00:18:35.000 --> 00:18:42.000
written just in a different way so it would suffice it would be sufficient not necessary
00:18:42.000 --> 00:18:49.000
but sufficient for consistency to have the property that this thing here converges to
00:18:49.000 --> 00:18:58.000
some finite matrix and this matrix here converges to zero obviously the converse would also be true
00:18:59.000 --> 00:19:05.000
but since this here is involves quadratic products it's not very likely that this goes to zero
00:19:05.000 --> 00:19:13.000
and quadratic products are typically non-negative mostly positive so it's not to be expected that
00:19:13.000 --> 00:19:19.000
this thing here goes to zero but here are error terms and error terms have an expectation of zero
00:19:19.000 --> 00:19:29.000
so there is some chance of this thing approaching asymptotically a value of zero so if we think of
00:19:29.000 --> 00:19:35.000
this term here as approaching zero then it would suffice to have the property that x prime x
00:19:35.000 --> 00:19:45.000
inverse c is a finite matrix what would i would still need the assumption of a finite and actually
00:19:45.000 --> 00:19:53.000
invertible matrix because if this matrix here were to converge towards something infinite
00:19:54.000 --> 00:20:02.000
then obviously the consistency is not ensured if this were to converge to something infinite
00:20:02.000 --> 00:20:06.000
then we would have the product of a term which becomes infinite asymptotically with a product
00:20:06.000 --> 00:20:13.000
which with a term which becomes zero asymptotically and this is not clear where this would go to
00:20:14.000 --> 00:20:22.000
where this would convert to so the sufficient condition is that this matrix here is finite
00:20:22.000 --> 00:20:28.000
this inverse matrix here is finite or that the matrix in with in parentheses is finite and
00:20:29.000 --> 00:20:35.000
invertible and then with with the additional assumption that this matrix here converges to zero
00:20:35.000 --> 00:20:40.000
in the plimps sense we would have established the property of consistency
00:20:41.000 --> 00:20:47.000
well in here oh yeah perhaps i still before i go to assumptions a7 and a8 i should perhaps
00:20:47.000 --> 00:20:55.000
explain why it may be reasonable to assume that this matrix here is finite or stays finite when
00:20:55.000 --> 00:21:02.000
a number of observation goes to infinity and see first forget about the factor one over n here
00:21:02.000 --> 00:21:10.000
then we have here a sum over a product of two vectors or one vector multiplied with each with
00:21:10.000 --> 00:21:18.000
his own with its own transpose so x i x i prime this vector product here is some type of quadratic
00:21:18.000 --> 00:21:25.000
product so it usually will have positive entries each single matrix for component i here
00:21:25.000 --> 00:21:35.000
will have positive entries so if i sum such observations and let the number of components
00:21:35.000 --> 00:21:43.000
which i add up go to infinity then i would expect actually that the sum also goes to infinity
00:21:44.000 --> 00:21:53.000
so for this reason if we just look at the sum of the x i x i primes here and let n go to infinity
00:21:53.000 --> 00:21:58.000
we would have all reason to expect that this matrix here converges to something infinite
00:22:01.000 --> 00:22:10.000
this is why we divide by one over n why we look at the sample mean so with the vector one over n
00:22:10.000 --> 00:22:17.000
in front of the sum we know that we always have a sample mean here and the assumption would just
00:22:17.000 --> 00:22:25.000
be that by adding a new observation the sample mean would be approximately still the same would
00:22:25.000 --> 00:22:30.000
not change by much the sample mean would not with increasing observations become greater and greater
00:22:30.000 --> 00:22:37.000
but rather the sample mean would stay somewhere in the neighborhood of well the expectation of x i
00:22:37.000 --> 00:22:44.000
essentially that is quite a reasonable assumption actually that sample mean does not diverge that
00:22:44.000 --> 00:22:52.000
the sample mean does not converge to infinity but rather goes to some finite number and therefore
00:22:52.000 --> 00:22:59.000
we make this assumption here that the plim of the sample mean of the x i x i primes with n going to
00:22:59.000 --> 00:23:07.000
infinity so more terms being added to the sum divided by greater number n here so these two
00:23:07.000 --> 00:23:13.000
developments compensate each other or cancel each other hopefully that the plim of the sample mean
00:23:13.000 --> 00:23:20.000
is some matrix which i here call sigma x x and sigma x x shall be any finite and invertible
00:23:20.000 --> 00:23:28.000
matrix okay i need of course the invertibility of the matrix sigma x x because sigma x x is the
00:23:28.000 --> 00:23:35.000
sample mean of the term in parentheses here and i have to take the inverse of this thing so
00:23:35.000 --> 00:23:41.000
invertibility is something which at least is helpful i will later i'll give you an example
00:23:41.000 --> 00:23:46.000
where we will not have invertibility of this matrix here that's actually fairly relevant
00:23:48.000 --> 00:23:52.000
example which i will give you but currently we make our lives easier by assuming in assumption
00:23:52.000 --> 00:24:00.000
a7 that sigma x x is not only finite but also invertible so if the matrix is finite and invertible
00:24:00.000 --> 00:24:06.000
then the inverse of the matrix is also finite okay um then the second assumption is assumption
00:24:06.000 --> 00:24:14.000
a8 which concerns this term here which says that the plim of the sum of x i's and u i's
00:24:14.000 --> 00:24:21.000
is the expectation of x i times u i and that this expectation is zero okay
00:24:23.000 --> 00:24:30.000
that is also quite plausible under certain assumptions because assumption a8 is actually
00:24:30.000 --> 00:24:38.000
implied by again assumption a1 that u has an expected value of zero and assumption
00:24:38.000 --> 00:24:47.000
six two along with a weak law of large numbers so assumption a8 is not strictly speaking a new
00:24:47.000 --> 00:24:55.000
assumption but rather just a summary of assumption a1 a62 and the property of a weak law of large
00:24:55.000 --> 00:25:01.000
numbers so assuming that the conditions for this weak type of dependence which we need and the
00:25:01.000 --> 00:25:10.000
weak law of large numbers that this is given that this holds i will come back to this issue again and
00:25:10.000 --> 00:25:15.000
explain to you why we need a weak law of large numbers and why we can't use standard law of large
00:25:15.000 --> 00:25:21.000
numbers just in a minute but for the time being just note that if the weak law of large numbers
00:25:21.000 --> 00:25:27.000
holds then assumptions a1 and a62 already imply assumption a8 so assumption a8 is actually not
00:25:28.000 --> 00:25:37.000
a new assumption so another weak assumption then we would have that the plim of x prime u is zero
00:25:37.000 --> 00:25:43.000
if regressive error terms are contemporaneously uncorrelated so if it is true that the expectation
00:25:43.000 --> 00:25:49.000
of x i and u i are the product of x i and u i is equal to zero for all i
00:25:49.000 --> 00:26:00.000
um let me pause here for a minute are there any questions um for what i have presented
00:26:07.000 --> 00:26:12.000
please uh raise your hand wave to me if you want to pose a question and don't
00:26:12.000 --> 00:26:16.000
have the time to do it right away or need some time to type it in
00:26:16.000 --> 00:26:26.000
but i don't see any sign here okay now i hope that this is clear and we will move on with some
00:26:26.000 --> 00:26:33.000
remarks on what i have just presented uh the first thing i have already said assumption a2 postulates
00:26:33.000 --> 00:26:39.000
that the complete regressor matrix x is exogenous strictly exogenous actually and that is therefore
00:26:39.000 --> 00:26:45.000
uncorrelated with a complete u vector now these are relationships between the complete set of
00:26:45.000 --> 00:26:52.000
regresses and the complete set of disturbances which we assume in the assumption a2 but now we
00:26:52.000 --> 00:26:59.000
replace it by the much weaker requirement of assumption 6.2 that only x i and u i so only
00:26:59.000 --> 00:27:10.000
the terms of the same observation are uncorrelated for each i for j not equal to i x i and u j may be
00:27:10.000 --> 00:27:16.000
correlated right so very clearly a much weaker assumption than assumption a2 and as i have
00:27:16.000 --> 00:27:23.000
already said this is particularly important for time series data because shocks in t are very
00:27:23.000 --> 00:27:30.000
likely to be correlated with variables in period t plus one if i have a shock in period t like say
00:27:31.000 --> 00:27:38.000
corona pandemic coming up then it is very possible that my GDP my investment my consumption in the
00:27:38.000 --> 00:27:42.000
next period are affected by the shock so very clearly there's a correlation between the shock
00:27:42.000 --> 00:27:48.000
and period t and the variables in period t plus one and it would be a strong assumption to assume
00:27:48.000 --> 00:27:54.000
that they are independent of each other or that the regressors that GDP consumption investment
00:27:54.000 --> 00:27:59.000
is exogenous and not affected by the corona virus shock so completely an assumption which
00:27:59.000 --> 00:28:10.000
one cannot justify in this particular context this is why the concept of unbiased estimators
00:28:10.000 --> 00:28:16.000
is not so particularly helpful in the type series context but rather the most we can hope for when
00:28:16.000 --> 00:28:24.000
we estimate an equation by OLS is consistency and in order to have consistency it is important that
00:28:24.000 --> 00:28:33.000
we always check that the shocks in period t do not correlate with regresses and with regresses
00:28:33.000 --> 00:28:40.000
of period t so with xt and i told you already that even this assumption is not innocuous and that
00:28:40.000 --> 00:28:48.000
easily there may be settings or specifications of regression equations which would have this
00:28:48.000 --> 00:28:55.000
undesirable property that shocks in ut are that shocks and purity are likely to correlate with
00:28:55.000 --> 00:29:03.000
the regressors in period t i gave you the example of the Keynesian cross of the Keynesian multiplier
00:29:03.000 --> 00:29:12.000
where you saw that the regressor y income in a simple Keynesian type consumption function
00:29:13.000 --> 00:29:19.000
correlated with the disturbance for consumption in this particular and in each particular period
00:29:19.000 --> 00:29:24.000
so in this model one would have to resort to different estimation techniques in order to
00:29:24.000 --> 00:29:28.000
estimate the type of consumption time but we will come to this later when we talk about
00:29:28.000 --> 00:29:39.000
instrumental variables it is however not only the time series context where the correlation between
00:29:39.000 --> 00:29:46.000
shocks and the contemporaneous regressors may pop up but rather we may also have this in settings
00:29:46.000 --> 00:29:52.000
where we have measurement errors in the explanatory variables and this setting we will also study and
00:29:52.000 --> 00:30:01.000
see what we can do about that now as i have already explained we have the consistency of
00:30:01.000 --> 00:30:09.000
the least squares estimator under assumptions a7 and a8 in addition to 6.2 because the plim of
00:30:09.000 --> 00:30:16.000
beta hat would be beta plus the plim of this product here and you know that product terms can be
00:30:17.000 --> 00:30:25.000
separated in the plim so this would be equal to the plim of this inverse here times the plim
00:30:25.000 --> 00:30:30.000
of this matrix here and we know the plim of the inverse is sigma xx inverse which is a finite
00:30:30.000 --> 00:30:38.000
matrix times zero which is the plim of this term here so this thing cancels and we are just left
00:30:38.000 --> 00:30:46.000
beta so we have proven that beta hat is consistent for beta where you always have to remember
00:30:46.000 --> 00:30:54.000
consistency does not mean unbiasedness in finite samples consistent estimators are often biased
00:30:55.000 --> 00:30:59.000
but the bias becomes smaller and smaller the more observations we have the closer we come
00:31:00.000 --> 00:31:06.000
to infinity with the number of observations fortunately um infinity i mean you know is a
00:31:06.000 --> 00:31:13.000
very great number fortunately we can often invoke asymptotic properties already for
00:31:13.000 --> 00:31:20.000
rather moderate sample sizes so as you know the t distribution for instance converges to
00:31:20.000 --> 00:31:28.000
the normal distribution already safely for 25 or 30 observations for estimates
00:31:28.000 --> 00:31:34.000
convergence speed is perhaps not quite as fast as for the t distribution but if you have 100 or 200
00:31:34.000 --> 00:31:43.000
observations you are often already quite close to the asymptotic of the asymptotic distribution or
00:31:44.000 --> 00:31:49.000
quite close to the asymptotic estimator so that you can invoke asymptotics and say well while
00:31:49.000 --> 00:31:56.000
there may still be a small bias in my estimate this bias is likely to be small because i have
00:31:56.000 --> 00:32:01.000
fairly high amount of observations and i know that the estimator is consistent
00:32:05.000 --> 00:32:11.000
i would encourage you to try out what i have just said by experimenting with artificial data in this
00:32:11.000 --> 00:32:18.000
exercise so once again and the way in which i have already explained many times generate in MATLAB or
00:32:18.000 --> 00:32:29.000
any other matrix program a matrix program a matrix x of say three regressors and 100 observations
00:32:29.000 --> 00:32:38.000
and then compute x prime x in this way here and then add another 100 observations to the matrix
00:32:38.000 --> 00:32:45.000
x so the first 100 observations you leave unaltered by but you generate another observations
00:32:45.000 --> 00:32:52.000
and then you compute this term here one over 200 and the sum over 200 terms xi xi prime
00:32:52.000 --> 00:33:01.000
and then continue like this many times and observe whether one over n times this sum here
00:33:02.000 --> 00:33:09.000
converges to a finite non-singular matrix you can just print out the matrices look at the matrices
00:33:09.000 --> 00:33:15.000
and see whether there is movement in the matrix in the sense that there are bigger changes or
00:33:15.000 --> 00:33:24.000
constant growth in the components of the matrix as you add further observations or if the matrix
00:33:24.000 --> 00:33:30.000
actually converges to something which becomes more or less constant over time so that we would
00:33:30.000 --> 00:33:35.000
expect to have a finite limiting matrix and then you can also check whether this limiting matrix
00:33:35.000 --> 00:33:38.000
is non-singular and i bet actually you would find that it is
00:33:38.000 --> 00:33:46.000
oh there was a question or comment coming in let me just take that
00:33:48.000 --> 00:33:54.000
where does the zero come from why do we multiply the matrix sum over x x by zero well
00:33:54.000 --> 00:34:01.000
that i had or explained at least two times if not three but let me explain it again this zero here
00:34:01.000 --> 00:34:09.000
comes from the plume of this term right and we have assumed in assumption a8 which i assume
00:34:09.000 --> 00:34:17.000
here right in assumption a8 we have assumed that this term here this was the second term in the
00:34:17.000 --> 00:34:25.000
parentheses on the slide we just started the explanation has a plume of zero so this is
00:34:25.000 --> 00:34:29.000
where the zero comes from i hope this answers your question
00:34:31.000 --> 00:34:41.000
okay now a really important note on the non-singularity property which you may check here
00:34:41.000 --> 00:34:49.000
and which i have assumed here where i said the matrix is invertible this matrix sigma x x
00:34:49.000 --> 00:34:59.000
we now study a case where the sigma x x matrix is not invertible and that is a fairly common
00:34:59.000 --> 00:35:06.000
case suppose we have a regressor matrix which contains a linear trend so suppose the regressor
00:35:06.000 --> 00:35:15.000
matrix x has as its first column say vector of ones like usual constant and as say the last column
00:35:15.000 --> 00:35:20.000
it doesn't depend on being the last column but suppose it is like this as its last column it has
00:35:20.000 --> 00:35:27.000
a linear trend so here we have a vector one two three four five six and so forth up to n that is
00:35:27.000 --> 00:35:32.000
a very common setting actually that you estimate something like this if you have trending data
00:35:32.000 --> 00:35:41.000
that you include a linear trend now observe what happens to x prime x x prime x as we know is this
00:35:41.000 --> 00:35:50.000
sum here sum over x i x i bar and this sum over x i x i bar contains now an entry where
00:35:50.000 --> 00:36:01.000
the first column this column here is multiplied by the last column one over n uh here um no sorry
00:36:01.000 --> 00:36:11.000
this is not the first column um what did i do there um this is just one entry which we have there
00:36:14.000 --> 00:36:16.000
where let me think
00:36:18.000 --> 00:36:26.000
what did i do and where the matrix the vector product of this vector of ones
00:36:26.000 --> 00:36:36.000
with a vector with a linear trend comes into play and summing all these uh trend components
00:36:36.000 --> 00:36:45.000
here gives us the sum over i when i goes from one to n and we know the sum to be n plus n plus one
00:36:45.000 --> 00:36:56.000
over two so that we know that one over n x prime x actually contains a term which is n plus one over
00:36:56.000 --> 00:37:08.000
two because if i take one over n x prime x here then i have n times n plus one divided by n so
00:37:08.000 --> 00:37:20.000
this n here cancels against this n here and we are left with n plus one over two okay so we have
00:37:20.000 --> 00:37:29.000
at least one element in the x prime x matrix here which is not finite and therefore the whole matrix
00:37:29.000 --> 00:37:39.000
sigma x sigma x x is not finite um i should note uh actually um i was a little uh surprised myself
00:37:39.000 --> 00:37:44.000
when i saw my formula here just a minute ago i should perhaps explain again why does this element
00:37:44.000 --> 00:37:50.000
come into existence in this matrix product well think of it in terms of the x prime x
00:37:50.000 --> 00:37:55.000
rather than in terms of the sum even though the sum is the same thing but it's easier to see with
00:37:55.000 --> 00:38:04.000
the x prime x when we look at the x prime x product here then x prime is actually this matrix
00:38:04.000 --> 00:38:10.000
transposed so the first column here is being transposed and will be a vector of ones in the
00:38:10.000 --> 00:38:19.000
first row and this vector of ones in the first row will be at some point uh multiplied with the last
00:38:19.000 --> 00:38:27.000
column of the matrix x so with this column here and this particular entry i have picked out here
00:38:27.000 --> 00:38:35.000
so the first row of the x prime matrix multiplied by the last column of the x matrix gives us the
00:38:35.000 --> 00:38:42.000
sum which is n times n plus one over two and if i divide now x prime x by n if i take the sample
00:38:42.000 --> 00:38:49.000
mean then i have that the n cancels then i have n plus one over two and this goes to infinity
00:38:50.000 --> 00:38:58.000
so sigma x x is not finite and therefore we are confronted with the question and with the problem
00:38:58.000 --> 00:39:04.000
do we actually have the consistency property of the least squares estimator if our regression
00:39:04.000 --> 00:39:16.000
model contains a linear trend see this so matrix a assumption a7 is not satisfied in a model in
00:39:16.000 --> 00:39:21.000
which you include a linear den trend in your regressor matrix because the matrix is not
00:39:21.000 --> 00:39:29.000
invertible right the sigma x x matrix will not be sorry it will not be finite excuse me it will not
00:39:29.000 --> 00:39:35.000
be finite okay invertibility is a different issue but uh sigma x x will not be finite
00:39:36.000 --> 00:39:41.000
so we do not know anymore if the matrix has a linear trend if the regressor matrix has a linear
00:39:41.000 --> 00:39:50.000
trend if this here is true that the product of sigma x x inverse times zero is equal to zero
00:39:50.000 --> 00:40:01.000
because if this matrix sigma x x is not finite then we may have concerns uh that the product of
00:40:01.000 --> 00:40:12.000
sigma x x inverse times zero is uh is infinite or is is not zero it's not zero okay fortunately
00:40:12.000 --> 00:40:18.000
these concerns are uh ungrounded uh there is no substance uh to them oops
00:40:21.000 --> 00:40:24.000
for the following reason i thought i had a slide on this but i will explain it
00:40:26.000 --> 00:40:32.000
the reason is that when we have this setting here with a linear trend in the regressor matrix
00:40:32.000 --> 00:40:39.000
then as we know in sigma x x we have a component n plus one over two which goes to infinity
00:40:41.000 --> 00:40:49.000
so sigma x x goes to infinity this has the implication that in this particular component
00:40:49.000 --> 00:40:54.000
and other components affected by it sigma x x inverse c actually goes to zero
00:40:55.000 --> 00:41:05.000
right because we have here sigma x x inverse not sigma x x if sigma x x has an entry which goes to
00:41:06.000 --> 00:41:14.000
infinity as this term n plus one over two thus then this actually increases the speed of
00:41:14.000 --> 00:41:21.000
convergence by which this term here goes to zero because sigma x x inverse will then be closer to
00:41:21.000 --> 00:41:29.000
zero since by intuition sigma x x inverse is something like one over sigma x x so we divide by
00:41:30.000 --> 00:41:38.000
that's very loosely formulated we divide by the entries of sigma x x and if one of those entries
00:41:38.000 --> 00:41:44.000
is infinite then the corresponding entries of sigma x x inverse will be close to zero
00:41:44.000 --> 00:41:51.000
so they will not be positive numbers but they will be zero or be close to zero so that this
00:41:51.000 --> 00:41:59.000
product will be zero even faster or approach zero even faster than when this matrix here converges
00:41:59.000 --> 00:42:07.000
to a finite matrix with a positive or negative entries but non-zero entries so an advanced
00:42:07.000 --> 00:42:15.000
econometrics it is actually possible to show that this assumption here with a finite sigma x x matrix
00:42:15.000 --> 00:42:22.000
induces a certain speed of convergence towards the true value beta and the speed of convergence is
00:42:22.000 --> 00:42:29.000
the square root of t but if we have a linear trend in the regressor matrix then the speed of
00:42:29.000 --> 00:42:35.000
convergence is even faster than the speed of convergence is t and not only the square root
00:42:35.000 --> 00:42:43.000
of t so the consistency is stronger not weaker than than before so that our concerns are actually
00:42:43.000 --> 00:42:49.000
groundless we do not need to bother about the linear trend in the data matrix but we should be
00:42:49.000 --> 00:42:58.000
aware that certain regressors may actually cause problems which may give rise to violations of
00:42:58.000 --> 00:43:06.000
assumption a7 this assumption here that this is a finite inverted matrix and by just reverting
00:43:06.000 --> 00:43:13.000
the argument which i just brought any matrix would cause trouble any regressor matrix would cause
00:43:13.000 --> 00:43:21.000
trouble which would have the property that one of the columns here added up to zero right so if you
00:43:21.000 --> 00:43:30.000
had a column of one minus one one minus one one minus one then using the first column here
00:43:30.000 --> 00:43:37.000
to sum one minus one one minus one one minus one you would have something which yeah
00:43:37.000 --> 00:43:46.000
comes close to zero every second or is actually zero every second observation and if the strings
00:43:46.000 --> 00:43:53.000
at a certain speed you would actually have the property that an entry becomes zero in the matrix
00:43:53.000 --> 00:44:00.000
and then you could have trouble with the property that the sigma xx matrix is finite the sigma xx
00:44:00.000 --> 00:44:06.000
inverse matrix is finite and then consistency may not be done so the assumption is not as innocuous
00:44:06.000 --> 00:44:14.000
as you possibly may think it is well and the second thing to note i have already mentioned
00:44:14.000 --> 00:44:21.000
so i won't spend much time on that x i and u i may easily be correlated so it is not a an
00:44:21.000 --> 00:44:29.000
assumption which can be made without thinking thoroughly the example of the k-kansan cross
00:44:29.000 --> 00:44:35.000
model which i already mentioned showed to you that we had a correlation between income the regressor
00:44:35.000 --> 00:44:42.000
and the the disturbance ui of sigma square u over one minus c so this was different from zero
00:44:42.000 --> 00:44:48.000
so apparently it is very well possible that this expectation here is non-zero and then consistency
00:44:48.000 --> 00:44:55.000
is always not given but rather it is fairly clear that the estimator is not consistent so not only
00:44:55.000 --> 00:44:59.000
that we don't know if it is consistent but we do know that it is not consistent because the plim
00:44:59.000 --> 00:45:06.000
will be beta plus something non-zero uh well and if you add something non-zero to beta then it is
00:45:06.000 --> 00:45:11.000
different from beta and therefore the consistency is violated the estimator would not be consistent
00:45:12.000 --> 00:45:19.000
any questions before i move to asymptotic normality
00:45:25.000 --> 00:45:34.000
i don't see any so let's continue and go to asymptotic normality um as i said we have not made
00:45:34.000 --> 00:45:41.000
assumption a five so we have not assumed the disturbances to be uh normally distributed we
00:45:41.000 --> 00:45:49.000
know we do not need this assumption for the blue property of the least squares estimator a one two
00:45:49.000 --> 00:45:54.000
a four would be sufficient to prove the blue property of the least squares estimator but we
00:45:54.000 --> 00:45:59.000
do know that we need the normality assumption in order to construct confidence intervals and do
00:45:59.000 --> 00:46:08.000
hypothesis testing in models which otherwise would not need the normality assumption now people tend
00:46:08.000 --> 00:46:16.000
to assume that disturbances are normally distributed just for convenience but actually
00:46:16.000 --> 00:46:23.000
in empirical work you'll all often find that disturbances are not normal so it is important
00:46:23.000 --> 00:46:29.000
to think about what happens to our methods of inference when the disturbances are not
00:46:29.000 --> 00:46:37.000
normal and here i will teach you one method which may apply if the number of observations
00:46:37.000 --> 00:46:45.000
is sufficiently large and namely that you invoke asymptotic arguments again in order to do this we
00:46:45.000 --> 00:46:53.000
first introduced a different convergence concept for random variables we have so far had two
00:46:53.000 --> 00:46:58.000
convergence concepts for random variables namely the plim convergence which we just used for the
00:46:58.000 --> 00:47:06.000
consistency property and we also had convergence and the in the lim sense that the the squared
00:47:07.000 --> 00:47:14.000
deviations from the expected value converts to zero but now we have a third concept here which
00:47:14.000 --> 00:47:21.000
says that a series of random variables xn converges in distribution to a random variable
00:47:21.000 --> 00:47:27.000
so that's different from plim and lim uh convergence a series of random variables
00:47:27.000 --> 00:47:33.000
converges in distribution to a random variable x if the limit of the distribution function
00:47:34.000 --> 00:47:44.000
of n observations f n of x is equal to f of x for every point x of the distribution function so
00:47:44.000 --> 00:47:52.000
for every point x where f is continuous right if n of x is typically not continuous this is a step
00:47:52.000 --> 00:47:57.000
function because we have n observations so we have n steps of the distribution function of the
00:47:57.000 --> 00:48:04.000
empirical distribution function but when the number of observations increases then these steps
00:48:04.000 --> 00:48:10.000
become smaller and smaller and when the number of observation goes to infinity then the function
00:48:10.000 --> 00:48:17.000
may become continuous and what we want to have as a property is that the limit of this distribution
00:48:17.000 --> 00:48:25.000
function of the empirical distribution function for n given observations is some function f of x
00:48:26.000 --> 00:48:29.000
at every point where f of x is continuous
00:48:32.000 --> 00:48:39.000
a shorthand distribution notation for that is that we just write xn converges in distribution
00:48:39.000 --> 00:48:47.000
to x so eventually the random variable xn has the same cumulative distribution function as
00:48:47.000 --> 00:48:54.000
some random variable x and then we may say that this random variable xn has converged to
00:48:54.000 --> 00:49:01.000
x as the numbers of observations approached infinity what i cannot prove to you but what
00:49:01.000 --> 00:49:08.000
you have to believe me is that the plim convergence implies the convergence in distribution
00:49:09.000 --> 00:49:15.000
so convergence in distribution is a weaker concept than plim convergence the reverse is
00:49:15.000 --> 00:49:20.000
not true so the true and the two are not equivalent we would not need defined convergence distribution
00:49:20.000 --> 00:49:24.000
if they were equivalent then this would be the same thing convergence in distribution is truly
00:49:24.000 --> 00:49:29.000
a weaker convergence concept for random variables than plim convergence
00:49:32.000 --> 00:49:38.000
now let's assume that assumption a5 the normality of the disturbances is not
00:49:39.000 --> 00:49:45.000
satisfied i had introduced you already to the central limit theorem which again i repeat here
00:49:45.000 --> 00:49:52.000
for your convenience you know if a random variable epsilon i is independently and identically
00:49:52.000 --> 00:50:00.000
distributed with mean zero and variance sigma square for all i's then we know that one over
00:50:00.000 --> 00:50:09.000
the square root of n times the sum of all the epsilon i's converges in distribution against a
00:50:09.000 --> 00:50:16.000
normal distribution so a normal distribution with expected value zero and variance and variance
00:50:16.000 --> 00:50:24.000
sigma square this is as i have already pointed out when i discussed the center limit earlier
00:50:24.000 --> 00:50:30.000
very important theorem because we have made no distribution assumptions on the epsilon i's
00:50:30.000 --> 00:50:36.000
they may have just any distribution and nevertheless one over n times the square times
00:50:36.000 --> 00:50:42.000
the sum of those epsilon i's converges to a normal distribution so we generate the normal
00:50:42.000 --> 00:50:47.000
distribution basically out of nothing without making any distribution assumptions at all we
00:50:47.000 --> 00:50:53.000
may actually assume very weird distributions for the epsilon i's then typically it would take
00:50:53.000 --> 00:50:59.000
longer to have the convergence result so we need greater number for for n to have satisfactory
00:50:59.000 --> 00:51:08.000
convergence but eventually for any distribution how weird it may be we would find that the this
00:51:08.000 --> 00:51:14.000
expression here the sum over the epsilon i's epsilon i's divided by square root of n converges
00:51:14.000 --> 00:51:19.000
to understand that normal to a normal distribution not a standard normal distribution to a normal
00:51:19.000 --> 00:51:27.000
distribution with variance sigma square and expected value zero now i'll give you an exercise
00:51:27.000 --> 00:51:34.000
here if you would like to have at least a partial proof of the central limit theorem so
00:51:34.000 --> 00:51:42.000
try to prove that the expectation of this term which converges to a normal distribution
00:51:42.000 --> 00:51:48.000
is equal to zero and that the variance of this term converges to sigma square right so this is
00:51:48.000 --> 00:51:53.000
a and b these are the easy parts of the central limit theorem you would have proven that this is
00:51:53.000 --> 00:51:59.000
the limit for the expectation this is the limit for the variance what you cannot prove so easily
00:51:59.000 --> 00:52:04.000
is that this is then actually a normal distribution so as you have to believe me
00:52:06.000 --> 00:52:08.000
but you can do the other parts of the proof
00:52:10.000 --> 00:52:16.000
now how does this help us well let us look at the distribution of the least squares estimator
00:52:17.000 --> 00:52:25.000
we know the least squares estimator has an expected value of beta or has asymptotically
00:52:25.000 --> 00:52:32.000
an expected value of beta i should actually write here asymptotically this is not good
00:52:32.000 --> 00:52:38.000
because this reads like it has always an expected value of beta but that's only true in the limit
00:52:38.000 --> 00:52:46.000
of course if we are not assuming a one two a four so i would change this here and write a little
00:52:46.000 --> 00:52:55.000
asymptotic underneath this tilde asymptotically beta hat has an expected has an expectation of
00:52:55.000 --> 00:53:03.000
beta and we know that the covariance matrix of beta hat is equal to sigma square u x prime x inverse
00:53:03.000 --> 00:53:12.000
so therefore beta hat minus beta is asymptotically distributed as a random variable with expectation
00:53:12.000 --> 00:53:20.000
zero and the same covariance matrix but we know this covariance matrix here involves our term
00:53:20.000 --> 00:53:31.000
x prime x inverse and we can then write one over n x prime x inverse and divide by one over n here
00:53:31.000 --> 00:53:37.000
multiply by one over n here so that the two n factors cancel each other right because this one
00:53:37.000 --> 00:53:45.000
over n is taken to the inverse so it's actually n and then divided by n it cancels the two n's cancel
00:53:45.000 --> 00:53:52.000
so it's the same thing as here but now we know that asymptotically of course this thing here
00:53:52.000 --> 00:54:02.000
converges to sigma x x inverse which is finite and since one over n is going to zero we would
00:54:02.000 --> 00:54:08.000
have the product of something which is zero asymptotically and something which is finite
00:54:08.000 --> 00:54:16.000
asymptotically so we would know that the whole term here actually converges to zero
00:54:16.000 --> 00:54:25.000
so the asymptotic variance of beta hat minus beta is zero and this means that there is no
00:54:25.000 --> 00:54:30.000
distribution asymptotically right the distribution shrinks and shrinks and shrinks in the sense that
00:54:30.000 --> 00:54:36.000
the variance becomes smaller and smaller and smaller and asymptotically the distribution
00:54:36.000 --> 00:54:43.000
is just a single point this is what i mean by degenerate distribution the distribution would
00:54:43.000 --> 00:54:52.000
just be a single point zero okay is this something welcome not really i mean it tells us something
00:54:52.000 --> 00:54:58.000
about the speed of convergence by which beta hat converges to beta and it is fine that we have this
00:54:58.000 --> 00:55:06.000
convergence but we can we will never find a distribution of beta hat which has shrunk
00:55:06.000 --> 00:55:10.000
to a single point because we never will have infinitely many observations
00:55:11.000 --> 00:55:19.000
and so in that sense the result is not really useful we know that we will always have some
00:55:19.000 --> 00:55:27.000
distribution of this of this random variable beta hat minus beta here and we would like to be able
00:55:27.000 --> 00:55:36.000
to carry out significance tests or any other types of tests so we would like to