WEBVTT - autoGenerated
00:00:00.000 --> 00:00:16.000
Thank you.
00:00:30.000 --> 00:00:40.000
Welcome, everybody, to today's lecture on estimation and inference.
00:00:40.000 --> 00:00:46.000
We are about to start, as you see, with hypothesis testing.
00:00:46.000 --> 00:00:51.000
Much of this will be known to you already from the review of statistics that we have
00:00:51.000 --> 00:00:52.000
done through.
00:00:52.000 --> 00:01:00.000
So today's lecture is more repetitive than perhaps former lectures.
00:01:00.000 --> 00:01:04.000
But before I start with that, let me ask you if there are questions.
00:01:04.000 --> 00:01:07.000
Yes, I see a question.
00:01:07.000 --> 00:01:19.000
Please write it down.
00:01:19.000 --> 00:01:35.000
Okay, that's an acute A. Use the chat, please.
00:01:35.000 --> 00:01:43.000
On slide 61, in the proof, why can't we just follow that all other eigenvalues must be
00:01:43.000 --> 00:01:45.000
zero?
00:01:45.000 --> 00:01:49.000
I'm afraid you may have typed a little too fast.
00:01:49.000 --> 00:01:59.000
Perhaps you can retype your question because it's not quite clear to me what you want to
00:01:59.000 --> 00:02:08.000
ask me.
00:02:08.000 --> 00:02:11.000
But let me go to page 61 already.
00:02:11.000 --> 00:02:23.000
Perhaps I do sense after reading it for the second time what you are about to ask.
00:02:23.000 --> 00:02:30.000
So this is the proof I have been explaining yesterday.
00:02:30.000 --> 00:02:36.000
And I said that A has as many eigenvalues equal to 1 as there are linearly independent
00:02:36.000 --> 00:02:40.000
columns of A. So this, I think, is clear.
00:02:40.000 --> 00:02:43.000
And now I think I do understand the question.
00:02:43.000 --> 00:02:47.000
The question is, why can we say that the other eigenvalues are zero?
00:02:47.000 --> 00:02:55.000
Well, the reason is the number of zero eigenvalues is always equal to the rank deficiency of
00:02:55.000 --> 00:02:57.000
a matrix.
00:02:57.000 --> 00:03:05.000
So for instance, if a matrix which is 4 by 4 has rank 3, then we know that there is one
00:03:05.000 --> 00:03:06.000
zero eigenvalue.
00:03:06.000 --> 00:03:13.000
If a matrix which is 4 by 4 has rank 2, then we know there are two zero eigenvalues.
00:03:13.000 --> 00:03:20.000
So always the dimension of the matrix minus the number of linearly independent columns
00:03:20.000 --> 00:03:23.000
in this matrix or rows doesn't play a role.
00:03:23.000 --> 00:03:29.000
If we talk about a quadratic matrix, let's say the number of linearly independent columns
00:03:29.000 --> 00:03:35.000
subtracted from the dimension of the matrix gives us exactly the number of zero eigenvalues
00:03:35.000 --> 00:03:36.000
for this matrix.
00:03:36.000 --> 00:03:43.000
And this is why we can conclude that all the other eigenvalues are zero.
00:03:43.000 --> 00:03:49.000
So the proof after we derived that there are eigenvalues with value 1, why can we follow
00:03:49.000 --> 00:03:52.000
that the other eigenvalues must be zero?
00:03:52.000 --> 00:03:53.000
You mean non-zero?
00:03:53.000 --> 00:03:56.000
No, I mean that the other eigenvalues must be zero.
00:03:56.000 --> 00:04:01.000
I think, actually, I have explained it now.
00:04:01.000 --> 00:04:04.000
The other eigenvalues must be zero.
00:04:04.000 --> 00:04:07.000
OK, so that's the answer I have.
00:04:07.000 --> 00:04:08.000
All right.
00:04:08.000 --> 00:04:10.000
Thank you very much.
00:04:10.000 --> 00:04:11.000
Good.
00:04:11.000 --> 00:04:12.000
Any other questions?
00:04:12.000 --> 00:04:24.000
I do not see any other questions.
00:04:24.000 --> 00:04:29.000
So then let's start with hypothesis testing.
00:04:29.000 --> 00:04:32.000
We will make a new assumption now.
00:04:32.000 --> 00:04:38.000
So far we have just had the four Gauss Markov assumptions.
00:04:38.000 --> 00:04:46.000
The one assumption was a really strong assumption, namely the assumption that the X matrix is
00:04:46.000 --> 00:04:58.000
strictly exogenous, so that the expectation of U of the shocks given X is equal to the
00:04:58.000 --> 00:05:04.000
unconditional expectation of U. This was the assumption of A2 strict exogeneity.
00:05:04.000 --> 00:05:08.000
We needed this assumption to establish the blue property.
00:05:08.000 --> 00:05:14.000
We will today actually boil down this assumption and make much weaker assumptions than assumption
00:05:14.000 --> 00:05:21.000
A2 at the price of not being able to prove anymore that the least squares estimator is
00:05:21.000 --> 00:05:24.000
unbiased.
00:05:24.000 --> 00:05:30.000
But we will then replace the property of an unbiased estimator by a weaker requirement,
00:05:31.000 --> 00:05:34.000
namely the requirement of being consistent.
00:05:34.000 --> 00:05:40.000
So we would find a consistent estimator.
00:05:40.000 --> 00:05:44.000
Anyway, we have had those four assumptions, the four Gauss Markov assumptions.
00:05:44.000 --> 00:05:51.000
And they allowed us to conclude that the least squares estimator is blue, is the best linear
00:05:51.000 --> 00:05:59.000
unbiased estimator or the best unbiased estimator in the class of linear estimators.
00:05:59.000 --> 00:06:09.000
And these four assumptions have not included any assumption about the distribution of the
00:06:09.000 --> 00:06:10.000
shocks.
00:06:10.000 --> 00:06:18.000
So the errors were errors, but we did not make any more assumptions about the errors
00:06:18.000 --> 00:06:25.000
than just the fact that they have an expectation of zero and that they are uncorrelated with
00:06:25.000 --> 00:06:26.000
each other.
00:06:26.000 --> 00:06:31.000
We did not assume what kind of distribution they follow.
00:06:31.000 --> 00:06:38.000
This is very important actually that you understand this well, because this is a very strong
00:06:38.000 --> 00:06:47.000
property, positive property of the least squares estimator, that it does not rely on any assumption
00:06:47.000 --> 00:06:49.000
about the distribution of the errors.
00:06:49.000 --> 00:06:54.000
So we have in particular not assumed that the errors are normally distributed or follow
00:06:54.000 --> 00:07:01.000
any other type of distribution, rather the result that the least squares estimator is
00:07:01.000 --> 00:07:09.000
blue under assumptions A1 to A4 holds for any given distribution of the errors, regardless
00:07:09.000 --> 00:07:11.000
what the distribution may be.
00:07:11.000 --> 00:07:20.000
So this makes the least squares estimator applicable in many, many different settings.
00:07:20.000 --> 00:07:27.000
And actually, one could say that the least squares estimator is a non-parametric estimator,
00:07:27.000 --> 00:07:35.000
because the term non-parametric is often used when estimators are established without making
00:07:35.000 --> 00:07:37.000
assumptions about the distribution of the error.
00:07:37.000 --> 00:07:48.000
So in that sense, the old S estimator here, called the least squares estimator, is a non-parametric
00:07:48.000 --> 00:07:49.000
estimator.
00:07:49.000 --> 00:07:53.000
It does not make any distributional assumptions.
00:07:53.000 --> 00:08:00.000
This is very remarkable that we can establish such a result without assuming that distributions
00:08:00.000 --> 00:08:06.000
follow a particular error, because we could think of very, very strange distributions
00:08:06.000 --> 00:08:14.000
of the error, and still least squares estimator would be the best linear unbiased estimator.
00:08:14.000 --> 00:08:19.000
So we can use it in any type of error distribution we may just think of.
00:08:19.000 --> 00:08:22.000
So that's very, very convenient.
00:08:22.000 --> 00:08:27.000
Anyway, now we make the assumption about the distribution of the errors.
00:08:27.000 --> 00:08:37.000
So assumption A5 says that the shocks u are normally distributed.
00:08:37.000 --> 00:08:43.000
To be more specific, I assume that all the ui, so all the components of the vector u,
00:08:43.000 --> 00:08:49.000
are independent of each other, and that they are normally distributed, so that we can write
00:08:49.000 --> 00:08:56.000
this either as for a single ui, so for just a scalar component, ui follows a normal distribution
00:08:56.000 --> 00:09:02.000
with expectation 0, so that's a scalar, and variant sigma square u.
00:09:02.000 --> 00:09:08.000
And this shall hold for all the i's, or I can write it in vector notation here, u, the
00:09:08.000 --> 00:09:17.000
whole vector of all the ui's is distributed as a normal distribution with an expected
00:09:17.000 --> 00:09:24.000
value of 0, where this 0 is now a vector of 0's, now conformable of course with u, and
00:09:24.000 --> 00:09:32.000
the covariance matrix is just sigma square u times the identity matrix.
00:09:32.000 --> 00:09:39.000
Now as we have already discussed in the review of statistics, a normal distribution has
00:09:39.000 --> 00:09:45.000
the nice property that the linear transformation of a normally distributed variable is also
00:09:45.000 --> 00:09:48.000
normally distributed.
00:09:48.000 --> 00:09:54.000
So the least squares estimator, as you know, is a linear estimate.
00:09:54.000 --> 00:10:00.000
And since this is so, we can immediately conclude that if assumption A5 holds, then we know
00:10:00.000 --> 00:10:06.000
that the least squares estimator beta hat is normally distributed.
00:10:06.000 --> 00:10:13.000
We can write the least squares estimator as x prime x inverse e x prime, and then substituting
00:10:13.000 --> 00:10:16.000
in for y, x beta plus u.
00:10:16.000 --> 00:10:22.000
And then you know of course when you multiply this out, then the x prime x inverse e times
00:10:22.000 --> 00:10:28.000
x prime x here cancel, so that we are just left with a true value of beta plus x prime
00:10:28.000 --> 00:10:31.000
x inverse e x prime u.
00:10:31.000 --> 00:10:38.000
So beta hat, the OLS estimate differs from the true parameter beta, but precisely this
00:10:38.000 --> 00:10:45.000
term here x prime x inverse e x prime u, which is x plus the general inverse times the unobserved
00:10:45.000 --> 00:10:48.000
errors.
00:10:48.000 --> 00:10:55.000
We know that under assumptions A1 to A4 beta hat is unbiased, actually just under assumptions
00:10:55.000 --> 00:10:57.000
A1 and A2, it is unbiased.
00:10:57.000 --> 00:11:06.000
And the covariance matrix then using assumptions A3 and A4 is v of beta hat, which is equal
00:11:06.000 --> 00:11:08.000
to sigma square u x prime x inverse e.
00:11:08.000 --> 00:11:17.000
So that we know beta hat is normally distributed with expectation beta and with covariance
00:11:17.000 --> 00:11:23.000
matrix sigma square u times x prime x inverse e.
00:11:23.000 --> 00:11:32.000
This matrix x prime x inverse e here, where we need this to get the standard deviations
00:11:32.000 --> 00:11:39.000
of the estimated coefficients of the beta hat vector, for this purpose it is useful
00:11:39.000 --> 00:11:47.000
to write it in a directly accessible form where you don't need to take the inverse of
00:11:47.000 --> 00:11:48.000
some matrix.
00:11:48.000 --> 00:11:56.000
So I didn't know now x prime x inverse e just as a matrix c with typical element small
00:11:56.000 --> 00:11:57.000
c ij.
00:11:57.000 --> 00:12:04.000
So we know x prime x inverse e is a k by k matrix, so the typical element here is c
00:12:04.000 --> 00:12:10.000
ij and the indices i and j for the rows of the columns would go from 1 to k each.
00:12:10.000 --> 00:12:17.000
So in this case we could say that the single estimate of a single coefficient beta hat
00:12:17.000 --> 00:12:26.000
small k is distributed normally with expectation being the true parameter beta k and the variance
00:12:26.000 --> 00:12:29.000
being sigma square u times the ckk element.
00:12:29.000 --> 00:12:39.000
So the diagonal element on this matrix c or on this matrix x prime x inverse e.
00:12:39.000 --> 00:12:50.000
Or with a probability of 95 percent we know that an estimate beta hat k lies in the interval
00:12:50.000 --> 00:12:58.000
of the true parameter beta k minus 1.96 times the standard deviation sigma u times square
00:12:58.000 --> 00:13:08.000
root of ckk and on the other end of the interval beta k plus 1.96 sigma u square root of ckk.
00:13:08.000 --> 00:13:14.000
So the sigma u square root of ckk is of course just the square root of the variance which
00:13:14.000 --> 00:13:21.000
we have here, taking the square root of this we get sigma u times the square root of sigma
00:13:21.000 --> 00:13:29.000
kk which would be the standard deviation of the estimate beta hat k or actually of the
00:13:29.000 --> 00:13:33.000
estimator beta hat k.
00:13:33.000 --> 00:13:43.000
So given that we know that 95 percent of all estimations which we may do for a given true
00:13:43.000 --> 00:13:50.000
model drawing many different samples and then estimating 95 percent of the cases that should
00:13:50.000 --> 00:13:57.000
be true that beta hat k lies in this interval we can write this equivalently as beta hat
00:13:57.000 --> 00:14:04.000
k minus beta k minus the true parameter is in the interval between minus 1.96 times the
00:14:04.000 --> 00:14:13.000
standard deviation and plus 1.96 at the standard deviation or dividing by the standard deviation
00:14:13.000 --> 00:14:19.000
so dividing by sigma u square root of ckk we can equivalently say that the variable
00:14:19.000 --> 00:14:23.000
which is standardized in the way in which we have standardized these variables already
00:14:23.000 --> 00:14:29.000
in the review of statistics and I call it here zk being defined as beta hat k minus
00:14:29.000 --> 00:14:36.000
the true parameter divided by the true standard deviation sigma u square root of ckk that
00:14:36.000 --> 00:14:45.000
the standardized variable zkk or zkk is in the interval between minus 1.96 and plus 1.96
00:14:45.000 --> 00:14:52.000
negative 1.96 plus 1.96 because we know that the standardized variable is distributed
00:14:52.000 --> 00:14:59.000
according to a standard normal distribution with expectation zero and standard deviation of variance
00:14:59.000 --> 00:15:07.000
one now that could in principle be used for a testing but there are two problems the first
00:15:07.000 --> 00:15:13.000
problem is that we do not know the true value of beta k so while we know that we can write down
00:15:13.000 --> 00:15:20.000
such an expression here and call it that k we cannot compute this expression because we do not
00:15:20.000 --> 00:15:29.000
know what the true parameter beta k is this thing here is unobserved but what we can do is
00:15:29.000 --> 00:15:37.000
that we can formulate hypotheses about the true value and test them so we can formulate a null
00:15:37.000 --> 00:15:44.000
hypothesis h naught which says that the true parameter beta k is some value which we now
00:15:44.000 --> 00:15:54.000
basically guess beta k superscript zero right so we fix this beta k superscript zero here
00:15:54.000 --> 00:16:03.000
and formulate the hypothesis this is the true parameter and with this parameter being fixed
00:16:03.000 --> 00:16:12.000
and inserted here for the beta k we can then in principle formulate the test or
00:16:12.000 --> 00:16:20.000
but actually we cannot but we are still lacking knowledge of the sigma u square root of ckk but
00:16:20.000 --> 00:16:26.000
assume that we had this and we would be able to compute this quantity here and check whether it
00:16:26.000 --> 00:16:38.000
is in the closed interval of negative 1.96 plus 1.96 right so we would know that at a five percent
00:16:39.000 --> 00:16:48.000
level of significance we would have negative 1.96 is less than or equal to this zk zero superscript
00:16:48.000 --> 00:16:56.000
zero and this is less or equal to 1.96 if the zk superscript zero is defined as i have done
00:16:56.000 --> 00:17:01.000
here with the beta k superscript zero replacing the true parameter beta k
00:17:01.000 --> 00:17:10.000
so as far as the problem is concerned that we do not know the true beta k we can solve this
00:17:10.000 --> 00:17:17.000
problem or circumvent it by just formulating hypotheses about the true value of beta k
00:17:19.000 --> 00:17:27.000
problem number two is that we also do not observe the sigma u right in this expression here
00:17:28.000 --> 00:17:36.000
the square root of ckk we do observe because this is a diagonal element of x prime x inverse
00:17:36.000 --> 00:17:42.000
and we do know what x prime x is since we have the regressor matrix x so this thing here is
00:17:42.000 --> 00:17:48.000
not a problem this one we have but we do not know how big sigma u is we have to estimate
00:17:48.000 --> 00:17:56.000
sigma u usually so the idea would then be of course to replace the unknown sigma u by its
00:17:57.000 --> 00:18:04.000
estimate sigma u hat and sigma u hat is of course the square root of u hat prime u hat
00:18:04.000 --> 00:18:09.000
divided by n minus k this is what we were talking about yesterday because this here is the unbiased
00:18:09.000 --> 00:18:15.000
estimator of the variance so we'll take the square root of the unbiased estimator of the variance
00:18:15.000 --> 00:18:23.000
and this would give us then the estimator for the for the standard deviation of the error terms u
00:18:24.000 --> 00:18:34.000
replacing sigma u by sigma hat u would lead to a statistic which i call tk rather than zk which
00:18:34.000 --> 00:18:43.000
would be beta hat k minus beta k divided by sigma hat u times the square root of ckk and the problem
00:18:43.000 --> 00:18:49.000
is of course that this thing here is not normally distributed anymore and that's why i gave it the
00:18:49.000 --> 00:18:56.000
name t because what we know is this statistic is distributed as a student's t distribution
00:18:56.000 --> 00:19:03.000
but not like a normal distribution at least in finite samples there's a difference which as you
00:19:03.000 --> 00:19:09.000
know is actually becoming very small quite fast with 30 observations it's already almost
00:19:09.000 --> 00:19:15.000
indistinguishable but still now not 30 observations but 30 degrees of freedom it's quite
00:19:15.000 --> 00:19:19.000
indistinguishable but still there's a difference and you should be aware of it in this case we
00:19:19.000 --> 00:19:27.000
would have to use the t distribution and not the normal distribution so to put it more precisely
00:19:27.000 --> 00:19:39.000
we have to use t distribution with n minus k degrees of freedom but so what we do know how the
00:19:39.000 --> 00:19:46.000
t distribution looks like it is tabulated so using this t test here and formulating some
00:19:46.000 --> 00:19:53.000
hypothesis about the true value of beta k we can actually then run the test usually we will
00:19:53.000 --> 00:19:58.000
formulate the hypothesis that beta k is equal to zero so we will test whether the regressor is
00:19:58.000 --> 00:20:07.000
important at all right so we use this test that this null hypothesis here h not is equal to beta k
00:20:07.000 --> 00:20:15.000
or h not is beta k being equal to zero then our t statistic is t k being the regression
00:20:15.000 --> 00:20:22.000
coefficient the estimated regression coefficient beta k hat divided by the estimated standard
00:20:22.000 --> 00:20:29.000
deviation and divided by square root of c k k and this thing here is then distributed as
00:20:29.000 --> 00:20:37.000
t students t with n minus k degrees of freedom n minus capital k of course has nothing to do with
00:20:37.000 --> 00:20:48.000
a small thing and then we can just look into the tabulated distribution of the t statistic and we
00:20:48.000 --> 00:20:55.000
will find that there's a certain p value so that the probability for the null hypothesis to be true
00:20:56.000 --> 00:21:04.000
can be taken from the tabulation of the t statistic. In regression analysis when you
00:21:04.000 --> 00:21:11.000
use commercial software typically the regression output gives you both the t statistic and the p
00:21:11.000 --> 00:21:18.000
value for each estimated coefficient so the computer program already has the information
00:21:18.000 --> 00:21:29.000
about t statistic programmed in its software so the t distribution is included in the software
00:21:29.000 --> 00:21:36.000
and therefore it's easy for the computer to pick the right p value we typically do not use those
00:21:36.000 --> 00:21:44.000
tables anymore but rather let the computer do it but as a rule of thumb you should recall that
00:21:44.000 --> 00:21:49.000
the regressor is typically significant at the five percent level if its t statistic is greater than
00:21:49.000 --> 00:21:59.000
two. The exact two-sided test is 1.96 but usually it works well if you just use this
00:21:59.000 --> 00:22:07.000
value of two as a rule of thumb and this is now another part of the of the eViews output
00:22:07.000 --> 00:22:13.000
which I have already explained to you highlighted different parts of it now I highlight
00:22:14.000 --> 00:22:20.000
different component yet so for the constant in our regression where we estimated the coefficient
00:22:20.000 --> 00:22:31.000
of 0.767 we have a standard error of 1.79 and then this resides in a t statistic of 4.276
00:22:32.000 --> 00:22:39.000
so you can easily make the test you can divide this coefficient here by the estimated standard
00:22:39.000 --> 00:22:44.000
error and this is the estimated standard error of course and then you should arrive at exactly
00:22:44.000 --> 00:22:49.000
this t statistic here and then the computer evaluates where what is the probability
00:22:50.000 --> 00:22:58.000
to observe such a t statistic if the null hypothesis is true if it were true that
00:22:58.000 --> 00:23:05.000
the constant were zero well and comes up with a result the probability is zero so we can be very
00:23:05.000 --> 00:23:13.000
sure actually that there is a positive constant in this case for this regression and the same thing
00:23:13.000 --> 00:23:20.000
here with the log of income you see the standard error is very small the t statistic consequently
00:23:20.000 --> 00:23:29.000
is great it's really massive 42.8 essentially right this t statistic here clearly indicates
00:23:29.000 --> 00:23:39.000
that the coefficient estimate is significant so in our sample regression we have two significant
00:23:39.000 --> 00:23:47.000
regressors if we look at this other kind of peculiar regression I discussed yesterday where I
00:23:47.000 --> 00:23:58.000
regress stock prices on a constant on a linear trend on TFP and I then look at the t statistics
00:23:58.000 --> 00:24:03.000
here then you see this t statistic is negative because of course we have a negative coefficient
00:24:03.000 --> 00:24:11.000
estimate divided by a positive estimated standard error gives negative 5.6 but so what we have a
00:24:11.000 --> 00:24:18.000
two-sided test so again negative 5.6 is clearly bigger than sorry clearly smaller than negative
00:24:18.000 --> 00:24:24.000
1.96 but an absolute value clearly bigger than 1.96 so the probability is narrow
00:24:24.000 --> 00:24:34.000
for the coefficient of the linear trend we get a t statistic of 3.46 this is still much bigger
00:24:35.000 --> 00:24:42.000
than 1.96 and you see the probability here while not exactly equal to zero it is very very small
00:24:42.000 --> 00:24:50.000
and probability of five percent would have five here right so that is about one hundredth of
00:24:51.000 --> 00:24:59.000
five percent in the order of magnitude of one hundredth of five percent but for total factor
00:24:59.000 --> 00:25:04.000
productivity in this regression again with negative coefficient we get of course a negative
00:25:04.000 --> 00:25:13.000
t statistic and this is negative 1.94 and you see negative 1.94 is a little bigger than 1.96
00:25:13.000 --> 00:25:23.000
or the negative 1.96 or in absolute terms 1.94 is a little smaller than 1.96 but it's not quite
00:25:23.000 --> 00:25:29.000
significant at the five percent level and this is why the p value here is greater than five percent
00:25:29.000 --> 00:25:38.000
it's 5.37 percent actually the probability for the null hypothesis that this coefficient here
00:25:38.000 --> 00:25:45.000
is actually zero the probability for this null hypothesis is still greater than five percent
00:25:45.000 --> 00:25:51.000
even though just marginally greater and this means that the estimated coefficient for TFP
00:25:51.000 --> 00:25:58.000
is here insignificant at the five percent level clearly what you should always do is that you
00:25:58.000 --> 00:26:05.000
decide on which significance level you would like to use before you actually look at the data so
00:26:05.000 --> 00:26:11.000
mostly in economics people use five percent levels of significance this is different actually
00:26:11.000 --> 00:26:23.000
in other sciences in pharmacy for instance you want to have much higher levels of much lower
00:26:23.000 --> 00:26:33.000
I should say much lower levels of significance so much lower probabilities for that null hypothesis
00:26:33.000 --> 00:26:42.000
to be true but this is more or less a question of the culture in each science in economics we
00:26:42.000 --> 00:26:47.000
usually use five percent but sometimes there are people who also use a 10 percent level or a one
00:26:47.000 --> 00:26:54.000
percent level or something like this and what you should never do is that you first estimate your
00:26:54.000 --> 00:27:01.000
equation and then you say this I really regret that the probability here is a little higher
00:27:01.000 --> 00:27:05.000
than five percent so why don't I use a 10 level of significance right this this is not good
00:27:05.000 --> 00:27:11.000
statistics I know you have to fix the significance level before you look at the data and before these
00:27:11.000 --> 00:27:20.000
statistics are being computed and then rigorously apply what your significance levels imply
00:27:21.000 --> 00:27:27.000
namely reject the null hypothesis or accept the null hypothesis exactly at the level of significance
00:27:27.000 --> 00:27:35.000
which you have as we have already discussed there are two possible errors in hypothesis
00:27:35.000 --> 00:27:40.000
testing I just repeat this here there is the possibility of a type one error which says we
00:27:40.000 --> 00:27:48.000
reject the null hypothesis even though the null hypothesis is true and the probability of such a
00:27:48.000 --> 00:27:54.000
type one error is of course the significance level alpha so at a significance level alpha of five
00:27:54.000 --> 00:28:02.000
percent there is a five percent chance for you to reject the null hypothesis even if it is true
00:28:04.000 --> 00:28:11.000
okay the type two error would be to fail to reject the null hypothesis even though the
00:28:11.000 --> 00:28:18.000
null hypothesis is wrong and the probability of a type two error is of course the greater
00:28:19.000 --> 00:28:25.000
the smaller the probability of a type one error is there was a small misprint in the slides by
00:28:25.000 --> 00:28:30.000
the way which you perhaps still have in your slides I had type two here and type two here
00:28:30.000 --> 00:28:37.000
which doesn't make any sense so it must read type one error here moreover the probability of a type
00:28:37.000 --> 00:28:44.000
two error cannot be easily assessed because it depends on the true parameter so for instance
00:28:45.000 --> 00:28:50.000
suppose you test a null hypothesis that a parameter is equal to zero
00:28:52.000 --> 00:29:00.000
and suppose the null hypothesis is not true and the true parameter is different from zero
00:29:01.000 --> 00:29:08.000
but if the true parameter is very close to zero let's say it's 0.0001 then it is very
00:29:09.000 --> 00:29:16.000
difficult to reject the null hypothesis even though technically speaking the null hypothesis
00:29:16.000 --> 00:29:23.000
is wrong so the possibility of a type two the probability of a type two error in such a case
00:29:23.000 --> 00:29:31.000
where the true parameter is close to the parameter which we have tested this probability is quite high
00:29:31.000 --> 00:29:41.000
conversely if the true parameter is far off the tested parameter then the probability of a type
00:29:41.000 --> 00:29:49.000
two error is rather small and we distinguish here between the level of a test which is the
00:29:49.000 --> 00:29:57.000
probability of a type one error and the level of a test should always be small and second the power
00:29:57.000 --> 00:30:05.000
of a test the power is the probability to reject the null hypothesis if a specific alternative h1
00:30:05.000 --> 00:30:12.000
is true so it is the probability of avoiding a type two error and this probability should be
00:30:12.000 --> 00:30:22.000
large in all tests we have a trade-off between level and power the lower the level of a test is
00:30:23.000 --> 00:30:33.000
the less is the power so the lower the probability for type one error is the greater is the probability
00:30:33.000 --> 00:30:45.000
to sorry the smaller is the probability to reject h zero if some alternative hypothesis h1 is actually
00:30:45.000 --> 00:30:49.000
true any questions
00:30:55.000 --> 00:31:01.000
i don't see any so let me continue with confidence intervals which we have also already
00:31:01.000 --> 00:31:08.000
covered so when we have a significance level of alpha then we know that the test statistic which
00:31:08.000 --> 00:31:17.000
i call tk for the k regression coefficient lies between the critical values of the t distribution
00:31:19.000 --> 00:31:25.000
with n minus k degrees of freedom and significance level alpha over two
00:31:27.000 --> 00:31:33.000
negative sign here or positive sign here with probability alpha and with probability one minus
00:31:33.000 --> 00:31:40.000
alpha yeah so these are the two critical values of the t distribution with n minus k degrees of
00:31:40.000 --> 00:31:48.000
freedom they are symmetric so they are just the same except for the sign they have therefore with
00:31:48.000 --> 00:31:54.000
this probability of one minus alpha we know that the standardized variable beta hat k minus beta
00:31:54.000 --> 00:32:00.000
k divided by the estimated standard deviation lies with this probability one minus alpha so let's
00:32:00.000 --> 00:32:07.000
say 95 percent in this interval here and therefore we can just rearrange the numbers here
00:32:07.000 --> 00:32:17.000
to arrive at a confidence interval which has as its lower bound the estimated value beta hat k
00:32:17.000 --> 00:32:25.000
plus the estimated standard deviation times ckk times then square root of ckk times then
00:32:25.000 --> 00:32:32.000
the critical value and this thing is well the upper bound of the interval i think i said the
00:32:32.000 --> 00:32:37.000
lower bound but it's of course the upper bound and here with a negative sign we would have the lower
00:32:37.000 --> 00:32:44.000
bound of the interval the two bonds are completely analogous and actually the same except for the
00:32:44.000 --> 00:32:53.000
sign of this term which uses the critical value at the standard deviation so this expression 36 here
00:32:53.000 --> 00:33:02.000
describes one minus alpha confidence interval for beta k now here is another MATLAB exercise
00:33:03.000 --> 00:33:11.000
for you which i described to you briefly MATLAB or any other programming link which as i said many
00:33:11.000 --> 00:33:18.000
times generates some matrix x of random numbers let's say a hundred by three matrix and then
00:33:19.000 --> 00:33:24.000
loop over the following steps again let's say a thousand times you've had a similar exercise
00:33:24.000 --> 00:33:32.000
yesterday so you generate some hundred by one random vector of disturbances i'll call them
00:33:32.000 --> 00:33:41.000
uj for each loop j you call you generate uj and then you compute from the generated
00:33:41.000 --> 00:33:57.000
matrix of random numbers the the dependent variable yj as x times the beta vector which i
00:33:57.000 --> 00:34:05.000
have here fixed at one one zero and you add the random numbers uj
00:34:09.000 --> 00:34:16.000
so the beta vector beta one beta two beta three is just one in component one one in component two
00:34:17.000 --> 00:34:25.000
and zero in component three and then each time you estimate beta hat j as x prime x inverse
00:34:26.000 --> 00:34:32.000
x prime yj and test at the five percent level the hypothesis that beta three is equal to zero
00:34:33.000 --> 00:34:40.000
now the question is how often do you commit a type one error that is to say how often do you now
00:34:40.000 --> 00:34:47.000
erroneously reject the null and what would happen if you increase the number of loops
00:34:48.000 --> 00:34:53.000
or what would happen if you increase the number of observations you could also do this right the
00:34:53.000 --> 00:34:57.000
number of observations is hundred here in my setting it can of course also have higher
00:34:57.000 --> 00:35:05.000
numbers of observations so you can test this out now for this exercise i would like to present
00:35:05.000 --> 00:35:15.000
you the solution because one of you was writing to me asking or indicating that he at least has
00:35:15.000 --> 00:35:22.000
little experience with doing these type of programming exercises and i see that this
00:35:22.000 --> 00:35:30.000
may be a problem then if you have not yet worked with such software like matlab or gauss or r so
00:35:30.000 --> 00:35:39.000
what i will do now i will show you the solution to this exercise however not in matlab but in gauss
00:35:40.000 --> 00:35:46.000
because i don't work in matlab i do work gauss but the solution would be very similar in terms
00:35:46.000 --> 00:35:55.000
matlab or in terms of r as all these mathematical programming languages follow the same type of idea
00:35:55.000 --> 00:36:04.000
so what i will do now is that i will discontinue the screen sharing here and rather share the screen
00:36:04.000 --> 00:36:18.000
for my gauss which i should have somewhere let me see show all windows here
00:36:26.000 --> 00:36:28.000
so where is my gauss screen
00:36:35.000 --> 00:36:37.000
there it is all right
00:36:40.000 --> 00:36:48.000
you should see the gauss screen now and for some reason you don't see my content
00:36:49.000 --> 00:36:59.000
yes now i see it all right good um so here is the gauss program so what i do is the first fix
00:37:00.000 --> 00:37:07.000
the parameter vector beta has 1 1 0 then i fix some value for the standard deviation of
00:37:08.000 --> 00:37:16.000
the use of the errors in this case i've just set this at 0.1 fix the number of observations n
00:37:16.000 --> 00:37:23.000
equal to 100 fix the number of regressors equal to three and then say well i go through a thousand
00:37:23.000 --> 00:37:31.000
loops of my loop and here i have a variable which i call rejection right this will count
00:37:31.000 --> 00:37:42.000
the number of rejections for the test which i'm going to carry through and now what i see actually
00:37:42.000 --> 00:37:50.000
is that i should also fix the regressor matrix outside of the loop i will just change this
00:37:51.000 --> 00:38:00.000
sorry let's put this here to be conformable with the exercise so i fix a regressor matrix here
00:38:00.000 --> 00:38:05.000
right so l n d n is a command a gauss command this sounds a little bit different in other
00:38:05.000 --> 00:38:13.000
programming languages which creates a matrix of type n by 3 i should perhaps write n by k here
00:38:14.000 --> 00:38:22.000
uh because k is the number of regressors right so i would have a number matrix of type n by
00:38:22.000 --> 00:38:30.000
k of random numbers and r and d here is for random and n is for normally distributed random numbers
00:38:30.000 --> 00:38:38.000
okay and then i go into a do loop loop so i fix the index i at zero and go through this loop here
00:38:38.000 --> 00:38:46.000
until the index i is equal to a thousand and in this loop now what i first do is that i generate
00:38:46.000 --> 00:38:54.000
random numbers u right right here and then i generate my observations y y is x times beta
00:38:54.000 --> 00:39:02.000
plus u this would be the dependent variable and then i estimate by least squares beta hat
00:39:02.000 --> 00:39:10.000
beta hat is equal to the inverse of x prime x times x prime y then i compute y hat which is x
00:39:10.000 --> 00:39:18.000
times beta hat then i compute u hat so the estimated residuals which would be y minus y hat
00:39:20.000 --> 00:39:28.000
then i estimate the variance of the u's sigma square u hat is u hat prime u hat prime my divided
00:39:28.000 --> 00:39:35.000
by n minus k and then i compute the covariance matrix of the beta hats which i call cv here for
00:39:35.000 --> 00:39:43.000
covariance matrix hat for estimation this would be the sigma u square hat times the inverse of x
00:39:43.000 --> 00:39:50.000
prime x and then i can compute the t statistic for the third coefficient which is why i call it t3
00:39:50.000 --> 00:39:55.000
beta hat in the third component so this would be the gauss way to address the third component of
00:39:55.000 --> 00:40:05.000
beta hat divided by the square root of the covariance matrix note that the estimate of sigma
00:40:05.000 --> 00:40:13.000
is sigma u squared is already included in this cv definition here and then i just check is the
00:40:13.000 --> 00:40:18.000
absolute value of the test statistic of the t statistic which i just have computed is this
00:40:18.000 --> 00:40:26.000
greater than 1.96 if this is true then i increase my variable rejection by one so rejection is equal
00:40:26.000 --> 00:40:34.000
to rejection plus one if it is not true then nothing happens at this place the following is
00:40:34.000 --> 00:40:42.000
optional this computes a histogram you can look at this at home end of the loop is that i increase
00:40:42.000 --> 00:40:49.000
the index i by one then here's the end do and the loop starts over until i is equal to thousand
00:40:51.000 --> 00:40:58.000
and when all thousand iterations have been done well then in this case i compute the percentage
00:40:58.000 --> 00:41:04.000
of the null hypothesis being rejected by dividing the number of rejection strings which i have
00:41:04.000 --> 00:41:18.000
calculated through the number of loops here so this whole thing would look like this let me just run it
00:41:21.000 --> 00:41:27.000
nice why doesn't it run okay that is right so i ran it here and it gives me now the
00:41:27.000 --> 00:41:35.000
message back percentage of h zero being rejected is six point six percent that's not exactly five
00:41:35.000 --> 00:41:40.000
percent because well there are still random numbers in there and we just have hundred
00:41:40.000 --> 00:41:45.000
observations and a thousand loops so when we rerun it it will be a different percentage
00:41:46.000 --> 00:41:54.000
let's see for some reason i don't get perhaps i should do it like this okay there it is in this
00:41:54.000 --> 00:42:00.000
case it's five point six percent right and i can do this many times now it's four point six percent
00:42:00.000 --> 00:42:05.000
and i do it again and then it's four point nine percent you see these are all numbers which
00:42:05.000 --> 00:42:11.000
center around five percent right here it's just three point seven percent next is again five point
00:42:11.000 --> 00:42:17.000
nine percent so i can do this many times and you see that on average this will be something like
00:42:17.000 --> 00:42:24.000
five percent of rejections of the null hypothesis because null hypothesis here was true right
00:42:25.000 --> 00:42:31.000
third value of beta was equal to zero and so we should find that the number of rejections
00:42:31.000 --> 00:42:37.000
of the null hypothesis of which we know that it is true is five percent you cannot play around
00:42:37.000 --> 00:42:41.000
with this program and increase the number of loops or increase the number of observations
00:42:42.000 --> 00:42:47.000
which i will not do with you this you can do when when you program this yourself and possibly in a
00:42:47.000 --> 00:42:55.000
different programming language the only thing which i will still do is that i uncomment this
00:42:55.000 --> 00:43:06.000
thing here because what i wrote in this program as optional is that i also collected all the test
00:43:06.000 --> 00:43:13.000
statistics all the thousand test statistics and then let gause put these in the histogram of all
00:43:13.000 --> 00:43:20.000
these test statistics so this may be interesting to look at for you run the program again now
00:43:25.000 --> 00:43:32.000
and here is the histogram right and this histogram should actually follow a normal
00:43:32.000 --> 00:43:41.000
distribution or a t distribution but with hundred observations or in this case 97 degrees of freedom
00:43:41.000 --> 00:43:47.000
this should basically be a normal distribution you see it doesn't quite look like a normal
00:43:47.000 --> 00:43:55.000
distribution yet there is too much variance in there still because we just have 100 observations
00:43:55.000 --> 00:44:05.000
and just a thousand loops i will just reduce the same thing with 10 000 loops let's go back here
00:44:05.000 --> 00:44:17.000
increase the number of loops to 10 000 and then re-run the program and now you see the result is
00:44:17.000 --> 00:44:23.000
already much closer to a to the shape of a normal distribution right and what i have counted as
00:44:23.000 --> 00:44:31.000
rejections are the 2.5 percent of events which are greater than 1.96 sorry this should be somewhere
00:44:31.000 --> 00:44:39.000
here right two is here so this branch and of course smaller than negative two or negative 1.96
00:44:40.000 --> 00:44:46.000
here these i have counted and then i came up to close to five percent rejections typically
00:44:48.000 --> 00:44:54.000
so this program i have already uploaded to steeda but not yet with the change i have just made
00:44:54.000 --> 00:45:03.000
regarding the x matrix i will upload this one again here with these changes after the lecture
00:45:03.000 --> 00:45:09.000
and then you can download it and look at it in detail and try to reproduce it or write it yourself
00:45:09.000 --> 00:45:18.000
in a programming language of your own choice i really recommend to you that you do these
00:45:18.000 --> 00:45:25.000
type of exercises because i think it will enhance your understanding of the geometric theory by much
00:45:26.000 --> 00:45:34.000
if you really try to reproduce this in terms of artificial numbers and your own programs you'll
00:45:34.000 --> 00:45:41.000
certainly commit certain errors when you write this for the first second third time and so forth
00:45:41.000 --> 00:45:46.000
but you learn from those errors right and when you find that the program doesn't deliver the
00:45:47.000 --> 00:45:52.000
results which it should deliver then you have to go back into your program and find the error
00:45:52.000 --> 00:45:58.000
because the theory is correct and then in this case your program won't be correct but it will
00:45:58.000 --> 00:46:05.000
help you to identify the error you have in your program and better understand the theory than
00:46:05.000 --> 00:46:10.000
by just looking at the formula any questions here
00:46:15.000 --> 00:46:22.000
i will discontinue the screen sharing of this screen and go back to
00:46:24.000 --> 00:46:33.000
the pdf slides which are here where we stopped so this would be this exercise
00:46:36.000 --> 00:46:43.000
and there's actually a second exercise to follow where you could do the same thing now testing
00:46:43.000 --> 00:46:53.000
the power of the test by finding out how often one commits a type 2 error because in this setting
00:46:53.000 --> 00:47:02.000
here the beta 3 coefficient would not be equal to zero but it would be equal to 0.1 so testing the
00:47:02.000 --> 00:47:09.000
0.1 so testing the null hypothesis that beta 3 is equal to 0 would be wrong
00:47:10.000 --> 00:47:15.000
or the testing would not be wrong but the null hypothesis would be wrong and if you test it
00:47:15.000 --> 00:47:22.000
then the question is how often would you actually reject the null hypothesis given that the true
00:47:22.000 --> 00:47:29.000
value of beta 3 is equal to 0.1 and of course you can experiment then here again with different
00:47:29.000 --> 00:47:35.000
parameters of the program and change the 0.12 values which are closer to zero or which are
00:47:35.000 --> 00:47:43.000
farther off from zero to see how the probability of type 2 errors or in this case the frequency of
00:47:43.000 --> 00:47:54.000
type 2 errors changes with these changes you may also vary the standard deviation of the errors
00:47:54.000 --> 00:48:01.000
remember I had fixed the sigma u at 0.1 you can use any other value and see how the
00:48:02.000 --> 00:48:07.000
estimation results change when let's say the standard deviation is greater than 0.1
00:48:07.000 --> 00:48:13.000
actually with 0.1 the estimation results are very precise but if you increase the
00:48:14.000 --> 00:48:19.000
sigma then you'll see that it's become less reliable actually
00:48:19.000 --> 00:48:29.000
all right and now suppose that we want to test hypotheses for more than just one regression
00:48:29.000 --> 00:48:36.000
coefficient all right so far we have just singled out one particular entry of the beta vector let's
00:48:36.000 --> 00:48:44.000
say beta k and we have tested the hypothesis that beta k is equal to zero but it may be sometimes
00:48:44.000 --> 00:48:51.000
that we want to test an hypothesis which concerns more than just one coefficient
00:48:51.000 --> 00:48:57.000
so for instance one hypothesis could be the non-hypothesis that a couple of coefficients
00:48:57.000 --> 00:49:04.000
in my regression are equal to zero or actually there are many other difficulties how you could
00:49:05.000 --> 00:49:12.000
test hypotheses other types of hypotheses by exactly the same way which I explained to you
00:49:12.000 --> 00:49:18.000
now for instance you could also test an hypothesis which says that the sum of
00:49:19.000 --> 00:49:25.000
certain coefficients add up to one or add up to something else or that the difference is so and so
00:49:25.000 --> 00:49:31.000
big or something like this these would be all certain types of linear hypotheses which you
00:49:31.000 --> 00:49:38.000
formulate and I just take the type of hypothesis which is standard which is tested as a standard
00:49:38.000 --> 00:49:45.000
in commercial economic software namely that number of regression coefficients are equal
00:49:45.000 --> 00:49:53.000
to zero so this will be our non-hypothesis to be more precise about this suppose we have
00:49:53.000 --> 00:50:00.000
the regression model in the usual form y is equal to x beta plus u and what I now do is that I
00:50:00.000 --> 00:50:09.000
partition the x matrix so the x matrix has k columns in our notation and now I split this up
00:50:09.000 --> 00:50:21.000
into one matrix which has k minus j columns I call this matrix x zero and a second matrix
00:50:21.000 --> 00:50:29.000
which has j columns and I call this matrix x one so k minus j columns here and j columns here
00:50:29.000 --> 00:50:37.000
add up to k columns so this exactly x zero and x one are the same matrix as this x here
00:50:39.000 --> 00:50:45.000
just in terms of notation I now distinguish between the first k minus j columns and the next
00:50:45.000 --> 00:50:54.000
j columns and I split up the beta vector in exactly the same way so beta is a column vector
00:50:54.000 --> 00:51:02.000
as you know and I split it now up in two smaller column vectors beta zero and beta one the first
00:51:03.000 --> 00:51:11.000
part here the beta zero is then a column vector with k minus j entries so this of type k minus j
00:51:11.000 --> 00:51:21.000
by one and the beta one vector here is of type j by one so that as a total this is again j rows
00:51:21.000 --> 00:51:27.000
and one column like in this beta vector here it's just a new notation nothing has changed
00:51:27.000 --> 00:51:33.000
but now I distinguish x zero and x one and the corresponding coefficients beta zero and beta one
00:51:34.000 --> 00:51:41.000
plus u because what I want to estimate what I want to test is whether this beta one here is
00:51:41.000 --> 00:51:47.000
possibly equal to zero in all components equal to zero so the whole beta one vector being equal to
00:51:48.000 --> 00:51:55.000
this is what the null hypothesis says right all betas starting at k minus j plus one up to beta
00:51:55.000 --> 00:52:04.000
k are equal to zero so this is translated here or rewritten here as beta one and the hypothesis
00:52:04.000 --> 00:52:10.000
would say beta one is equal to the zero vector the beta zero here can be whatever it wants but
00:52:10.000 --> 00:52:17.000
I would like to test whether beta one is equal to zero okay so how is this test run because obviously
00:52:17.000 --> 00:52:24.000
we cannot use a t test here anymore we want to test all these hypotheses here jointly there's
00:52:24.000 --> 00:52:31.000
just one hypothesis which we which we test namely that all these coefficients are jointly equal to
00:52:31.000 --> 00:52:38.000
zero so don't do now j t tests or something like this this would create a problem of multiple
00:52:38.000 --> 00:52:45.000
testing you would not know what the the level of your test is anymore but you have to run just
00:52:45.000 --> 00:52:52.000
one single test and this will be an f test okay and the f test proceeds as follows you actually
00:52:52.000 --> 00:53:00.000
do two estimations first you estimate the regular model y is equal to x beta plus u without any kind
00:53:00.000 --> 00:53:07.000
of restrictions on the beta right so you estimate the full model full regressor matrix here let the
00:53:07.000 --> 00:53:13.000
beta be whatever the beta is namely x prime x inverse times x prime y right so you estimate
00:53:13.000 --> 00:53:21.000
this beta here and let's call this estimate beta hat the next thing you do is that you calculate
00:53:21.000 --> 00:53:28.000
the sum of squared resilience for this beta hat so you calculate s one which i define to be u hat
00:53:28.000 --> 00:53:34.000
prime u hat and the u hats are of course the residuals for this particular regression we
00:53:34.000 --> 00:53:42.000
have just run without any restriction on beta so it's y minus x beta hat prime y minus x beta hat
00:53:43.000 --> 00:53:51.000
right so this is the scalar which you have here s one gives you just the sum of squared
00:53:51.000 --> 00:54:00.000
residuals for this first regression next step is that you re-run a regression but not the same of
00:54:00.000 --> 00:54:09.000
course but you re-estimate equation 38 now under the restriction that beta one is equal to zero
00:54:10.000 --> 00:54:17.000
so when beta one is equal to zero in this notation here then x one is being multiplied by beta one
00:54:17.000 --> 00:54:23.000
but beta one is equal to zero so x one can't play a role so actually what we have to do then is that
00:54:23.000 --> 00:54:32.000
we regress y just on x zero and estimate beta zero right under the null hypothesis beta one would be
00:54:32.000 --> 00:54:37.000
equal to zero so x one times beta one would be equal to zero so this doesn't play a role
00:54:38.000 --> 00:54:45.000
therefore we just regress y on x zero and on beta zero so we estimate this equation here
00:54:45.000 --> 00:54:55.000
y is equal to x zero times beta zero plus u this is equation 39 now call the estimate of beta zero
00:54:55.000 --> 00:55:03.000
which you obtain here beta zero tilde so the tilde is like the hat here right just an indication of
00:55:03.000 --> 00:55:10.000
that this is an estimate the least squares estimate is x zero prime times x zero
00:55:11.000 --> 00:55:18.000
inversely of the whole matrix times x zero prime y and the sum of the squared residuals for this
00:55:18.000 --> 00:55:26.000
regression would be s zero times u tilde prime u tilde where u tilde is defined as y minus x
00:55:26.000 --> 00:55:32.000
zero times beta two zero and so just analogously to what we have done up there
00:55:33.000 --> 00:55:40.000
just one question can you tell me do we know which of these two magnitudes is greater than
00:55:40.000 --> 00:55:47.000
the other s one and s zero which of these two magnitudes would you think is greater
00:55:47.000 --> 00:55:50.000
than the other or can't we just tell
00:56:00.000 --> 00:56:08.000
there is an answer coming yes s zero is greater and y can you also type in y zero is greater
00:56:17.000 --> 00:56:36.000
that's the answer the answer is approximately correct because we have less parameters to modify
00:56:36.000 --> 00:56:42.000
it's similar to r square is what you say yes i think you mean exactly the the right thing see
00:56:42.000 --> 00:56:53.000
the point is whatever we achieve for s zero in the restricted regression we could also achieve
00:56:53.000 --> 00:57:01.000
in the unrestricted regression so we can always of course set the beta zero part of our parameter
00:57:02.000 --> 00:57:10.000
oh let's put it this way when we minimize s one we can always pick the same values for beta zero
00:57:11.000 --> 00:57:19.000
as we have done when we minimized s zero so we can always pick the beta zero tilde values here
00:57:19.000 --> 00:57:26.000
and have all the other the beta one values equal to zero so for s one it is very easy to achieve
00:57:26.000 --> 00:57:35.000
at least as little as the sum s zero but obviously we then have more degrees of freedom because we
00:57:35.000 --> 00:57:45.000
can also vary the coefficients of beta one so we will be able to minimize further than just with
00:57:45.000 --> 00:57:52.000
s zero so that was the correct answer we will always have that s zero is greater than or equal
00:57:52.000 --> 00:57:57.000
to s one actually in fact usually we will have s zero is strictly greater than s one and we've
00:57:57.000 --> 00:58:05.000
just answered the question why is that now this is trivial that s zero is greater than or equal to
00:58:05.000 --> 00:58:13.000
s one the interesting question is whether the difference between s zero and s one is significant
00:58:14.000 --> 00:58:21.000
is significantly different from zero and this is what the f test tests
00:58:22.000 --> 00:58:29.000
and here's how the the f test is constructed i'm trying to give you an intuitive explanation of
00:58:29.000 --> 00:58:37.000
that so first look at the denominator of this ratio here look at s one over n minus k what is
00:58:37.000 --> 00:58:47.000
s one over n minus k well s one is the sum of squared residuals in the unrestricted regression
00:58:49.000 --> 00:58:54.000
forget for the time being about that it is the unrestricted regression but think of it just as
00:58:54.000 --> 00:59:01.000
the sum of squared residuals which we have achieved but we have obtained now when we divide
00:59:01.000 --> 00:59:11.000
this by n minus k then essentially what we compute here is the average squared residual the size of
00:59:11.000 --> 00:59:18.000
the average squared residual which we have right you you have to read this in such a way that you
00:59:18.000 --> 00:59:26.000
say well this is essentially what given this type of data is normal to obtain for a squared
00:59:26.000 --> 00:59:36.000
residual this magnitude here expresses how big on average a squared residual should be
00:59:37.000 --> 00:59:41.000
the average however is not taken with reference to all observations but with reference to the
00:59:41.000 --> 00:59:48.000
number of degrees of freedoms as you see here i won't go into the details why it must be degrees
00:59:48.000 --> 00:59:55.000
of freedom rather than observations but for all practical purposes the difference between n minus
00:59:55.000 --> 01:00:02.000
k and n is usually not so big right so mostly we have many more observations than we have regressor
01:00:02.000 --> 01:00:07.000
that should be the case so whether we divide by n minus k over n is not so important but if you
01:00:07.000 --> 01:00:17.000
want to phrase it correctly then you would say this is the average square of the residuals which
01:00:17.000 --> 01:00:30.000
we receive per degree of freedom okay now what is this here in the numerator of this ratio here
01:00:30.000 --> 01:00:40.000
s zero minus s one over j we have the increase in squared resilience the increase due to
01:00:42.000 --> 01:00:52.000
restricting the regression the increase due to imposing j restrictions so we lose actually j
01:00:52.000 --> 01:01:03.000
degrees of freedom because we restrict j of our regresses to have no effect so this is out about
01:01:03.000 --> 01:01:08.000
this is also about degrees of freedom here j is degrees of freedom and n minus k is degrees of
01:01:08.000 --> 01:01:19.000
freedom and what we compute here is basically how big the average increase per degree of freedom
01:01:19.000 --> 01:01:27.000
is when we impose restrictions right so this here is an increase in squared
01:01:30.000 --> 01:01:39.000
residuals an increase in squared residuals per degree of freedom relative to the normal level
01:01:39.000 --> 01:01:48.000
of squared residuals per degree of freedom now if this ratio here is great then we would say
01:01:48.000 --> 01:01:54.000
that probably these restrictions were not so appropriate this would mean that we have
01:01:55.000 --> 01:02:05.000
lots of additional unexplained errors in our regression so if the numerator here
01:02:06.000 --> 01:02:12.000
and this f ratio is greater than the denominator here or much greater than the denominator then
01:02:12.000 --> 01:02:20.000
we would expect that this is a significant f statistic note by the way that s one over n
01:02:20.000 --> 01:02:24.000
minus k is a familiar expression because there's nothing else than the unbiased by variance
01:02:24.000 --> 01:02:31.000
estimator sigma square two right but in order to understand the workings of the f statistic i think
01:02:31.000 --> 01:02:39.000
it is more useful to write it this way well what can be shown is that this test statistic f which
01:02:39.000 --> 01:02:46.000
i have not tried to explain to you on an intuitive basis that this follows a well-defined distribution
01:02:46.000 --> 01:02:53.000
if the variables which we consider here have normal distributions and if the u's are normally
01:02:53.000 --> 01:03:01.000
distributed then the sum of squared u's is always distributed as a chi-square distribution and the
01:03:01.000 --> 01:03:08.000
ratio of two independent chi-square distributions is distributed like an f distribution so this
01:03:08.000 --> 01:03:14.000
distribution is well known it has been tabulated and we can just look at the critical values of an
01:03:14.000 --> 01:03:24.000
f distribution with j and n minus k degrees of freedom each right so this small f here follows
01:03:24.000 --> 01:03:33.000
a standard capital f distribution with j degrees of freedom in the numerator and n minus k degrees
01:03:33.000 --> 01:03:40.000
of freedom in the denominator of the f statistic and as i have already said we reject the null
01:03:40.000 --> 01:03:49.000
hypothesis when the small f is too large so in this case we just accept h one right the alternative
01:03:49.000 --> 01:03:55.000
hypothesis and the alternative hypothesis is of course that at least one of the coefficients
01:03:55.000 --> 01:04:02.000
which we have tested for being equal to zero is not equal to zero so at least one of the beta j's
01:04:02.000 --> 01:04:07.000
must be different from zero this would be the alternative hypothesis which we have except
01:04:07.000 --> 01:04:14.000
which we would have to accept if the value of the test statistic small f is too large
01:04:16.000 --> 01:04:21.000
now why do i explain this in the first play i explained this to explain to you what standard
01:04:21.000 --> 01:04:27.000
regression output informs you about because in standard regression output from computer
01:04:27.000 --> 01:04:35.000
software you always get a report on the f statistic and you may wonder what exactly is
01:04:35.000 --> 01:04:42.000
the f statistic that is being tested well the f statistic which is routinely tested by commercial
01:04:42.000 --> 01:04:50.000
software is the statistic that all regression coefficients all coefficients except the constant
01:04:51.000 --> 01:04:57.000
are equal to zero right so if the constant term which is usually included in the regression is
01:04:57.000 --> 01:05:03.000
the first regressor which is also some kind of standard then the null hypothesis is that beta
01:05:03.000 --> 01:05:11.000
two beta three beta four up to beta k are all equal to zero this is what the f statistic tests
01:05:11.000 --> 01:05:17.000
in any regression which you run with a commercial software mostly this is not a very interesting
01:05:17.000 --> 01:05:25.000
statistic because you would have very badly specified regression if indeed all your regressors
01:05:25.000 --> 01:05:31.000
are actually equal to zero all your regressors except the constant doesn't happen very often
01:05:31.000 --> 01:05:36.000
right so most people don't really pay much attention to this f statistic it is in fact not
01:05:36.000 --> 01:05:44.000
very informative but the principle of how f tests are carried out is very important because you
01:05:44.000 --> 01:05:52.000
often have hypotheses which relate to more than one regression coefficient so very often you run
01:05:52.000 --> 01:06:00.000
your own f tests and how you have to calculate the sum of squared residuals then in the unrestricted
01:06:00.000 --> 01:06:05.000
estimation and the restricted estimation depends on what kind of hypothesis you want to test
01:06:06.000 --> 01:06:12.000
commercial software programmers don't know what you want to test but some software allows you to
01:06:12.000 --> 01:06:18.000
specify your your hypothesis right away and then completes the f statistic for you like
01:06:18.000 --> 01:06:24.000
if you test this for instance and then you get the appropriate p value without having to bother
01:06:24.000 --> 01:06:30.000
about all these kind of sums of squares and correction for degrees of freedom and so forth
01:06:30.000 --> 01:06:34.000
but the standard output as i say is what you get here when it writes under
01:06:36.000 --> 01:06:43.000
regression results f statistic for our sample regressions here log private consumption explained
01:06:43.000 --> 01:06:52.000
but a constant block income we have huge f statistic as you see 1827.9 and the probability
01:06:52.000 --> 01:06:57.000
of the f statistic is zero which is what you usually find in a well-specified regression
01:06:57.000 --> 01:07:03.000
with the probability of the f statistic is zero so we can very clearly reject the hypothesis that
01:07:04.000 --> 01:07:10.000
well actually in this case just this coefficient here is equal to zero because the f statistic in
01:07:10.000 --> 01:07:17.000
this case just tests the hypothesis that the coefficient of log income is equal to zero
01:07:17.000 --> 01:07:25.000
as i have told you the coefficient on the constant is not being included in the hypothesis of the f
01:07:25.000 --> 01:07:31.000
statistic so in some way or a variable with just two regressors one constant and one additional
01:07:31.000 --> 01:07:38.000
regressor the f statistic is kind of ridiculous kind of redundant i actually wanted to say
01:07:39.000 --> 01:07:45.000
because we have the same information already here in the t statistic and in fact the t statistic is
01:07:45.000 --> 01:07:51.000
closely related to the f statistic basically by a square root of the f.
01:07:53.000 --> 01:08:01.000
All right and then again with Gauss, Netlab, R or whatever you can try as an exercise to reproduce
01:08:01.000 --> 01:08:06.000
the f statistic for this benchmark regression you have the data on log consumption of income
01:08:06.000 --> 01:08:12.000
and you can just reproduce the regression results and compute the f statistic in the p-value menu.
01:08:16.000 --> 01:08:20.000
This is it for hypothesis testing, do you have any questions?
01:08:25.000 --> 01:08:28.000
I don't see any indication, no hands rising.
01:08:28.000 --> 01:08:37.000
Okay then we move on with 3.5 where we now talk about asymptotic properties. I spoke about that
01:08:37.000 --> 01:08:45.000
already at the beginning of today's lecture that we will now try to look at different
01:08:47.000 --> 01:08:54.000
properties of the estimators than the ones we have studied so far so under the aspect of
01:08:54.000 --> 01:09:02.000
optimality we have looked at the best linear unbiased estimator and we will now move away
01:09:02.000 --> 01:09:09.000
from the requirement of unbiasedness and rather move to a requirement which is asymptotic in
01:09:09.000 --> 01:09:17.000
nature namely consistency of an estimator. And the reason for doing that is that assumption
01:09:17.000 --> 01:09:25.000
A2 is very often not satisfied in reality or there are good reasons to assume that assumption
01:09:25.000 --> 01:09:34.000
A2 is not satisfied in reality, it is an assumption which is actually too strong for many data sets.
01:09:34.000 --> 01:09:38.000
It's a nice assumption that we can derive nice results with it more other things that
01:09:39.000 --> 01:09:46.000
the least squares estimator is blue but in many many settings it is simply not true that
01:09:46.000 --> 01:09:55.000
assumption A2 holds so we have to suppose that the least squares estimator will be biased in
01:09:55.000 --> 01:10:01.000
finite samples and if this is the case then the question is does this mean that we can forget
01:10:01.000 --> 01:10:10.000
about the least squares estimator altogether or are there other properties which still make the
01:10:10.000 --> 01:10:17.000
least squares estimator desirable and yes there are there is this property of consistency which I
01:10:17.000 --> 01:10:26.000
will introduce you to now and where I will explain how we can replace the strong assumption A2 by an
01:10:26.000 --> 01:10:32.000
assumption which is much weaker or actually there are different assumptions which we can use which
01:10:32.000 --> 01:10:39.000
are all much weaker than A2 and we can use them and they would be sufficient to ensure the
01:10:39.000 --> 01:10:50.000
consistency of the estimator. Now what actually is the problem we deal with? Let's go back to the
01:10:50.000 --> 01:10:54.000
least squares estimator in the way in which we have handled that already beta hat is x prime x
01:10:54.000 --> 01:11:02.000
inverse c x prime y and you know we can rewrite this this form here to get to beta plus x prime x
01:11:02.000 --> 01:11:12.000
inverse c x prime u. Now we know this far we moved with the unbiasedness property if x prime x
01:11:12.000 --> 01:11:21.000
inverse c x prime u or basically if the x matrix is strictly exogenous then we know that the expected
01:11:21.000 --> 01:11:27.000
value of beta hat will be equal to beta plus the expectation of this term and the expectation of
01:11:27.000 --> 01:11:36.000
this term is zero due to the strong exogeneity of the x matrix but as I said mostly the x matrix is
01:11:36.000 --> 01:11:42.000
not strongly exogenous at least for many many data sets it will not be true that the x matrix is
01:11:42.000 --> 01:11:51.000
strictly exogenous to u rather it will be the case that some arrows u may have an influence
01:11:51.000 --> 01:11:59.000
on some of the regressors x and then there is no strict exogeneity anymore. So if this is true then
01:11:59.000 --> 01:12:08.000
we see that beta hat won't be unbiased anymore but rather we have this term here as the bias or
01:12:08.000 --> 01:12:17.000
actually the expectation of this term would be the the bias which we have to account for
01:12:17.000 --> 01:12:25.000
and all we can do then is that we say well perhaps when the number of observations increases
01:12:26.000 --> 01:12:31.000
perhaps this bias here becomes smaller and smaller and this is what we aim for now that
01:12:31.000 --> 01:12:40.000
we reduce the size of this bias here by large sub properties by hoping that if we have enough
01:12:40.000 --> 01:12:46.000
data that this bias here will become less and less important and in order to see how this
01:12:46.000 --> 01:12:51.000
works we will rewrite this expression here now in a different way in particular
01:12:51.000 --> 01:13:01.000
where we rewrite these matrix products x prime x and x prime u in this form here. I start with
01:13:01.000 --> 01:13:08.000
a second term here because it's perhaps easier to understand recall that earlier I had introduced
01:13:08.000 --> 01:13:20.000
the symbol xi as the explanatory variables for the i-th observation right we have n observations
01:13:20.000 --> 01:13:27.000
i is running from one to n let's say we observe n people or n regions or something like this
01:13:27.000 --> 01:13:37.000
so for each person or each region we have a number of explanatory variables and in xi we just collect
01:13:37.000 --> 01:13:49.000
the the explanatory variables for observation i now this x prime ui here is a matrix which is of
01:13:49.000 --> 01:14:02.000
type k by n right so in each column of the matrix x we have a k by one vector and this xi here is
01:14:02.000 --> 01:14:12.000
just one representative column of this x prime matrix here so x prime u actually is nothing else
01:14:12.000 --> 01:14:21.000
but all those xi vectors which make up matrix x prime here x prime u is nothing else but all
01:14:21.000 --> 01:14:30.000
those xi vectors be multiplied by the respective scalar ui by this respective error term which
01:14:30.000 --> 01:14:36.000
of course we do not observe but we can write down the expressions so x prime u can be written
01:14:36.000 --> 01:14:45.000
as the sum over all observations of those column vectors xi which contain the regresses for the
01:14:46.000 --> 01:14:54.000
observation times the shock for the observation and the same thing i can do here with x prime x
01:14:54.000 --> 01:15:01.000
just that then the sum here is xi prime times xi prime but same argument and i have to take
01:15:01.000 --> 01:15:08.000
the inverse of that because as you see this x prime x inverse here now in the next line i
01:15:08.000 --> 01:15:15.000
do something very trivial actually because i multiply both things by one over n so both
01:15:15.000 --> 01:15:19.000
sums here are being multiplied by one over n and you may ask well why can't i just
01:15:20.000 --> 01:15:27.000
divide by one over n in both terms and still claim that this is equal to the previous line
01:15:27.000 --> 01:15:34.000
well for the simple reason that here we have the inverse right this is the inverse of one over n
01:15:34.000 --> 01:15:42.000
the inverse of one over n is n and here i divide by n so n divided by n is equal to one so those
01:15:42.000 --> 01:15:50.000
two terms one over n here and one over n inversely here they cancel right so nothing has actually
01:15:50.000 --> 01:15:57.000
changed now this equality is correct why do i do this because i want to emphasize that i take an
01:15:57.000 --> 01:16:05.000
average here and i take an average here right what i do here is that i sum over all the observations
01:16:05.000 --> 01:16:15.000
from one to n of these terms xi xi prime or here i sum over all the xis times ui but always over n
01:16:15.000 --> 01:16:22.000
observations which i have and then both cases i divide by n so i get just the average across
01:16:22.000 --> 01:16:31.000
all individuals or across all regions for these products xi ui or xi xi prime and then the question
01:16:31.000 --> 01:16:38.000
is that if i do this is if i compute these averages if these averages converge to anything
01:16:38.000 --> 01:16:44.000
as the number of observations becomes greater and greater right so the questions which we have is
01:16:44.000 --> 01:16:54.000
where does this all converge to right when when i have this matrix here and this vector here
01:16:54.000 --> 01:17:03.000
now what would we hope for what we would hope for is that actually this converges to zero
01:17:03.000 --> 01:17:09.000
because if this converges to zero then we would know that at least for great uh numbers of
01:17:09.000 --> 01:17:16.000
observations beta hat would converge to the true parameter beta right and this is what we will later
01:17:16.000 --> 01:17:24.000
call consistency sorry for that um this is what we call it later call it a consistency so the
01:17:24.000 --> 01:17:30.000
question is do we have reasons to believe or can we make reasonable assumptions which would imply
01:17:30.000 --> 01:17:38.000
that what we add here what this bias is that this is equal to zero now let's look at the terms
01:17:38.000 --> 01:17:48.000
which we have in here this here is xi multiplied by xi prime right so these are just regressors
01:17:48.000 --> 01:17:55.000
multiplied by themselves but taking transposes of them we know nothing about those regressors
01:17:55.000 --> 01:18:03.000
but we can assume that these are regressors which are different from zero right so multiplying
01:18:03.000 --> 01:18:10.000
those regressors by the same vector transpose post will give us at least a number of squared
01:18:10.000 --> 01:18:19.000
terms so squares are typically positive um when by those numbers here are different from zero
01:18:19.000 --> 01:18:26.000
the squares will all be uh positive so actually we have no hope that this term here will converge
01:18:26.000 --> 01:18:33.000
to zero right even if we take the average value it may converge to something there may be a well
01:18:33.000 --> 01:18:38.000
defined average for for all the observations here but it will certainly not converge to zero
01:18:39.000 --> 01:18:45.000
how about this term here well their chances are better because i have here
01:18:46.000 --> 01:18:51.000
regressors xi for all the persons and i multiply them by
01:18:51.000 --> 01:19:01.000
shocks by u's which are centered around zero now though those xi's may sometimes be greater
01:19:01.000 --> 01:19:06.000
or may sometimes be smaller but the ui's would sometimes be positive and they will sometimes
01:19:06.000 --> 01:19:16.000
be negative and on average uh they will be zero so when i computed great sum of these xi's all
01:19:16.000 --> 01:19:23.000
weighted by shocks which are sometimes positive sometimes negative and on average they are zero
01:19:23.000 --> 01:19:29.000
it's actually quite reasonable to expect that this average here will converge to zero
01:19:30.000 --> 01:19:34.000
and this suffices so if this thing here converges to zero
01:19:36.000 --> 01:19:43.000
then zero times something is zero unless the something which i have in here converges to
01:19:43.000 --> 01:19:49.000
infinity which is of course also a possibility but let's suppose that since this is just an
01:19:49.000 --> 01:19:54.000
average over all regressors that this is a well-defined average which does not explode
01:19:54.000 --> 01:20:04.000
but rather converges to something finite this is uh what uh we we aim for and for this uh to
01:20:04.000 --> 01:20:11.000
handle this properly we have to use some uh properties for convergence of random variables
01:20:12.000 --> 01:20:17.000
the first definition we have already seen i just repeated here for your convenience it
01:20:17.000 --> 01:20:23.000
defines the plim convergence which i have already introduced right so recall a series of random
01:20:23.000 --> 01:20:32.000
variables xn converges in probability to a constant x if it is true that for every number epsilon greater
01:20:32.000 --> 01:20:39.000
zero and we think of this epsilon as being something very small the probability for uh the
01:20:39.000 --> 01:20:50.000
event that xn deviates from x by more than epsilon converges to zero right so this is the plim
01:20:51.000 --> 01:20:56.000
convergence which we already know shorthand notation for this is that plim xn is equal to
01:20:56.000 --> 01:21:00.000
x right and the plim is the limits in probability so limiting probability
01:21:03.000 --> 01:21:08.000
well this we have already seen and now here's the second definition the random variable xn
01:21:08.000 --> 01:21:16.000
is a consistent estimator for some constant x if and only if the plim of xn is equal to x
01:21:17.000 --> 01:21:23.000
right so consistency is a property which is defined in terms of plim uh convergence and
01:21:23.000 --> 01:21:30.000
this is what we aim for when we look at this thing here we want to have the property that
01:21:31.000 --> 01:21:36.000
with increasing number of observations with the numbers of observations actually approaching
01:21:36.000 --> 01:21:45.000
infinity the probability that this thing here is different from zero or is this greater than some
01:21:45.000 --> 01:21:52.000
very small epsilon that this probability is zero or converges to zero and we want to have the the
01:21:52.000 --> 01:21:59.000
fact that this thing here has a plim which is finite right these are the two things we we aim for
01:21:59.000 --> 01:22:10.000
now obviously if we have xn defined as one over n times xi xi prime this is a series of random
01:22:10.000 --> 01:22:19.000
variables for different values of n so this xn here i may evaluate for n equal to one
01:22:19.000 --> 01:22:27.000
then it's just x one times x one prime and then we may evaluate it for x equal to two that would
01:22:27.000 --> 01:22:38.000
be one over two so 0.5 times the sum of x1 x1 prime and x2 x2 prime so this would be the second
01:22:38.000 --> 01:22:43.000
value in this series of random variables here and i can do this for any value of n
01:22:43.000 --> 01:22:51.000
so xn is actually a series of random variables that is defined that way where i let n grow over
01:22:51.000 --> 01:22:58.000
time i just let n grow and then the question is whether this xn converges in probability to
01:22:58.000 --> 01:23:08.000
some fixed limit x right there's good reason to hope for that because we have already encountered
01:23:08.000 --> 01:23:15.000
the law of large numbers when we're talking about uh probability we know the sample average of
01:23:16.000 --> 01:23:24.000
an independently and identically distributed random variable z converges in probability to the
01:23:24.000 --> 01:23:31.000
expectation of z when the number of observations n approaches infinity so this law of large numbers
01:23:31.000 --> 01:23:40.000
we have already had and we know that plim of zn bar of the sample average is equal to the expectation
01:23:40.000 --> 01:23:45.000
of z so this could be helpful right all right at least it smells like it is helpful
01:23:47.000 --> 01:23:54.000
can we apply the law of large numbers which i have uh re-stated here uh for your convenience
01:23:55.000 --> 01:23:58.000
to this series xn
01:24:02.000 --> 01:24:04.000
if anybody knows the answer please raise your hand
01:24:04.000 --> 01:24:13.000
and see this is also an average right one over n times uh well the sum over xi xi prime
01:24:20.000 --> 01:24:27.000
can we apply this law of large numbers here to this expression here
01:24:28.000 --> 01:24:41.000
well this may be too difficult but unfortunately we cannot at least not directly because here in
01:24:41.000 --> 01:24:49.000
the law of large numbers the zi's are assumed to be independently and identically distributed
01:24:49.000 --> 01:24:57.000
random variables but we have no such assumption that the xi's here are independently and identically
01:24:57.000 --> 01:25:04.000
distributed random variables so this assumption we could make but it is not an innocuous assumption
01:25:04.000 --> 01:25:13.000
it may easily be violated in empirical data sets so actually what we need and what i cannot
01:25:14.000 --> 01:25:21.000
introduce exactly in this lecture but i will inform you that such results exist what we need
01:25:21.000 --> 01:25:28.000
are weaker forms of the law of large number which do not assume that the zi's
01:25:29.000 --> 01:25:36.000
this law of large numbers are independent and identically distributed and there are actually
01:25:36.000 --> 01:25:43.000
in the statistics literature such weaker forms for the law of large numbers this is the form
01:25:43.000 --> 01:25:50.000
which is usually used when the law of large number is introduced in introductory statistics
01:25:50.000 --> 01:25:57.000
or econometrics lectures but on a more advanced level there are weaker forms of the law of large
01:25:57.000 --> 01:26:06.000
numbers and it turns out that it is not necessary to have iid observations zi for this average here
01:26:06.000 --> 01:26:16.000
but if there is some dependence between zi and zj this may still allow for the law of large numbers
01:26:16.000 --> 01:26:23.000
to hold provided that this dependence is not too big basically provided that the dependence
01:26:23.000 --> 01:26:32.000
fades out when the zi is distant from this z from the zj right so there are technical conditions
01:26:32.000 --> 01:26:40.000
one can formulate to get weak dependence and essentially they imply that the correlation
01:26:40.000 --> 01:26:51.000
between zi and zi minus n converges to zero when well zi and zi minus n are quite far apart when
01:26:51.000 --> 01:27:00.000
when this n here goes to infinity something like that okay the conditions are rather technical so
01:27:00.000 --> 01:27:11.000
i don't state them here and we admit the details but rather immediately state the weak law of large
01:27:11.000 --> 01:27:17.000
numbers which i call wlln under the proper condition of weak dependence which i don't spin
01:27:17.000 --> 01:27:26.000
out here the sample average zn bar of a series of random variables zi with the same expected value
01:27:26.000 --> 01:27:33.000
so not anymore requiring the same and independent distribution but just the same expected value
01:27:34.000 --> 01:27:41.000
converges in probability to exactly this expected value which i define as mu z here so it would have
01:27:41.000 --> 01:27:48.000
as the law of large numbers that the plim of zn bar is equal to mu z under these conditions of
01:27:48.000 --> 01:27:56.000
weak dependence now how can we ensure that these conditions of weak dependence actually
01:27:56.000 --> 01:28:04.000
hold one way is to look at our set of assumptions and there are now three different assumptions
01:28:04.000 --> 01:28:10.000
which i introduce you to which all have the same effect that these conditions of weak dependence
01:28:10.000 --> 01:28:19.000
hold one assumption is assumption 6.1 which we will sometimes use this says something about
01:28:19.000 --> 01:28:27.000
the regressor being predetermined predetermined is much weaker than the strict exogeneity of the
01:28:27.000 --> 01:28:34.000
x matrix which we have assumed an assumption in two predeterminedness says that for each
01:28:34.000 --> 01:28:43.000
observation i we have the property that e of ui given xi is the same thing as e of ui as the
01:28:43.000 --> 01:28:52.000
expectation of ui so only for each single observation is it necessary that the conditional
01:28:52.000 --> 01:28:59.000
expectation e of ui given xi equals the unconditional expectation e of ui which is typically zero
01:29:00.000 --> 01:29:07.000
if this assumption here holds then the regressors are already predetermined and that is already
01:29:07.000 --> 01:29:14.000
helpful assumption this assumption is particularly relevant when you work with time series data
01:29:15.000 --> 01:29:22.000
because in time series data it is often the case that you have certain regresses which well are
01:29:22.000 --> 01:29:33.000
recorded in time so that you can replace the idea by t and say well x is x t is the value of x i
01:29:33.000 --> 01:29:41.000
observe in period t and then you want that the shock which hits the economy in period t given
01:29:42.000 --> 01:29:54.000
x t has the same expectation as the unconditioned shock basically saying that the regressor x t that
01:29:54.000 --> 01:30:03.000
does not respond to the shock ut in this period but the regressor x t may actually have responded
01:30:03.000 --> 01:30:09.000
to shocks in previous periods this is quite common and the analysis of time series data if any is
01:30:09.000 --> 01:30:18.000
shock let's say corona shock hits our economy in 2020 right it may be that certain variables like
01:30:18.000 --> 01:30:27.000
say capital stock are not yet affected in 2020 by the corona shock but in 2021 or 2022 or 2023
01:30:27.000 --> 01:30:34.000
they will also be affected by this shock however this would not do any harm here in this case we
01:30:34.000 --> 01:30:43.000
would still have the predetermined condition if just the u i is independent of the x i or the x i
01:30:43.000 --> 01:30:48.000
is independent not independent but with the marginal expectation of the u i given x i is
01:30:48.000 --> 01:30:55.000
equal to the unconditional expectation of the u i this would suffice and the other two
01:30:55.000 --> 01:31:08.000
um uh assumptions are similar assumption a 6.2 would say that the covariance between x i and u i
01:31:08.000 --> 01:31:13.000
is zero that's a slightly different assumption from the assumption of predeterminedness
01:31:13.000 --> 01:31:18.000
we call regressor an error term then contemporaneously uncorrelated
01:31:18.000 --> 01:31:27.000
and a third assumption which is stronger and implies assumption 6a and 6.2 would say that for
01:31:27.000 --> 01:31:35.000
each observation i x i and u i are just independent well you know that independence implies zero
01:31:35.000 --> 01:31:41.000
correlation so very clearly 6.3 implies 6.2 and independence also implies that the marginal
01:31:41.000 --> 01:31:47.000
expectation is to the unconditional expectation so very clearly 6.3 is the strongest of the three
01:31:47.000 --> 01:31:51.000
assumptions we look actually for the weakest of the three assumptions which still guarantees
01:31:51.000 --> 01:32:00.000
us consistency and we will do this uh next week where we then continue the lecture with the
01:32:00.000 --> 01:32:06.000
assumptions a7 and a8 which basically state that the plim of these two matrices i've been talking
01:32:06.000 --> 01:32:13.000
about a couple of minutes ago so the average over the x i x i primes and the average over the x i
01:32:13.000 --> 01:32:20.000
u i primes is such that the average of the x i x i primes converges just to some finite
01:32:20.000 --> 01:32:26.000
invertible matrix or doesn't play a role what matrix that is as long as it is invertible
01:32:27.000 --> 01:32:32.000
and finite so it doesn't look it's not allowed that it explodes and converges to infinity
01:32:33.000 --> 01:32:41.000
and that the sum of the x i u i converges to zero so that would then already help
01:32:41.000 --> 01:32:50.000
and allows us to establish the property of consistency sorry for having taken a little
01:32:51.000 --> 01:32:55.000
longer but i found it better to take some time to explain these rather difficult issues a little
01:32:55.000 --> 01:33:03.000
more in detail i hope that you've benefited from that are there any remaining questions and please
01:33:03.000 --> 01:33:17.000
raise your hand that is not the case then uh goodbye for today and until monday