WEBVTT - autoGenerated
00:00:00.000 --> 00:00:09.000
We will talk today about hypothesis testing, or at least this is with which we do at the
00:00:09.000 --> 00:00:16.000
start of this lecture, I move through hypothesis testing rather quickly because we have done
00:00:16.000 --> 00:00:21.000
this in principle already in slide two of this lecture.
00:00:21.000 --> 00:00:30.000
I just would like to remind you that we have our estimation, our regression, which we
00:00:30.000 --> 00:00:38.000
talk about, here is a variant of it, the regression of the log of private consumption on the log
00:00:38.000 --> 00:00:44.000
of income with cross-section data from various countries, and that I am in the process of
00:00:44.000 --> 00:00:52.000
explaining to you the meaning of the regression output, which we get from regression packages
00:00:52.000 --> 00:01:02.000
like eViews, which I take here as an example, other econometrics programs give similar output,
00:01:02.000 --> 00:01:06.000
and I want you to understand what this means.
00:01:06.000 --> 00:01:11.000
We have covered already a number of the statistics that you see, for instance, on the slides
00:01:11.000 --> 00:01:16.000
here, like for instance, the coefficient estimates and the standard errors, you should
00:01:16.000 --> 00:01:19.000
now know how they are computed, where they come from.
00:01:19.000 --> 00:01:25.000
We've talked last lecture about the R-squared and the adjusted R-squared, you know what
00:01:25.000 --> 00:01:31.000
the standard error of the regression is, and you know, of course, what the sum of squared
00:01:31.000 --> 00:01:36.000
residuals is, or these statistics like the mean of the dependent variable and standard
00:01:36.000 --> 00:01:39.000
deviation of the dependent variable.
00:01:39.000 --> 00:01:45.000
We will now move to basically this block here, t-statistics and probabilities, when we talk
00:01:45.000 --> 00:01:56.000
about hypothesis testing, which is section 1.4, I think it should be 3.4, but never mind,
00:01:56.000 --> 00:02:02.000
and so the fourth subsection in slides 3.
00:02:03.000 --> 00:02:10.000
Now, as you will recall, we have so far not made any distributional assumption, rather
00:02:10.000 --> 00:02:17.000
we proved the blue property of the least squares estimator, that is to say, we proved that
00:02:17.000 --> 00:02:24.000
the least squares estimator is the best in the sense of variance minimal estimator in
00:02:24.000 --> 00:02:29.000
the class of linear and unbiased estimators of our parameter beta.
00:02:29.000 --> 00:02:35.000
On this property, the blue property is valid for any distribution of the error, so we do
00:02:35.000 --> 00:02:42.000
not need to assume a particular assumption, a particular distribution, like for instance,
00:02:42.000 --> 00:02:45.000
that the errors are normally distributed.
00:02:45.000 --> 00:02:52.000
It is a property which is valid regardless of what distributions the errors follow, and
00:02:52.000 --> 00:02:56.000
therefore it's a very strong property, actually, of the least squares estimator.
00:02:57.000 --> 00:03:02.000
Now, in terms of hypothesis testing, we need more than that.
00:03:02.000 --> 00:03:07.000
We need to make an assumption about the distribution of the errors, and this is what we do in
00:03:07.000 --> 00:03:14.000
assumption A5, which you see in the green box here, and now assume that all the errors
00:03:14.000 --> 00:03:22.000
ui are independent of each other, and that they are normally distributed.
00:03:22.000 --> 00:03:27.000
That is to say, each single ui follows a normal distribution with an expected value
00:03:27.000 --> 00:03:34.000
of zero and a variance of sigma square u, so we have the same and constant variance
00:03:34.000 --> 00:03:36.000
for all the ui's.
00:03:36.000 --> 00:03:42.000
Obviously, these two properties here reflect assumptions we've made already earlier.
00:03:42.000 --> 00:03:47.000
We've made already the assumption that the errors have a mean of zero, this was assumption
00:03:47.000 --> 00:03:54.000
A1, which is thereby implicitly contained in assumption A5 by saying that the normal
00:03:54.000 --> 00:04:01.000
distribution of the ui's has an expected value of zero, and we have assumed that the errors
00:04:01.000 --> 00:04:04.000
are homoscedastic.
00:04:04.000 --> 00:04:10.000
That is to say that they all have the same variance sigma square u, so the variance does
00:04:10.000 --> 00:04:17.000
not vary with index i, but rather all ui's have the same variance.
00:04:17.000 --> 00:04:26.000
This was assumption A3, which we had there, and we also had assumption A4, which said
00:04:26.000 --> 00:04:32.000
that the ui's are all uncorrelated with each other, here it is even a little stronger by
00:04:32.000 --> 00:04:36.000
saying that they should all be independent of each other.
00:04:36.000 --> 00:04:45.000
In some sense, A5 summarizes assumptions A1, A3, and A4, and adds a new property, which
00:04:45.000 --> 00:04:50.000
is that the distribution of the errors is actually normal.
00:04:50.000 --> 00:04:57.000
I write the normality assumption here for a single observation ui, so this is a scalar,
00:04:57.000 --> 00:05:04.000
this scalar ui follows a scalar normal distribution, and I say that this shall be the case for
00:05:04.000 --> 00:05:05.000
all i.
00:05:05.000 --> 00:05:12.000
Here, I write it alternatively as an assumption on the whole vector of the u's, so that's
00:05:12.000 --> 00:05:20.000
a n by 1 vector u here, and this n by 1 vector u follows a vector normal distribution with
00:05:20.000 --> 00:05:27.000
an expected value of 0, so that's also an n by 1 vector of 0's here, and a covariance
00:05:27.000 --> 00:05:33.000
matrix sigma square u, where sigma square u is a scalar, like which is here, so the
00:05:33.000 --> 00:05:40.000
same symbol, times the identity matrix of type n, so an n-dimensional identity matrix,
00:05:40.000 --> 00:05:46.000
which again summarizes the assumptions A3 and A4.
00:05:46.000 --> 00:05:52.000
Now we've talked about the normal distribution already, and I told you when I covered this
00:05:52.000 --> 00:05:56.000
in the lecture that one important property of the normal distribution is the fact that
00:05:56.000 --> 00:06:03.000
linear combinations of normally distributed variables are also normally distributed, so
00:06:03.000 --> 00:06:11.000
we will make use of this assumption now by looking at the least squares estimator.
00:06:11.000 --> 00:06:20.000
As you recall, the estimator is a random variable, that's it from estimate, which would just
00:06:20.000 --> 00:06:28.000
be number, so this is a random variable here, beta hat, which of course also has a distribution.
00:06:28.000 --> 00:06:37.000
We can now conclude from the property that the errors are normally distributed, that
00:06:37.000 --> 00:06:47.000
the least squares estimator is also normally distributed, because we know beta hat is x
00:06:47.000 --> 00:06:55.000
prime x inverse e x prime times y, this term here in parentheses is y, this is equal to
00:06:55.000 --> 00:07:03.000
x beta plus u, so in the way we've done this already many times now, multiplying out this
00:07:03.000 --> 00:07:10.000
expression here, we know beta hat is equal to beta, because x prime x inverse e times
00:07:10.000 --> 00:07:15.000
x prime x is the identity matrix, so this gives us just a beta here, the true value
00:07:15.000 --> 00:07:22.000
of the parameter, the unknown true value beta, which we want to estimate, plus x prime x
00:07:22.000 --> 00:07:32.000
inverse e times x prime u, so this is sort of the random component of the estimator beta
00:07:32.000 --> 00:07:40.000
hat, and you see now that this is a term which is linear in u, think of the axis as being
00:07:40.000 --> 00:07:48.000
exogenous, possibly non-random matrices, then we can take the x prime x inverse e x prime
00:07:48.000 --> 00:07:56.000
expression here, just as some type of constant by which the u is being multiplied. The u
00:07:56.000 --> 00:08:04.000
appears in this expression just in linear form, so there is nowhere u squared or ui squared in
00:08:04.000 --> 00:08:11.000
this expression, so this is just linear expression which we have here, and since we now know by
00:08:11.000 --> 00:08:20.000
assumption a5 that u is normally distributed, it follows that beta hat is also normally distributed
00:08:20.000 --> 00:08:29.000
with expected value of beta, so beta hat as we know is unbiased, and with an error term here x
00:08:29.000 --> 00:08:37.000
prime x inverse e x prime u, whose covariance matrix is sigma square u times x prime x inverse e,
00:08:37.000 --> 00:08:46.000
as we have computed this in the last lecture, so if you don't recall, please try to reproduce
00:08:46.000 --> 00:08:54.000
the result. The covariance matrix of this term here is sigma square u times x prime x inverse e,
00:08:54.000 --> 00:09:02.000
so summarizing what I've just said, we know that the least squares estimator beta hat follows a
00:09:02.000 --> 00:09:09.000
normal distribution with expected value of beta, so of the true parameter, and covariance matrix
00:09:09.000 --> 00:09:18.000
sigma square u times x prime x inverse e. Now let us denote the typical element of x prime x
00:09:18.000 --> 00:09:27.000
inverse e by cij, so obviously this is some complicated expression here, which is rather
00:09:27.000 --> 00:09:35.000
difficult to write down in terms of the x observations, as of the observations we have
00:09:35.000 --> 00:09:41.000
in our regressor matrix, because we have to take the square here and then take the inverse of that,
00:09:41.000 --> 00:09:48.000
so we just denote the typical element of x prime x inverse e by cij, where i is the row index and j
00:09:48.000 --> 00:09:58.000
is the column index. Then this notation here implies that each single coefficient in the
00:09:58.000 --> 00:10:05.000
estimator, or each single component I should say of the estimator, let's say the kth component of
00:10:06.000 --> 00:10:14.000
this vector beta hat here, would be beta hat k, and that this is also normally distributed
00:10:14.000 --> 00:10:23.000
with expectation of beta k, the true value of the parameter vector, and then a variance
00:10:23.000 --> 00:10:34.000
of sigma square u times ckk, so that is the kth element on the main diagonal of the x prime x
00:10:34.000 --> 00:10:44.000
inverse e matrix. So this here would be the variance of beta hat k, this is the true
00:10:44.000 --> 00:10:53.000
variance of beta hat k, because here we still use sigma square u as if we knew the true value of
00:10:53.000 --> 00:10:59.000
sigma square u. Obviously we usually do not know what the true value of sigma square u is, and then
00:10:59.000 --> 00:11:05.000
we have to estimate what sigma square u is, we talked about the estimator in the last lecture,
00:11:06.000 --> 00:11:12.000
but the true distribution of course relates to the true value of sigma square u. So while we
00:11:12.000 --> 00:11:19.000
write a hat here to denote the estimator, this here is just a real number, it's the true but
00:11:19.000 --> 00:11:27.000
unfortunately unknown value of sigma square u, as this is also the true but unfortunately unknown
00:11:27.000 --> 00:11:34.000
value of beta k, these characterize the distribution and when we want to do hypothesis
00:11:34.000 --> 00:11:40.000
testing we will have to replace this value here by its estimate and this value here also by its
00:11:40.000 --> 00:11:46.000
estimate depending on the question we want to analyze in our hypothesis testing.
00:11:49.000 --> 00:11:55.000
Now from the fact that we now know what the distribution of the estimator is, we know that
00:11:55.000 --> 00:12:07.000
the coefficient estimator beta hat k is included in a interval which may be defined as the true
00:12:07.000 --> 00:12:14.000
coefficient unknown for us, true coefficient minus 1.96 sigma u times the square root of ckk,
00:12:15.000 --> 00:12:23.000
so that's the standard deviation of course of beta hat k and the same thing with plus 1.96
00:12:25.000 --> 00:12:33.000
times the standard deviation of beta k hat. Okay so this is like almost already like confidence
00:12:33.000 --> 00:12:39.000
interval, it is not yet the confidence interval but we'll derive it from this expression here
00:12:39.000 --> 00:12:49.000
knowing that the estimator is contained in this interval here with the probability of 95 percent.
00:12:49.000 --> 00:12:57.000
So equivalently we may write that beta hat k minus beta k minus the true value is contained
00:12:57.000 --> 00:13:05.000
in an interval which is plus minus 1.96 times the true standard deviation of the beta hat k
00:13:06.000 --> 00:13:17.000
or writing this now in standardized magnitudes we can define a magnitude zk as beta hat k minus
00:13:17.000 --> 00:13:25.000
beta k divided by the standard deviation sigma u times square root of ckk which would be contained
00:13:25.000 --> 00:13:37.000
in the interval minus 1.96 and plus 1.96 since we know that this magnitude here zk follows a
00:13:37.000 --> 00:13:43.000
standard normal distribution so zk is distributed as n of zero one right the expected value of
00:13:44.000 --> 00:13:52.000
this term here is zero because the expected value of the numerator is already zero and we have
00:13:52.000 --> 00:13:57.000
standardized by dividing through by the true standard deviation so variance and standard
00:13:57.000 --> 00:14:06.000
deviation are one for the standardized variable zk. Problem number one is that we do not know what
00:14:06.000 --> 00:14:15.000
the true value of parameter beta k is so we cannot actually compute that k since here is
00:14:15.000 --> 00:14:22.000
beta k contained in zk so since we don't know this value here zk is actually unknown to us
00:14:24.000 --> 00:14:31.000
however what we can do is that we can formulate hypotheses about the true value of beta k so we
00:14:31.000 --> 00:14:42.000
can say let us assume that beta k is some beta k zero so that would be the null hypothesis h naught
00:14:43.000 --> 00:14:53.000
beta k is equal to beta k zero and assuming that this null hypothesis is true we can then compute
00:14:54.000 --> 00:15:01.000
zk in this way using beta k zero up here in the numerator of zk.
00:15:05.000 --> 00:15:12.000
So when we want to test this hypothesis here we would say our test rejects the null hypothesis
00:15:13.000 --> 00:15:21.000
if the estimate beta hat k is not too far off the hypothesized value beta k zero
00:15:21.000 --> 00:15:33.000
which means that at a 5% significance level we should see that zk zero is between minus
00:15:33.000 --> 00:15:44.000
minus 1.96 and plus 1.96 if zk zero is defined as beta hat k minus beta k zero minus the hypothesized
00:15:44.000 --> 00:15:51.000
value of the true parameter so minus the value which we assume to be
00:15:51.000 --> 00:15:57.000
and test to be the true value of the parameter vector.
00:16:00.000 --> 00:16:09.000
Problem number two as I already said is that we do not observe zk or we're still unable to compute
00:16:09.000 --> 00:16:17.000
either zk zero or zk in this form here because we do not know what c u is we do know the square root
00:16:17.000 --> 00:16:23.000
of the ck case because this depends on the observations x only and we have the observations
00:16:23.000 --> 00:16:30.000
x but we do not know what sigma u is sigma square u. So second problem is that in general we ignore
00:16:30.000 --> 00:16:36.000
sigma u and then of course we will replace sigma u by its estimate and the estimate of sigma u as
00:16:36.000 --> 00:16:43.000
I have laid out last lecture is or the best estimate is actually u hat prime u hat divided
00:16:43.000 --> 00:16:48.000
by n minus k and now of course we take the square root here because we do not want to estimate the
00:16:48.000 --> 00:16:54.000
variance and we want to estimate in this case standard deviation. So this here would be the
00:16:54.000 --> 00:17:01.000
appropriate estimate to use because it's the unbiased estimate on this thing here the unbiased
00:17:01.000 --> 00:17:10.000
estimate of the variance. Now as we have also already discussed in the section on hypothesis
00:17:11.000 --> 00:17:20.000
testing in slides the two of this lecture the standardized ratio beta hat k minus beta k divided
00:17:20.000 --> 00:17:29.000
by sigma hat note the hat here sigma hat u times square root of ckk is not normally distributed
00:17:29.000 --> 00:17:37.000
anymore because now we have left the world of linear transformations of u beta hat k was still
00:17:37.000 --> 00:17:45.000
just a linear transformation of u so we know that beta hat k the estimator of beta k is normally
00:17:45.000 --> 00:17:54.000
distributed but now we divide through by a parameter or by an estimator which is not a linear
00:17:54.000 --> 00:18:01.000
transformation of u anymore you see there is the sum of squared u's involved here and then we take
00:18:01.000 --> 00:18:08.000
the square root of the sum of squared u's so this does not cancel the squaring since not each single
00:18:08.000 --> 00:18:16.000
observation is taken back to its square root but it's sum of squared values of which we take the
00:18:16.000 --> 00:18:21.000
square root here so that's a highly non-linear transformation of the u's which we do here
00:18:21.000 --> 00:18:27.000
and dividing one linear transformation of the u's by some non-linear transformation of the u's gives
00:18:27.000 --> 00:18:33.000
us something which is not a linear transformation anymore so this standardized value is not normally
00:18:33.000 --> 00:18:42.000
distributed anymore tk is not normally distributed while zk was indeed normally distributed
00:18:43.000 --> 00:18:51.000
while tk is actually just an estimate of the unknown zk the distributional assumptions
00:18:51.000 --> 00:18:58.000
change and we move of course to a t distribution here as we have also already discussed in a
00:18:58.000 --> 00:19:07.000
previous lecture so tk follows the standard t distribution with n minus k degrees of freedom
00:19:07.000 --> 00:19:13.000
so it's possible to show this i won't show it to you i won't prove it to you you just have to
00:19:13.000 --> 00:19:21.000
leave it but it is as we have already discussed this in the previous lecture on hypothesis
00:19:24.000 --> 00:19:30.000
now when we hypothesize a particular value of beta k the standard hypothesis the test
00:19:30.000 --> 00:19:37.000
is of course that the true parameter is zero so that a regressor is not important regression
00:19:37.000 --> 00:19:44.000
and the true value of the parameter is zero then this means this particular kth regressor
00:19:44.000 --> 00:19:51.000
we should rather not include in our regression because it has no explanatory content so the
00:19:51.000 --> 00:20:00.000
standard test which we do in regression output is to test the null hypothesis that beta k is equal
00:20:00.000 --> 00:20:08.000
to zero so this would be this null hypothesis here h zero is that beta k is equal to zero
00:20:08.000 --> 00:20:15.000
and therefore the test statistic now written as the t statistic which we can observe which we
00:20:15.000 --> 00:20:23.000
can compute tk would be just the coefficient estimate divided by its standard error or better
00:20:23.000 --> 00:20:29.000
by its estimated standard error because we use the estimate of the standard deviation of the use here
00:20:30.000 --> 00:20:40.000
now this t statistic has of course a certain p value which is just the probability for the null
00:20:40.000 --> 00:20:46.000
hypothesis to be true and we know where the p value comes from namely from the student t
00:20:46.000 --> 00:20:54.000
distribution most regress regression packages that i know of give the regression output in such a
00:20:54.000 --> 00:21:02.000
form that they report both the t statistic and the p value for each estimated coefficient
00:21:02.000 --> 00:21:09.000
which is in some way of course redundant because we know the t statistic then the p value follows
00:21:09.000 --> 00:21:17.000
immediately so if we know that the t statistic is greater than 1.96 we know that the statistic
00:21:17.000 --> 00:21:23.000
is significant at the five percent level and if it is smaller than that then it is insignificant
00:21:23.000 --> 00:21:29.000
at the five percent level so in principle the p values are not really necessary but often people
00:21:29.000 --> 00:21:36.000
are interested in the p value on its own right in order to see whether it is well perhaps just eight
00:21:36.000 --> 00:21:43.000
percent and then you're saying it's not really significant but the likelihood that the null
00:21:43.000 --> 00:21:52.000
hypothesis is true is rather small just eight percent so perhaps i do let the i do leave the
00:21:52.000 --> 00:22:00.000
regressor in the regression sometimes people do do say something like that so they are sometimes
00:22:00.000 --> 00:22:08.000
interested in the p value for judgmental issues however as a matter of experience usually does
00:22:08.000 --> 00:22:17.000
not make a great difference if you have a regressor regressor in a regression with say a p value of
00:22:17.000 --> 00:22:25.000
eight percent so somewhat above five percent or if you don't have it in the regression usually
00:22:25.000 --> 00:22:30.000
most other coefficient estimates do not change by much and the significance levels do not change
00:22:30.000 --> 00:22:36.000
by much and the r square doesn't change by much if you just pick out such a weak regressor from
00:22:37.000 --> 00:22:44.000
regression even at the five percent level the explanatory content is usually rather low so
00:22:45.000 --> 00:22:50.000
changing the specification and eliminating a regressor which is just barely significant at
00:22:50.000 --> 00:22:57.000
the five percent level does also in most cases not make a big difference with respect to the
00:22:57.000 --> 00:23:05.000
explanatory power of the regression as you know the rule of thumb is that a regressor is significant
00:23:05.000 --> 00:23:11.000
if its p statistic is greater than two if you don't want to bother about 1.96 then let's use
00:23:11.000 --> 00:23:19.000
two as a rule of thumb now this explains this block of regression output here you see the t
00:23:19.000 --> 00:23:26.000
statistics here and you now know how they have been computed namely that's the coefficient estimate
00:23:26.000 --> 00:23:33.000
so this value here that's the beta hat k or for the second regressor this is the beta hat k
00:23:34.000 --> 00:23:39.000
divided by its standard error and the standard error is of course the square root of
00:23:40.000 --> 00:23:50.000
the variance of the coefficient estimate so it's the sigma hat u times the square root of the ckk
00:23:51.000 --> 00:23:58.000
if you divide this coefficient by the standard error here you arrive at the t statistic 4.28
00:23:59.000 --> 00:24:06.000
and if you divide this coefficient by standard error then you arrive at the t statistic of 43
00:24:06.000 --> 00:24:11.000
and you can use the t distribution which is tabulated in most statistical textbooks
00:24:12.000 --> 00:24:19.000
to find out that for these two t statistics here the probability that the true coefficient is
00:24:19.000 --> 00:24:27.000
actually zero this probability is zero so we can reject the null hypothesis in favor
00:24:27.000 --> 00:24:34.000
of a hypothesis which says these coefficients here are different from zero so now you are in
00:24:34.000 --> 00:24:39.000
a position to understand exactly how this output here is generated and how it is related to each
00:24:39.000 --> 00:24:45.000
other you know what the r squared and the adjusted r squared is you know what the standard error of
00:24:45.000 --> 00:24:50.000
the regression is and where it comes from how it is computed of course you know what the sum of
00:24:50.000 --> 00:24:56.000
squared residuals is namely the objective function which try to minimize and least squares estimation
00:24:58.000 --> 00:25:03.000
log likelihood i have not yet explained and i will not explain for some time
00:25:03.000 --> 00:25:08.000
how this is exactly computed well basically we set up the likelihood function and then
00:25:09.000 --> 00:25:16.000
insert the sum of squared errors we will still talk about the f statistic here and about the
00:25:16.000 --> 00:25:22.000
distribution the probability of the f statistic so that's actually the next step in explaining
00:25:22.000 --> 00:25:30.000
regression output here the same thing for our regression on stock prices which i have
00:25:30.000 --> 00:25:37.000
introduced last time so dependent variable is the index of stock prices standard and course
00:25:37.000 --> 00:25:46.000
index of stock prices and we have 210 observations here we have three regressors in this regression
00:25:46.000 --> 00:25:52.000
constant linear trend and total factor productivity we see the t statistics which are computed in
00:25:52.000 --> 00:25:58.000
exactly the same way as i just explained for the previous regression and we see the associated
00:25:58.000 --> 00:26:07.000
probabilities level probability levels which here are not all simply zero in particular
00:26:07.000 --> 00:26:15.000
the last regressor on total factor productivity which is the only economic regressor constant
00:26:15.000 --> 00:26:21.000
and t being more or less some kind of mechanical regressors included in this regression here
00:26:22.000 --> 00:26:30.000
which is not exactly significant as you see it is very slightly above five percent so the probability
00:26:30.000 --> 00:26:39.000
of the null hypothesis being true is five point four percent null hypothesis being true true
00:26:39.000 --> 00:26:44.000
meaning that this coefficient here is possibly just zero and total factor productivity doesn't
00:26:44.000 --> 00:26:53.000
add anything to the explanation of stock prices you also see it from the t statistic here which
00:26:53.000 --> 00:26:59.000
is just a little lower in absolute value than one point nine six that's one point nine four
00:26:59.000 --> 00:27:03.000
in this case so it is very marginally not significant at the five percent
00:27:06.000 --> 00:27:12.000
i alerted you to the fact that this regression is not very well specified and actually the
00:27:12.000 --> 00:27:19.000
residuals in this regression here the u's are certainly not independent further tests would
00:27:19.000 --> 00:27:23.000
reveal this those of you who know what the term what's the statistic is we'll cover this a little
00:27:23.000 --> 00:27:29.000
later i see this immediately from this value of the term what's the statistic which should actually
00:27:29.000 --> 00:27:35.000
be somewhere between one point six and two point four and here it is far off one point six zero
00:27:35.000 --> 00:27:41.000
point zero two so there's very high and strong autocorrelation in the residual as the germ
00:27:41.000 --> 00:27:49.000
watson statistic informs us so all the assumptions on normality are certainly violated in this
00:27:49.000 --> 00:27:55.000
regression here and that's another reason why we cannot trust these results here but taking
00:27:55.000 --> 00:28:01.000
at face value and assuming that assumption a5 does hold an assumption which is immediately
00:28:01.000 --> 00:28:07.000
falsified by the term what's the statistic as we later understand assuming nevertheless
00:28:07.000 --> 00:28:13.000
that assumption a5 is satisfied we would here say that tfp is just barely not significant
00:28:16.000 --> 00:28:22.000
now recall that in hypothesis testing there are typically two possible errors which we can
00:28:23.000 --> 00:28:31.000
encounter one is the type one error which is i repeat this here to reject the null hypothesis
00:28:31.000 --> 00:28:39.000
although the null hypothesis is true and this probability of a type one error is always equal
00:28:39.000 --> 00:28:46.000
to the significance level alpha because we reject null hypothesis when we have a p value lower than
00:28:46.000 --> 00:28:55.000
alpha for the null hypothesis so alpha exactly indicates the probability for rejection of the
00:28:55.000 --> 00:29:04.000
null in the case in which the null hypothesis is true the type two error is sort of the complement
00:29:04.000 --> 00:29:12.000
of type one errors that would be the failure to reject the null hypothesis although the null
00:29:12.000 --> 00:29:19.000
hypothesis is wrong which of course can also be the case now the type two errors are more
00:29:19.000 --> 00:29:27.000
more complicated than the type one errors because to compute the probability of a type two error
00:29:27.000 --> 00:29:34.000
would actually require knowledge of the true value of the parameter which we test so let's say under
00:29:34.000 --> 00:29:43.000
h zero under the null hypothesis we had assumed that the true value of the parameter is zero
00:29:44.000 --> 00:29:51.000
now suppose this null hypothesis is wrong true value of the parameter is not zero
00:29:52.000 --> 00:29:59.000
but the probability that we fail to reject the null hypothesis then of course depends on
00:30:00.000 --> 00:30:07.000
how close the true parameter is to zero so for instance when the true parameter is very close
00:30:07.000 --> 00:30:17.000
to zero but different from zero nevertheless then the probability of failure to reject
00:30:17.000 --> 00:30:23.000
h zero is of course very different from the case where the true parameter would be far off
00:30:24.000 --> 00:30:34.000
zero so where the null hypothesis is very clearly not true as far off reality so that would be a
00:30:34.000 --> 00:30:40.000
different case from the case where perhaps the true parameter is not exactly zero but very very close
00:30:40.000 --> 00:30:49.000
to zero in general it is true that the probability of a type two error is the greater the smaller
00:30:49.000 --> 00:30:57.000
the probability of a type one error is sorry that's a misprint here that should be type one error
00:30:57.000 --> 00:31:03.000
here or the other way around right the probability of a type one error is the greater the smaller
00:31:03.000 --> 00:31:09.000
the probability of a type two error is so the two of them are exactly complements you can either
00:31:09.000 --> 00:31:15.000
write the type one here and the type two here or the type one here and the type two here but
00:31:15.000 --> 00:31:21.000
but you cannot as i have done here erroneously excuse the mistake write type two errors in both
00:31:21.000 --> 00:31:28.000
cases that does of course not make sense and i would correct it okay moreover the probability
00:31:28.000 --> 00:31:36.000
of a type two error depends on the true parameter as i have already explained up here so as i said
00:31:36.000 --> 00:31:42.000
if the true parameter is far off the hypothesized parameter then the probability of a two of a type
00:31:42.000 --> 00:31:53.000
two error is rather small there are two terms which you may encounter in reading econometrics
00:31:53.000 --> 00:32:00.000
literature which you should know of sometimes people speak of the level of it the level of
00:32:00.000 --> 00:32:08.000
a test is the probability of a type one error and sort of the way you can memorize that is that we
00:32:08.000 --> 00:32:15.000
speak of the significance level right we know the significance level is the probability of a type one
00:32:15.000 --> 00:32:22.000
error so that's the level of a test and such a level of a test should ideally be rather small
00:32:23.000 --> 00:32:28.000
say we choose five percent or one percent or something like this so low percentage
00:32:30.000 --> 00:32:37.000
we cannot however just take a very very small probability level because
00:32:38.000 --> 00:32:45.000
the smaller this level is the smaller is also the power of a test and this is the second term
00:32:45.000 --> 00:32:51.000
i want to acquaint you with if you are not yet acquainted to this term here the power of a test
00:32:51.000 --> 00:33:00.000
is the probability to reject H0 if the null hypothesis is wrong so this is the probability
00:33:00.000 --> 00:33:07.000
of a type two error right the level of a test is the probability of a type one error and the
00:33:07.000 --> 00:33:16.000
power of a test is the probability of a type two error obviously we want from any test that the
00:33:16.000 --> 00:33:25.000
power of this test is large so we want that the probability of a rejection of the null hypothesis
00:33:25.000 --> 00:33:29.000
if the null hypothesis is actually wrong that this probability is high
00:33:31.000 --> 00:33:37.000
now the power of a test and the level of a test must be balanced and there's always a trade-off
00:33:37.000 --> 00:33:43.000
between the two so the lower the level of the test is which is a general good to have a low
00:33:43.000 --> 00:33:50.000
level of the test the less unfortunately will be the power of the test and the higher we drive the
00:33:50.000 --> 00:33:58.000
power of a test well the higher is the probability of a type one error so the higher is the level
00:33:58.000 --> 00:34:05.000
and we don't want a higher high level so some compromise must be drawn between the level of
00:34:05.000 --> 00:34:12.000
a test and the power of a test and when people devise new tests in the econometrics literature
00:34:13.000 --> 00:34:19.000
then they usually will report both the level and the power of a test and they will actually
00:34:19.000 --> 00:34:28.000
argue for certain specifications of the test in order to achieve a satisfactory power for a given
00:34:28.000 --> 00:34:35.000
level of the test or they will fix the level of the test say at five percent and say now let's see
00:34:35.000 --> 00:34:40.000
how we can devise the test in such a way that the power is maximal is as best as possible
00:34:42.000 --> 00:34:47.000
so that's some kind of optimality criterion when designing new tests
00:34:49.000 --> 00:34:55.000
we will not design any new tests here we will just use standard tests one of which is the T-test
00:34:55.000 --> 00:35:03.000
which you know of which we have already talked about now as the most basic test to be applied
00:35:03.000 --> 00:35:12.000
to regression results so let's continue the discussion of T-statistics here
00:35:12.000 --> 00:35:20.000
by repeating what I have already said on confidence intervals but now to familiarize
00:35:20.000 --> 00:35:29.000
you with the regression framework let's do it in terms of regression theory so as you know for
00:35:29.000 --> 00:35:36.000
a test level alpha for a significance level alpha we would have that the test statistic Tk
00:35:37.000 --> 00:35:46.000
which we can compute should lie between two critical values which I denote here as T index
00:35:46.000 --> 00:35:55.000
n minus k alpha however negative of that positive of that value so the true test statistic which
00:35:55.000 --> 00:36:01.000
we compute from the regression results should lie between the two critical values here with
00:36:01.000 --> 00:36:09.000
probability one minus alpha because there would be just probability of alpha over two for a test
00:36:09.000 --> 00:36:15.000
statistic which is even smaller than the negative of this critical value or the same probability
00:36:15.000 --> 00:36:23.000
alpha over two for Tk being greater than this critical value here so the probability for Tk
00:36:23.000 --> 00:36:30.000
to lie between these two critical values here is just one minus alpha and obviously the critical
00:36:30.000 --> 00:36:37.000
values are then taken from a t distribution with n minus k degrees of freedom and obviously we have
00:36:37.000 --> 00:36:49.000
a two-sided test here so that we have the residual probability alpha over two for very small
00:36:49.000 --> 00:36:59.000
and same probability alpha over two for very large values of the test now Tk is of course
00:36:59.000 --> 00:37:07.000
just this expression here which we have already derived the estimator beta hat k for the kth
00:37:07.000 --> 00:37:13.000
coefficient of parameter vector beta minus beta k minus the true value which we do not
00:37:14.000 --> 00:37:22.000
know but on which we can have certain ideas where we have certain hypotheses what the true value
00:37:22.000 --> 00:37:30.000
of beta k may be divided by the true standard error here sorry this is not this is not the t
00:37:31.000 --> 00:37:44.000
here because i do not use the true because i use the true standard deviation of the u c s sigma u
00:37:44.000 --> 00:37:52.000
and not the headed value so that's not exactly correct what i have written here which i think
00:37:52.000 --> 00:38:02.000
i should change the formulas and insert a hat here that makes more sense so that's another typo in
00:38:02.000 --> 00:38:07.000
the slides for which i apologize so please correct this for yourself and i will upload a corrected
00:38:07.000 --> 00:38:18.000
file here that should be sigma hat u here and here and there right and then in the usual way
00:38:18.000 --> 00:38:29.000
when we write it this way here we can transform these two inequalities such that we know that
00:38:29.000 --> 00:38:40.000
basically the value of the estimator beta hat k plus the standard deviation times the critical
00:38:40.000 --> 00:38:46.000
value must be greater or equal than the true parameter value and this is greater or equal
00:38:46.000 --> 00:38:54.000
than beta k minus the test the the critical value times one standard error here so this
00:38:54.000 --> 00:39:04.000
expression 36 would describe as a confidence interval for beta k i would tell us that in 95
00:39:04.000 --> 00:39:13.000
of the cases the true parameter beta k would lie in between these two bounds which i have stated
00:39:14.000 --> 00:39:19.000
here with sigma hat u rather than sigma u in the formula
00:39:21.000 --> 00:39:30.000
you can verify what i have just said this thing by doing another MATLAB exercise much in the spirit
00:39:30.000 --> 00:39:39.000
of previous MATLAB exercises that i gave to you so again generate some matrix of random numbers
00:39:39.000 --> 00:39:46.000
let's say 100 observations and three regressors in the regressor matrix x and then again you
00:39:46.000 --> 00:39:54.000
run a thousand regressions of a particular type so you loop over a thousand regressions
00:39:54.000 --> 00:40:01.000
in where in each of the in each of the loops you generate a random vector of disturbances u j
00:40:01.000 --> 00:40:06.000
which follow a normal distribution so that so that would be important in this case
00:40:07.000 --> 00:40:14.000
that you actually use a random number generator which generates normal normally distributed
00:40:15.000 --> 00:40:23.000
distributions and then you compute in each of the thousand loops your dependent variable y j
00:40:24.000 --> 00:40:33.000
from the following regression so x times parameters 1 1 and 0.1 plus u j so we would
00:40:33.000 --> 00:40:41.000
know that the true parameter vector beta is equal to 1 1 and 0.1 and obviously you may now
00:40:41.000 --> 00:40:48.000
test in each of the regress regressions whether this parameter parameter is actually zero
00:40:49.000 --> 00:40:58.000
so that would be testing a null hypothesis which is not true right but from the regression results
00:40:58.000 --> 00:41:05.000
it may be that you think perhaps the last coefficient is actually zero and not 0.1 so you
00:41:05.000 --> 00:41:13.000
may actually run now tests in each loop of this sample program here which tests the hypothesis
00:41:13.000 --> 00:41:20.000
that the third coefficient beta three is equal to zero you know by construction that the hypothesis
00:41:20.000 --> 00:41:26.000
is not true so you test a null hypothesis which is not true and you should now check whether your
00:41:26.000 --> 00:41:32.000
results reject the wrong null hypothesis in approximately five percent of the cases
00:41:33.000 --> 00:41:40.000
right no not in approximately five percent in the cases you should
00:41:42.000 --> 00:41:52.000
check how often with what probability the beta three coefficient here is detected as being
00:41:52.000 --> 00:42:01.000
non-zero if you test at the five percent level of significance so you would study the power of
00:42:01.000 --> 00:42:09.000
the test in this exercise here you would find out what the probability is to reject the null
00:42:09.000 --> 00:42:17.000
hypothesis of which you know that it is not true and thereby you can get a feeling what the power
00:42:17.000 --> 00:42:28.000
of the test is if the if we know that the true parameter is 0.1 you may also when you write such
00:42:28.000 --> 00:42:36.000
a program check how the power changes when you use other values for beta beta three so not use 0.1
00:42:36.000 --> 00:42:45.000
but use 0.2 or 0.05 or whatever you can then see how the power of the test changes with the true
00:42:45.000 --> 00:42:56.000
parameter which you use in this regression here. I think that was why it was just confused that by
00:42:56.000 --> 00:43:05.000
some mistake I skipped this exercise here when I changed the slides so actually we had discussed
00:43:05.000 --> 00:43:17.000
slide page 95 here and I should first have come to this exercise which relates to the level of a test
00:43:17.000 --> 00:43:23.000
and then comes the next exercise basically modification of the first one which I have
00:43:23.000 --> 00:43:29.000
just introduced which relates to the power to the power of a test so let me go back one slide here
00:43:29.000 --> 00:43:37.000
the first exercise actually that I recommend is of the same design but using a value of 0 for beta
00:43:38.000 --> 00:43:46.000
three so in this case we would know that the third regressor which we have in our regressor matrix
00:43:46.000 --> 00:43:56.000
x is actually not explaining or is not related to the dependent variable yj because the true
00:43:56.000 --> 00:44:03.000
value of the parameter is zero but if you test this third regressor never nevertheless so you
00:44:03.000 --> 00:44:10.000
estimate the regression with three regressors not knowing that the third regressor is actually not
00:44:10.000 --> 00:44:17.000
needed then you can run the t-test at the five percent level and you can check how often in your
00:44:17.000 --> 00:44:26.000
thousand replications the hypothesis beta three is rejected the hypothesis beta three equal to zero
00:44:26.000 --> 00:44:34.000
is rejected by the data and it should actually be the case that only in approximately five percent
00:44:34.000 --> 00:44:41.000
of your thousand observations so only for about 50 of this thousand loops you would reject the
00:44:41.000 --> 00:44:50.000
null hypothesis that beta is equal to that beta three is equal to zero and in 950 cases you should
00:44:50.000 --> 00:44:57.000
accept this hypothesis so you would get a feeling of what a type one error is if you do this
00:44:57.000 --> 00:45:05.000
exercise and then here the same exercise is repeated with a non-zero value for beta three
00:45:05.000 --> 00:45:09.000
zero point one as I have just explained and that would give you a feeling what the power
00:45:09.000 --> 00:45:13.000
of a test is and how it changes when you change this coefficient here
00:45:16.000 --> 00:45:24.000
now what we have talked about is testing one particular coefficient beta k with a hypothesis
00:45:24.000 --> 00:45:34.000
of beta k being equal to zero suppose now that we have a joint hypothesis so suppose that we
00:45:34.000 --> 00:45:43.000
want to test an hypothesis for a regression equation which says that more than one coefficient
00:45:43.000 --> 00:45:51.000
is equal to zero so it could be a hypothesis of the following form another hypothesis would be
00:45:51.000 --> 00:46:01.000
that beta k minus j plus one up to beta capital k are all equal to zero or it's perhaps easier to
00:46:01.000 --> 00:46:07.000
see what I mean by looking at equation 38 we have the typical regression model y is equal to
00:46:07.000 --> 00:46:18.000
x beta plus u and now we split the x matrix column by y's so the first k minus j columns
00:46:18.000 --> 00:46:30.000
are in matrix x zero and the next j columns are in matrix x one so we just split here vertically
00:46:30.000 --> 00:46:37.000
basically um the this vertical or is this horizontal I don't know so we split along the
00:46:37.000 --> 00:46:45.000
columns so that the first k minus j columns of matrix x are matrix x zero and the last k the
00:46:45.000 --> 00:46:54.000
last j columns of x one are in matrix x one and we split the beta vector accordingly so that matrix
00:46:54.000 --> 00:47:02.000
x zero is multiplied by parameter subvector beta zero and matrix x one is multiplied by
00:47:02.000 --> 00:47:15.000
subvector beta one so the vector beta zero would have k minus j by would be of type k minus j by
00:47:15.000 --> 00:47:23.000
one so one column k minus j rows and the subvector beta would be j rows and one column of course
00:47:23.000 --> 00:47:32.000
plus error term now the hypothesis that we would like to test is that this vector beta one here
00:47:33.000 --> 00:47:40.000
is actually identical to zero so that all the regressors contained in matrix x one are actually
00:47:40.000 --> 00:47:49.000
not needed and have no explanatory power right so that's the null hypothesis of a joint hypothesis
00:47:49.000 --> 00:47:58.000
on coefficients in this case on j coefficients in our regressor matrix how would we estimate
00:47:58.000 --> 00:48:06.000
or sorry how would we test such an hypothesis well the first thing would be that we estimate
00:48:06.000 --> 00:48:13.000
the usual regression so just compute beta hat in the usual way as the least squares estimator beta
00:48:13.000 --> 00:48:22.000
x prime x inverse times x prime y would be beta hat we call this the unrestricted regression
00:48:22.000 --> 00:48:28.000
because there are no restriction placed on our regressor matrix here we just use all the regressor
00:48:29.000 --> 00:48:34.000
all the regressors even though we know that some of them are actually superfluous and we need them
00:48:35.000 --> 00:48:42.000
but we have them included in our regressor matrix nevertheless and then just compute
00:48:42.000 --> 00:48:50.000
usual least squares regression denote the estimator by beta hat so no restrictions on beta
00:48:50.000 --> 00:48:56.000
one here and not imposing the restriction that beta one is actually equal to zero as
00:48:56.000 --> 00:49:08.000
h zero specifies that to be the case now for the least squares estimator denote by s one the sum
00:49:08.000 --> 00:49:14.000
of squared residuals u hat prime u hat is the sum of squared residuals as you know so s one is
00:49:14.000 --> 00:49:22.000
u hat prime u hat or you can also write it as y minus x beta hat prime y minus x beta that's the
00:49:23.000 --> 00:49:34.000
sum of squared residuals for the unrestricted regression and now we re-estimate the regression
00:49:34.000 --> 00:49:41.000
under the restriction of the null hypothesis so under the restriction that beta one is equal to
00:49:41.000 --> 00:49:49.000
zero what does this mean if beta one is equal to zero then we can impose the restriction on the
00:49:49.000 --> 00:49:57.000
regression by not including matrix x one in our regression right if beta one is equal to zero then
00:49:57.000 --> 00:50:03.000
x one times beta one is equal to zero so the regression model in this case with this restriction
00:50:03.000 --> 00:50:13.000
being imposed is just that y is equal to x zero times beta zero plus u and this term here would be
00:50:13.000 --> 00:50:24.000
zero um sorry so we would have this model here this is the restricted model which imposes the
00:50:24.000 --> 00:50:29.000
restriction beta one is equal to zero and this gives us an estimate which i here denote by beta
00:50:29.000 --> 00:50:36.000
zero tilde right which is actually the least squares estimate of the least squares estimator
00:50:36.000 --> 00:50:44.000
of the restricted model so it's x zero prime x zero inverse times x zero prime y and the
00:50:44.000 --> 00:50:51.000
corresponding sum of squared residuals would be s zero and this would be uh utility prime
00:50:51.000 --> 00:50:59.000
utility so utility are the residuals from this regression here and we can write this as y minus
00:50:59.000 --> 00:51:11.000
x zero beta zero tilde prime the same expression not transposed so s one is the sum of squared errors
00:51:12.000 --> 00:51:20.000
from the unrestricted regression s zero is the sum of squared errors from the restricted
00:51:20.000 --> 00:51:27.000
regression and you can memorize the significance of the zero index here and the one index here
00:51:27.000 --> 00:51:33.000
by saying well this corresponds to the null hypothesis this is why i have an index of null
00:51:34.000 --> 00:51:40.000
here of zero here right so this sum of squared residuals corresponds to the null hypothesis
00:51:41.000 --> 00:51:47.000
and this here corresponds to h one to the alternative hypothesis that at least one of the
00:51:48.000 --> 00:51:51.000
beta one coefficients is different from zero
00:51:53.000 --> 00:51:59.000
now the question that i would usually ask you now if we had normal lectures would be
00:52:00.000 --> 00:52:06.000
what do we know about s one and s zero in particular can we state that one of these
00:52:06.000 --> 00:52:15.000
magnitudes here is greater or smaller than the other in all possible specifications of the
00:52:15.000 --> 00:52:21.000
regressor matrix x does anyone know the correct answer for this question then please raise your
00:52:21.000 --> 00:52:33.000
hand do you have any idea whether s one or s zero is systematically greater or smaller
00:52:33.000 --> 00:52:36.000
than the other sum of squared residuals
00:52:36.000 --> 00:52:46.000
please wave to me raise your hand if you want to answer the question
00:52:53.000 --> 00:53:02.000
i don't see any sign so the answer is easy sorry oh yeah there's somebody
00:53:03.000 --> 00:53:07.000
uh please um yeah uh there comes the answer in the chat
00:53:08.000 --> 00:53:13.000
a zero is smaller um why do you think that a zero is smaller
00:53:23.000 --> 00:53:27.000
because the results are as small as possible no that's unfortunately not
00:53:27.000 --> 00:53:34.000
the correct answer wait a minute i have to go back to the right slide
00:53:37.000 --> 00:53:44.000
so the true answer is actually that s zero is always larger than s one so just the opposite
00:53:44.000 --> 00:53:54.000
of what you just tried to say and well yeah my question is why is this the case now given that
00:53:54.000 --> 00:54:01.000
i told you what the true the correct answer is uh what do you think again about it why is it true
00:54:01.000 --> 00:54:08.000
that s zero sum of squared residuals in the restricted regression is always greater than
00:54:08.000 --> 00:54:21.000
or equal to s one
00:54:24.000 --> 00:54:35.000
we'll observe that in both regressions regression one and regression zero if i may call them so
00:54:35.000 --> 00:54:39.000
we want to explain the same dependent variable right we want to explain why
00:54:40.000 --> 00:54:49.000
and in uh equation 39 here which gives rise to sum of squared residuals s zero we are
00:54:50.000 --> 00:54:58.000
able to explain why by choosing coefficient vectors beta zero freely basically we may choose
00:54:58.000 --> 00:55:09.000
any value of beta zero such that it minimizes you whereas in equation one we may we may minimize
00:55:09.000 --> 00:55:15.000
the sum of squared residuals choosing any value for the beta zero so we may just make the same
00:55:15.000 --> 00:55:25.000
choice as before as in the restricted regression plus we are able to choose beta one freely in the
00:55:25.000 --> 00:55:32.000
restricted regression we are not able to choose beta one freely because we have restricted the
00:55:32.000 --> 00:55:42.000
beta one coefficients to be all zero obviously in the unrestricted regression we are free to choose
00:55:42.000 --> 00:55:48.000
the beta ones as being equal to zero if this should be optimal if this should minimize the
00:55:48.000 --> 00:55:54.000
sum of squared residuals so it's completely possible to reproduce exactly the same estimate
00:55:54.000 --> 00:56:01.000
that we would have in the restricted regression but we may also choose non-zero coefficients
00:56:01.000 --> 00:56:08.000
for the beta ones so we have more flexibility in this regression because we can choose any
00:56:08.000 --> 00:56:15.000
value of beta one which allows us to minimize the squared errors of this regression
00:56:15.000 --> 00:56:22.000
so when we have more flexibility we can of course achieve a better value a smaller value for the
00:56:22.000 --> 00:56:29.000
sum of squared residuals so the minimum will be smaller in the unrestricted regression that will
00:56:29.000 --> 00:56:37.000
be in the restricted regression and because of this it is always true that s zero is greater than
00:56:37.000 --> 00:56:44.000
or equal to s one all right so we can explain less in the restricted regression than we can in
00:56:44.000 --> 00:56:51.000
the unrestricted regression so the difference s one minus s one is always non-negative right
00:56:51.000 --> 00:56:56.000
it's positive or at best case even though this never happens it would be exactly equal to zero
00:56:56.000 --> 00:57:04.000
but it's certainly not negative cannot be the case now we know let us for simplicity say this term
00:57:04.000 --> 00:57:11.000
here is positive we know this term is positive but the question actually is is it significantly
00:57:11.000 --> 00:57:19.000
different from zero so is this positive value here uh large enough to differ significantly
00:57:19.000 --> 00:57:25.000
from a value of zero because if it were not significantly different from zero then we would
00:57:25.000 --> 00:57:32.000
say okay it is may it may still be true that all the beta coefficients are uh all the coefficients
00:57:32.000 --> 00:57:39.000
for the beta one vector are actually equal to zero uh since we don't really make much progress
00:57:39.000 --> 00:57:45.000
in reducing the sum of squared residuals when we move to the flexible approach in which the beta
00:57:45.000 --> 00:57:52.000
one coefficients can be chosen in a way which is uh which which would give non-zero values for the
00:57:52.000 --> 00:58:00.000
beta one coefficients so we want to take and actually want to formally test whether the
00:58:00.000 --> 00:58:08.000
difference between s zero and s one is significantly different from zero and the way to do this is to
00:58:08.000 --> 00:58:15.000
construct a statistic which i hear is called small f which actually will give rise to an f test which
00:58:15.000 --> 00:58:23.000
i usually write with a capital f here but this is just the test statistic that's not yet the
00:58:23.000 --> 00:58:29.000
distribution that's why i distinguish it in notation here's a small f here now how is such
00:58:29.000 --> 00:58:37.000
an f statistic constructed well the idea is not as complicated as it may perhaps a first
00:58:37.000 --> 00:58:44.000
here to you when you are introduced to f tests with two different numbers on the degrees of
00:58:44.000 --> 00:58:53.000
freedoms what we look at is a ratio so this here indicates this ratio we have something which is
00:58:53.000 --> 00:58:59.000
in the numerator of the ratio which is s zero minus s one over j and we have something which
00:58:59.000 --> 00:59:07.000
is in the denominator of this ratio which is s one over n minus k what do we do here why we
00:59:07.000 --> 00:59:15.000
choose these funny looking uh numerators and denominators where we try to standardize our
00:59:16.000 --> 00:59:25.000
results and perhaps it is best to start with the denominator here s one over n minus k
00:59:26.000 --> 00:59:35.000
because what does this exactly tell us s one is the sum of squared residuals uh as you know
00:59:35.000 --> 00:59:45.000
in the unrestricted regression and here i uh divide this by the number of degrees of freedom
00:59:45.000 --> 00:59:54.000
so i i divide by the degrees of freedom by n minus k so basically um you can think of this as
00:59:55.000 --> 01:00:04.000
the average squared error that we have for each degree of freedom in our regression if it is
01:00:04.000 --> 01:00:13.000
unrestricted and now we would like to know how much additional squared error do we get when we
01:00:13.000 --> 01:00:22.000
introduce restrictions and here it is of course clear that the more restrictions sorry the more
01:00:22.000 --> 01:00:28.000
restrictions we place on our model so the more parameters are restricted to be equal to zero
01:00:29.000 --> 01:00:36.000
the higher will be the value of squared residual residuals which we obtain in the restricted
01:00:36.000 --> 01:00:45.000
regression so it makes sense to divide the difference between s zero and s one by the number
01:00:45.000 --> 01:00:57.000
of restrictions so in a sense this here tells us the increase in the value of the squared residuals
01:00:57.000 --> 01:01:07.000
per restriction the average increase in the value of the squared residuals
01:01:08.000 --> 01:01:16.000
when we add one additional restriction or when we set one of the coefficients equal to zero
01:01:16.000 --> 01:01:23.000
exogenously right so that's the average error which we commit or the average squared error
01:01:23.000 --> 01:01:31.000
which we commit per restricted coefficient and this we relate to the average error which we
01:01:31.000 --> 01:01:39.000
have per observation and then we see whether this error here the average squared error per
01:01:39.000 --> 01:01:47.000
restriction is significantly greater than the average error per observation which we have had
01:01:47.000 --> 01:01:58.000
in the unrestricted regression so that's the principal idea of an f statistic to see how much
01:02:00.000 --> 01:02:07.000
additional error variants do we get when we impose additional restrictions on the regression
01:02:08.000 --> 01:02:16.000
and we compare the additional error variants with yeah well what is this here this is basically
01:02:16.000 --> 01:02:24.000
the estimator of the variance right this is the unbiased variance estimator sigma square
01:02:24.000 --> 01:02:32.000
u hat s one over n minus k the numerator in this expression here is just the unbiased
01:02:32.000 --> 01:02:39.000
variance estimator sigma square u hat so basically we ask here how much additional
01:02:39.000 --> 01:02:47.000
variance do we get per imposed restrictions restriction relative to the level of variance
01:02:47.000 --> 01:02:52.000
which we already have in the unrestricted regression this is why this is a sensible
01:02:54.000 --> 01:03:00.000
test statistic which we have here and the distribution of this test statistic follows an
01:03:00.000 --> 01:03:07.000
f distribution which can be shown but which i will not show in this lecture
01:03:09.000 --> 01:03:17.000
it's not that trivial to show it but in general it is true that if we look at the ratio of two
01:03:17.000 --> 01:03:26.000
numbers where each number is the sum of squared random variables and each of the squared random
01:03:26.000 --> 01:03:34.000
variables is normally distributed then the ratio of such random variables is distributed
01:03:35.000 --> 01:03:44.000
as an f distribution with particular degrees of freedom provided that the numerator is independent
01:03:44.000 --> 01:03:51.000
of the denominator which i would have to show and which i will not show that this is the case
01:03:52.000 --> 01:03:58.000
but you can show it then you have the result that distribution of this test statistic here
01:03:58.000 --> 01:04:05.000
follows an f distribution so perhaps just memorize if you have the ratio of the sums of
01:04:05.000 --> 01:04:12.000
squared normally distributed variables this divided by each other and granted that
01:04:12.000 --> 01:04:17.000
numerator and denominator are independent of each other then the variable is f
01:04:17.000 --> 01:04:27.000
so what i've just said i just summarized one sentence on this slide here it is possible to
01:04:27.000 --> 01:04:36.000
show that f follows a standard capital f distribution with j and n minus k degrees of
01:04:36.000 --> 01:04:44.000
freedom so j degrees of freedom in the numerator and n minus k degrees of freedom in the denominator
01:04:44.000 --> 01:04:52.000
of the test statistic f and then obviously and like for any test we reject the null hypothesis
01:04:52.000 --> 01:05:01.000
when f is large sorry this was my fault we reject the null hypothesis when f is too large
01:05:03.000 --> 01:05:09.000
so in this case we would accept the alternative hypothesis h1
01:05:09.000 --> 01:05:19.000
what does the alternative hypothesis h1 state well it basically tells us that there is at least
01:05:19.000 --> 01:05:26.000
one coefficient beta j in the coefficient vector beta 1 which is different from zero
01:05:26.000 --> 01:05:34.000
so when our null hypothesis was actually that this beta 1 vector here was equal to zero
01:05:34.000 --> 01:05:42.000
it means that all of the components in beta 1 are equal to zero so the null hypothesis tells
01:05:42.000 --> 01:05:49.000
us that one of these components at least one of these components is non-zero right maybe that all
01:05:49.000 --> 01:05:55.000
the others are still zero but at least one of them is non-zero this would be the alternative
01:05:55.000 --> 01:06:02.000
hypothesis and this alternative hypothesis h1 i have stated here says that one of the beta j's
01:06:02.000 --> 01:06:10.000
for j running from k minus j plus 1 up to k that one of these beta j's is non-zero
01:06:12.000 --> 01:06:17.000
now in the standard regression output which you get from econometrics packages there's always
01:06:17.000 --> 01:06:24.000
something reported on the f statistic and never do they state in the regression output which
01:06:24.000 --> 01:06:31.000
precise hypothesis is being tested because obviously there are many ways to split a
01:06:31.000 --> 01:06:41.000
regressor matrix like here between x0 and x1 many ways to have a hypothesis in such a way
01:06:41.000 --> 01:06:48.000
that a particular subset of the coefficients is equal to zero so the standard f test is actually
01:06:48.000 --> 01:06:56.000
the f test which tests the hypothesis that all the coefficients in the beta vector are zero
01:06:56.000 --> 01:07:01.000
with the exception of the first one which is the coefficient on the constant
01:07:02.000 --> 01:07:10.000
because in almost any regression we know that researchers will include constant there so that's
01:07:10.000 --> 01:07:19.000
a regressor which almost reliably is in the progression or the other regresses may vary
01:07:19.000 --> 01:07:25.000
depending on the research topic so the standard f statistic reported in regression output always
01:07:25.000 --> 01:07:32.000
tests the hypothesis that with the exception of the constant as the first regressor all other
01:07:32.000 --> 01:07:40.000
beta coefficients are equal to zero this is what i have written here so the standard non-hypothesis
01:07:40.000 --> 01:07:47.000
in regression output is that beta two up to beta capital k are all equal to zero and just beta one
01:07:47.000 --> 01:07:51.000
is allowed to be non-zero and beta one is usually the constant
01:07:54.000 --> 01:08:03.000
so this is what you see here in our consumption income regression the f statistic is enormous
01:08:03.000 --> 01:08:09.000
one thousand eight hundred and thirty and the probability of the f statistic is zero
01:08:09.000 --> 01:08:19.000
so the probability that log y is actually equal to zero or the coefficient of log y is actually equal
01:08:19.000 --> 01:08:26.000
to zero is zero as we knew already from the t state test because in this particular case the
01:08:26.000 --> 01:08:31.000
f statistic doesn't really test a different hypothesis than t test for this coefficient
01:08:32.000 --> 01:08:39.000
tests since we just have two regressors here and the f statistic tests the hypothesis
01:08:39.000 --> 01:08:46.000
that this thing here is equal that the coefficient on log y is equal to zero
01:08:51.000 --> 01:08:56.000
could again an exercise in this context reproduce the f statistic for the benchmark regression
01:08:57.000 --> 01:09:04.000
uh benchmark regression meaning log private consumption request of log income reproduce
01:09:04.000 --> 01:09:11.000
the f statistic and its p value manually so by computing it from hand in order to see
01:09:11.000 --> 01:09:13.000
that you have understood what the f statistic actually means
01:09:17.000 --> 01:09:22.000
any questions which relate to hypothesis testing
01:09:27.000 --> 01:09:35.000
this is not the case so we move on to asymptotic properties again the numbering should be 3.5
01:09:38.000 --> 01:09:44.000
and now what we want to do is that we want to look at the least squares estimator when the
01:09:44.000 --> 01:09:52.000
assumptions a1 to a5 or some of these assumptions are actually not satisfied i told you already
01:09:52.000 --> 01:10:00.000
some of the assumptions in particular assumption a2 a3 and a4 are rather strong assumptions
01:10:00.000 --> 01:10:08.000
oh well a5 is actually also strong so it is legitimate to raise the question of what happens
01:10:08.000 --> 01:10:20.000
if a1 a2 a3 a4 a5 some of them are actually not satisfied and of some help when we are
01:10:20.000 --> 01:10:27.000
faced with scenarios in which we have to believe that these assumptions a1 to a5 are not satisfied
01:10:27.000 --> 01:10:34.000
it is of help to resort to asymptotic properties of the estimator because we will usually not be
01:10:34.000 --> 01:10:42.000
able to derive finite sample properties anymore when a1 to a5 are not satisfied or at least a1 to
01:10:42.000 --> 01:10:52.000
a4 are not satisfied so let's look at what happens asymptotically first have a look at the first two
01:10:52.000 --> 01:10:59.000
lines in this expression here these are transformations of the least squares estimator which
01:10:59.000 --> 01:11:06.000
you already know beta hat is equal to two parameter vector plus x prime x inverse times x prime u
01:11:07.000 --> 01:11:15.000
so this we are familiar with this property now what we have done so far is that we have usually
01:11:15.000 --> 01:11:21.000
argued where x prime u has an expected value of zero right and the assumption which we needed
01:11:21.000 --> 01:11:30.000
for this was the strict exogeneity of x so we needed to have the assumption that the expectation
01:11:30.000 --> 01:11:40.000
of x prime u is given x is actually equal to the expectation of u which would be zero
01:11:40.000 --> 01:11:47.000
and then the whole thing here is zero but this conditional independence which we had assumed
01:11:48.000 --> 01:11:56.000
here the effect that the expectation of x prime u is equal to zero this expectation
01:11:57.000 --> 01:12:05.000
this assumption is a strong assumption because it implies that the u here is no way related to
01:12:06.000 --> 01:12:14.000
the x matrix here and this is not necessarily true in particular in time series context
01:12:14.000 --> 01:12:22.000
it is not true for instance it is often the case that in a time series context
01:12:22.000 --> 01:12:30.000
the x matrix contains lagged values of the dependent variable so suppose we want to explain
01:12:30.000 --> 01:12:40.000
y then we would have lagged values of y in the x matrix well if we have lagged values of y in the
01:12:40.000 --> 01:12:47.000
x matrix then we know of course that lagged values of u of the errors are also in the x matrix
01:12:47.000 --> 01:12:57.000
because y is equal to x beta plus u so when we have lagged values of y in the regressor matrix
01:12:57.000 --> 01:13:03.000
then we also have lagged values of u in the regressor matrix so we would have lagged values
01:13:03.000 --> 01:13:10.000
of u in here and we would have the non-lagged values of u here but since we multiply out this
01:13:10.000 --> 01:13:19.000
matrix product here we would get terms in which u a value of u is multiplied by its own lagged
01:13:19.000 --> 01:13:27.000
values and it is not clear that this is always that the expectation of this is always zero
01:13:29.000 --> 01:13:37.000
or we would have perhaps this is easier to see then we would have lagged values of y multiplied
01:13:37.000 --> 01:13:47.000
with lagged values of u or in a different row of this u vector here which would have a expectation
01:13:47.000 --> 01:13:53.000
which is non-zero because this involves the variance of the u's actually that's the better
01:13:53.000 --> 01:14:00.000
way to frame it i think the first burning was was not so exact so that would be the the right way
01:14:00.000 --> 01:14:11.000
to look at it if you have a lagged y here let's say in row two then this depends on a lagged error
01:14:12.000 --> 01:14:21.000
and this lagged error would be in this vector of all the u values which we occur with which we
01:14:23.000 --> 01:14:30.000
observe or which we implicitly observe in our sample so that there would exist a correlation
01:14:30.000 --> 01:14:37.000
of this lagged u term here with the corresponding different row in this u matrix in this u vector
01:14:37.000 --> 01:14:46.000
here so that we would when multiplying out x prime u encounter a term which is squared u i
01:14:47.000 --> 01:14:55.000
and this would have an expectation of sigma square u so it would be non-zero therefore this
01:14:55.000 --> 01:15:04.000
term you would not vanish basically what we want to do now is that we want to boil down
01:15:04.000 --> 01:15:12.000
our assumption on the x prime u on the strict exogeneity of the axis here to an assumption
01:15:12.000 --> 01:15:19.000
which is weaker in the sense that we only want to have the property that the axis in the particular
01:15:20.000 --> 01:15:27.000
particular row of x are independent of the u disturbance in the same row
01:15:28.000 --> 01:15:34.000
but not independent of all the u disturbances in this vector so on all rows in the u vector
01:15:35.000 --> 01:15:43.000
and in order to do this it is helpful to write the x prime x inverse representation here
01:15:43.000 --> 01:15:52.000
and the x prime u expectation here in terms of vectors which we add up so
01:15:54.000 --> 01:16:01.000
at the beginning of the regression model i had already informed you or asked you in an exercise
01:16:01.000 --> 01:16:11.000
to verify that x prime x is actually the same thing as the sum over all observations of x i
01:16:11.000 --> 01:16:19.000
times x i prime where x i are just the explanatory values for the ith observation
01:16:20.000 --> 01:16:25.000
right these are just the explanatory variables for the ith observation here
01:16:25.000 --> 01:16:35.000
and i multiply the this vector here by its own cross product so this x i is a k by one vector
01:16:36.000 --> 01:16:42.000
being multiplied by one by k vector which gives me then that the product here is k by k
01:16:43.000 --> 01:16:51.000
exactly as x prime x is k by k and well then this is added up over n observations so n k by k
01:16:51.000 --> 01:16:59.000
matrices added up is still in a k by k matrix so x prime x is the same thing as this expression here
01:17:00.000 --> 01:17:05.000
and then i have to take the inverse of this expression and the same thing holds true for
01:17:05.000 --> 01:17:15.000
x prime u x prime u can be written as x i times the scalar value u i x i is a k by one vector
01:17:17.000 --> 01:17:29.000
and this is being multiplied by the observation u i so we get the product x prime u which as you
01:17:29.000 --> 01:17:41.000
know is a k by one vector since x has k columns x prime has k rows so this is k times n or k by n
01:17:41.000 --> 01:17:50.000
matrix multiplied by n by one matrix gives us a k by one matrix so we here also have a k by one
01:17:50.000 --> 01:17:58.000
matrix since x i is a vector with k rows being multiplied by a scalar is still a vector of k rows
01:17:59.000 --> 01:18:08.000
added up n times the sum of n k by one vectors is a k by one vector so it seems that the formats
01:18:08.000 --> 01:18:13.000
here are correct the dimensions of the matrices are correct and then we can just write our matrix
01:18:13.000 --> 01:18:20.000
products in the somewhat more complicated way of adding up vector products like i've written it
01:18:20.000 --> 01:18:27.000
here and the importance of that is illustrated by the fact that we now see that there are just the
01:18:27.000 --> 01:18:37.000
products of the observations of the same observations involved in these sums here so it's always x i
01:18:37.000 --> 01:18:43.000
multiplied by u i we never have x i be multiplied by u i minus one or minus two or minus three or
01:18:43.000 --> 01:18:49.000
something like this always the same observation index or in the time series context it would be
01:18:49.000 --> 01:18:59.000
the same time index x t and u t or here that is also the case the correlation between different
01:18:59.000 --> 01:19:07.000
observations only comes up when we multiply this term here with this term here and we are
01:19:07.000 --> 01:19:13.000
are now looking for conditions under which the product of these terms here is actually zero
01:19:16.000 --> 01:19:19.000
and therefore we will use asymptotic arguments
01:19:19.000 --> 01:19:32.000
well what we do here is essentially that we introduce the average over these sums
01:19:34.000 --> 01:19:42.000
so you see that this here is an inverse by the superscript minus one and this here is not so
01:19:42.000 --> 01:19:51.000
we may actually introduce one over n here and one over n here because the inverse of the factor one
01:19:51.000 --> 01:19:59.000
over n is n right if i pull out this factor one over n out of these parentheses here then since
01:19:59.000 --> 01:20:06.000
this is the inverse i would have to write an n between these two matrices the inverse of one
01:20:06.000 --> 01:20:13.000
over n is n and if i pull out the one over n out of these parentheses since this is not an
01:20:13.000 --> 01:20:18.000
inverse it's just a one over n factor so actually pulling out these two factors here out of the
01:20:18.000 --> 01:20:27.000
parentheses gives me n times one over n so this cancels right so clearly this type of product
01:20:27.000 --> 01:20:35.000
here is the same as this type of product here and why do we do this well because we want to
01:20:36.000 --> 01:20:45.000
use some type of law of large numbers we know from the law of large numbers that the sum over
01:20:45.000 --> 01:20:55.000
independent observations converges or the the average over independent observation observations
01:20:55.000 --> 01:21:04.000
converges to the true value asymptotically right and here now we take the average of well some
01:21:04.000 --> 01:21:12.000
sort of squared values of the xi's over all the observations i so i running from one to n and here
01:21:12.000 --> 01:21:18.000
we do the same thing for the product of the observations of the regressor matrix times
01:21:18.000 --> 01:21:25.000
the error and we again compute the average of all of that so what you will do in the next lecture
01:21:25.000 --> 01:21:33.000
is that we invoke the law of large number to argue that this thing here under certain conditions
01:21:33.000 --> 01:21:41.000
converges to zero and we will also argue that this thing here under certain conditions converges
01:21:41.000 --> 01:21:47.000
to a finite matrix any finite matrix so if this is any finite matrix to which it converges
01:21:47.000 --> 01:21:54.000
and this here is zero or has as a limit as zero then the product of something finite
01:21:54.000 --> 01:22:01.000
times something which is zero is zero and the whole terms here vanish this is what we
01:22:03.000 --> 01:22:08.000
have as additional properties when as additional properties when the number of observations
01:22:10.000 --> 01:22:16.000
converges to infinity or goes to infinity and we will use this next time
01:22:17.000 --> 01:22:21.000
for today i will stop the recording