WEBVTT - autoGenerated
00:00:00.000 --> 00:00:10.000
Today we start and we finish with the last section of the review of basic econometrics
00:00:10.000 --> 00:00:17.000
in which again we talk about a violation of the classical assumptions which we have made
00:00:17.000 --> 00:00:27.000
to prove the blue property of the ordinary least squares estimator.
00:00:27.000 --> 00:00:36.000
You will have noted that I spoke of the least squares estimator when I introduced this estimator
00:00:36.000 --> 00:00:42.000
but now I distinguish always between the ordinary least squares estimator OLS and the generalized
00:00:42.000 --> 00:00:50.000
least squares estimator GLS which I introduced to you in the last lecture, GLS estimator
00:00:50.000 --> 00:00:58.000
I introduced to you in the last lecture and you know that OLS is basically just a special
00:00:58.000 --> 00:01:07.000
case of GLS because what GLS does is that it weights the regressors or the regressor
00:01:07.000 --> 00:01:13.000
matrix with the inverse of the covariance matrix of the errors and since the covariance
00:01:13.000 --> 00:01:20.000
matrix of the errors is scalar and diagonal in the case of the classical linear regression
00:01:20.000 --> 00:01:28.000
model the ordinary least squares estimator is also weighing the regressors with the inverse
00:01:28.000 --> 00:01:35.000
of the covariance matrix just that this inverse is an identity matrix so there's no need to
00:01:35.000 --> 00:01:41.000
have a special symbol for it we just write x prime x rather than x i inverse e and x
00:01:41.000 --> 00:01:48.000
since i inverse e is i and this thing does not play a role but with GLS it does and
00:01:48.000 --> 00:01:54.000
therefore in my notation and like most of the literature we are specific in saying that
00:01:54.000 --> 00:02:00.000
we either use OLS and GLS and we've covered a number of cases in which GLS may be a better
00:02:00.000 --> 00:02:08.000
estimator than OLS which of course are cases in which the covariance matrix of the error
00:02:08.000 --> 00:02:19.000
is not homoscedastic which errors are correlated so that the classical assumptions are violated.
00:02:19.000 --> 00:02:26.000
Now we know that both the OLS and the GLS estimator are consistent and asymptotically
00:02:26.000 --> 00:02:36.000
normally distributed if the expectation of the product terms so the scalar product ut
00:02:36.000 --> 00:02:46.000
times xt is equal to zero so that is a specific property which follows from assumption a2
00:02:46.000 --> 00:02:54.000
which is a more general assumption on the strict exogeneity of x with respect to u but
00:02:54.000 --> 00:03:02.000
the property we actually need is just that this product term here so u of period t and
00:03:02.000 --> 00:03:11.000
x period t their product should have an expectation of zero for all observations t if we index
00:03:11.000 --> 00:03:21.000
observations now by time rather than indexing them from one to n because the problem of
00:03:21.000 --> 00:03:27.000
this assumption here being violated appears often in the time series context so i chose
00:03:28.000 --> 00:03:33.000
time series notation here but you may also have them in a cross-section context as you will learn
00:03:33.000 --> 00:03:42.000
in subsequent like sections of this lecture when we talk about causality analysis but the point
00:03:42.000 --> 00:03:50.000
which i would like to make here is and perhaps it is easier to understand what i'm talking about
00:03:50.000 --> 00:03:56.000
the point here is that we only need that the correlation between ut and xt or more precisely
00:03:56.000 --> 00:04:04.000
the expectation of the product of ut and xt is zero but we may allow actually for consistency
00:04:04.000 --> 00:04:14.000
that xt is correlated with ut minus one but that would not constitute a problem for consistency
00:04:18.000 --> 00:04:26.000
this fact we will try to exploit today or we will exploit today for settings in which
00:04:27.000 --> 00:04:36.000
the expectation of this product term here is not equal to zero so for cases in which xt is correlated
00:04:36.000 --> 00:04:45.000
with ut and this phenomenon that xt and ut are correlated is called the endogeneity or the
00:04:45.000 --> 00:04:51.000
simultaneity problem so we will address this problem that the expectation of this product term
00:04:51.000 --> 00:04:58.000
is different from zero the first thing i would like to do is that i give you an example of when
00:04:58.000 --> 00:05:04.000
this may be the case there are actually many settings in which this may occur but perhaps
00:05:04.000 --> 00:05:13.000
at the outset it is difficult to see why xt should be correlated with ut and all so i will actually
00:05:13.000 --> 00:05:19.000
give you two basic examples one of which we have already covered in previous lectures namely the
00:05:19.000 --> 00:05:24.000
Keynesian cross i will not start with this example but now give you a different example of
00:05:24.000 --> 00:05:30.000
errors in variables and that's actually a very relevant example because it is quite plausible
00:05:30.000 --> 00:05:38.000
that we do have errors in variables problems in our data so suppose that the true model is of
00:05:38.000 --> 00:05:46.000
the following form where here write down just the equation for single observation indexed with
00:05:47.000 --> 00:05:55.000
so these are not vectors like i usually use them when i denote econometrics but just the scalar
00:05:56.000 --> 00:06:03.000
version of such an equation here where yt is a single observation in period t and it depends
00:06:03.000 --> 00:06:12.000
on a true regressor wt so also something we would like to observe in period t
00:06:13.000 --> 00:06:19.000
with coefficient beta so beta here is also just a scalar and then there is an error term ut
00:06:20.000 --> 00:06:31.000
the problem now is that wt is not observed instead of observing wt we observe something else namely xt
00:06:32.000 --> 00:06:44.000
and xt is a measure of wt but we measure wt with some error epsilon t which probably is the case
00:06:44.000 --> 00:06:53.000
in many econometric settings for instance if you think of human capital which is an important
00:06:53.000 --> 00:06:59.000
regressor in many production-oriented econometric approaches the human capital of course
00:06:59.000 --> 00:07:07.000
corresponds to production and is a factor of production we are very well aware of the
00:07:07.000 --> 00:07:15.000
fact that we do not have a good measure of human capital because all we have are typically certain
00:07:15.000 --> 00:07:20.000
educational degrees which people have obtained during their lifetimes
00:07:21.000 --> 00:07:26.000
but the educational degrees are of course just an imperfect measure of the human capital
00:07:27.000 --> 00:07:35.000
going along with a certain burka since workers may have acquired other capabilities
00:07:36.000 --> 00:07:44.000
by i don't know learning by doing for instance or by self-instruction or because formal degrees
00:07:44.000 --> 00:07:53.000
measure their ability inadequately sometimes perhaps workers are granted or later workers
00:07:53.000 --> 00:08:01.000
have been granted formal degrees in times of schooling even though perhaps they did not know
00:08:01.000 --> 00:08:09.000
quite as much as the exam seemed to find out when they took the exam whatever the human capital
00:08:09.000 --> 00:08:15.000
certainly always measured with error and probably with quite a bit of error but the same is of
00:08:15.000 --> 00:08:20.000
course true for almost any kind of variable you may think of if we have time series on consumption
00:08:20.000 --> 00:08:24.000
for instance this is almost certainly not the true consumption which people have
00:08:24.000 --> 00:08:32.000
ahead in a certain period but is a measure of consumption and it may deviate from the true
00:08:32.000 --> 00:08:38.000
consumption depending on how we measure consumption we know if we use different approaches to
00:08:38.000 --> 00:08:45.000
measuring consumption we will get different figures for that so we don't really know what
00:08:45.000 --> 00:08:49.000
the true consumption is and we know that we have committed some error when measuring
00:08:49.000 --> 00:08:55.000
consumption or income of capital or whatever variable you would think of actually the most
00:08:55.000 --> 00:09:04.000
natural assumption to make is that we almost never have the true variable wt at our disposal so we
00:09:04.000 --> 00:09:10.000
probably never observe it but always we will observe some other variable which i here denote
00:09:10.000 --> 00:09:17.000
xt which is measured with error i make you the assumption that this error is additive which is
00:09:17.000 --> 00:09:22.000
not innocuous as an assumption why should the error be additive it could also be a manipulative
00:09:22.000 --> 00:09:30.000
or coming in any other type of form or combinations of these types of forms so there are many
00:09:30.000 --> 00:09:42.000
possibilities why there are errors and it is not it is not evident that the form of the error
00:09:42.000 --> 00:09:49.000
should be additive but we will assume it for simplicity here i saw a comment coming in or
00:09:49.000 --> 00:09:58.000
question i'll just take that yeah somebody asked whether what we are talking about is only valid
00:09:58.000 --> 00:10:03.000
for time series models actually i just said that and no it is not only valid for time series models
00:10:03.000 --> 00:10:10.000
it's also valid for cross-section models and as i already said please listen to what i say in the
00:10:10.000 --> 00:10:19.000
lecture we will cover this in subsequent sections of the lecture so you will get many examples
00:10:19.000 --> 00:10:24.000
actually where the same type of problem also occurs with cross-section data but the notation
00:10:24.000 --> 00:10:32.000
is somewhat easier when you have time series data because it is quite natural then to suppose
00:10:32.000 --> 00:10:38.000
or quite natural then to speak about xt and the neighboring observation xt minus one so the
00:10:38.000 --> 00:10:46.000
previous observation in previous observation xt minus one rather than picking just one other
00:10:46.000 --> 00:10:53.000
cross-section of which we don't know which of the many core cross-sections is it is with which
00:10:53.000 --> 00:10:59.000
perhaps there is or is not a certain observation so this is why typically these type of problems are
00:10:59.000 --> 00:11:08.000
illustrated in the time series model all right now what we have here are three unobservable
00:11:08.000 --> 00:11:15.000
variables we have wt which is unobserved we have the error ut and we have the error epsilon t
00:11:16.000 --> 00:11:22.000
we make another simplifying assumption to have our setup as easy as possible by saying
00:11:22.000 --> 00:11:29.000
that all these three unobserved quantities shall be independent and we assume that they are not
00:11:29.000 --> 00:11:35.000
autocorrelated so in some way we make our world already as easy as it can only be and you have
00:11:35.000 --> 00:11:40.000
to bear in mind that probably real problems are more complicated in their structure than
00:11:40.000 --> 00:11:45.000
the structure we discuss here but already in this simplified structure you'll see the problem quite
00:11:46.000 --> 00:11:50.000
clearly and it should be clear afterwards that in more complicated structures it's inverse
00:11:51.000 --> 00:12:01.000
all right now what happens when we estimate this equation here by using the regressor
00:12:01.000 --> 00:12:08.000
i am using as regressor the variable xt rather than the variable wt which as i said is not at our
00:12:08.000 --> 00:12:17.000
disposal so one idea you might come up with is that you say well i don't have wt but i know xt
00:12:17.000 --> 00:12:24.000
is an attempt to measure wt okay there is an error so i have no better possibility so i just
00:12:24.000 --> 00:12:32.000
plug in xt for wt in this equation here and what the heck epsilon t is uncorrelated with ut so
00:12:32.000 --> 00:12:38.000
let's hope that everything goes well well in fact you can do this but not everything goes
00:12:39.000 --> 00:12:47.000
and we see this very easily here we know that yt is in real in truth in the true model
00:12:47.000 --> 00:12:57.000
is equal to beta times xt minus epsilon t plus ut why is that well xt minus epsilon t is wt
00:12:57.000 --> 00:13:07.000
right wt is equal to xt minus epsilon t so in place of wt we may also write xt minus epsilon t
00:13:07.000 --> 00:13:14.000
plus then ut so this equality here is true that's the true model which we see here
00:13:15.000 --> 00:13:22.000
so we may rearrange terms and say this is equal to beta times xt that would be now the systematic
00:13:22.000 --> 00:13:29.000
part of our regression we regress on xt because xt is observable we can estimate beta and we would
00:13:29.000 --> 00:13:37.000
know that as the error term we would have ut minus beta epsilon t this term here would be
00:13:37.000 --> 00:13:44.000
completely unobservable because both ut and epsilon t are unobservable note that the beta
00:13:44.000 --> 00:13:51.000
in here is the parameter we are actually interested in so there is information on beta also in the
00:13:51.000 --> 00:14:00.000
error term the combination of ut and minus beta epsilon t we now call just vt so vt is our new
00:14:00.000 --> 00:14:08.000
symbol for the error term in the regression which we now estimate since we do not know the true
00:14:08.000 --> 00:14:18.000
regressor wt but rather use xt in its place so what we estimate would be equation 56 where xt
00:14:18.000 --> 00:14:25.000
is observable and we know what this error vt is about so the question we will have to ask if we
00:14:25.000 --> 00:14:31.000
want to know whether the estimate of beta is consistent the question we have to ask is whether
00:14:31.000 --> 00:14:41.000
the expectation of xt times the error term vt is equal to zero if it were equal if it were equal to
00:14:41.000 --> 00:14:48.000
zero then obviously the the estimate of beta would be consistent and that's fine but as you
00:14:48.000 --> 00:14:57.000
would see now unfortunately this expectation is not equal to zero because expectation of xt times
00:14:57.000 --> 00:15:06.000
vt is by definition the expectation of wt plus epsilon t right this is xt wt plus epsilon t since
00:15:06.000 --> 00:15:16.000
xt is wt measured with an error and vt is ut minus beta epsilon t so we've made the assumption
00:15:16.000 --> 00:15:24.000
that wt and ut are independent so their expectation the expectation of the product is
00:15:24.000 --> 00:15:32.000
zero and epsilon t and ut we also have assumed to be independent so the expectation of the product
00:15:32.000 --> 00:15:40.000
is equal to zero so no problem with that but there is a problem with epsilon t and minus beta
00:15:40.000 --> 00:15:47.000
epsilon t here because if we multiply this out then we see that the result is negative beta times
00:15:47.000 --> 00:15:55.000
epsilon t squared and taking the expectation of negative beta times epsilon t squared we would
00:15:55.000 --> 00:16:02.000
of course get a negative beta times sigma squared epsilon and this is different from zero as long as
00:16:02.000 --> 00:16:07.000
beta is different from zero obviously in the special case that beta is equal to zero
00:16:08.000 --> 00:16:13.000
the expectation would be zero but then of course we wouldn't have a problem with errors and variables
00:16:13.000 --> 00:16:18.000
since beta being equal to zero would mean that this term here actually is zero in the true model
00:16:18.000 --> 00:16:25.000
yt is just an error term but that's irrelevant for our purposes but as long as beta is different
00:16:25.000 --> 00:16:32.000
from zero this term here will be different from zero and therefore we know that the regressor xe
00:16:33.000 --> 00:16:41.000
correlates with the error term by construction so simple errors and variables specification
00:16:41.000 --> 00:16:47.000
as this one here as simple as it is with just an added observational error
00:16:49.000 --> 00:16:56.000
implies necessarily that we have a correlation between xt and the error term and therefore the
00:16:56.000 --> 00:17:06.000
our s estimator in for for beta in equation 56 is inconsistent so the plim of beta hat is equal to
00:17:06.000 --> 00:17:14.000
beta the usual way plus expectation of xxt times vt divided by the variance of xt
00:17:15.000 --> 00:17:22.000
well what is the expectation of xt times vt we know it is different from zero and we know that
00:17:22.000 --> 00:17:29.000
this expectation here is negative if beta is positive or it's positive as beta is negative
00:17:29.000 --> 00:17:38.000
obviously and so has the reverse sign from the true value of beta so for beta greater than zero
00:17:38.000 --> 00:17:46.000
we would know that the plim of beta hat is smaller than beta so even if we have
00:17:47.000 --> 00:17:53.000
asymptotic if we can invoke asymptotic arguments if we have almost arbitrarily many observations
00:17:53.000 --> 00:18:02.000
would know that the ols estimator converges to a value which is smaller than the true value
00:18:02.000 --> 00:18:07.000
even asymptotically so there is no convergence to the true value when we have errors and variables
00:18:08.000 --> 00:18:17.000
which is quite a depressing depressing finding given the fact that errors and variables are quite
00:18:18.000 --> 00:18:29.000
a likely structure likely to be found in real okay as an exercise here i ask you to reconsider
00:18:29.000 --> 00:18:35.000
the Keynesian cross example which i introduced to you in the beginning of the section i think
00:18:35.000 --> 00:18:44.000
around like forky and in this exercise you shall demonstrate the simultaneity problem
00:18:44.000 --> 00:18:51.000
um basically the exercise asked you to do it without scrolling back to equations 20 and 22
00:18:51.000 --> 00:18:59.000
and 22 to 22 in this set of slides here just try to reconstruct just from your mind how the
00:18:59.000 --> 00:19:06.000
Keynesian cross example worked and uh make it clear to you again that there was a simultaneity
00:19:06.000 --> 00:19:14.000
problem in the Keynesian cross model that the simultaneity problem is of exactly the same type
00:19:14.000 --> 00:19:21.000
as the errors and variables and structure here namely implying a correlation between the regressor
00:19:21.000 --> 00:19:30.000
and the error term so that the plim of uh the consumption uh the marginal propensity to consume
00:19:30.000 --> 00:19:39.000
so this little c will be smaller than the true marginal propensity to consume if uh yeah uh if
00:19:39.000 --> 00:19:44.000
we have this positive correlation which of course is clear for consumption since c is certainly a
00:19:44.000 --> 00:19:54.000
positive parameter now what can we do about this problem which i here call the simultaneity
00:19:54.000 --> 00:20:02.000
problem well let's first look at the structure of the problem suppose we have in a general set up
00:20:02.000 --> 00:20:09.000
now not anymore looking at the time series set up but just some general set up be it either time
00:20:09.000 --> 00:20:20.000
series or cross sections in which we estimate by OLS then we know that the plim of the OLS estimator
00:20:20.000 --> 00:20:28.000
is the plim of x prime x inverse x prime y and this plim we know we can decompose into two terms
00:20:29.000 --> 00:20:39.000
by substituting in for y x beta plus u so we will have the sum of two terms here and we then know
00:20:39.000 --> 00:20:46.000
that x prime x inverse times x prime x is just the identity matrix so the first term is just the
00:20:46.000 --> 00:20:53.000
true value of beta the second term we would like to be equal to zero but that we cannot take for
00:20:53.000 --> 00:21:01.000
granted in this particular setting here the second term is the plim of x prime x inverse times x prime
00:21:01.000 --> 00:21:09.000
u you know that we can introduce a factor one over n in both of these terms because one over n
00:21:09.000 --> 00:21:15.000
inverse c times one over n cancels and by assumptions we've made previously we would
00:21:15.000 --> 00:21:23.000
then know that the plim of this first component here one over n x prime x inverse c converges to
00:21:23.000 --> 00:21:31.000
a finite matrix sigma x x inverse c was our notation so something finite and by other
00:21:31.000 --> 00:21:37.000
assumptions we have made we always wanted to ensure that the plim of x prime u or the plim
00:21:37.000 --> 00:21:45.000
plim of one over n x prime prime u converges to zero but in the setting we are currently
00:21:45.000 --> 00:21:51.000
considering with the simultaneity problem we would know that the plim is different from zero
00:21:52.000 --> 00:21:58.000
so if this here is a finite matrix and it is multiplied by something which is different from
00:21:58.000 --> 00:22:04.000
zero then the whole second term here will be different from zero which means that the plim
00:22:04.000 --> 00:22:11.000
of beta hat is not equal to beta but it is beta plus something plus something which is different
00:22:11.000 --> 00:22:23.000
from zero so the OLS estimate is not consistent now this exposes neatly the type of problem
00:22:23.000 --> 00:22:30.000
we have essentially the problem is the non-convergence of this term here to zero
00:22:31.000 --> 00:22:38.000
and this non-convergence is due to the fact that the product terms x t times u t if we would go
00:22:38.000 --> 00:22:44.000
back to the time series context or let's say for the same observation in the cross section
00:22:46.000 --> 00:22:53.000
setting x n times u n however you would like to frame it that the expectation of this product
00:22:53.000 --> 00:23:03.000
term is different from zero so in order to cure the problem the natural thing to think of
00:23:03.000 --> 00:23:11.000
is to change this term somehow and the only thing we can change is actually our choice of regressors
00:23:11.000 --> 00:23:16.000
because the u is an error term which we don't observe which we don't control it is just the
00:23:16.000 --> 00:23:22.000
error term so we may think about changing something about this regressor matrix x here
00:23:23.000 --> 00:23:31.000
in order to ensure that this plim one over n modified x prime u is equal to zero
00:23:32.000 --> 00:23:40.000
the idea which is the fundamental idea of what we call instrumental estimation is to replace the
00:23:40.000 --> 00:23:50.000
matrix x prime by different variables z prime which are not correlated with u anymore so that's
00:23:50.000 --> 00:23:58.000
basically just the first part of the idea we could just you know replace x prime by something which
00:23:58.000 --> 00:24:05.000
we call z prime by some other matrix which has no correlation with u so that the plim of one over
00:24:05.000 --> 00:24:13.000
n z prime u is equal to zero however there's no point in replacing the regressor matrix by a
00:24:13.000 --> 00:24:20.000
different matrix if this different matrix doesn't have any expiratory power or y obviously what we
00:24:20.000 --> 00:24:28.000
always need to have is that x prime is correlated with y otherwise we cannot estimate beta so there
00:24:28.000 --> 00:24:33.000
are two requirements when we replace x prime by variable z prime then we need to have the
00:24:33.000 --> 00:24:41.000
property that z prime is uncorrelated with u to make this term here disappear in the limit
00:24:42.000 --> 00:24:48.000
but at the same time it must be ensured that z prime is highly correlated with y
00:24:49.000 --> 00:24:55.000
and thereby it must be highly correlated with x now these are two different things
00:24:55.000 --> 00:25:02.000
it is possible to have variables which are uncorrelated with u but they are still correlated
00:25:02.000 --> 00:25:10.000
with x and if they are highly correlated with x then we may be hopeful that we can still estimate
00:25:10.000 --> 00:25:15.000
beta appropriately because we have something which is very close to the correlation between
00:25:15.000 --> 00:25:23.000
x prime and y when we look at z prime and y right so if z is highly correlated with x
00:25:24.000 --> 00:25:30.000
then the correlation between x and y will be very similar to the correlation between z
00:25:30.000 --> 00:25:38.000
and y so we may actually have z prime y here so that's the general idea of the instrumental
00:25:38.000 --> 00:25:45.000
variables estimator to find variable z prime which can serve as so-called instruments for x
00:25:48.000 --> 00:25:54.000
in general we will just denote this in such a way that we say we replace the regressor matrix
00:25:54.000 --> 00:26:03.000
x or beta the part here which is transposed x prime by different matrix z prime but of course
00:26:03.000 --> 00:26:13.000
x is a matrix which involves let's say k regressors and not for each column in k it needs to be true
00:26:13.000 --> 00:26:24.000
that the respective entry in x is correlated with u so that x k t is correlated with ut
00:26:25.000 --> 00:26:34.000
so in practice is only necessary to replace those columns in x by some different variable
00:26:35.000 --> 00:26:40.000
which have the property that there is contemporaneous correlation between
00:26:41.000 --> 00:26:48.000
the period t observation in this column of x with the period t error term ut
00:26:49.000 --> 00:26:56.000
and for other columns we do not need to change the column if there is no problem
00:26:56.000 --> 00:27:03.000
between no problem of correlation between x t of this particular column and ut
00:27:03.000 --> 00:27:14.000
so in this case when we do not replace a column in the x matrix by some instrument we say that
00:27:14.000 --> 00:27:23.000
this particular column is instrumented by itself so we use the exact column of x as an instrument
00:27:23.000 --> 00:27:30.000
so we just make no change but we do instrument with different variable those columns of x which
00:27:30.000 --> 00:27:37.000
display a contemporaneous correlation with u and since then in this case there is a change in
00:27:37.000 --> 00:27:47.000
the x matrix we use a different notation namely z this is what i have summarized in this box here
00:27:47.000 --> 00:27:55.000
z prime is called the matrix of instrumental variables and all of the columns in z are
00:27:55.000 --> 00:28:02.000
actually instrumental variables if you accept that a variable can also instrument itself
00:28:02.000 --> 00:28:06.000
namely in the case when there is no contemporaneous correlation with the error term
00:28:08.000 --> 00:28:20.000
the column say the kth column z k of z correlated with the column x k of x is called the instrument
00:28:20.000 --> 00:28:27.000
for x k and obviously x k can also instrument itself because x k is correlated with itself
00:28:29.000 --> 00:28:37.000
i just know that for some reason this z here seems to be a little smaller than the z here but
00:28:38.000 --> 00:28:44.000
this has no significance this here also means a matrix z it's the same matrix as this matrix
00:28:44.000 --> 00:28:51.000
here just it has been transposed here i don't know why in the printout this is a little smaller
00:28:51.000 --> 00:29:00.000
but do not mistake please this matrix z here for a column of z which are denoted by small meters
00:29:00.000 --> 00:29:06.000
so that's a capital z here and this is small letter z and this is a capital z again here it
00:29:06.000 --> 00:29:13.000
is clear but perhaps between the z and the z it is not quite as clear so that's why i emphasized
00:29:15.000 --> 00:29:24.000
now we will make three different assumptions which replace the assumptions we have used so far
00:29:24.000 --> 00:29:32.000
on the plumes of certain products between z and u between x and u and between x and x
00:29:32.000 --> 00:29:38.000
and these are assumptions a 9 a 10 and a 11 they replace assumptions a 7 and a 8
00:29:38.000 --> 00:29:45.000
so the setup of the assumptions is actually completely analogous to assumptions a 7 and a 8
00:29:46.000 --> 00:29:53.000
we will assume that the plume of 1 over n times the sum of z i prime x i
00:29:54.000 --> 00:30:04.000
is a finite invertible matrix capital sigma z x so we just assume that this sum here converges
00:30:04.000 --> 00:30:10.000
in probability to a finite invertible matrix that's all so we just assume that it doesn't explode
00:30:11.000 --> 00:30:16.000
and that it doesn't become singular okay so that's a rather weak assumption typically
00:30:17.000 --> 00:30:27.000
assumption a 10 is that the plume of 1 over n times the sum over z i times u i
00:30:27.000 --> 00:30:35.000
is equal to the expectation of z i and u i and that this expectation is zero
00:30:36.000 --> 00:30:44.000
this is the important assumption which we need in order to guarantee that this term here with x
00:30:44.000 --> 00:30:51.000
being replaced by z converges in probability to zero rather than when we use x when it does not
00:30:51.000 --> 00:30:59.000
converge to zero so assumption a 10 basically ensures the consistency of our estimate
00:31:01.000 --> 00:31:07.000
by ensuring that the second component here when we replace x by z converges to zero
00:31:07.000 --> 00:31:16.000
and assumption a 9 is just the assumption which would guarantee that 1 over m z prime x inversely
00:31:16.000 --> 00:31:23.000
converges to something which stays finite because we when we multiply by zero we have to ensure that
00:31:23.000 --> 00:31:28.000
this matrix here is finite if it were to become infinite then obviously we wouldn't really know
00:31:28.000 --> 00:31:35.000
where it converges to since infinity times zero is not well defined in that case i mean it depends
00:31:35.000 --> 00:31:39.000
on the speed of convergence then but there are cases in which the product would not be
00:31:40.000 --> 00:31:49.000
so to be on the safe side we assume in a 9 that sigma z x is finite and in a 10 we assume
00:31:49.000 --> 00:31:52.000
that the second term will actually converge to zero
00:31:54.000 --> 00:31:59.000
and for the covariance matrix for the asymptotic covariance matrix we also need to assume that the
00:31:59.000 --> 00:32:08.000
plume of 1 over n z i prime z i which i denote by sigma z z is a finite and invertible matrix
00:32:09.000 --> 00:32:16.000
so that's an assumption completely analogous to a 9 but now relating to the product of the z i
00:32:17.000 --> 00:32:18.000
columns here
00:32:22.000 --> 00:32:29.000
okay um with these definitions uh with the or with these assumptions we can define the instrumental
00:32:29.000 --> 00:32:37.000
variables estimator which i do down here where say the linear estimator beta hat i v for instrumental
00:32:37.000 --> 00:32:46.000
variables defined as z prime x inverse z prime y is called the instrumental variables estimator of
00:32:46.000 --> 00:32:53.000
beta and the instrumental variables estimator is typically abbreviated i v okay so off you will
00:32:53.000 --> 00:33:00.000
read just the i v estimator note that the difference between the i v estimator and the ols
00:33:00.000 --> 00:33:09.000
estimator is that we that we replace the x prime terms in the normal ols estimator
00:33:09.000 --> 00:33:17.000
by z prime terms the x matrix here is preserved so the instrumental variables estimator is not z
00:33:17.000 --> 00:33:25.000
prime z inverse z prime y you'll see in a minute why we preserve the x here but you have to replace
00:33:25.000 --> 00:33:35.000
the x the x prime matrix both in this inverse here and in its product with y by a z prime
00:33:36.000 --> 00:33:39.000
matrix in order to construct the i v estimator
00:33:42.000 --> 00:33:49.000
now we've always studied two properties of the least squares estimator namely the property of
00:33:49.000 --> 00:33:55.000
being unbiased and the property of being consistent and the first thing to note is
00:33:55.000 --> 00:34:02.000
that the i v estimator is still a biased estimator in general which is easy to see
00:34:02.000 --> 00:34:10.000
look at this equation here the the expected value of beta beta hat i v is equal to the expectation
00:34:10.000 --> 00:34:20.000
of z prime x inverse z prime y and we can then substitute of the y and write in place of the y
00:34:20.000 --> 00:34:30.000
x beta plus u like usual so it's the expectation of z prime x inverse z prime x beta plus u
00:34:31.000 --> 00:34:37.000
here you see already why we preserve the x in the instrumental variables estimator at this place
00:34:37.000 --> 00:34:45.000
because we have z prime x inverse in our here and this matches well to the z prime x which we obtain
00:34:46.000 --> 00:34:55.000
by substituting of y for x beta plus u in this expression here so these two matrices then cancel
00:34:55.000 --> 00:35:03.000
because this is just matrix at an inverse z prime x inverse times z prime x beta gives just beta
00:35:03.000 --> 00:35:13.000
which is the true value but what we remain with is the expectation of z prime x inverse z prime u
00:35:14.000 --> 00:35:23.000
z prime x inverse z prime u is the second term here and this expectation here is in general not
00:35:23.000 --> 00:35:30.000
equal to zero right there's no reason to suppose that would be equal to zero because we would have
00:35:30.000 --> 00:35:36.000
some x terms in here and we would have some u terms in here and we know the x is correlated
00:35:36.000 --> 00:35:44.000
with the u so this expectation here is different from zero and therefore the ib estimator is in
00:35:44.000 --> 00:35:52.000
most cases a biased estimator thus the the instrumentation does not solve the problem of
00:35:53.000 --> 00:36:00.000
of biasness of the estimator the ib estimator usually stays biased there are only few and very
00:36:00.000 --> 00:36:05.000
special cases in which you may show that the ib estimator is an unbiased estimate
00:36:07.000 --> 00:36:14.000
so the best we can hope for in a setting where we have correlation between x and u is that we have
00:36:14.000 --> 00:36:22.000
consistency and this indeed is the case if assumptions a nine and a two hold because then
00:36:22.000 --> 00:36:29.000
we have the asymptotic properties that the plim of one over n z prime x is equal to sigma z x
00:36:29.000 --> 00:36:37.000
and the plim of one over n z prime u is equal to zero and these are just the properties we need
00:36:37.000 --> 00:36:45.000
to ensure that the term in the expectations operator here converges asymptotically to zero
00:36:45.000 --> 00:36:52.000
because the first term here would converge to a finite matrix and the second term here would
00:36:52.000 --> 00:36:59.000
converge to zero and then the whole thing vanishes asymptotic right so that's what i have
00:36:59.000 --> 00:37:06.000
written down here if assumptions a nine and a ten hold then we know that z prime x inverse z prime
00:37:06.000 --> 00:37:13.000
u can be written as one over n z prime x and the whole thing taking the inverse of which converges
00:37:13.000 --> 00:37:25.000
to sigma z x inverse e and one over n z prime u which converges converges to zero so that's
00:37:25.000 --> 00:37:31.000
something finite here this is zero so the product of the two will be zero and in this case we have
00:37:32.000 --> 00:37:35.000
the consistency of the ib estimator is touched
00:37:35.000 --> 00:37:45.000
um that's basically already the formula proof or at least these are the core ingredients of
00:37:45.000 --> 00:37:50.000
the proof i won't work it out because i think it is sufficient here if you have this understanding
00:37:50.000 --> 00:37:59.000
of how consistency is established and the result then is that beta hat ib is consistent for the
00:37:59.000 --> 00:38:05.000
true parameter beta if a nine and a ten hold and in addition of course we need some standard
00:38:05.000 --> 00:38:12.000
regularity conditions which we always assume but the two important conditions are a nine and a ten
00:38:15.000 --> 00:38:22.000
that's about the consistency so about the question when the estimator converges to the true value
00:38:22.000 --> 00:38:30.000
asymptotically then we may also ask about the variance of this estimator and we can prove in
00:38:30.000 --> 00:38:40.000
similar way using assumption a nine and a ten um that the asymptotic covariance matrix of beta hat
00:38:40.000 --> 00:38:55.000
ib um can be established in such a way that for large n the covariance matrix of beta hat ib is
00:38:55.000 --> 00:39:04.000
approximately equal to the variance of the term u of the error term u times z prime x inverse z prime
00:39:04.000 --> 00:39:16.000
z x prime z inverse note the wording i have chose here i have not said that asymptotically the
00:39:16.000 --> 00:39:25.000
covariance matrix of beta iv hat is some particular matrix because that would not be true
00:39:26.000 --> 00:39:34.000
for the following reason we know that beta hat iv converges asymptotically in the plim sense
00:39:34.000 --> 00:39:40.000
to the true parameter so asymptotically actually this estimator doesn't have any variance anymore
00:39:41.000 --> 00:39:47.000
the probability of the estimator being different from the true value is zero asymptotically that's
00:39:47.000 --> 00:39:54.000
precisely what clime convergence says so asymptotically there is no covariance of beta hat
00:39:55.000 --> 00:40:02.000
beta hat iv there's no covariance matrix no non-zero covariance matrix of beta hat ib anymore
00:40:03.000 --> 00:40:11.000
so what i just say is that when n becomes large then the variance the covariance matrix of beta
00:40:11.000 --> 00:40:19.000
hat iv is approximately equal to this matrix here so we can use this expression here to compute
00:40:19.000 --> 00:40:25.000
standard errors of the estimated coefficients for large n and that's completely sufficient
00:40:25.000 --> 00:40:31.000
for our purposes because we never have infinitely many observations we always at most have large
00:40:31.000 --> 00:40:38.000
numbers of observations so large n and then we may use this expression here to compute standard
00:40:38.000 --> 00:40:47.000
errors i will give you the expression for a covariance matrix of the transformed estimator
00:40:47.000 --> 00:40:54.000
beta hat iv be multiplied by the square root of n two or three slides so that of course can also
00:40:54.000 --> 00:41:00.000
be established but what you may use as the as the matrix to compute the standard errors of the
00:41:00.000 --> 00:41:09.000
coefficients is actually this expression 57 here comes the proof we know that beta hat iv is
00:41:09.000 --> 00:41:18.000
consistent and therefore we have if n is large that the covariance matrix of beta hat iv is of
00:41:18.000 --> 00:41:27.000
course just well the expectation of the mean adjusted beta hat iv mean adjusted beta hat iv
00:41:27.000 --> 00:41:33.000
means beta hat iv minus the expectation of beta hat iv multiplied by its own prime
00:41:34.000 --> 00:41:40.000
multiplied by its own transport so that would be the covariance matrix the difficult thing here
00:41:40.000 --> 00:41:47.000
is that the expectation of beta hat iv is as we know not the true beta because the estimator is
00:41:47.000 --> 00:41:53.000
biased as i have shown to you so this thing here cannot be replaced by beta as we have usually done
00:41:53.000 --> 00:42:07.000
in the ols context however we know for large n the expectation of uh beta hat iv will be close
00:42:07.000 --> 00:42:16.000
to beta because we know that beta hat iv converges in probability to beta so that's why i use an
00:42:16.000 --> 00:42:24.000
approximately equal sign here for large n when we are already close to the true parameter we can
00:42:24.000 --> 00:42:32.000
safely say that the expectation of beta hat iv is approximately equal to beta and therefore we can
00:42:32.000 --> 00:42:40.000
use this expression here well this expression is much easier to deal with because we can replace
00:42:40.000 --> 00:42:48.000
this in the usual way by z prime x inverse z prime u u prime z x prime z inverse so that's the
00:42:48.000 --> 00:42:54.000
usual expression i mean if you just look first you first forget this part here which is just the
00:42:54.000 --> 00:43:02.000
transpose of the first part here if you just look at the first part here up to uh this u term here
00:43:02.000 --> 00:43:11.000
then you will recall that z prime x inverse is z prime u is just the expression for beta hat
00:43:12.000 --> 00:43:19.000
for the estimator minus the true value right if there were z prime x inverse is z prime y here
00:43:19.000 --> 00:43:26.000
then we would have beta hat and if we subtract the true value then we just have the u value here
00:43:26.000 --> 00:43:32.000
so z prime x inverse is z prime u is the same thing as beta hat iv minus the true value
00:43:32.000 --> 00:43:40.000
and then multiplied by its transpose well the expectation of this thing here is again assuming
00:43:40.000 --> 00:43:47.000
that we have large n or exploiting the assumption that we have large n the expectation of this is
00:43:47.000 --> 00:43:53.000
the same expectation that we would have if actually z prime and x were variables which we could pull
00:43:53.000 --> 00:44:01.000
outside of the expectations operator so that we just take the expectation of u u prime in here
00:44:01.000 --> 00:44:06.000
the expectation of u u prime is by our assumptions that just sigma square u times
00:44:06.000 --> 00:44:12.000
identity matrix so the whole thing simplifies to approximately sigma square u
00:44:12.000 --> 00:44:20.000
f prime x inverse z prime z times x prime z inverse so this here would be the covariance
00:44:20.000 --> 00:44:31.000
matrix of beta hat iv for finite but large n you'll find this expression often in the literature
00:44:31.000 --> 00:44:38.000
but you also find a different expression than this one here as the covariance matrix of the
00:44:38.000 --> 00:44:44.000
instrumental variables estimator and i will give you this other expression also it is actually of
00:44:44.000 --> 00:44:48.000
course the same value but the notation is so different that perhaps you may think it is a
00:44:48.000 --> 00:44:58.000
different matrix you know from matrix algebra that the product of matrices or the inverse of
00:44:58.000 --> 00:45:04.000
a product of matrices say a b c inversely is the same thing as c inversely b inversely a inversely
00:45:05.000 --> 00:45:12.000
for invertible and conformable matrices a b c so the order of multiplication is reversed when we
00:45:12.000 --> 00:45:20.000
take the inversely right so this result you should actually know so when we look at this thing here
00:45:22.000 --> 00:45:33.000
z prime x inversely z prime z x prime z inversely then by the same token as this here we can conclude
00:45:33.000 --> 00:45:40.000
that these are three inverses multiplied with each other so we also can write this as one inverse
00:45:40.000 --> 00:45:47.000
so we also can write this as x prime z which would be the inverses of this term here
00:45:48.000 --> 00:45:57.000
times z prime z inversely which would be the inverse of z prime z here times z prime x which
00:45:57.000 --> 00:46:05.000
is the inverse of z prime x here and the whole thing taken as an inverse and now you also see
00:46:05.000 --> 00:46:12.000
that in the middle of this expression we have z z prime z inversely z prime and the last part
00:46:12.000 --> 00:46:19.000
of this is z prime z inverse is z prime which is just z plus which is just the pseudo inverse the
00:46:19.000 --> 00:46:27.000
group n-verse inverse so you can also write the covariance matrix as x prime z times z plus times
00:46:27.000 --> 00:46:36.000
x and the inverse thereof this expression here is exactly the same thing as this expression here
00:46:38.000 --> 00:46:42.000
except for the sigma square u which i have suppressed right but the same thing as this
00:46:42.000 --> 00:46:48.000
matrix expression here so you can also denote in this form you should not be confused if in the
00:46:48.000 --> 00:46:53.000
literature sometimes you find as the covariance matrix of the instrumental variables estimated
00:46:53.000 --> 00:46:58.000
this expression here rather than that expression or vice versa they are both identical
00:47:00.000 --> 00:47:04.000
and both are frequently used actually so you may easily encounter both of them
00:47:08.000 --> 00:47:13.000
okay as i have already said this matrix which i have just spoken of sigma square u times z prime
00:47:13.000 --> 00:47:19.000
x inversely z prime z x prime z inversely is a large sample approximation to the covariance
00:47:19.000 --> 00:47:26.000
matrix of beta hat iv it is obviously not the asymptotic covariance matrix of beta hat iv for
00:47:26.000 --> 00:47:31.000
the reason i have already given beta hat iv converges in probability to beta so there's
00:47:31.000 --> 00:47:40.000
actually no asymptotic variance of beta hat iv anymore everything sort of goes to a central
00:47:40.000 --> 00:47:46.000
value of beta and the probability of beta hat iv deviating from beta is zero
00:47:46.000 --> 00:47:53.000
well here is this reason stated again in a formal way z prime x inverse z prime z x prime
00:47:53.000 --> 00:48:04.000
z inversely we can look at where this matrix converges to well look we again introduce
00:48:04.000 --> 00:48:13.000
factors n or one over n in this expression so we write one over n in this inverse expression here
00:48:13.000 --> 00:48:20.000
the inverse of one over n is n so this cancels against the one over n which i have here right
00:48:21.000 --> 00:48:27.000
and then i have one over n an inverse in here which cancels against another one over n here
00:48:27.000 --> 00:48:33.000
so all those n factors actually cancel against each other and then we have one over n z prime
00:48:33.000 --> 00:48:42.000
x inversely here of which we know by our assumption a nine that it converts to a finite matrix
00:48:43.000 --> 00:48:50.000
and we have this matrix one over n z prime z here which by assumption a 11 converges to a
00:48:50.000 --> 00:49:00.000
finite matrix sigma zz and we have this matrix here which again by assumption a nine converges
00:49:00.000 --> 00:49:07.000
against a finite matrix so this thing and this thing and this thing all of them converge in
00:49:07.000 --> 00:49:14.000
probability against something which is finite but then we are left with a fourth factor one over n
00:49:14.000 --> 00:49:22.000
here and this goes to zero so we have we see immediately that the whole covariance matrix here
00:49:23.000 --> 00:49:30.000
converges to zero as the number of observations n goes to infinity the factor one over n then goes
00:49:30.000 --> 00:49:40.000
to zero so asymptotically we don't really have a covariance matrix of beta hat i v rather than
00:49:40.000 --> 00:49:47.000
what we would need to do is in order to have a well defined covariance matrix asymptotically
00:49:47.000 --> 00:49:55.000
is that we look at square root of beta hat i v whose covariance matrix would then be this thing
00:49:56.000 --> 00:50:09.000
here and this thing here times n sorry times n so without this factor here
00:50:10.000 --> 00:50:16.000
the intuition of this is that with increasing and the precision of the estimates increases
00:50:17.000 --> 00:50:23.000
so that means that the variance of the estimates decreases and in the limit then the estimation
00:50:23.000 --> 00:50:28.000
variance is zero and the estimates converge to the true value beta in probability
00:50:31.000 --> 00:50:39.000
so it follows then that for large n we have that the variance of square root of n beta hat i v
00:50:39.000 --> 00:50:48.000
is the same thing as n times the variance of beta hat i v which would be n times sigma square u times
00:50:48.000 --> 00:50:55.000
this covariance matrix here and of this matrix we would know that it converges to this well defined
00:50:55.000 --> 00:51:03.000
limit here which is finite because now this factor n here cancels against the one over n factor what
00:51:03.000 --> 00:51:10.000
do we have there so this expression here would be the asymptotic covariance matrix of the i v
00:51:10.000 --> 00:51:20.000
estimates multiplied by the square root of k now under some regularity conditions like
00:51:20.000 --> 00:51:26.000
weak dependence of the error terms there are certain versions of the law of large numbers
00:51:26.000 --> 00:51:32.000
and the central limit theorem which hold in our setting i do not give you the formal proofs there
00:51:32.000 --> 00:51:40.000
they are technical and we won't bother with them it suffices that you just understand that basically
00:51:40.000 --> 00:51:49.000
what happens here is analogous to the OLS case which i already refrained from giving you the
00:51:49.000 --> 00:51:56.000
proof for the asymptotic normality and then just try to make it plausible so the same thing holds
00:51:56.000 --> 00:52:03.000
here and we can then show that square root of n times the i v estimator has non-degenerate
00:52:03.000 --> 00:52:12.000
asymptotic normal distribution where square root of n times beta hat i v minus beta converges
00:52:12.000 --> 00:52:20.000
in distribution to a normal distribution with expectation zero and this covariance matrix
00:52:21.000 --> 00:52:29.000
here so we know what the asymptotic distribution of square root of n beta hat i v minus beta is
00:52:29.000 --> 00:52:36.000
so we can actually use normal testing procedures based on a normal distribution
00:52:36.000 --> 00:52:47.000
if the sample size is large but this is a restriction of the i v estimator or a detriment
00:52:47.000 --> 00:52:54.000
of the simultaneity settings that we can establish only asymptotic properties for i v
00:52:54.000 --> 00:53:00.000
estimators and not finite sample properties because the finite sample properties typically
00:53:00.000 --> 00:53:04.000
strongly depend on the specific instruments and on the sample size
00:53:08.000 --> 00:53:15.000
well a short exercise for you um think about why we don't use the estimator x prime x inverse z
00:53:15.000 --> 00:53:23.000
prime y so why do we have z prime in here in the and in the i v estimator rather than x prime x
00:53:23.000 --> 00:53:31.000
inverse z now let's look at another example namely the example of the Keynesian cross again
00:53:32.000 --> 00:53:38.000
remember the Keynesian cross example um perhaps i'll briefly go back to it to show you the setting
00:53:38.000 --> 00:53:48.000
at consumption and investment output was it let's see here it is right so this was uh the model
00:53:48.000 --> 00:53:54.000
uh y is equal to consumption plus investment investment is assumed to be exogenous
00:53:54.000 --> 00:54:00.000
consumption depends on income with marginal marginal propensity c and then we have an error
00:54:00.000 --> 00:54:09.000
term here the objective of this model would be to estimate the consumption because this here is a
00:54:09.000 --> 00:54:15.000
definition this is true there's no parameter to be estimated that's an identity but this thing here
00:54:15.000 --> 00:54:20.000
is a behavioral equation which we would like to estimate so we would like to get an estimate of c
00:54:20.000 --> 00:54:26.000
right so that's the Keynesian cross example the most uh the simplest macroeconomic model
00:54:26.000 --> 00:54:36.000
you may think of we go back to this uh model now oops where are we here where we want to estimate
00:54:36.000 --> 00:54:41.000
c now i use the index t assuming that we have time series data need not be the case you may
00:54:41.000 --> 00:54:46.000
also do this in a cross section of countries for instance but most often just done with time
00:54:46.000 --> 00:54:54.000
series data ct is equal to marginal propensity or to consume small c times yt plus error beauty
00:54:56.000 --> 00:55:04.000
the problem is that when solving the model we know that yt is equal to 1 over 1 minus c times
00:55:04.000 --> 00:55:15.000
investment plus ut and this means that we have a correlation between uh the regressor yt and the
00:55:15.000 --> 00:55:20.000
error term which of course here should be a ut and not an epsilon t excuse me that's an error here
00:55:20.000 --> 00:55:30.000
right so yt as a regressor here consists already of ut and ut is simultaneously the error term
00:55:30.000 --> 00:55:35.000
i showed you this already when i discussed the Keynesian cross example and this here should
00:55:35.000 --> 00:55:43.000
of course be ut now what may we do we may say well if the model is true then investment is
00:55:43.000 --> 00:55:51.000
exogenous so investment does not correlate with ut right investment is an exogenous ut is a
00:55:52.000 --> 00:55:58.000
an error term a random event that they do not correlate with each other but obviously yt
00:55:58.000 --> 00:56:07.000
correlates with it because yt is transformation of it plus ut so what we may think of is that
00:56:07.000 --> 00:56:14.000
we use it as an instrument in our regression regression to estimate the marginal propensity
00:56:14.000 --> 00:56:26.000
of uh of consumption so um again please this is an ut and not an exilon t um we can construct the
00:56:26.000 --> 00:56:36.000
estimator for the marginal propensity to consume which i here denote by civ hat as i prime y
00:56:36.000 --> 00:56:44.000
inverse c times i prime c where the notation should be clear i comprises all the time series
00:56:44.000 --> 00:56:50.000
observations on investment and y comprises all the time series observations of income and the same
00:56:50.000 --> 00:56:56.000
thing for consumption here so these are vectors which i denote here i prime y i prime c that would
00:56:56.000 --> 00:57:04.000
be the instrumental variables estimator so we may use this estimator in equation 58 rather than
00:57:04.000 --> 00:57:11.000
estimating c hat as y prime y inverse y prime z which would be the ols estimator we know that the
00:57:11.000 --> 00:57:18.000
ols estimator is inconsistent because there is the correlation between y prime and the error term u
00:57:18.000 --> 00:57:27.000
so easily we can derive an instrument which at least if the model is true would be a valid
00:57:27.000 --> 00:57:33.000
instrument for the regressor yt which happens to be correlated with ut
00:57:35.000 --> 00:57:43.000
however excuse me my epsilon here and that's again in the u in this simple model we have
00:57:43.000 --> 00:57:49.000
made this assumption that it is exogenous and independent of epsilon this is not necessarily
00:57:49.000 --> 00:57:56.000
true in reality actually it will typically not be true in reality usually investment is also
00:57:57.000 --> 00:58:04.000
endogenous and may well correlate with the error term so it may actually not be a good instrument
00:58:04.000 --> 00:58:11.000
to estimate the marginal propensity to consume only in this model it is but not in reality so
00:58:11.000 --> 00:58:18.000
we may also think of other instruments to instrument y and one alternative may be that we
00:58:18.000 --> 00:58:26.000
use the lag value of y i use here slightly different notation than i had in previous versions
00:58:26.000 --> 00:58:31.000
of these slides and you may still have the old versions of the slides um excuse me for changing
00:58:31.000 --> 00:58:39.000
it here but i found clear to express it this way as a vector y index minus one meaning this is
00:58:39.000 --> 00:58:46.000
a vector comprising all the observations on income but lagged by one period for this reason
00:58:46.000 --> 00:58:51.000
an index minus one i use the same notation by the way in the sample exam questions as you
00:58:51.000 --> 00:59:01.000
may perhaps already have noted so not using i as instrument rather rather using y minus one is
00:59:01.000 --> 00:59:10.000
also possible because y minus one is a valid instrument since the lag value of y does not
00:59:10.000 --> 00:59:17.000
correlate with the contemporaneous value of u it cannot because it realized itself earlier it
00:59:17.000 --> 00:59:22.000
realized itself one period earlier than the error term so uh there cannot be any correlation for
00:59:22.000 --> 00:59:30.000
that so we have a different possibility which i denote a civ double hat different possibility
00:59:30.000 --> 00:59:38.000
for an instrumental variables estimator where i use uh y minus one prime y inversely times y
00:59:38.000 --> 00:59:45.000
minus one prime c so that's also a possibility to estimate the modern propensity to consume
00:59:45.000 --> 00:59:52.000
and i would think it is a safer uh possibility to do so because of y minus one we surely know that
00:59:52.000 --> 00:59:59.000
there is no correlation with the error term of the next period but for it we cannot be that
00:59:59.000 --> 01:00:03.000
sure actually it is quite possible in reality that there will be a correlation
01:00:05.000 --> 01:00:11.000
so that's actually something which is done uh really often in empirical work that you just
01:00:11.000 --> 01:00:20.000
use the lagged value of a variable in order to instrument this variable because by using the
01:00:20.000 --> 01:00:27.000
lag you can usually be sure that there is no correlation with the error term in the anymore
01:00:27.000 --> 01:00:33.000
and at least in the time series framework it is usually the case that y and its own past are
01:00:33.000 --> 01:00:41.000
highly autocorrelated so we would have the two desired properties that the instrument is
01:00:41.000 --> 01:00:49.000
uncorrelated with the error term u but it is highly correlated often with the regressor it is
01:00:49.000 --> 01:00:56.000
replacing in the instrumental variables estimation of course the fact that y minus
01:00:56.000 --> 01:01:04.000
one is uncorrelated with u is only given if the error u is itself not autocorrelated
01:01:05.000 --> 01:01:12.000
but typically in specifications um you would or in estimation you would specify the equation in
01:01:12.000 --> 01:01:19.000
such a way that you add as many regresses as there are necessary to ensure that the error term
01:01:19.000 --> 01:01:26.000
is not autocorrelated anymore this you will learn in time series econometrics if you are interested
01:01:26.000 --> 01:01:32.000
in time series econometrics um i offer a course on that this next semester i won't go into the
01:01:32.000 --> 01:01:38.000
details here it suffices here that you just um note that of course we need the property of u
01:01:38.000 --> 01:01:45.000
not being autocorrelated in order to guarantee that y minus one has no contemporaneous correlation
01:01:45.000 --> 01:01:52.000
anymore with the error term u there's a small exercise on that here to suppose that ut is not
01:01:52.000 --> 01:01:59.000
autocorrelated then show the consistency of civ double hat in the canesian cross model
01:02:01.000 --> 01:02:08.000
as i said already it is a very popular approach in econometrics to instrument a regressor by its own
01:02:08.000 --> 01:02:12.000
so usually taking just one period lag
01:02:14.000 --> 01:02:22.000
and another exercise going back to the errors and variables example now suppose that xt has
01:02:22.000 --> 01:02:30.000
autocorrelation so xt is autocorrelated with is correlated with xt minus one try to develop an
01:02:30.000 --> 01:02:39.000
IV estimator for beta despite of this undesirable but empirical practice often encountered property
01:02:42.000 --> 01:02:50.000
the last thing i would like to cover in this slide is the related approach of two-stage least squares
01:02:50.000 --> 01:02:57.000
estimation you'll probably have heard of two-stage least squares estimators also in your basic
01:02:57.000 --> 01:03:03.000
econometrics in your undergraduate econometrics and you may already know that two-stage least
01:03:03.000 --> 01:03:10.000
squares is actually a special case of IB instrument estimation but in case that is not so clear to you
01:03:10.000 --> 01:03:17.000
i think here is a good opportunity to show to you or to review this property and put it into
01:03:18.000 --> 01:03:24.000
a perspective so last few slides on two-stage least squares estimation
01:03:26.000 --> 01:03:34.000
because historically people have taken different approaches to dealing with the simultaneity bias
01:03:34.000 --> 01:03:41.000
as we have found it in the canesian cross and this is why we will return to the canesian cross example
01:03:42.000 --> 01:03:50.000
and the idea was the idea of two-stage least squares estimation was that we estimate the model
01:03:50.000 --> 01:03:59.000
in two steps and the first step was actually only designed to construct a regressor which is
01:03:59.000 --> 01:04:07.000
uncorrelated with the error term of the second step regression so knowing that there is a problem
01:04:07.000 --> 01:04:13.000
of contemporaneous correlation between the regressor and the error term in one step estimation
01:04:13.000 --> 01:04:21.000
and usual OLS estimation people have said can we not first construct an appropriate regressor
01:04:21.000 --> 01:04:26.000
given that the regressor we currently have let's say y is not a good regressor since it is
01:04:26.000 --> 01:04:32.000
contemporaneously correlated with the error term and when we have constructed such a regressor
01:04:32.000 --> 01:04:38.000
we can use it in the second step and we are rid of the problem that the regressor correlates
01:04:38.000 --> 01:04:46.000
contemporaneously with the error term so here is the idea illustrated at the example of the canesian
01:04:46.000 --> 01:04:55.000
cross so in step one we would rewrite the canesian cross model which is expressed in equations 2021
01:04:55.000 --> 01:05:02.000
in the reduced form that is we solve for the endogenous variables as functions of the
01:05:02.000 --> 01:05:07.000
exogenous variables and the shocks that is called the reduced form when you have solved the model in
01:05:07.000 --> 01:05:12.000
such a way that all the endogenous variables are on the left hand side and all the exogenous
01:05:12.000 --> 01:05:17.000
variables are on the right hand side of the equations and the exogenous variables in the
01:05:17.000 --> 01:05:23.000
canesian cross example are of course investment because this is assumed to be exogenous need not
01:05:23.000 --> 01:05:30.000
be exogenous in reality probably is not definitely is not exogenous in reality and the error term
01:05:30.000 --> 01:05:36.000
which is also an exogenous variable so in this very simple model actually we're just interested
01:05:36.000 --> 01:05:44.000
in the reduced form of the regressor y we know the regressor y is equal to one over one minus c i t
01:05:44.000 --> 01:05:53.000
plus one over one minus c u t so we may write this as gamma times i t plus u tilde t so the idea of
01:05:53.000 --> 01:06:02.000
the first step regression is to first estimate this reduced form here so regress y t on just
01:06:02.000 --> 01:06:10.000
exogenous variables right so that should be possible because here is no correlation between regressor
01:06:10.000 --> 01:06:18.000
and error term since i t is exogenous so it does not depend at all on the error term at least not
01:06:18.000 --> 01:06:26.000
if the model is true right so we could regress y t on i t estimate gamma and thereby estimate
01:06:26.000 --> 01:06:34.000
also the new error term u tilde which is a simple transformation of the error term u so we get an
01:06:34.000 --> 01:06:42.000
estimator gamma hat and that's an unbiased estimator of this gamma here actually it is also
01:06:42.000 --> 01:06:46.000
an unbiased estimator of one over one minus c because this precisely what gamma is and you
01:06:46.000 --> 01:06:53.000
might say well why not stop here and then compute c from the fact that gamma hat is one over minus c
01:06:53.000 --> 01:06:57.000
that's a simple non-linear transformation if you have an idea of c but that's a non-linear
01:06:57.000 --> 01:07:01.000
transformation so it would be a non-linear estimate so we don't really know exactly
01:07:01.000 --> 01:07:07.000
what properties it has the blue property is just a property of linear estimators so let's not go
01:07:07.000 --> 01:07:14.000
into this type of research dealing here with non-linear estimation problems but just stay
01:07:14.000 --> 01:07:22.000
with the classical approach of linear estimators and take now this gamma hat here as something we
01:07:22.000 --> 01:07:32.000
may use in the second step estimation so in step two what do you do given gamma hat we can compute
01:07:32.000 --> 01:07:44.000
y hat as i t times gamma hat right we know that y is i t times gamma plus some error now we have
01:07:44.000 --> 01:07:51.000
estimated gamma so we have estimated gamma hat therefore we may say gamma hat times i t is the
01:07:51.000 --> 01:08:01.000
systematic component of y t and u t tilde is the random component of y t by using gamma hat in this
01:08:01.000 --> 01:08:07.000
equation here we can decompose y t into its systematic part and into its random part
01:08:09.000 --> 01:08:15.000
and obviously it is the case that only the random part of y t correlates with u t
01:08:17.000 --> 01:08:25.000
but gamma hat times i t does not correlate with u t right so that's the idea of two-stage least
01:08:25.000 --> 01:08:33.000
squares estimation in some way you may think of this y t hat as a forecast based on the exogenous
01:08:33.000 --> 01:08:44.000
value i t and based on a coefficient estimate gamma hat so y t hat is uncorrelated with u t but
01:08:44.000 --> 01:08:53.000
is correlated with y t and therefore it can serve as an instrument basically so what we do now is
01:08:53.000 --> 01:09:04.000
that we replace y t in the regression equation c t is equal to c y t plus u t by y hat t and then
01:09:04.000 --> 01:09:12.000
we estimate by all s this is the second step and the two step or two-stage least squares estimator
01:09:13.000 --> 01:09:22.000
so estimating this relationship here with y t being replaced by y hat t gives us the two-stage
01:09:22.000 --> 01:09:27.000
least squares estimator which i denote here as c hat two s s for two-stage least squares
01:09:30.000 --> 01:09:41.000
okay now we have now four variables actually which i collect in vectors so y y hat c and i
01:09:41.000 --> 01:09:47.000
are the four variables and i construct corresponding time series vectors of the individual
01:09:47.000 --> 01:09:55.000
observations y t y hat t c t i t so these are scalar values and these are here are vectors
01:09:55.000 --> 01:10:01.000
please do not confuse the i which is the investment data vector with the identity matrix so this has
01:10:01.000 --> 01:10:10.000
nothing to do with the matrix here now the gamma hat is actually i prime i inverse c times i prime
01:10:10.000 --> 01:10:19.000
y because this is gamma we know gamma hat is the ols estimate of this relationship here
01:10:20.000 --> 01:10:28.000
right estimating the gamma here would just give i prime i i inversion times i prime y as the
01:10:28.000 --> 01:10:38.000
estimate of gamma so gamma hat is this expression here the two-stage least squares estimator on the
01:10:38.000 --> 01:10:47.000
other hand is y hat prime y hat inverse c y hat prime c that's the ols estimator of the
01:10:47.000 --> 01:10:58.000
consumption equation using s regressor y hat rather than y now we know what y hat is y hat is
01:10:58.000 --> 01:11:09.000
i times gamma hat so we may actually write this whole thing as y prime i i prime i inverse c i
01:11:09.000 --> 01:11:18.000
prime i i prime i inverse c i prime y right because as you know the perhaps i'm going into
01:11:18.000 --> 01:11:27.000
the fast now y hat is i times gamma hat and gamma hat is i prime i inverse c i prime y
01:11:28.000 --> 01:11:36.000
so in place of y hat i may also write i times i prime i inverse c i prime y
01:11:37.000 --> 01:11:43.000
and that's what i do here and if i do it this way you see that this expression here
01:11:45.000 --> 01:11:52.000
evolves and in this expression i prime i inverse c and i prime i cancel obviously
01:11:52.000 --> 01:11:58.000
here we have the other part of the expression y prime i i prime i inverse c i prime c
01:11:59.000 --> 01:12:08.000
so that simplifies two now first we let these two terms cancel here and then we use the fact that
01:12:08.000 --> 01:12:14.000
the inverse of the product is the product of the inverses in reverse order so this is i prime y
01:12:14.000 --> 01:12:22.000
inverse c times i prime i times y prime i inverse c times y prime i times i prime i inverse c times
01:12:22.000 --> 01:12:29.000
i prime c here again you see that terms cancel like for instance y prime i inverse c here cancels
01:12:29.000 --> 01:12:37.000
against y prime i here and when these two terms go away then the i prime i inverse c here cancels
01:12:37.000 --> 01:12:44.000
against the i prime i here and we are finally left with i prime y inverse c times i prime c
01:12:45.000 --> 01:12:53.000
and this is the instrumental variance estimator that's exactly the estimator no sorry that's just
01:12:53.000 --> 01:12:59.000
these instrumental variables estimator that's the estimator which yeah perhaps i should show it
01:12:59.000 --> 01:13:09.000
this way here which we would have if we oh sorry i don't really see the right formula if we replace
01:13:09.000 --> 01:13:19.000
y prime by the instrument i prime but the ols estimator of c hat would be y prime y inverse
01:13:20.000 --> 01:13:30.000
y prime c and this expression we have replaced the y prime by i prime so we have i prime y
01:13:30.000 --> 01:13:37.000
inverse i prime c which means that we have instrumented the y by i as we have already
01:13:37.000 --> 01:13:45.000
thought about doing so given that or assuming that the Keynesian cross model is true so this
01:13:45.000 --> 01:13:52.000
here is the instrumental variables estimator and well we start off with this two-stage least squares
01:13:52.000 --> 01:13:57.000
estimator so we see that at least in this example the two-stage least squares estimator is the same
01:13:57.000 --> 01:14:04.000
thing as the instrumental variables estimator and that actually holds in general i won't prove it
01:14:04.000 --> 01:14:10.000
to you i think the example here is illustrative enough the two-stage least squares estimator is
01:14:10.000 --> 01:14:15.000
an instrumental variables estimator with a specific choice of instruments so instrumental
01:14:15.000 --> 01:14:20.000
variables is more general than two-stage least squares because two-stage least squares tells us
01:14:20.000 --> 01:14:25.000
exactly how to construct a certain instrument right so it's a special case of the instrumental
01:14:25.000 --> 01:14:33.000
variables estimator but it is indeed an instrumental variable system the last thing for this lecture
01:14:34.000 --> 01:14:42.000
here is that we return to our example in which i had regressed stock prices on TFP
01:14:42.000 --> 01:14:49.000
you recall this example it was very easy regression which i had a stock prices being
01:14:49.000 --> 01:14:58.000
regressed on consumption on a linear trend and on TFP i noted that when i let the sample end in 2007
01:14:58.000 --> 01:15:07.000
q2 TFP is not quite significant with p value of five somewhat more than five percent five
01:15:07.000 --> 01:15:14.000
point four percent essentially and a coefficient estimate of minus one point five five and also i
01:15:14.000 --> 01:15:21.000
already told you well this is not a good regression actually for various reasons and we will now deal
01:15:21.000 --> 01:15:26.000
with just one of the reasons it remains a bad regression afterward but with one reason less
01:15:27.000 --> 01:15:34.000
this one reason we deal with now is that we say well in this setup here actually it is quite
01:15:34.000 --> 01:15:44.000
likely that TFP correlates with the error which affects sp so correlates with buter why is that
01:15:45.000 --> 01:15:54.000
so well obviously it is the case that TFP is just a measure of the state of technology
01:15:55.000 --> 01:16:01.000
so there may very well be an instrumental variables uh sorry an errors and variables
01:16:01.000 --> 01:16:10.000
problem here we may measure the state of knowledge which is main contributor to stock prices
01:16:11.000 --> 01:16:17.000
technological possibilities which we have our main contributor of stock prices we may very well
01:16:17.000 --> 01:16:23.000
measure this with error here and we know from the errors and variables model that measuring with
01:16:23.000 --> 01:16:31.000
error means we have contemporaneous correlation between uh regressor and error term so we must
01:16:31.000 --> 01:16:40.000
assume that this estimate here is biased and not even consistent asymptotically so this value here
01:16:40.000 --> 01:16:50.000
may be misleading in size right um what kind of correlation would we assume well TFP and stock
01:16:50.000 --> 01:16:57.000
prices should actually be positively correlated but one thing we were wondering about is that we
01:16:57.000 --> 01:17:04.000
had a negative coefficient estimate here right um whereas economic theory would tell us when
01:17:04.000 --> 01:17:14.000
technological possibilities um increase then uh then the stock prices would increase too
01:17:14.000 --> 01:17:24.000
well we know that if the correlation between the dependent variable and the independent variable
01:17:24.000 --> 01:17:32.000
is positive then this coefficient will be estimated smaller than the true value so what we can do now
01:17:32.000 --> 01:17:41.000
is that we actually instrument the TFP variable here and what can we use as an instrument well
01:17:41.000 --> 01:17:48.000
that's difficult to find a good instrument for TFP which would not correlate with the error term
01:17:48.000 --> 01:17:56.000
anymore unless we use the past TFP value so TFP lag by one period and that's what I do here
01:17:56.000 --> 01:18:04.000
I instrument the TFP variable by TFP minus one and use two stage least squares or instrumental
01:18:04.000 --> 01:18:14.000
variables uh regression so if you gives me here the instrument list C T and TFP minus one
01:18:14.000 --> 01:18:21.000
right uh the the regressor matrix consisted of C T and TFP and now I have
01:18:21.000 --> 01:18:26.000
instrumented the constant by itself instrument the linear trend by itself because both of them don't
01:18:26.000 --> 01:18:33.000
correlate with the error term and I have instrumented TFP by TFP minus one and estimate the same thing
01:18:33.000 --> 01:18:44.000
again and now the coefficient estimate for TFP is minus 1.97 this is not really satisfactory
01:18:44.000 --> 01:18:50.000
because the coefficient estimate is even smaller than the coefficient estimate uh was before so
01:18:50.000 --> 01:18:56.000
it's actually the converse of what we would expect uh to uh get in a properly specified regression
01:18:57.000 --> 01:19:04.000
so there is still quite some indication that the model is not well specified and in fact it isn't
01:19:04.000 --> 01:19:09.000
well specified for a number of reasons as I have already told you one reason being that look at the
01:19:09.000 --> 01:19:16.000
derp-Watson statistic here that is almost zero so it tells us there's high high high autocorrelation
01:19:16.000 --> 01:19:22.000
in the errors and with autocorrelation in the errors the regression equation is not well
01:19:22.000 --> 01:19:27.000
specified so not really great surprise that the coefficient estimate is still not positive
01:19:28.000 --> 01:19:34.000
it is absolutely unimportant that its term suddenly appears to be uh significant whereas
01:19:34.000 --> 01:19:39.000
it was marginally insignificant uh before what is important however is that the coefficient
01:19:39.000 --> 01:19:47.000
estimate has changed and it has changed by quite a bit if you compare minus 1.55 with minus 1.97
01:19:47.000 --> 01:19:53.000
so there is a considerable change in the TFP coefficient in this regression here even though
01:19:53.000 --> 01:19:57.000
it's not going the same way as we would have expected even though there are still
01:19:58.000 --> 01:20:06.000
problems with autocorrelation in the errors here but the fact that the coefficient changes by so
01:20:06.000 --> 01:20:14.000
much that that the initial coefficient estimate was unstable already tells us that this equation
01:20:14.000 --> 01:20:21.000
is not yet well estimated and that there's quite a bit to do about it before we arrive
01:20:21.000 --> 01:20:26.000
at a good estimate of how stock prices depend on technological knowledge
01:20:28.000 --> 01:20:38.000
this concludes finally the section review of basic econometrics we will now move briefly
01:20:38.000 --> 01:20:45.000
to interactive moments so that you can ask me questions on this section or on anything else
01:20:45.000 --> 01:20:53.000
and on Thursday we will then move to micro econometrics so the last third of the semester
01:20:53.000 --> 01:21:00.000
I will teach you methods of micro econometrics many of which actually will have to deal with
01:21:00.000 --> 01:21:07.000
the problem of simultaneity that we have just covered theory so we will do causality analysis
01:21:07.000 --> 01:21:14.000
and we will see that in terms of causal questions being asked we often run into problems of
01:21:15.000 --> 01:21:20.000
simultaneity or endogeneity bias and often need to resort to estimation technologies
01:21:21.000 --> 01:21:26.000
which make use of instruments so which are variants of instrumental variables estimators
01:21:28.000 --> 01:21:32.000
so much in this webinar I stopped the recording here and
01:21:32.000 --> 01:21:39.000
start this webinar and ask you to come to the interactive mode please