WEBVTT - autoGenerated
00:00:30.000 --> 00:00:39.000
Welcome to the lecture on Estimation and Inference in Econometrics.
00:00:39.000 --> 00:00:46.000
This lecture is the lecture which is scheduled to be held on January 4th.
00:00:46.000 --> 00:00:52.000
It is recorded and not live.
00:00:52.000 --> 00:01:00.000
We have dealt with the issue of matching in the last lecture.
00:01:00.000 --> 00:01:07.000
Now turn back to a topic which we have already discussed briefly in the review of basic econometrics.
00:01:07.000 --> 00:01:14.000
We will return to this issue now, namely to instrumental variables, since instrumental
00:01:14.000 --> 00:01:24.000
variables are different possibilities of treating with problems which occur with OLS
00:01:24.000 --> 00:01:33.000
estimation in the case of, say, measurement errors or any other kind of endogeneity in
00:01:33.000 --> 00:01:35.000
the regressor matrix.
00:01:35.000 --> 00:01:41.000
The relevant readings you find here, again, Jeffrey Woodridge's textbook introductory
00:01:41.000 --> 00:01:50.000
econometrics here, in particular chapter 15 and the two books by Angrist and Bischke
00:01:50.000 --> 00:01:54.000
in the 2009 book, Mostly Harmless Econometrics.
00:01:54.000 --> 00:02:05.000
It is chapter 4 and in the 2015 book, Mastering Metrics, it is chapter 3.
00:02:05.000 --> 00:02:12.000
Now let's have a look back at the instrumental variables approach, which, as I say, we have
00:02:12.000 --> 00:02:16.000
already dealt with in the review of basic econometrics.
00:02:16.000 --> 00:02:25.000
Our notation there was that we have a matrix of instruments, Z, which replaces the matrix
00:02:25.000 --> 00:02:30.000
X in the usual formula of the squares estimator.
00:02:30.000 --> 00:02:39.000
So while the OLS estimator is X prime, X inverse, X prime, Y, the two X prime matrices are
00:02:39.000 --> 00:02:43.000
replaced by Z prime matrices.
00:02:43.000 --> 00:02:50.000
And as you may recall, this does not necessarily say that all the columns of the X matrix are
00:02:50.000 --> 00:02:53.000
replaced by something else.
00:02:53.000 --> 00:03:00.000
It is quite possible that certain regressors or a certain column of X is contained in the
00:03:00.000 --> 00:03:04.000
same form in the Z matrix.
00:03:04.000 --> 00:03:12.000
But for other columns, in particular for regressors, which suffer from endogeneity or measurement
00:03:12.000 --> 00:03:20.000
error, which is a specific type of endogeneity, it would be possible to replace the original
00:03:20.000 --> 00:03:26.000
regressor, which is contained in the X matrix, by some appropriate instrument Z.
00:03:26.000 --> 00:03:32.000
And then we have the estimator, which I call the instrumental variables estimator, beta
00:03:32.000 --> 00:03:35.000
hat IV.
00:03:35.000 --> 00:03:40.000
This method of instrumental variables is the most well-known method to address endogeneity
00:03:40.000 --> 00:03:41.000
problems.
00:03:41.000 --> 00:03:48.000
And this is why I return to this issue here, since I think it is important that you understand
00:03:48.000 --> 00:03:50.000
it well.
00:03:50.000 --> 00:03:58.000
It is widely used in all areas of empirical economics and econometrics.
00:03:58.000 --> 00:04:06.000
The IV methods allow us to get unbiased estimators, even though endogeneity of certain variables
00:04:06.000 --> 00:04:08.000
is present.
00:04:08.000 --> 00:04:13.000
But actually, unbiasedness is very hard to prove.
00:04:13.000 --> 00:04:21.000
So the more important property is actually that IV allows us to get consistent estimators.
00:04:21.000 --> 00:04:29.000
And unbiased estimators may occur, but it is difficult to establish a theory for that
00:04:29.000 --> 00:04:38.000
since the estimators, the instruments will have, it cannot be really be generalized what
00:04:38.000 --> 00:04:45.000
type of instruments we choose for a specific regressor, which is endogenous.
00:04:45.000 --> 00:04:54.000
IV techniques provide a solution to estimate the causal effects in certain elements, which
00:04:54.000 --> 00:05:05.000
we may deal with, and in the case of omitted variables or in the case of selection of unobservables.
00:05:05.000 --> 00:05:12.000
Because recall, we have so far stated the conditional independence assumption, the CIA,
00:05:12.000 --> 00:05:20.000
or I should better say, we have interpreted the CIA as being an assumption which guarantees
00:05:20.000 --> 00:05:26.000
conditional independence by conditioning on observable variables.
00:05:26.000 --> 00:05:33.000
But it is, of course, possible that the selection of treatment also occurs with respect to certain
00:05:33.000 --> 00:05:39.000
unobservable variables, and these unobservable variables would then be part of the error
00:05:39.000 --> 00:05:40.000
term.
00:05:40.000 --> 00:05:46.000
And then there's always the possibility that the error term correlates with certain observable
00:05:46.000 --> 00:05:54.000
regressors, and therefore we would have an endogeneity problem.
00:05:54.000 --> 00:06:01.000
The general idea of the instrumental variables approach is that the endogenous treatment
00:06:02.000 --> 00:06:11.000
regressor D contains some part which we may consider as exogenous, so that this exogenous
00:06:11.000 --> 00:06:16.000
part is being captured by an instrument which is exogenous.
00:06:16.000 --> 00:06:23.000
So the IV estimator basically uses the exogenous source of variation in the treatment regressor
00:06:23.000 --> 00:06:32.000
D, which affects the outcome variable Y only through this D then.
00:06:32.000 --> 00:06:41.000
So the variable providing the exogenous variation D is then what we call the instrument.
00:06:41.000 --> 00:06:48.000
Actually one instrument may perhaps provide only for a certain part of the exogenous variation
00:06:48.000 --> 00:06:56.000
in the variable D, and it is quite possible to have more than one instrument for a specific
00:06:56.000 --> 00:07:00.000
regressor or in this case for the treatment regressor.
00:07:00.000 --> 00:07:06.000
And as I said, the instrumental variable will be noted in our case here.
00:07:06.000 --> 00:07:12.000
I always just scan through the causal graphs because I haven't introduced this in the lecture,
00:07:12.000 --> 00:07:18.000
but you may have looked at it, they are probably self-explanatory, but I won't comment on them,
00:07:18.000 --> 00:07:28.000
but move on to theory of instrumental variable estimation, which I motivate by the following example.
00:07:28.000 --> 00:07:38.000
Suppose we have some outcome variable Y, and the population model is that Y depends on
00:07:38.000 --> 00:07:46.000
some constant beta naught and on the treatment variable D, and then on some error term U.
00:07:46.000 --> 00:07:51.000
So that's a very standard setup, which we have here, and we do not have any covariance in this
00:07:51.000 --> 00:07:59.000
population model. But what I now assume is something which is not the standard assumption.
00:07:59.000 --> 00:08:07.000
The standard assumption would be that the covariance of D i and U i is equal to zero.
00:08:07.000 --> 00:08:13.000
So if that were the case, if this were an equal sign here, then we would know that we can just
00:08:13.000 --> 00:08:19.000
estimate this equation consistently here. But now I allow for the possibility, or actually I
00:08:19.000 --> 00:08:26.000
specifically assume, that there is a non-zero covariance between D i and U i, which means
00:08:26.000 --> 00:08:33.000
that the treatment indicator, the treatment regressor D i, is endogenous. And therefore,
00:08:33.000 --> 00:08:42.000
we cannot estimate equation 2 by OLS without causing an inconsistency for the estimator of
00:08:42.000 --> 00:08:49.000
beta 1. So when we apply OLS, of course, we can estimate the equation 2 by OLS, but we would know
00:08:49.000 --> 00:08:55.000
that the estimator for beta 1 would then be inconsistent, which means that in asymptotically
00:08:55.000 --> 00:09:05.000
it would not converge to the correct value. Suppose now that we have some variable Z, which
00:09:05.000 --> 00:09:14.000
is correlated with D, but uncorrelated with the variable U. So formally, we would have that the
00:09:14.000 --> 00:09:22.000
covariance between Z i and U i is equal to zero, and the covariance between Z i and D i is non-zero.
00:09:23.000 --> 00:09:32.000
Actually, we would want the covariance between Z i and D i to be really high. So if we express
00:09:32.000 --> 00:09:38.000
it in terms of correlations, we would like to have a correlation of either plus 1 or close to
00:09:38.000 --> 00:09:43.000
plus 1, or not really plus 1, that would mean that it's the same variable, but a correlation
00:09:43.000 --> 00:09:51.000
which is close to 1 or close to negative 1. This is what I mean by high correlation. So we want a
00:09:51.000 --> 00:09:58.000
high degree of correlation between Z i and D i, and we want no correlation at all between Z i and
00:09:58.000 --> 00:10:06.000
U i across all i's, across all individuals or all units we are having in our sample here.
00:10:08.000 --> 00:10:17.000
We generally allow that correlations for Z i and U j with j different from i are non-zero. So
00:10:18.000 --> 00:10:24.000
we're currently just aiming at a consistent estimation, not at unbiased estimation.
00:10:26.000 --> 00:10:34.000
Now in this case, the IV estimator, which I established in equation one, takes the following
00:10:34.000 --> 00:10:43.000
form if I write it in a partitioned matrix. Our regressor matrix is actually consisting of just
00:10:43.000 --> 00:10:49.000
two columns. One column is the Yota columns or the vector of ones. So that's one column of ones,
00:10:49.000 --> 00:10:54.000
and the other one is the treatment variable, which we can think of as a dummy variable
00:10:55.000 --> 00:11:02.000
indicating treatment with one and non-treatment with zero. So that's a very simple type of regressor
00:11:02.000 --> 00:11:10.000
matrix which we have here. And for the IV estimator, we would have as the transposed matrix here,
00:11:10.000 --> 00:11:16.000
not the transpose of this X matrix, that's the X matrix here, but we would have the
00:11:16.000 --> 00:11:24.000
instrumented variable in this matrix in place of the D matrix. So we would have Yota prime
00:11:25.000 --> 00:11:32.000
transpose of the first column here and then Z prime rather than B, and we would hope that
00:11:32.000 --> 00:11:40.000
Z prime is correlated, that Z is correlated with E. And here again, we have Yota prime and Z prime
00:11:40.000 --> 00:11:52.000
Y. Of course, this matrix here must be inverted, so I write inverse here, and the way to do that
00:11:52.000 --> 00:11:58.000
is rather straightforward. I have shown you already the theorem which tells you how to
00:11:58.000 --> 00:12:06.000
avert a partitioned matrix, so let us first multiply out these two matrices here. Clearly,
00:12:06.000 --> 00:12:11.000
the 1-1 instrument would be a first row here, first column here is just Yota prime Yota,
00:12:13.000 --> 00:12:20.000
then first row here, second column here would be Yota prime D, so we would have this here,
00:12:20.000 --> 00:12:28.000
second row times first column is Z prime Yota, and then of course second row times second column
00:12:28.000 --> 00:12:37.000
gives Z prime D. So we have here the Z prime X matrix written in partitioned form, and we have
00:12:37.000 --> 00:12:44.000
to take the inverse of that. Multiplying out the second component here gives me Yota prime Y and
00:12:44.000 --> 00:12:53.000
Z prime Y, of course. Observe that Yota prime Yota is always equal to the number of observations,
00:12:53.000 --> 00:13:01.000
so to the number 2 to N, because Yota is just a vector of ones, and obviously Yota prime Y
00:13:01.000 --> 00:13:10.000
or similar Yota prime Z or Yota prime D is just the average of Y, Z, or D multiplied by N, right?
00:13:11.000 --> 00:13:19.000
So Yota prime Y just sums all the Y i's, and if I would divide this by N, I would get Y bar,
00:13:19.000 --> 00:13:22.000
and the same is then true for D and for Z.
00:13:26.000 --> 00:13:32.000
Now we know that N times N inverse is equal to 1, right? N is the number of observations divided
00:13:32.000 --> 00:13:41.000
by N is equal to 1, obviously. So nothing changes if I multiply N divided by N in between those two
00:13:41.000 --> 00:13:50.000
matrices we have just discussed. And then, of course, we can pull this factor N in the inverse
00:13:50.000 --> 00:13:59.000
matrix here. Well, then, of course, we have to invert the N and arrive at 1 over N times this
00:13:59.000 --> 00:14:06.000
Z prime X matrix. Nothing has changed here. It's the same matrix as this one here, and we have to
00:14:06.000 --> 00:14:12.000
take the inverse of the whole expression, including the 1 over N. So this 1 over N would
00:14:12.000 --> 00:14:18.000
then be, again, transformed to an N, which cancels against the 1 over N here. But in this way of
00:14:18.000 --> 00:14:25.000
writing it, I have 1 over N in front of this matrix, which should be inverted, and I have 1
00:14:25.000 --> 00:14:35.000
over N here. And as I already told you, 1 over N Yota prime Y bar, Y is just Y bar. So the same
00:14:35.000 --> 00:14:43.000
thing I can do here by multiplying in by factoring in the 1 over N into this Z prime X matrix here.
00:14:43.000 --> 00:14:49.000
So I would have, well, for the first component Yota prime Yota, this is N divided by N gives
00:14:49.000 --> 00:15:00.000
me just 1. And Yota prime D is the sum over all the entries in the D vector. So essentially,
00:15:00.000 --> 00:15:08.000
it is the number of treated units divided by N gives me the share of treated units,
00:15:08.000 --> 00:15:15.000
which is D bar. Z prime Yota is the same thing as Yota prime Z because these are just two
00:15:15.000 --> 00:15:22.000
vectors. It's a scalar product of these two, the inner product of these two vectors here.
00:15:23.000 --> 00:15:33.000
So divided by N would give me Z bar. And here I have Z prime times D. Well, this has to be
00:15:33.000 --> 00:15:41.000
multiplied by an inverse. Note that Z prime times D is a scalar. Well, this is a
00:15:42.000 --> 00:15:48.000
row vector here, and this is a column vector. Don't confuse that with matrices now because
00:15:48.000 --> 00:15:55.000
I use capital letters, but these are just simple vectors, which we have here. So Z prime D is just
00:15:55.000 --> 00:16:02.000
some number from the space of real numbers. And that must be multiplied then by Y bar
00:16:02.000 --> 00:16:07.000
and an inverse over Z prime Y. Z prime Y is again just a real number.
00:16:09.000 --> 00:16:15.000
Now, it's very easy to invert this matrix here. Probably all of you know how to invert a two by
00:16:15.000 --> 00:16:23.000
two matrix. What we have to do is just that we exchange the position of the elements on the main
00:16:23.000 --> 00:16:33.000
diagonal. So we get an inverse Z prime D here, and the one goes on to two position. And we have
00:16:33.000 --> 00:16:40.000
to multiply the elements on the off diagonal with negative one. So we have a negative sign here.
00:16:41.000 --> 00:16:46.000
Otherwise, the Z bar stays in place and the B bar also stays in place. So that here would be the
00:16:46.000 --> 00:16:53.000
joint of the matrix, which we're going to invert. And we have to divide the joint by the covariance
00:16:54.000 --> 00:17:01.000
or in this case by the determinant, not by the covariance, by the determinant. The determinant
00:17:01.000 --> 00:17:10.000
of this matrix is of course one times this thing here minus Z bar times D bar. So that's the
00:17:10.000 --> 00:17:16.000
determinant. And as you see, that's why I just also did a preview with covariance. This is nothing
00:17:16.000 --> 00:17:29.000
else but the estimated covariance of Z and D. So one over N Z prime D minus one over N Z prime,
00:17:30.000 --> 00:17:37.000
one over N U dot prime Z, and one over N U dot prime D, which are the two means here,
00:17:37.000 --> 00:17:42.000
is just the estimate of the covariance. If we do not correct for degrees of freedom,
00:17:42.000 --> 00:17:47.000
so just divide by N. So that's why it's covariance hat here, covariance hat of Z and D.
00:17:49.000 --> 00:17:56.000
Using what we have just derived, we can just continue the instrumental variables estimator
00:17:56.000 --> 00:18:03.000
is one over the estimated covariance between Z and D. And remember, we have assumed that the
00:18:03.000 --> 00:18:09.000
covariance between Z and D is non-zero. And we hope actually that it is a sizable covariance
00:18:10.000 --> 00:18:15.000
in the sense that the correlation between Z and D is rather high, as I said, or at a very low,
00:18:15.000 --> 00:18:24.000
close to negative one. Here is then the adjoint. So the two of them together are the inverse of
00:18:24.000 --> 00:18:32.000
Z prime X. And then here we have the Z prime Y vector, as we have multiplied out of the previous
00:18:32.000 --> 00:18:39.000
slide. Well, now what we have to do is that we multiply this matrix here with this column
00:18:39.000 --> 00:18:46.000
vector here. And if we do this, then of course we arrive at N inverse Z prime D times Y bar.
00:18:46.000 --> 00:18:53.000
So this element here times Y bar plus the multiplication of these two elements,
00:18:53.000 --> 00:19:01.000
which then gives negative B bar times N inverse Z prime Y, which is this thing here. And for the
00:19:01.000 --> 00:19:08.000
second line, we have negative Z bar times Y bar, which I write with a negative sign here,
00:19:09.000 --> 00:19:17.000
plus one times N inverse Z prime Y gives N inverse Z prime Y here. And again, we see that is an
00:19:17.000 --> 00:19:24.000
estimated covariance. Again, not taking into account degrees of freedom, but using the maximum
00:19:24.000 --> 00:19:30.000
likelihood estimator for the covariance by dividing through by N. In this case, it is the
00:19:30.000 --> 00:19:36.000
estimated covariance between Z and Y, where here it was the estimated covariance between Z and D.
00:19:38.000 --> 00:19:44.000
Okay, can we simplify this any further? Yes, indeed we can, because we can
00:19:46.000 --> 00:19:57.000
rearrange this expression here. Since, look, we have one over N Z prime D here. This smells like
00:19:57.000 --> 00:20:08.000
the estimated covariance between Z and D again, but what we lack are the means of these variables.
00:20:08.000 --> 00:20:18.000
So why not just add the means here and subtract them later, right? I add the means here and
00:20:18.000 --> 00:20:27.000
actually I subtract later because what's out here is Z bar D bar times Y bar. And here is negative
00:20:28.000 --> 00:20:37.000
Z bar Y bar B bar is the same thing, right? So this term here cancels against that term here.
00:20:38.000 --> 00:20:43.000
And then you see that what I have here is actually the estimated covariance between Z and D,
00:20:43.000 --> 00:20:52.000
again times Y bar and of course plus this term which cancels. And then here I have the estimated
00:20:52.000 --> 00:21:00.000
covariance between Z and Y because again I have this product Z prime Y here and then I added the
00:21:00.000 --> 00:21:08.000
means of Z and Y here, which exactly compensate for this term here after having been multiplied by
00:21:08.000 --> 00:21:17.000
the D bar. Okay, so that's rather easy. We see immediately that this term here
00:21:19.000 --> 00:21:25.000
cancels against this term here multiplied by the D bar so that we actually in this
00:21:25.000 --> 00:21:30.000
first row of this vector have Y bar times the estimated covariance between Z and D
00:21:31.000 --> 00:21:37.000
minus D bar times the estimated covariance between Z and Y. And here the second component
00:21:37.000 --> 00:21:44.000
we have already computed it, this was the estimated covariance between Z and Y. So now
00:21:44.000 --> 00:21:52.000
we have expressed the instrumental variables estimator just in terms of estimated covariances.
00:21:52.000 --> 00:21:57.000
There's an estimated covariance here, there's an estimated covariance here, here and here,
00:21:58.000 --> 00:22:05.000
plus I have to say not just estimated covariances plus two means, the mean of Y here and the mean
00:22:05.000 --> 00:22:17.000
of the D's here. What does this imply? Well, let's first look at the estimator of the beta one.
00:22:17.000 --> 00:22:24.000
Beta one, if you recall, is the coefficient of the treatment variable. So this is the estimator
00:22:24.000 --> 00:22:34.000
of our causal effect which we have now estimated in a consistent way because we have used the
00:22:34.000 --> 00:22:44.000
instrument in place of the endogenous variable D in the instrumental variables formula. So the beta
00:22:44.000 --> 00:22:49.000
one component is this component down here, so the component in the second row and the component in
00:22:49.000 --> 00:22:55.000
the first row is the estimated constant. The expression for beta one is actually easier
00:22:55.000 --> 00:23:00.000
because as we see it is just the estimated covariance between Z and Y divided by the
00:23:00.000 --> 00:23:08.000
estimated covariance between Z and D. So beta one IV is the estimated covariance between Z and Y
00:23:08.000 --> 00:23:16.000
divided by the estimated covariance between Z and D. Now we have the treatment effect estimated here
00:23:16.000 --> 00:23:24.000
just as the ratio of two covariances and in both covariances rather than using the endogenous
00:23:24.000 --> 00:23:33.000
D estimator, the D regressor, we replace it where it has been transposed by the instrument Z.
00:23:34.000 --> 00:23:43.000
The estimate for the constant seems to be a little bit more complicated. We have three covariance
00:23:43.000 --> 00:23:51.000
terms here but it is actually easy to see that this simplifies greatly because here we have the
00:23:51.000 --> 00:23:56.000
covariance between Z and E, the estimated covariance, and here we have to divide
00:23:56.000 --> 00:24:03.000
through by this. So this is just one times Y bar. So the constant, the instrumental variables
00:24:03.000 --> 00:24:11.000
estimator of the constant is Y bar minus, well and then comes D bar times the covariance between
00:24:11.000 --> 00:24:19.000
Z and Y divided by the covariance between Z and D bar from estimated. This is exactly the beta one
00:24:19.000 --> 00:24:25.000
coefficient which we have estimated. So we can write the constant term as the estimate of the
00:24:25.000 --> 00:24:31.000
constant term as Y bar minus D bar times the estimate of the causal effect.
00:24:35.000 --> 00:24:41.000
However, our main interest is always the beta one. So our coefficient estimate of interest is beta one
00:24:41.000 --> 00:24:50.000
ID. This estimate is of course well defined only if the covariance between Z and D is different
00:24:50.000 --> 00:24:57.000
from zero, which was one of the assumptions we have made. Look at the formula again.
00:24:59.000 --> 00:25:07.000
The estimate of this covariance is here in the denominator of this term.
00:25:08.000 --> 00:25:15.000
Clearly if the true covariance were zero then we would expect that the estimated covariance
00:25:15.000 --> 00:25:22.000
asymptotically converges to zero and then actually the IB estimator wouldn't even be defined
00:25:23.000 --> 00:25:29.000
or would explode towards plus or minus infinity. So this would not be a reasonable estimate.
00:25:30.000 --> 00:25:36.000
We rely in instrumental variables estimation on the property that the covariance between Z and D
00:25:36.000 --> 00:25:45.000
is significantly let's say greatly different from zero. So that was the assumption we have made.
00:25:45.000 --> 00:25:50.000
So only if the instrument is correlated with the regressor would we have a well-defined
00:25:50.000 --> 00:26:02.000
estimate of this coefficient here. Now just as a practice for you rewrite the estimator of beta
00:26:02.000 --> 00:26:12.000
one such that beta one IV hat is equal to X divided by the correlation the estimated correlation
00:26:12.000 --> 00:26:20.000
between Z and D and determine what X is. So that would be an empirical estimate of the correlation
00:26:20.000 --> 00:26:26.000
and then find out what actually would be X and what do you expect to happen with this estimator
00:26:27.000 --> 00:26:32.000
if the true population correlation between Z and D is zero. That's actually a very easy
00:26:33.000 --> 00:26:41.000
exercise just to perhaps make you understand a little better what the meaning of a non-zero
00:26:41.000 --> 00:26:46.000
covariance is. I rephrased the question here in terms of correlation coefficient because a
00:26:46.000 --> 00:26:52.000
correlation coefficient is probably easier to grasp than the covariance since the correlation
00:26:52.000 --> 00:26:55.000
coefficient is normalized to be between negative one and one.
00:26:58.000 --> 00:27:05.000
Now provided that the covariance between Z and D is different from zero
00:27:05.000 --> 00:27:10.000
we know that the beta one hat estimator with instrumental variables is a consistent estimator
00:27:10.000 --> 00:27:18.000
of a beta one. If it is true that the covariance between ui and zi is equal to zero so that the
00:27:18.000 --> 00:27:29.000
zi is exogenous is not endogenous. To rephrase this the necessary but not yet sufficient
00:27:29.000 --> 00:27:36.000
requirements for good instrument are two conditions which are called the relevance condition and the
00:27:36.000 --> 00:27:42.000
validity condition. The relevance condition says that an instrument is highly correlated
00:27:42.000 --> 00:27:48.000
with the endogenous regressor so the higher the correlation between Z and D is the better.
00:27:49.000 --> 00:27:55.000
Here the correlation taken in absolute terms so we want this to be as close as possible to one
00:27:56.000 --> 00:28:04.000
shall be quite a bit bigger than zero and the validity condition is that the instrument is
00:28:04.000 --> 00:28:12.000
uncorrelated with the error term so that the correlation between Z and u is equal to zero.
00:28:17.000 --> 00:28:21.000
Yes actually it would be sufficient if I wrote here
00:28:21.000 --> 00:28:35.000
well if I wrote this in terms of the random variables then I could just write it as zi and
00:28:35.000 --> 00:28:40.000
ui so that if this is meant to indicate the population correlation then
00:28:41.000 --> 00:28:49.000
it would be sufficient to have zi and ui being equal to zero. I didn't quite make it
00:28:49.000 --> 00:28:56.000
clear here whether I mean the population correlation or the sample correlation.
00:28:56.000 --> 00:29:03.000
For the sample correlation of course we would just need that the correlation is approximately equal
00:29:04.000 --> 00:29:09.000
to zero. The sample correlation actually we can't really take the sample correlation because
00:29:09.000 --> 00:29:14.000
the u is not measured but if we could measure it then it should be approximately
00:29:15.000 --> 00:29:21.000
zero. So perhaps I should make clear in the next update of these slides that what I mean
00:29:21.000 --> 00:29:30.000
here is the population correlation and then zi ui would suffice. Now if the validity condition the
00:29:30.000 --> 00:29:39.000
second condition holds then we know that z affects the dependent variable only through y
00:29:40.000 --> 00:29:47.000
so it affects the dependent variable y only through d right so it does not affect the
00:29:47.000 --> 00:29:55.000
dependent variable through the error term u. Look back please to this equation here which
00:29:55.000 --> 00:30:04.000
is the population equation. There are basically two channels through which y can be changed.
00:30:04.000 --> 00:30:10.000
One channel would be the treatment d and the other channel would be u. Now what we assume
00:30:10.000 --> 00:30:19.000
is that there's high correlation between z and d here so when z changes then quite likely d
00:30:19.000 --> 00:30:27.000
also changes and thereby affects y whereas simultaneously we assume that z is uncorrelated
00:30:27.000 --> 00:30:35.000
with u so even if z changes u does not change and therefore changes z do not affect
00:30:35.000 --> 00:30:46.000
y through u but the only transmission channel is through d. This is what I mean by the sentence
00:30:46.000 --> 00:30:52.000
down here that if this validity condition holds z affects y only through d and not through u.
00:30:52.000 --> 00:31:00.000
Now this is the first part of the theory we come back to theory later. Let us now first
00:31:00.000 --> 00:31:07.000
reconsider an example we have already considered several times actually namely the example of
00:31:07.000 --> 00:31:14.000
wage or earnings determination. Suppose that the true model the population model
00:31:14.000 --> 00:31:21.000
postulates that wages are determined by constant by education and by ability and
00:31:21.000 --> 00:31:28.000
well some shock. So this model we have already had there's little to be said about it. We know
00:31:28.000 --> 00:31:36.000
that beta one and beta two are plausibly assumed to be greater than zero. Again we assume that we
00:31:36.000 --> 00:31:42.000
can observe education the formal education let's say in terms of schooling or college education
00:31:42.000 --> 00:31:52.000
but we cannot observe ability. So we have a omitted variables problem here because as you know
00:31:53.000 --> 00:32:02.000
this beta two types ability would enter the unobservable error term of this equation here
00:32:02.000 --> 00:32:09.000
which in the previous setup of this model I called v. So if we just estimate by O and S
00:32:09.000 --> 00:32:15.000
that the equation wage is equal to beta naught times theta plus beta one times education vector
00:32:15.000 --> 00:32:22.000
plus v then we know that there would be a correlation between v and education if we
00:32:22.000 --> 00:32:27.000
assume that the correlation between education and ability is positive which is a very plausible
00:32:28.000 --> 00:32:35.000
assumption. I've commented on that already extensively in previous lectures. So since v
00:32:35.000 --> 00:32:42.000
is correlated with education O and S estimation of beta one would not be consistent. Therefore
00:32:42.000 --> 00:32:48.000
we need to find a good instrument for education and if we do find a good instrument for education
00:32:48.000 --> 00:32:57.000
we may use the instrumental variables estimator. But the trouble is actually what could we consider
00:32:57.000 --> 00:33:05.000
to be a good instrument for education? What may we use as an instrument in an IB estimator? So we
00:33:05.000 --> 00:33:13.000
need to ask the question which instrument would be correlated highly hopefully with education but
00:33:13.000 --> 00:33:21.000
would not be correlated with ability. There are four ideas not all of them very sensible
00:33:21.000 --> 00:33:28.000
but we can just discuss that like for instance with the number of siblings have any correlation
00:33:29.000 --> 00:33:38.000
with education or with ability. Well you can never exclude anything right and perhaps the number of
00:33:38.000 --> 00:33:45.000
siblings if there are many siblings has a negative impact on education because perhaps the mother and
00:33:45.000 --> 00:33:53.000
the father can't devote too much time to each single child so education perhaps in the family
00:33:53.000 --> 00:33:59.000
is not as good as if you have fewer children perhaps if you have a great number of siblings
00:33:59.000 --> 00:34:06.000
there are less financial means to finance education education is costly that's quite possible.
00:34:07.000 --> 00:34:12.000
But you could also make reverse arguments you could make arguments like you ought to learn from
00:34:12.000 --> 00:34:18.000
your siblings right and your siblings deal with certain issues they are interested in they tell
00:34:18.000 --> 00:34:23.000
you about it you learn something you get interested in certain things but may well be that education
00:34:23.000 --> 00:34:31.000
is actually improved family education is improved if you have siblings with which you deal and it
00:34:31.000 --> 00:34:35.000
may be that education is actually costlessly provided by the government so that the number
00:34:35.000 --> 00:34:40.000
of siblings doesn't affect you the quality of your education or you get special support of
00:34:40.000 --> 00:34:45.000
scholarships if you have many children in the family so if you personally have many siblings
00:34:46.000 --> 00:34:51.000
so there are many things one could argue about the influence of siblings on education
00:34:51.000 --> 00:34:57.000
but by and large I think probably one would say it is not really to be expected that there is a
00:34:57.000 --> 00:35:04.000
high correlation between the number of siblings and education and since we do need an instrument
00:35:04.000 --> 00:35:08.000
which has a high correlation with education the number of siblings is probably not very good.
00:35:10.000 --> 00:35:17.000
The next issue is the education of the father that could probably be a more interesting idea
00:35:17.000 --> 00:35:24.000
if the father is well educated it is quite likely I would say that he also invests into
00:35:24.000 --> 00:35:32.000
the education of his offspring so sons and daughters may first receive good education
00:35:32.000 --> 00:35:37.000
at home in the family and they may also receive good education in schools or at college because
00:35:37.000 --> 00:35:43.000
the father knows that a good education is worth something or conversely if the education of the
00:35:43.000 --> 00:35:49.000
father is at a very low level perhaps he doesn't care about the education of his children then
00:35:49.000 --> 00:35:55.000
there would be correlation between education and education of the offspring and education
00:35:55.000 --> 00:36:01.000
of the father and this would mean that the relevance criterion is likely to be satisfied
00:36:01.000 --> 00:36:09.000
for this possible instrument education of the father. The critical part though is the question
00:36:09.000 --> 00:36:15.000
whether the education of the father is possibly correlated with the ability of the offspring of
00:36:15.000 --> 00:36:25.000
the children and this was quite possible. It depends a little bit on what exactly you count
00:36:25.000 --> 00:36:32.000
as ability but if it is the case that the education of the father
00:36:32.000 --> 00:36:43.000
encourages the father to also educate his children then it is quite possible that since
00:36:43.000 --> 00:36:48.000
education is correlated with ability that ability also correlates with the education of the father
00:36:48.000 --> 00:36:53.000
and in this case education of the father would not be a really good instrument.
00:36:53.000 --> 00:37:03.000
On the other hand one may say that at least former times much of family life was actually
00:37:05.000 --> 00:37:10.000
governed by the mothers and the fathers were on their jobs and perhaps not taking too much
00:37:10.000 --> 00:37:15.000
interest in the education of the children and then perhaps the education of the father does not
00:37:15.000 --> 00:37:25.000
really have big correlation with the ability of sons or daughters whatever. So education of the
00:37:25.000 --> 00:37:33.000
father is an instrument which may be useful but it's not really clear whether it is unproblematic.
00:37:36.000 --> 00:37:43.000
Next issue would be geographical proximity to university. One may probably consider that too
00:37:44.000 --> 00:37:50.000
if people live close to university they have lesser expenses for horses for instance so it
00:37:50.000 --> 00:37:57.000
may be that education is actually provided on a higher level if they live close to a university
00:37:58.000 --> 00:38:05.000
or it may also be that they have more contact with people who have university education and this
00:38:05.000 --> 00:38:12.000
improves their own education. Quite clearly I think one could argue that geographical proximity
00:38:12.000 --> 00:38:20.000
to the university is uncorrelated with ability, basically uncorrelated with ability. So we might
00:38:20.000 --> 00:38:28.000
have an instrument here which satisfies this exclusion restriction that ability and geographical
00:38:28.000 --> 00:38:35.000
proximity have no and hopefully zero correlation even though one may also doubt whether the
00:38:35.000 --> 00:38:40.000
correlation between education and geographical proximity to university is very high because
00:38:40.000 --> 00:38:45.000
clearly there are also people who come from very remote places and go to any university and earn
00:38:45.000 --> 00:38:52.000
college degrees and have good education even though they live so far off university. So that
00:38:52.000 --> 00:38:59.000
perhaps in terms of the relevance criterion this possible instrument is not the strongest.
00:39:01.000 --> 00:39:09.000
Sorry for that. Tuition fees is another possibility. We would probably expect that tuition fees are
00:39:09.000 --> 00:39:17.000
negatively correlated with education but that's fine. As long as the correlation is high in
00:39:17.000 --> 00:39:24.000
absolute terms then it is still a valid instrument even though it may be a negative correlation here.
00:39:24.000 --> 00:39:29.000
So if it is a correlation close to negative one between tuition fees and education, perfect.
00:39:29.000 --> 00:39:37.000
The relevance criterion would be satisfied and one could also argue very well that tuition fees
00:39:37.000 --> 00:39:44.000
don't have anything to do with the ability of the prospective student so that this would be a valid
00:39:44.000 --> 00:39:53.000
instrument if the correlation with education is sufficiently high. In fact as far as the relevance
00:39:53.000 --> 00:40:00.000
criterion is concerned we can always check that because we can just run an auxiliary regression
00:40:00.000 --> 00:40:10.000
in which we regress the endogenous regressor like education on the instrument or on the set
00:40:10.000 --> 00:40:19.000
of instruments that we use. People often do this. They take the regressor they cannot use in the
00:40:19.000 --> 00:40:24.000
squares, ordinary squares estimation because of its endogeneity. They take this regressor,
00:40:24.000 --> 00:40:31.000
so in our case education, and regress it in an auxiliary regression on the instrument or the
00:40:31.000 --> 00:40:38.000
set of instruments. And then there is a rule of thumb which also has some theoretical basis but
00:40:38.000 --> 00:40:45.000
is mostly just applied as a rule of thumb which says that the instruments are good if the F statistic
00:40:45.000 --> 00:40:51.000
of this regression exceeds the number 10. So perhaps you want to remember this the F statistic
00:40:51.000 --> 00:40:58.000
higher than 10 is usually taken as indicating good instruments and the advice is then conversely
00:40:58.000 --> 00:41:03.000
that if your instruments do not produce an F statistic of at least 10 they better forget
00:41:03.000 --> 00:41:11.000
them and don't use them. There is actually quite a bit of work on how bad the IB estimations
00:41:11.000 --> 00:41:19.000
perform if the instruments are weak and instruments with an F statistic smaller than 10 are considered
00:41:19.000 --> 00:41:31.000
weak instruments. All right, in this next section 7.2 I would like to explore again because that
00:41:31.000 --> 00:41:37.000
also we have already touched in the review of basic econometrics, the relationship between
00:41:37.000 --> 00:41:47.000
instrumental variables and two-staged least squares estimation. We have already seen that
00:41:47.000 --> 00:41:51.000
the two-stage least squares estimator is a special case of an IB estimator. That was what
00:41:51.000 --> 00:41:58.000
I showed you in the review of basic econometrics. We will now return to it because just, sorry,
00:41:58.000 --> 00:42:02.000
just on the previous slide I spoke about this auxiliary regression here and this
00:42:02.000 --> 00:42:10.000
auxiliary regression is actually very closely related to the first step in two-stage least squares
00:42:10.000 --> 00:42:19.000
estimation procedure which we can conceive as consisting of the following two steps. I just
00:42:19.000 --> 00:42:26.000
present this to you intuitively now. The first stage would be to identify some exogenous variation
00:42:26.000 --> 00:42:34.000
in D. So in the treatment variable, in the endogenous treatment variable, we identify
00:42:34.000 --> 00:42:41.000
the component in D which we can think of as being exogenous. So when we know that the D
00:42:41.000 --> 00:42:50.000
correlates with the error term V, we may hope that some of the variation in D is actually uncorrelated
00:42:50.000 --> 00:42:58.000
with the error term V so that we can consider this part of the variation as it being exogenously
00:42:59.000 --> 00:43:05.000
caused and therefore we may search for an instrument which explains precisely this
00:43:05.000 --> 00:43:11.000
exogenous variation. This instrument would be Z and therefore in the first stage what we do
00:43:11.000 --> 00:43:19.000
is we just regress D on this instrument either on one instrument Z or on an instrument matrix
00:43:20.000 --> 00:43:29.000
Z which is exactly what I have described here and in the second stage we then use this exogenous
00:43:29.000 --> 00:43:37.000
variation in D as the regressor in the causal regression which explains the influence of D on Y.
00:43:38.000 --> 00:43:44.000
So here's how the two-stage least squares estimator works in practice.
00:43:44.000 --> 00:43:53.000
The causal regression which suffers from the endogeneity of treatment variable was equation
00:43:53.000 --> 00:44:00.000
two and we've seen that already. Y is equal to a constant plus beta one D plus U and then D and U
00:44:00.000 --> 00:44:10.000
unfortunately correlates in some way or in our refined setup I have called the U then
00:44:10.000 --> 00:44:19.000
I have actually called the U V and here U is V in a different meaning sorry for that so just leave
00:44:19.000 --> 00:44:25.000
it at that. U is here currently the regressor the error term which correlates with the regressor D.
00:44:26.000 --> 00:44:32.000
Now in the first stage we would regress the endogenous regressor D on the instrument matrix
00:44:32.000 --> 00:44:40.000
Z so we would run this regression here D is some constant theta naught plus theta one times
00:44:40.000 --> 00:44:46.000
the instrument Z plus a new error term which I now call V and this V has nothing to do with the V I
00:44:46.000 --> 00:44:54.000
had in this previous example of wage determination and the variability was in the U estimator right.
00:44:54.000 --> 00:44:59.000
So actually the notation here is using the same symbols as in the example but of course it is
00:44:59.000 --> 00:45:05.000
completely detached from the example but it's again the general framework in which I present
00:45:05.000 --> 00:45:12.000
the theory of instrumental variables estimation here two stage least squares estimation so please
00:45:12.000 --> 00:45:17.000
don't mistake don't confuse this V here with the V in the previous example it has nothing to do with
00:45:18.000 --> 00:45:24.000
all right this is a very simple regression which we have here actually just a bivariate regression
00:45:24.000 --> 00:45:31.000
and we know if we run this regression here by O and S simple least squares estimation
00:45:31.000 --> 00:45:39.000
then we know the estimate delta one hat of this true parameter delta one is just the estimated
00:45:39.000 --> 00:45:45.000
covariance between D and Z divided by the variance the estimated variance of Z right.
00:45:46.000 --> 00:45:52.000
We've seen this formula in the review of basic econometrics in the bivariate regression the
00:45:52.000 --> 00:45:59.000
O-S estimate is as simple as that and we also know that the estimated constant delta not hat
00:45:59.000 --> 00:46:08.000
is equal to the mean of the dependent variable d bar minus Z bar times delta one hat which is
00:46:08.000 --> 00:46:15.000
completely analogous to the formula I showed you at the beginning of this lecture for the partitioned
00:46:16.000 --> 00:46:25.000
representation of the beta head IV of the instrumental variables as estimator.
00:46:27.000 --> 00:46:34.000
So given that we easily obtain the estimates delta not hat and delta one hat by a simple O-S
00:46:34.000 --> 00:46:43.000
regression the prediction of D given the instrument is simply this D hat here defined as
00:46:44.000 --> 00:46:52.000
well the constant delta not hat plus delta one hat times the instrument Z right. So you can think
00:46:52.000 --> 00:46:59.000
of that as a forecaster prediction right and given that we have some prediction of what the
00:46:59.000 --> 00:47:07.000
treatment would look like as much as the treatment can be accounted for by exogenous variation.
00:47:07.000 --> 00:47:13.000
And then in the second stage we simply do the following thing we return to the main
00:47:13.000 --> 00:47:22.000
regression equation two but we replace D by this prediction D hat from the first stage.
00:47:22.000 --> 00:47:29.000
So we now run a slightly modified regression we do not regress Y on a constant end of D
00:47:30.000 --> 00:47:39.000
but we regress it on a constant end on D hat and the D hat here is then uncorrelated with
00:47:39.000 --> 00:47:48.000
the error term in this regression which I now call W. This W is of course typically different
00:47:49.000 --> 00:47:56.000
from the original error term U because I have a different regressor DML so this is why I use
00:47:56.000 --> 00:48:03.000
a new symbol here. The estimates of this two-stage least squares approach are beta one
00:48:04.000 --> 00:48:12.000
hat for the two-stage least squares estimate that's well the covariance of Y and D hat the
00:48:12.000 --> 00:48:19.000
estimated covariance of Y and D hat without correction for degrees of freedom so just divided
00:48:19.000 --> 00:48:27.000
through by the number of observations and here the simple variance also with the estimate of the
00:48:27.000 --> 00:48:33.000
variance of D hat also divided through just by the number of observations. And then again the
00:48:33.000 --> 00:48:38.000
constant term is beta naught hat with two-stage least squares that's the mean of the dependent
00:48:38.000 --> 00:48:47.000
variable and now the mean of the D hat times the beta one hat from the two-stage least squares
00:48:47.000 --> 00:48:58.000
regression. Just keep this in mind and now look at the alternative estimate which we may use
00:48:59.000 --> 00:49:05.000
by estimating the causal regression by instrumental variables using instrument Z.
00:49:07.000 --> 00:49:14.000
As we have seen the estimates would in this case be beta one hat IV is equal to the covariance
00:49:15.000 --> 00:49:20.000
the estimated covariance between Z and Y and the estimated covariance between Z and D and
00:49:21.000 --> 00:49:30.000
this is exactly same thing as in the two-stage least squares estimator because this estimated
00:49:30.000 --> 00:49:37.000
covariance here this estimated variance here and this estimated covariance here are the same as the
00:49:37.000 --> 00:49:45.000
estimated covariance is here and there as I will prove to you now. The same is true for the
00:49:45.000 --> 00:49:54.000
estimate of the constant beta naught hat IV is equal to Y bar minus D bar times beta one hat IV
00:49:55.000 --> 00:50:03.000
and this thing again is equal to this thing here. Currently you don't see that right I'm going to
00:50:03.000 --> 00:50:09.000
prove it I just want to state what I'm going to prove the formulas at first sight look different
00:50:09.000 --> 00:50:14.000
or at least they look like you still have to analyze them whether this ratio here is indeed
00:50:14.000 --> 00:50:26.000
the same thing as this ratio here where I have the D in here rather than having a D hat in here
00:50:26.000 --> 00:50:35.000
and similarly here I have a D hat rather than having a D in the formula. Well the first thing
00:50:35.000 --> 00:50:45.000
I'm sorry to note in this proof is that D bar and D hat bar are just the same. Why is that so? Well
00:50:45.000 --> 00:50:50.000
it follows immediately from the fact that the regression residuals always sum to zero when we
00:50:50.000 --> 00:50:58.000
have a constant in the regression. See we know that Yuta prime V hat is equal to zero.
00:50:59.000 --> 00:51:05.000
V hat is just the residual from the first stage regression right we have a constant in here
00:51:05.000 --> 00:51:12.000
so we know the V hats the residuals of this regression sum to zero therefore Yuta prime
00:51:12.000 --> 00:51:20.000
V hat is equal to zero and Yuta prime V hat is equal to zero. Now also we do know that D is just
00:51:20.000 --> 00:51:29.000
the same thing as D hat plus V hat this follows from our prediction right. This here is D hat
00:51:29.000 --> 00:51:37.000
as you know well what is missing here well precisely V hat. V is this thing so V hat
00:51:38.000 --> 00:51:46.000
would be delta zero hat times Yuta so D minus delta zero hat times Yuta
00:51:47.000 --> 00:51:56.000
plus delta one hat times Z this would be V hat. So we know that D is the same thing as D hat plus
00:51:56.000 --> 00:52:05.000
V hat so if we pre-multiply this equation here by Yuta prime so if we just sum all the components
00:52:05.000 --> 00:52:11.000
in the D vector sum all the components in the D hat vector and sum all the components in the
00:52:11.000 --> 00:52:17.000
V hat vector then we know Yuta prime V hat is equal to zero. So we find that the sum over all
00:52:17.000 --> 00:52:23.000
of the D's is the same thing as the sum over all of the D hats and therefore dividing through by N
00:52:23.000 --> 00:52:32.000
we know that D bar is the same thing as V hat bar. So this is already settled we can actually
00:52:32.000 --> 00:52:39.000
replace this D hat bar here just by a D bar like we have it here.
00:52:42.000 --> 00:52:48.000
So we know then that the constant which we estimate in the IV regression is equal to the
00:52:48.000 --> 00:52:54.000
constant which we estimate in the two stage least squares regression if and only if it is true that
00:52:54.000 --> 00:53:02.000
the first beta so that the beta one coefficient estimated in the IV regression is equal to the
00:53:02.000 --> 00:53:10.000
beta one coefficient estimate in the two stage least squares regression. This follows from the
00:53:10.000 --> 00:53:19.000
fact that the estimate of the constant in the two stage least squares regression is Y bar minus D
00:53:19.000 --> 00:53:27.000
bar times the beta one hat from the two stage least squares whereas here it is also Y bar minus D bar
00:53:28.000 --> 00:53:33.000
times the estimated beta one coefficient from the instrumental variables regression.
00:53:34.000 --> 00:53:41.000
So our print is now already a little less demanding because we only need to show that this
00:53:41.000 --> 00:53:48.000
coefficient here this estimated coefficient here is exactly the same thing as this estimated
00:53:48.000 --> 00:53:52.000
coefficient here when we have shown that the two estimates of beta one are the same
00:53:52.000 --> 00:53:57.000
then it immediately follows that the two estimates of the constant are also the same.
00:54:00.000 --> 00:54:06.000
Yeah so we can drop the constants here or the estimates of the constants here and just look
00:54:06.000 --> 00:54:12.000
at the slope coefficients which we have here and prove that this relationship is true then immediately
00:54:12.000 --> 00:54:20.000
follows that this relationship is also true. Now in order to prove this latter fact the
00:54:20.000 --> 00:54:26.000
equality of the two beta one estimates here start with a beta one in the two stage least squares
00:54:26.000 --> 00:54:31.000
regression which was estimated as the estimated covariance between Y and D hat divided by the
00:54:31.000 --> 00:54:39.000
estimated variance of D hat. Well what is D hat? We just use the definition of D hat here and write
00:54:39.000 --> 00:54:46.000
this as the estimated covariance between Y and well this here is just the definition of D hat
00:54:46.000 --> 00:54:54.000
right this is the prediction which we used to define our D hat here and here in the estimated
00:54:54.000 --> 00:54:59.000
variance we have the same thing right we just replaced the D hat by its definition.
00:54:59.000 --> 00:55:10.000
Now delta naught hat times Yota is just a constant right so this is a
00:55:12.000 --> 00:55:19.000
scalar multiplied with a vector of ones so that's just a constant before variance this
00:55:19.000 --> 00:55:25.000
is completely unimportant and for covariance this is also completely unimportant it would go into
00:55:26.000 --> 00:55:33.000
the means of these variables so actually we can simplify the expression which we have
00:55:33.000 --> 00:55:40.000
up here to the expression that beta one hat two stage least squares is the covariance between Y
00:55:41.000 --> 00:55:48.000
and we're just the component delta one hat set so just this component matters because only this
00:55:48.000 --> 00:55:53.000
has variance this component doesn't have any variance so it doesn't covariate with Y right
00:55:54.000 --> 00:55:59.000
and divided then by the estimated variance of delta one hat Z for the same reason
00:56:00.000 --> 00:56:04.000
this component just doesn't have any variance so all the variance is in here.
00:56:06.000 --> 00:56:14.000
Well and then you apply the rules for covariances where you know that you can factor out constant
00:56:14.000 --> 00:56:20.000
factors so this delta one hat here you can factor out the covariance the covariance of Y
00:56:21.000 --> 00:56:27.000
delta one hat Z is the same thing as the delta one hat times the covariance of Y and Z I always
00:56:27.000 --> 00:56:34.000
mean estimated covariances and similarly for the variance I can factor out the delta one two and
00:56:34.000 --> 00:56:41.000
then it is being squared so that one of the delta one hats here cancels and we are left with
00:56:41.000 --> 00:56:49.000
covariance hat of Y and Z divided by delta one hat times the variance hat of Z.
00:56:50.000 --> 00:56:57.000
We do know what the estimated delta one coefficient is that was an OLS estimate
00:56:57.000 --> 00:57:02.000
if you recall we had the formula on one of the previous slides delta one hat was the
00:57:02.000 --> 00:57:10.000
estimated covariance between D and Z divided by the variance of Z so just plug this here in
00:57:10.000 --> 00:57:18.000
for delta one hat and you see that variance hat of Z cancels against variance hat of Z here
00:57:18.000 --> 00:57:24.000
and we are just left with the estimated covariance between D and Z so we have covariance hat between
00:57:24.000 --> 00:57:33.000
Y and Z divided by covariance hat between D and Z well that's the formula here it's equal to
00:57:33.000 --> 00:57:41.000
the ratio which I just described verbally and that is as you see the IV estimator and this completes
00:57:41.000 --> 00:57:50.000
the proof right so we have proven that the IV estimator is the same thing as the two stage least
00:57:50.000 --> 00:58:01.000
squares estimator. Note here that it is also possible to estimate a so-called reduced
00:58:02.000 --> 00:58:12.000
form which is obtained by plugging equation six in equation two so equation six is this equation
00:58:12.000 --> 00:58:21.000
here which in the first stage we have estimated the alternative way to go forward would be that
00:58:21.000 --> 00:58:29.000
we say well hey why don't we just plug in for this D here this expression right so sort of in terms
00:58:29.000 --> 00:58:38.000
of population models prior to any estimation we just replace this D here by this expression which
00:58:38.000 --> 00:58:47.000
we postulate in equation six oops if we do this well then we start out with equation two which is
00:58:47.000 --> 00:58:54.000
this equation here and now we replace the D by this expression from equation six as I have just
00:58:55.000 --> 00:59:02.000
explained what does this mean this means we get a new constant here right so we have its constant
00:59:02.000 --> 00:59:08.000
beta naught being multiplied by Yota and beta one times delta naught being multiplied by Yota so we
00:59:08.000 --> 00:59:14.000
can write this as beta naught plus beta one delta naught this thing here which I now call gamma naught
00:59:15.000 --> 00:59:23.000
times Yota and then we have a new coefficient here for the new regressor Z which is the instrument
00:59:23.000 --> 00:59:30.000
for the D the D had the coefficient beta one the Z would have a coefficient of beta one times
00:59:30.000 --> 00:59:38.000
delta one this coefficient here which I call now gamma one and we have a new error term this was
00:59:38.000 --> 00:59:47.000
the U here but now it is amended by V times beta one so a new error term would be beta one times
00:59:47.000 --> 00:59:55.000
V plus U this I call epsilon which means that by plugging six into two I have on the right hand
00:59:55.000 --> 01:00:00.000
side of the equation gamma naught times Yota so it's just a constant then gamma one times the
01:00:00.000 --> 01:00:07.000
instrument Z plus epsilon well this Z here of course should be exaggerated so it doesn't
01:00:07.000 --> 01:00:14.000
correlate with the epsilon anymore therefore we can just estimate this reduced form equation
01:00:14.000 --> 01:00:21.000
eight by regressing Y on the right hand side here so on a constant and on Z
01:00:22.000 --> 01:00:28.000
what is the OLS estimator of this gamma one coefficient we apply the same formulas as we
01:00:28.000 --> 01:00:34.000
have done before gamma one hat is of course the covariance the estimated covariance between Y and
01:00:34.000 --> 01:00:44.000
Z divided by the estimated variance of Z so that's the OLS formula right how can we rewrite this
01:00:44.000 --> 01:00:50.000
well we can write this as the estimated covariance of Y and Z same thing as before
01:00:51.000 --> 01:00:59.000
multiplied by the covariance between V and Z divided by the same covariance so that this
01:00:59.000 --> 01:01:05.000
thing here actually cancels again against this thing and then divided by the estimated
01:01:05.000 --> 01:01:12.000
variance of Z which we had here so obviously this more complicated term here is the same thing
01:01:12.000 --> 01:01:19.000
as this one here why do i do the complication here because i want to interpret the gamma one
01:01:19.000 --> 01:01:27.000
hat in terms of our previous estimates from the IV or two-stage least squares regression IV is the
01:01:27.000 --> 01:01:31.000
same thing as two-stage least squares as we have shown so you may actually say this here is the
01:01:31.000 --> 01:01:37.000
two-stage least squares estimator in the second stage and the delta one had the two-stage least
01:01:37.000 --> 01:01:46.000
squares estimator in the first stage and you see that what we have here the covariance head of Y
01:01:46.000 --> 01:01:53.000
and Z divided by the covariance head of D and Z that was precisely our beta one IV or beta one
01:01:53.000 --> 01:02:01.000
two-stage least squares and covariance head of D and Z divided by variance head of Z is precisely
01:02:01.000 --> 01:02:08.000
our delta one hat so we have shown here that the OLS estimator gamma one hat of the reduced
01:02:08.000 --> 01:02:17.000
form equation is the same thing as the IB estimator beta one hat multiplied by the delta one hat
01:02:19.000 --> 01:02:26.000
now that's very intuitive of course that gamma one hat is beta one hat times delta one hat
01:02:26.000 --> 01:02:32.000
because we know that gamma one was defined as beta one times delta one right so this is like
01:02:32.000 --> 01:02:41.000
it actually should be which allows us to say that the two-staged least squares estimator beta
01:02:42.000 --> 01:02:49.000
of beta one right is the ratio of the OLS reduced form estimator for gamma one this gamma one hat
01:02:49.000 --> 01:02:57.000
here and the OLS first stage estimator for delta so this delta one hat here so that's different
01:02:57.000 --> 01:03:04.000
representation of the two-stage least squares estimator for beta one or equally equivalently
01:03:04.000 --> 01:03:16.000
for the IV estimator of beta one all right and so that's all very nice and
01:03:16.000 --> 01:03:26.000
uh very much in line with each other um not completely new to you because you have seen
01:03:26.000 --> 01:03:33.000
similar things already in the right view of basic economics now let's talk about the standard
01:03:33.000 --> 01:03:39.000
errors of these estimates because currently we have only determined how do we find point
01:03:39.000 --> 01:03:44.000
estimates but we would not know anything about the significance of these point estimates so we
01:03:44.000 --> 01:03:51.000
need standard errors for any type of hypothesis testing we would need the standard errors of the
01:03:51.000 --> 01:03:56.000
estimated coefficients let's say two-stage least squares coefficients or IV coefficients
01:03:57.000 --> 01:04:03.000
well obviously the standard errors are the square roots of the diagonal of the covariance matrix
01:04:03.000 --> 01:04:10.000
of the two-stage least squares estimator right this covariance matrix we denote by V of beta hat
01:04:10.000 --> 01:04:16.000
two-stage least squares this doesn't help us by much because we would need to know what actually
01:04:16.000 --> 01:04:21.000
is the covariance matrix here and if we have it then it's easy to pull off the diagonal element
01:04:21.000 --> 01:04:28.000
and take the square roots of the diagonal elements well so let's turn to the question what is the
01:04:28.000 --> 01:04:37.000
covariance matrix of uh beta hat two-stage least squares well this covariance matrix is equal to
01:04:37.000 --> 01:04:41.000
the covariance matrix of the corresponding IV estimates because the two-stage least squares
01:04:41.000 --> 01:04:46.000
estimate are the same as the IV estimates so we would know that this covariance matrix which we
01:04:46.000 --> 01:04:53.000
look for is the same thing as the covariance matrix of the IV estimator but in the review
01:04:53.000 --> 01:04:58.000
of basic econometrics we had already derived a large sample approximation for the covariance
01:04:58.000 --> 01:05:06.000
matrix of beta hat IV so we know the covariance matrix of the two-stage least squares estimator
01:05:06.000 --> 01:05:13.000
is equal to the covariance matrix of the instrumental variables estimator and that
01:05:13.000 --> 01:05:20.000
is approximately then equal to sigma square u times z prime x inverse z prime z x prime z
01:05:20.000 --> 01:05:27.000
inversely if the number of observations is large so we can evoke asymptotic arguments
01:05:29.000 --> 01:05:35.000
now clearly we don't only need to estimate this matrix here which is easily done since we know
01:05:35.000 --> 01:05:40.000
the z matrix and the x matrix but we also need to estimate the sigma square u that's a parameter
01:05:40.000 --> 01:05:47.000
which we haven't estimated so far but very clearly the usual estimate would be sigma square u hat
01:05:47.000 --> 01:05:52.000
which is one over n minus k so divided by the number of degrees of freedom
01:05:53.000 --> 01:05:59.000
u hat prime u hat right and u hat would then be the residuals from the IV regression
01:05:59.000 --> 01:06:08.000
so given such an estimate sigma square hat u we would obtain the standard errors of the IV
01:06:08.000 --> 01:06:13.000
estimates and these are identical to the standard errors of the two-stage least squares estimates
01:06:15.000 --> 01:06:22.000
now why do i do this in this detail because i want to warn you that if you compute a two-stage
01:06:22.000 --> 01:06:28.000
least squares estimator in a computer by separately running the first stage and the
01:06:28.000 --> 01:06:34.000
second stage regression in the way i just taught it to you then the standard computer routines will
01:06:34.000 --> 01:06:42.000
return wrong estimates of the standard errors in the second stage why well because the computer
01:06:42.000 --> 01:06:46.000
does not recognize that the second stage is actually the second stage of a two-stage least squares
01:06:46.000 --> 01:06:53.000
regression the command which you use in your computer to run the second stage regression
01:06:53.000 --> 01:07:00.000
would just be an OLS command and therefore the standard computer routines would just compute
01:07:00.000 --> 01:07:06.000
the standard covariance matrix which is applicable for an OLS regression but not the covariance
01:07:06.000 --> 01:07:12.000
matrix slightly more complicated covariance matrix which is the large sample approximation
01:07:12.000 --> 01:07:18.000
for IV or two-stage least squares estimates right the computer in the second stage of the
01:07:19.000 --> 01:07:26.000
two-stage least squares regression would compute the covariance matrix as sigma u squared hat
01:07:27.000 --> 01:07:34.000
like this thing here and then just times x prime x inverse not this expression here
01:07:37.000 --> 01:07:44.000
okay to see this formally suppose that we first run the general first stage regression which is
01:07:44.000 --> 01:07:55.000
analogous to our equation six so we just run a regression of x of the
01:07:56.000 --> 01:08:06.000
progressive matrix which is possibly endogenous on z and some matrix of coefficients delta plus
01:08:06.000 --> 01:08:14.000
some error term v for this regression here well perhaps the easiest would be to think of x not
01:08:14.000 --> 01:08:21.000
as a matrix but as a vector right so just confine your attention to the case where x is just one
01:08:21.000 --> 01:08:28.000
single regressor so then we would project the x on the instrument which means that we would run a
01:08:28.000 --> 01:08:34.000
first stage regression of x on the instrument z estimating a coefficient delta and we have some
01:08:35.000 --> 01:08:42.000
error v here from this regression 10 we would then get the projections of x on
01:08:43.000 --> 01:08:54.000
z so the predicted value of the regressor x given z this would be x hat x hat would be z times delta
01:08:54.000 --> 01:09:02.000
hat and well delta hat we know such a regression is equal to z prime z inverse is at prime x
01:09:03.000 --> 01:09:08.000
and because that is the regressor matrix and x is the dependent matrix so rather than using
01:09:08.000 --> 01:09:15.000
the usual ols formula x prime x x prime x inverse times x prime y in this case we would have z prime
01:09:15.000 --> 01:09:25.000
z inverse z prime x because x is the endogenous variable so this thing here is delta hat and this
01:09:25.000 --> 01:09:32.000
is the z matrix by which we pre-multiply the delta hat recall that z prime z inverse z prime
01:09:32.000 --> 01:09:38.000
is just z plus so we can also write this as z bupendros inverse times x then
01:09:41.000 --> 01:09:49.000
the second stage regression would be a projection of y on x hat quite analogous to equation seven
01:09:49.000 --> 01:09:55.000
so again we use ols to run a regression and in this case we would regress y which should
01:09:55.000 --> 01:10:05.000
actually be capital y here and there y on x hat times beta so the predicted values of x
01:10:07.000 --> 01:10:18.000
x hat times beta times plus w so equation 12 clearly follows from 10 and 11 because we have
01:10:18.000 --> 01:10:27.000
y is equal to x beta plus u and well we know the x is equal to x hat plus v hat much like we had
01:10:27.000 --> 01:10:35.000
the property that the d was equal to d hat plus v hat times beta plus the u and then obviously
01:10:35.000 --> 01:10:41.000
in terms of observables this is the component x hat times beta and the v hat beta plus u
01:10:41.000 --> 01:10:51.000
is unobservable so we collect this in a new era term w what is the v hat from this regression
01:10:51.000 --> 01:11:01.000
well v hat is x minus x hat so it is x minus z z plus x since we have computed x hat as z z plus x
01:11:02.000 --> 01:11:09.000
so we can factor out the x vector here and we get i minus z z plus which is the well-known m matrix
01:11:09.000 --> 01:11:15.000
which we have encountered in the review of basic econometrics m now with reference to matrix z
01:11:15.000 --> 01:11:21.000
times x so these are the residuals from estimating equation 10 by ols
01:11:24.000 --> 01:11:31.000
the result of the second stage regression 12 is then this two stage least squares estimator
01:11:31.000 --> 01:11:39.000
so it would be x hat prime x hat inverse times x hat prime y and therefore it would be x prime
01:11:39.000 --> 01:11:49.000
z times z plus z z plus x and we know from the properties of the mupennaros inverse that z plus
01:11:49.000 --> 01:11:57.000
times z times z plus simply simplifies to z plus we take the inverse of this whole thing here
01:11:58.000 --> 01:12:08.000
write again the x hat prime here as x prime z z plus so just transposing our x plus or x hat
01:12:08.000 --> 01:12:15.000
matrix and multiplying by y and then we would get x prime z z prime z inverse z prime x
01:12:15.000 --> 01:12:26.000
inverse times here x prime z z prime z inverse z prime y that's the result of second stage regression
01:12:28.000 --> 01:12:36.000
now as an exercise look at what we have done here and assume assumptions a 9 a 10 and 11 from a 11
01:12:36.000 --> 01:12:44.000
from the review of basic econometrics after doing that use this expression sorry use this
01:12:44.000 --> 01:12:52.000
expression in 14 to verify that for large n we would have that the covariance matrix of beta hat
01:12:52.000 --> 01:12:58.000
two stage least squares is equal to the covariance matrix of beta hat iv which would approximately be
01:12:58.000 --> 01:13:06.000
sigma square u times z prime x inverse z prime z x prime z inverse so we have verified then again
01:13:06.000 --> 01:13:11.000
that the covariance matrix of the two stage least squares estimator is the same thing as
01:13:11.000 --> 01:13:19.000
the covariance matrix of the iv estimator but the standard computer output that was the point
01:13:19.000 --> 01:13:24.000
i wanted to make we compute the standard errors of the second state regression using the usual
01:13:24.000 --> 01:13:33.000
ols type formula so for equation 12 for this regression here which as i said we estimate by
01:13:33.000 --> 01:13:41.000
ols the standard computer output would compute the covariance matrix as w hat prime w hat
01:13:41.000 --> 01:13:51.000
divided by n minus k so that's the estimate of sigma square w hat times x hat prime x hat
01:13:51.000 --> 01:13:57.000
inverse z and this is a different expression because this would be well again the estimate
01:13:57.000 --> 01:14:05.000
of the variance use the definition of the x hats here to write this as x prime z z plus z z plus x
01:14:07.000 --> 01:14:14.000
again we see of course by the properties of the mopendro's inverse for z z plus z is equal to z
01:14:14.000 --> 01:14:21.000
or if you like z plus z z plus is equal to z plus in any case you will arrive at just z z plus here
01:14:22.000 --> 01:14:31.000
so you get this expression here the estimate of the residual variance times x prime z z plus x inverse
01:14:33.000 --> 01:14:46.000
now 16 and 15 here 16 and 50 differ by the variance vectors sigma square w and sigma square u
01:14:46.000 --> 01:14:58.000
respect from equation 13 we know how w is related to u because w is equal to v hat times beta plus u
01:14:58.000 --> 01:15:06.000
so we know that sigma square w is equal to beta squared times the variance of sigma square v
01:15:07.000 --> 01:15:13.000
so variance of v plus beta times the covariance between v and ui plus sigma u squared
01:15:14.000 --> 01:15:21.000
so sigma square w is in general different from sigma square u and therefore the estimate which
01:15:21.000 --> 01:15:26.000
the computer routine gives you is the estimate of the standard errors is different from the
01:15:26.000 --> 01:15:35.000
estimates which are actually applicable now for simplicity in this calculation i have assumed
01:15:35.000 --> 01:15:45.000
that v and u are homo-scatastic this small exercise here show that the covariance between vi
01:15:45.000 --> 01:15:51.000
and ui is different i think it's difficult but it is also not difficult to compute the correct
01:15:51.000 --> 01:15:57.000
standard errors some programs like for instance data have actually specific commands to compute
01:15:57.000 --> 01:16:02.000
the correct standard errors but this is not a general feature of econometric software so
01:16:02.000 --> 01:16:05.000
other programs may not have these commands and therefore you should be aware of the fact
01:16:06.000 --> 01:16:11.000
that in the second stage of a two-stage v-squares regression you may encounter
01:16:11.000 --> 01:16:15.000
wrong estimates of the standard errors since the residual variance is estimated incorrect
01:16:19.000 --> 01:16:27.000
obviously the z matrix can include more than one and in principle any valid exogenous variable
01:16:27.000 --> 01:16:33.000
right we do not need to instrument one column of the regressor matrix by just one instrument we can
01:16:33.000 --> 01:16:42.000
use more than just one instrument it is important to include all relevant controls or all relevant
01:16:42.000 --> 01:16:50.000
instruments in the first and in the second stage regressing since that is not included in the first
01:16:51.000 --> 01:16:58.000
regression this is also referred to as the exclusion restriction z does not have an
01:16:58.000 --> 01:17:05.000
independent effect on y except through the x and in the specific example z does not have an
01:17:05.000 --> 01:17:10.000
independent effect on capital y i distinguish here between the general y which i write in small
01:17:10.000 --> 01:17:16.000
letters and i think a specific example where we had the treatment variable d rather than the general
01:17:17.000 --> 01:17:23.000
the general regressor matrix x so the specific example that doesn't have an
01:17:23.000 --> 01:17:31.000
independent effect on y except through d so this two-stage least squares estimation can
01:17:31.000 --> 01:17:37.000
also be used if there is more than one endogenous variable and there are at least as many instruments
01:17:37.000 --> 01:17:50.000
well i think this is it for today i will next lecture return to the example of wage determination
01:17:51.000 --> 01:17:59.000
i'm aware that this was again a lecture which was somewhat heavy on theory even though in principle
01:17:59.000 --> 01:18:05.000
you should have been aware of this theory already you have seen that already in the review of basic
01:18:05.000 --> 01:18:12.000
kinematics so partially it was just a repetition of what we have already done and at some length
01:18:12.000 --> 01:18:18.000
but nevertheless of course it may be that you have questions relating to what i have just said and
01:18:18.000 --> 01:18:22.000
since this is a recorded lecture you didn't have any opportunity to pose the question you're
01:18:22.000 --> 01:18:28.000
welcome to do this in the next lecture then which will be live again i will ask you as always at the
01:18:28.000 --> 01:18:34.000
beginning of the lecture if you have questions concerning material of past lectures and please
01:18:34.000 --> 01:18:39.000
feel free just to speak out in particular with the reference to the two lectures which i have
01:18:39.000 --> 01:18:46.000
now recorded and we can clarify anything which may perhaps need clarification before we continue
01:18:46.000 --> 01:18:59.000
the lecture here with our return to the example of wage determination