WEBVTT - autoGenerated
00:00:00.000 --> 00:00:23.000
And we can continue where we left off last time, which was actually a discussion of the instrumental variables estimator in a bivariate model so just one explanatory variable which we can observe, which has a coefficient of beta one.
00:00:23.000 --> 00:00:41.000
This coefficient is the coefficient we want to estimate, or where our prime interest is of course there's also a constant in the regression with a coefficient of it or not, but we're not so interested in the constant so the discussion focuses on the estimate
00:00:41.000 --> 00:00:56.000
of the estimate of beta one, which, as we know, will not be consistent if there is a second variable which is an unobserved or which will not be consistent.
00:00:57.000 --> 00:01:22.000
If the regressor one correlates for some other reason with the error term. So one way out of this is to find a suitable instrument and then perform an instrumental variables estimation, as we have already introduced that in the review of basic econometrics and then discussed it again last Thursday.
00:01:22.000 --> 00:01:47.000
Now in this specific case we are dealing with in these last lectures, we would have some type of treatment D and it may be that the treatment D is itself partially endogenous, meaning that the treatment D may perhaps correlate with the error term of regression,
00:01:47.000 --> 00:01:58.000
in which case the consistency of the estimate of the coefficient of the treatment indicator, so the consistency of a estimate of beta one would not be granted.
00:01:58.000 --> 00:02:16.000
So the instrumental variables approach, as you recall, consists in finding some instrument Z, which would be correlated with the treatment indicator D and then pursue instrumental variables estimation.
00:02:16.000 --> 00:02:36.000
However, it needs to be ensured not only that the covariance between Z and D is different from zero so that there's correlation between the endogenous regressor and the instrument, but we also need to ensure that the instrument is uncorrelated with the disturbance of the regression.
00:02:36.000 --> 00:02:54.000
And we will in this lecture now discuss this again using the already very familiar, I hope, example of the regression of wages on education and ability or just on education if ability is unobservable.
00:02:54.000 --> 00:02:59.000
So we come to this shortly.
00:03:00.000 --> 00:03:20.000
As I said, provided that there is a nonzero covariance between instrument and the treatment variable, the regressor, and provided that the covariance between the error term ui for individual i and the instrument for individual i is zi
00:03:20.000 --> 00:03:31.000
is equal to zero, then the instrumental variables estimator which I denote as beta one IV hat is consistent. It's a consistent estimator of beta one.
00:03:31.000 --> 00:03:39.000
So there are two conditions which are necessary for a good instrument.
00:03:39.000 --> 00:03:54.000
These are actually those two conditions here which I now repeat with giving them certain names, namely there is the relevance condition, the relevance condition means that the instrument is highly correlated.
00:03:54.000 --> 00:04:10.000
Which is important that the correlation is pronounced that the instrument, the instrument is highly correlated with the endogenous regressor regressor so let's say the absolute value of the correlation, shall be far off zero.
00:04:10.000 --> 00:04:25.000
I take the absolute value of the correlation here because it doesn't matter whether there's positive or negative correlation. What is important is that there is substantial correlation between the instrument and treatment indicator in this case.
00:04:25.000 --> 00:04:33.000
So that's called the relevance condition, the instrument needs to measure something which is relevant about the treatment indicator.
00:04:33.000 --> 00:04:51.000
But second, there's also the validity condition. The validity condition means that the instrument is uncorrelated with the error term so that the correlation between z and u or better perhaps I should have written here zi and ui is equal to zero.
00:04:51.000 --> 00:05:06.000
So only in this case where the correlation between zi and ui is equal to zero, would we have a valid instrument, otherwise it wouldn't just be an instrument.
00:05:06.000 --> 00:05:23.000
Okay, now if this validity condition holds, then this means that z affects the dependent variable y only through the channel of d, there is no independent influence of z on y.
00:05:23.000 --> 00:05:48.000
Because, as you can clearly imagine, what we have as a true model the regression model that y is affected by d, but not by z separately, then this precisely means that the only effect a variable z, which is correlated with d, may have on y goes through d.
00:05:48.000 --> 00:06:01.000
So very clearly, that's the only channel which z has to affect y given that y depends in the true model only on d.
00:06:01.000 --> 00:06:10.000
Now as I said we will discuss this again with the example of wage determination which actually in this lecture will be split in two parts.
00:06:10.000 --> 00:06:29.000
Here I first give just a very quick review of what the setup of the model is and add some more information on how to use instrumental variables, and then we do some theory again and then we return to the model and continue the fashion in terms of the model.
00:06:29.000 --> 00:06:45.000
So, let's assume that the true model has this particular form of wage determination. So wages are determined by constant plus education, plus ability, plus of course an error term.
00:06:45.000 --> 00:07:03.000
And again, as always, we assume that beta one and beta two as the measures of the influence of education and ability respectively on wages are positive, so there's positive correlation between wage and education and there's positive correlation between wage and ability.
00:07:03.000 --> 00:07:26.000
Also again we assume of course that ability cannot be measured, is not observed, so we just have the observed variable education, and we also suppose that ability and education are positively correlated, as seems plausible, the more able people typically have higher levels of education.
00:07:26.000 --> 00:07:50.000
So in this setup we know that OLS estimation of a regression equation of type five, where wages would just be regressed on a constant and on education, that OLS estimation of this equation here is inconsistent because the regressor education would correlate with the error term V.
00:07:51.000 --> 00:08:03.000
Since V is actually equal to beta two ability plus U, so V would be exactly what we do not observe and among the things we do not observe is ability.
00:08:03.000 --> 00:08:19.000
Now since ability correlates with education, we know that V correlates with education, and this would be a correlation even on the individual level, so for each observation it would be true, for each individual it would be true that V correlates with education
00:08:19.000 --> 00:08:25.000
and therefore the OLS estimate is neither unbiased nor is it consistent.
00:08:27.000 --> 00:08:38.000
However, we may think about finding a good instrument for education, which we then may use in the context of an instrumental variable estimate.
00:08:39.000 --> 00:09:02.000
I give you here some examples of what might be a valid instrument, and perhaps you can just think about it for a minute, which of the following four possible instruments would be correlated with education, but uncorrelated with ability.
00:09:02.000 --> 00:09:20.000
These are the two checks we have to make. So we need to check the relevance condition and the validity condition. One possible instrument could be the number of siblings a person has.
00:09:21.000 --> 00:09:34.000
Another instrument could be the education of the father or also of the mother, but I take the father here because later in empirical data set we will have the information on the education of the father.
00:09:34.000 --> 00:10:00.000
Or there may be the geographical proximity to a university, which counts for, or which is correlated with education, or it may be the tuition fees, whether they are high or low at the university, which the individual has had access to.
00:10:00.000 --> 00:10:14.000
So the questions that we actually have to consider is, are these four instruments, which of these four instruments is highly correlated with education and uncorrelated with ability.
00:10:14.000 --> 00:10:29.000
Now if you think of it actually in class I would now ask you to discuss it, but here this is a little difficult so let me do the discussion myself. If you think of a possible instrument number of siblings, then we may well say that the number of
00:10:29.000 --> 00:10:39.000
siblings is probably uncorrelated with ability and therefore uncorrelated with the error term.
00:10:39.000 --> 00:10:49.000
The number of siblings, a person has does probably not correlate with his or her ability.
00:10:49.000 --> 00:11:00.000
However, it is difficult to argue that the number of siblings would be correlated with education, at least not to a high degree.
00:11:00.000 --> 00:11:18.000
And it would even not even be really clear whether this would be positive or negative correlation if there were any correlation you might of course say in a family with many siblings parents do not devote as much attention or cannot devote as much attention
00:11:18.000 --> 00:11:31.000
to each single child as if there were just one child which is perhaps very well educated so you might think of a negative impact of the number of siblings on education.
00:11:31.000 --> 00:11:42.000
But actually that would be probably informal education and not the formal education that we measure like school degrees or university degrees.
00:11:42.000 --> 00:12:01.000
Plus, you could also make the reverse argument that with many siblings, perhaps each single child receives multiple aspects of educations also as sort of spillovers from the education, the education by parents.
00:12:01.000 --> 00:12:18.000
The siblings receive. So perhaps this is quite inspiring if somebody has had many siblings perhaps with different fields of interest and therefore perhaps actually education is higher for families with many siblings.
00:12:18.000 --> 00:12:28.000
In general, I think it is fair to say that we cannot expect a high degree of correlation between the number of siblings and education.
00:12:28.000 --> 00:12:41.000
So while it is true that siblings are probably not correlated with ability it may well be that we also would find it is not correlated with education, which actually we could of course test right we could just compute.
00:12:41.000 --> 00:12:52.000
We could just observe education, we could just compute the correlation between education and the number of siblings and see whether there is any or we could run a regression of education on the number of siblings.
00:12:52.000 --> 00:13:18.000
And that's actually fairly common tool that we test the validity, no, actually the relevance of an instrument by just running a regression of the variable to be instrumented, in this case, education on the instrument and see how well does the instrument explain the endogenous regressor.
00:13:18.000 --> 00:13:41.000
How about education of the father education of the father one may say that the education of the father was determined before the ability of the individual was determined, assuming that the father went through his education before he fathered children.
00:13:41.000 --> 00:14:02.000
So the ability was determined after the education of the father and therefore one might argue that the education of the father is uncorrelated with the ability and the education of the father may well be correlated with the education of the children.
00:14:02.000 --> 00:14:23.000
So here we may perhaps expect a high degree of correlation well educated fathers may have well educated children so that's rather possible, but less plausible is perhaps the link between or the non link between education of the father and the ability.
00:14:23.000 --> 00:14:48.000
Basically, perhaps, because there is the question hinging in the room of whether there is something like a genetic influence like if the education of the father is positively correlated with his ability and ability is by genetic reasons transmitted to the children,
00:14:48.000 --> 00:14:57.000
then there may actually be correlation between the education of the father and ability.
00:14:57.000 --> 00:15:02.000
Geographical proximity to a university.
00:15:02.000 --> 00:15:22.000
Is this a valid instrument. Well, the geographical proximity to a university is certainly not correlated with the ability, or almost certainly not correlated with the ability of a student, even though perhaps even here one could doubt this arguing that perhaps in the proximity
00:15:22.000 --> 00:15:38.000
of universities is that they probably live people who work at the university, and perhaps people who work at a university have a higher ability than the average population so perhaps even here we would have a correlation between geographical proximity
00:15:38.000 --> 00:15:59.000
and ability. But even if this correlation of were absent, then it is quite doubtful whether education is really dependent on geographical proximity to a university at least in the society where students don't mind, traveling, or don't mind moving
00:15:59.000 --> 00:16:20.000
to another place and experience a different place to go through their university education. So, I would personally be very doubtful that education has a high degree of correlation with the geographical proximity to the university.
00:16:20.000 --> 00:16:41.000
There would be tuition fees as a possibility to instrument the education variable, and that would probably be a negative correlation, the higher the tuition fees are the more costly it is for individuals to get education.
00:16:41.000 --> 00:17:01.000
And therefore, one might say that high tuition fees are negatively correlate with education. Here, however, the problem is which tuition fees would we actually take? Would we take any kind of tuition fees anywhere in the country, including those institutions
00:17:01.000 --> 00:17:22.000
which perhaps have no tuition fees at all, like public schools, or would we just use the tuition fees at the university where the individual underwent its education, where then one might perhaps say that the higher the tuition fees, the better was the university,
00:17:22.000 --> 00:17:40.000
and therefore the better is the degree of education. So, not quite clear which way the correlation would go here, whereas it is fairly clear that tuition fees are probably not correlated with ability as it should be the case.
00:17:40.000 --> 00:18:06.000
So, it is not so easy in general to find a valid instrument and we will later in an experical example work with the education of the father as an instrument, even though there may be doubts about the validity of the relevance of this instrument, particularly actually about the validity of the instrument, not so much about the relevance.
00:18:06.000 --> 00:18:10.000
And we'll see how far we get with this.
00:18:10.000 --> 00:18:33.000
As I have already said, in general, we can test the relevance criterion by regressing education on a possible instrument and then checking how well the instrument explains the endogenous regressor by just looking at the t statistic or in this case one typically looks at the F statistic.
00:18:33.000 --> 00:18:44.000
There is a rule of thumb, which says that an instrument is good, only if the F statistic of such a regression exceeds the value of 10.
00:18:44.000 --> 00:18:49.000
That's a mere rule of thumb, but it's fairly often used in econometrics.
00:18:49.000 --> 00:19:01.000
Why do we look at the F statistic and not at the t statistic? Well, for the simple reason that often we actually use more than just one instrument. So we would
00:19:01.000 --> 00:19:12.000
regress education, not only on one of those four instruments, but possibly on more than that, perhaps on the letter three because the number of siblings is not so promising an instrument.
00:19:12.000 --> 00:19:24.000
But we might pick the letter three and then regress education on three instruments at a time. It actually may use three instruments at a time to instrument the variable education.
00:19:24.000 --> 00:19:48.000
And then, of course, we would end up with three t statistics and it's not easy or not possible to formulate a rule of thumb there, but we would have one F statistic for the three instruments and the rule of thumb would tell us if the F statistic exceeds 10, then the instruments are relevant.
00:19:48.000 --> 00:20:05.000
Now, as you know, as we have already covered it in the review of the basic econometrics, there's a close relationship between instrumental variables and the two-staged least squares estimation, 2SLS.
00:20:05.000 --> 00:20:26.000
And in the two-stage least squares estimation, we do precisely what I just explained, that we first regress the endogenous regressor on the instrument. So that would be the first step in the two-stage least squares estimation.
00:20:26.000 --> 00:20:44.000
And hence it is useful to look at 2SLS estimation as one way to perform instrumental variables estimation. So as you recall, the two-stage least squares estimator is actually an instrumental variables estimator.
00:20:44.000 --> 00:20:55.000
And the first stage of the two-stage least squares estimator is precisely what I discussed here, regressing the endogenous regressor on the instrument.
00:20:55.000 --> 00:21:01.000
So let's have a look at two-stage least squares estimation now.
00:21:01.000 --> 00:21:11.000
First stage, as I said, is to identify the exogenous variation in the endogenous regressor, in this case in our causal variable D.
00:21:11.000 --> 00:21:20.000
So the variable which indicates the treatment and we do this by regressing the treatment variable on the set of instruments set.
00:21:20.000 --> 00:21:31.000
In general, this can be a set of instruments, so more than just one, but mostly in this lecture, I would consider the case where we have just one instrument to explain D.
00:21:31.000 --> 00:21:48.000
Now, if this instrument here is exogenous in the sense that doesn't correlate with contemporaneous error terms, then obviously Z captures the exogenous variation in the treatment variable, but not the endogenous variation in the treatment variable.
00:21:48.000 --> 00:22:01.000
And this is precisely why we can, in the formula of an instrumental variables estimator, replace the D or the general the regressor X by the instrument Z.
00:22:01.000 --> 00:22:09.000
And therefore, identify what part of the variation in the regressor is exogenous.
00:22:09.000 --> 00:22:26.000
And then in the second stage of the two-stage least squares estimator, as you may recall, we use this exogenous variation in D as a regressor in the causal progression, which explains then the influence of D on Y.
00:22:26.000 --> 00:22:46.000
So look at a set up of this case here, we have some observation, some outcome variable Y as we had it before, and we have some treatment variable D, and we would like to know what is the causal effect of D on Y.
00:22:46.000 --> 00:22:50.000
That would usually be measured by this beta one here.
00:22:50.000 --> 00:22:59.000
But now assume that the treatment variable suffers itself from some kind of endogeneity bias.
00:22:59.000 --> 00:23:14.000
Like for instance, if the treatment is education, we may have the fact that this error term here, which used to be called V in the previous example, that this error term U here also includes ability.
00:23:14.000 --> 00:23:25.000
And therefore, there would be correlation between education, the treatment variable, and the error term, which biases the estimate of the regressor beta one.
00:23:25.000 --> 00:23:42.000
In the two-stage least squares approach, we would now in the first stage regress the regressor D on a constant and on the instrument Z. Let's say it's the education of the father, plus an error term, of course, which I call here V.
00:23:43.000 --> 00:24:02.000
As you know, in such a bivariate regression as equation six, the estimates are easily computed as the ratio of a covariance and a variance, namely the estimated covariance between the treatment variable and the instrument, divided by the estimated variance of the instrument.
00:24:02.000 --> 00:24:17.000
Where again, this notation here, I would like to remind you of that covariance hat and variance hat is the estimator of a variance which uses the full number of observations. It does not do any kind of
00:24:17.000 --> 00:24:24.000
correction for the number of degrees of freedom. So this always divides by the sample size here.
00:24:24.000 --> 00:24:49.000
So we know the delta one hat is the ratio of the covariance divided by the variance and the estimate of the constant is always, we've had these formula two, the mean of the dependent variable in regression six, so the regressor D, the mean of D minus the mean of the regressor Z, so the instrument,
00:24:49.000 --> 00:24:57.000
multiplied by the estimate delta one hat, which was defined here.
00:24:57.000 --> 00:25:13.000
So, the prediction of D then given Z is obviously D hat, which would be the estimate delta naught hat times the constant, thus delta one hat times Z.
00:25:13.000 --> 00:25:25.000
This is the prediction of D or the exogenous part of the variation in D, explained by exogenous variation in the exogenous instrument set.
00:25:25.000 --> 00:25:38.000
In the second part on the second stage of the two stage least squares estimator, we would then replace the endogenous regressor D by its prediction D hat from the first stage.
00:25:38.000 --> 00:25:48.000
So precisely by this D hat here and run regression seven, which regresses Y again on a constant, but then on D hat, rather than on D.
00:25:48.000 --> 00:26:04.000
And obviously we would have a new error term, which I now call W. W is typically different from the error term U, which we had here when we use the regressor D simply because these are different regresses.
00:26:04.000 --> 00:26:20.000
Now we have also already discussed how the estimate of beta one would look like for this regression here. It's the same formula, beta one and now index it with two stage least squares here.
00:26:20.000 --> 00:26:32.000
So the two stage least squares estimator of beta one is equal to the estimated covariance between Y and D hat divided by the variance of D hat.
00:26:32.000 --> 00:26:51.000
And obviously the constant then follows again analogously from the same formula, the mean of Y minus the mean of D hat times the two stage least squares estimator of coefficient beta one.
00:26:51.000 --> 00:27:03.000
Alternatively, we could also estimate the cause of regression two by using the instrumental variables approach by using Z as an instrument.
00:27:03.000 --> 00:27:29.000
And we have derived in the last week, the IV estimator, which is beta one hat IV as the covariance between Z and Y, the estimated covariance between Z and Y divided by the covariance, the estimated covariance between Z and D and then the constant follows again by same formula.
00:27:29.000 --> 00:27:48.000
What is interesting and what we have already also seen in the review of basic econometrics, but what I would like to explain to you again once more and actually prove it once more to you in this particular setting is that this estimate here, the IV estimate
00:27:48.000 --> 00:28:02.000
is exactly the same as the two stage least squares estimate that we had here. So what we may prove and what we will prove now is that this estimate of beta one
00:28:02.000 --> 00:28:07.000
is the same thing as this estimate of beta one.
00:28:07.000 --> 00:28:17.000
So here we have two covariances which we divide by each other, two estimated covariances and here we have a covariance, an estimated covariance divided by an estimated variance.
00:28:17.000 --> 00:28:33.000
I claim that the two are the same and I also claim that the estimate of the constant which we derived here in the two stage least squares case is the same thing as the estimate of the instrumental variables, estimators, cons and all.
00:28:33.000 --> 00:28:46.000
The estimate of is the same as the instrumental variables estimate of the constant which I denote here as beta naught hat IV, which follows from this formula.
00:28:46.000 --> 00:28:56.000
Now, how can we prove that? Here is the proof. First note that the mean over d bar is the same thing as the mean over d hat.
00:28:56.000 --> 00:29:13.000
So the mean over d, denoted d bar is the same as the mean over d hat, denoted d hat bar. Why is this so? It follows immediately because we know that the regression residuals always sum to zero when we have a constant in the regression.
00:29:13.000 --> 00:29:31.000
So what we do know is that Yota prime nu hat V hat is equal to zero, where the V hat is obviously just the residual of equation six, so of this equation here, right?
00:29:31.000 --> 00:29:47.000
In general, actually, all those estimated residuals here, U hat, V hat and W hat here will sum to zero. So in particular, Yota prime V hat is equal to zero.
00:29:47.000 --> 00:29:57.000
Now we know that d is decomposed into estimated d hat and estimated residual V hat.
00:29:57.000 --> 00:30:16.000
So by pre-multiplying this equation here by Yota prime, we get Yota prime d is the same thing as Yota prime d hat, and then dividing through by the sample size, we get that mean over d is the same as the mean over d hat.
00:30:16.000 --> 00:30:22.000
Therefore, we have proven this equality here.
00:30:22.000 --> 00:30:44.000
So, it follows now that the two estimates of the constant, once in the IV case, and then in the two-stage least squares case, are equal if and only if the estimates of meter one in the IV case and in the two-stage least squares case are equal.
00:30:44.000 --> 00:31:03.000
Because if you look again at the estimates which we have here, this is the estimate of the constant in the two-stage least squares case. There is Y bar minus this term.
00:31:03.000 --> 00:31:17.000
And this with this estimate of the constant in the IV case, there's also Y bar, so that's the same thing. And then there is d bar times the estimate of meter one in the IV case.
00:31:17.000 --> 00:31:28.000
Now we know that this d bar is equal to this d hat bar. So the Y bar and the d hat bar or d bar are the same in both cases.
00:31:28.000 --> 00:31:43.000
So what needs to be proven is that the estimates of beta one in this two-stage least squares case and in the IV case are the same. If this is the case, then obviously the constants are also identical.
00:31:43.000 --> 00:31:46.000
The estimates of the constants are also identical.
00:31:47.000 --> 00:32:02.000
So we have to prove this here and then we are done. Now, the estimate of beta one in the two-stage least squares case is, as I already said, this estimated variance divided by this variance here.
00:32:02.000 --> 00:32:13.000
And for d hat, we can now substitute in its definition, which would be delta naught hat times Yota plus delta one hat times Z.
00:32:13.000 --> 00:32:22.000
And the same thing we can do here. This is just the definition of d hat, which I have plugged in here.
00:32:22.000 --> 00:32:30.000
And when you look at the covariance between Y, which is the vector of observations, this is something which varies.
00:32:30.000 --> 00:32:35.000
And just at the first term here, then you see this first term is a constant.
00:32:35.000 --> 00:32:41.000
So something which has some variation and something which has a constant have a covariance of zero, obviously.
00:32:41.000 --> 00:32:56.000
Right. And same thing here. This is a constant. The constant doesn't play a role when computing the variance. So we can actually forget these constant terms here and conclude that beta one in the two-stage least squares case.
00:32:56.000 --> 00:33:09.000
So same thing as here is just the covariance, estimated covariance of Y and delta one hat Z. So this component here matters, not this one.
00:33:09.000 --> 00:33:20.000
And same thing down here. It's the variance of delta one hat Z, which matters. So forget about this thing. It's just delta one hat Z, which matters.
00:33:20.000 --> 00:33:29.000
Okay. Now, we also know that such constant terms like delta one hat can be pulled out of covariance and variance expressions.
00:33:29.000 --> 00:33:44.000
So it's the same thing as delta one hat times the covariance hat of Y and Z, or delta one squared, because this is a variance where the delta one comes up twice, times the estimated variance of Z.
00:33:44.000 --> 00:33:57.000
So the square here goes away against the delta one hat here. We are left with covariance hat of Y and Z divided by delta one hat and the estimated variance of Z.
00:33:57.000 --> 00:34:07.000
Well, we know what the delta one hat is. The estimate of the delta one hat was estimated covariance of D and Z divided by estimated variance of Z.
00:34:07.000 --> 00:34:17.000
So we can plug this into this expression here, and then we get that beta one hat two stage least squares, which was this expression here.
00:34:17.000 --> 00:34:29.000
Plugging in the delta one hat gives us this expression here from which the variance of Z, the estimated variance of Z cancels in numerator and denominator.
00:34:29.000 --> 00:34:51.000
So we're left with the ratio of the estimated covariances of Y and Z and estimated covariance of D and Z. And this is just the IV estimator of beta one, and thereby we have proven that two stage least squares and the instrumental variables estimator are just the same thing.
00:34:51.000 --> 00:35:01.000
Any questions so far?
00:35:01.000 --> 00:35:15.000
Okay, and what we could also do, just as a note, we could also estimate the reduced form, which we get by plugging equation six into equation two.
00:35:15.000 --> 00:35:34.000
So what we can also do is rather than estimate in two stages, right, two, so first stage here, second, sorry, first stage here and then second stage on the next page, we can also plug in this equation six into two directly, right,
00:35:34.000 --> 00:35:46.000
and then estimate this, which would be called the reduced form because we have substituted the endogenous variable and just explain why by the exogenous variable Z.
00:35:46.000 --> 00:36:03.000
So that's what I remind you of in this note here. This is equation two, where Y is explained by the treatment variable D. Now I can plug in the equation six, which would be this equation here.
00:36:03.000 --> 00:36:15.000
For D, where D is explained by Z. And then I can collect terms because obviously this here, for instance, is the constant of the new regression.
00:36:15.000 --> 00:36:24.000
So this would be beta naught plus beta one delta naught times zeta, that's the constant. I call this new coefficient of the constant gamma naught.
00:36:25.000 --> 00:36:34.000
And then we would have coefficient beta one delta one times Z. This is beta one times delta one times Z here, which I call gamma one.
00:36:34.000 --> 00:36:50.000
And then we would have a new error term, beta one V plus U, which I would define as epsilon. So that would be the reduced form regression here, where Y would be regressed directly just on the instrument.
00:36:50.000 --> 00:37:08.000
So no instrumenting, no combination of D and Z, the estimator, but just now estimating gamma naught and gamma one by the usual expression Z prime, Z inverse, Z prime, Y, so by an OLS estimator.
00:37:08.000 --> 00:37:10.000
Right?
00:37:10.000 --> 00:37:24.000
So this OLS estimator in the case of the coefficient of interest of gamma one would be again, of course, the covariance, the estimated covariance of Y and Z divided by the estimated variance of Z.
00:37:24.000 --> 00:37:38.000
And well, this thing we can, of course, then also write by multiplying with covariance, the estimated covariance of D and Z both in denominator and numerator.
00:37:38.000 --> 00:37:50.000
And when we have done it this way, we see that's here, this here is just the IV estimate, beta one IV hat. So this term here.
00:37:50.000 --> 00:38:03.000
And this thing here is just delta one hat, which of course makes sense, because we know that when estimating gamma, we estimate beta one times delta one.
00:38:03.000 --> 00:38:23.000
So we would have an estimate here, which actually consists of the estimates, the IV estimate of beta one and the auxiliary regression estimate of delta one, of delta one, that would be the same thing as our gamma one hat here.
00:38:23.000 --> 00:38:36.000
So, as I say, that gamma one hat is the product of these two estimates is very intuitive because gamma one is just the product of the two coefficients.
00:38:36.000 --> 00:38:53.000
Therefore, we also know that the two stage least squares estimator of beta one, beta one two stage least squares hat is just the ratio of the reduced form estimator for gamma one and the OLS first stage estimator for delta one.
00:38:53.000 --> 00:39:02.000
So we can derive the two stage least squares estimator also as the ratio of two ordinary least squares estimator.
00:39:02.000 --> 00:39:19.000
We actually do not need to go through the two stages, but rather we can simply run two OLS regressions very easily and then divide the two estimated coefficients by each other and then we also have the two stage least squares estimator.
00:39:19.000 --> 00:39:28.000
Now the question is, of course, which standard errors apply in the case of IV or two stage least squares.
00:39:29.000 --> 00:39:39.000
We need these standard errors because typically we would like to do hypothesis testing on the estimated coefficients.
00:39:39.000 --> 00:39:52.000
We know that the standard errors can be estimated by looking at the diagonal of the estimated covariance matrix of beta two
00:39:52.000 --> 00:40:12.000
stage of beta hat in the two stage least squares estimator, which I call V of beta two SLS. Actually, we do not know the variance of beta hat in the two stage least squares case, but this variance would be the true standard errors.
00:40:12.000 --> 00:40:35.000
What we have, however, is just the estimated covariance matrix or that's the best we can get the estimated covariance matrix of some V hat of beta hat two stage least squares that we may have and that's we may use as a basis for deriving the standard errors which would then be the square roots of the diagonal of this estimated matrix.
00:40:35.000 --> 00:40:46.000
But here I've written down the true matrix because I had not written the hat here and then actually the sentence is true in the way it is stated.
00:40:46.000 --> 00:40:59.000
The true standard errors are the square roots of the true matrix and the estimated standard errors are the square roots of the diagonal of the estimated covariance matrix.
00:40:59.000 --> 00:41:27.000
Now, since the two stage least squares estimates are the same as the IV estimates, obviously the true variance of the two stage least squares coefficient is the same thing as the true variance of the instrumental variables estimation so V of beta hat two stage least squares is the same as V of beta hat instrumental variables.
00:41:28.000 --> 00:41:40.000
For this covariance matrix V of beta hat IV we had already derived large sample approximation in the review of basic econometrics.
00:41:40.000 --> 00:41:58.000
So now we know that for large sample sizes, so n going to infinity, the covariance matrix of beta hat in the two stage least squares approach is the same thing as the covariance matrix of beta hat in the IV case.
00:41:58.000 --> 00:42:19.000
And it is then approximately equal to sigma square u times this matrix here, which is z prime x inverse z prime z x prime z inverse. So this approximate equality here holds for n sufficiently large sample size sufficiently large.
00:42:19.000 --> 00:42:44.000
Unfortunately, the sigma square u is unknown, so we will have to replace it by an estimate and then we usually use standard estimate sigma u hat squared, which would be one over n minus k so sample size minus the number of regressors so we just divide by the number of freedoms times the sum of the squared residuals u hat prime u hat.
00:42:44.000 --> 00:42:59.000
And given them this estimate of sigma square u we would obtain the standard errors of the IV estimates for and also the standard standard errors of the two stage least squares estimates.
00:42:59.000 --> 00:43:17.000
Now what is important is that this expression here which we have used to derive the standard errors of the two stage least squares estimate is different from the formula which we apply in the OLS case for computing standard errors.
00:43:17.000 --> 00:43:38.000
Recall in the OLS case, when we use OLS as a method of estimation and when the assumption of a Gauss Markov are satisfied, then the true covariance matrix is actually sigma square u times x prime x inverse z and that's it.
00:43:38.000 --> 00:44:01.000
There is no z in there, right? This would just be x prime x inverse. It would be this expression if z is replaced everywhere by x, right? Then z prime z here and this would be x prime x and this would be x prime x inverse so this disappears and would just be left with x prime x inverse here.
00:44:01.000 --> 00:44:14.000
So in the case of no instruments, we would have a different matrix here than in the case of using z to instrument all of the regressors.
00:44:14.000 --> 00:44:29.000
And the problem now is that when we do two stage least squares estimation in two steps, the second step we estimate regression equation by the method of OLS.
00:44:29.000 --> 00:44:42.000
So the computer will compute the OLS estimate of the covariance matrix and therefore will give us standard errors which are incorrect.
00:44:42.000 --> 00:44:51.000
Which would be the case if we do two stage least squares really in two stages which two stage least squares is made up of.
00:44:51.000 --> 00:45:06.000
So standard computer routines will return the wrong estimates of the standard errors in the second stage if we compute the two stage least squares regressors in the two stages in two steps, right?
00:45:06.000 --> 00:45:25.000
Which you can see very easily because suppose that we run general first stage regression where we regress the regressor matrix x on the instrument matrix z estimating some coefficient of delta and having some error term of v.
00:45:25.000 --> 00:45:43.000
This would give us estimates of x, x, so what is called the projection of x on z, x hat would be z times delta hat and delta hat is of course z prime z inverse z prime x.
00:45:43.000 --> 00:46:00.000
So you recognize immediately that this term here, z times z prime z inverse z prime is the same thing as z times the dual Penrose inverse z plus, so this would be z z plus x.
00:46:00.000 --> 00:46:15.000
And in the second stage regression we would project then y on the projection of x on z, so we would run an OLS regression where y that should actually be capital.
00:46:15.000 --> 00:46:36.000
No, that's, that's fine this is, this is the sampled y, this would be the random variable so this is why I have a different notation here, where we project y on x hat by means of a coefficient beta and error term w.
00:46:36.000 --> 00:46:57.000
And then we would have y is x beta plus u but x is obviously equal to x hat plus v hat. This would be x would be multiplied by beta plus u. So we have x hat beta plus an error term v hat beta plus u which is actually the w.
00:46:57.000 --> 00:47:00.000
Right.
00:47:00.000 --> 00:47:26.000
So, the v hat, as you can reproduce from this equation here is x minus x hat, and this is x minus z z plus x. So it is i minus z z plus times x. And then this is the well known matrix mz the idempotent matrix which we have already encountered many cases, which has been multiplied by x.
00:47:26.000 --> 00:47:36.000
So these are the residuals, which we get in the second stage of the two stage least squared approach.
00:47:36.000 --> 00:47:50.000
Therefore, the estimate we obtain beta hat in the two stage least squares approach is x hat prime x hat inverse the x hat prime y.
00:47:50.000 --> 00:48:07.000
And for this, we can then rewrite this term of the x matrices rather than in terms of the x hat matrices, using the fact that x hat is z z plus x.
00:48:07.000 --> 00:48:21.000
So we get this term z z plus z z plus here, you know from the properties of the Wu Penrose inverse that z plus z z plus is the same thing as just z plus.
00:48:21.000 --> 00:48:35.000
And this is multiplied by x prime z z plus y. So we can decompose the z plus here to get x prime z z prime z inverse z prime x and the inverse of that.
00:48:35.000 --> 00:48:46.000
And here we also decompose the z plus and write it out. That's x prime z, and then z plus z prime z inverse z prime times y.
00:48:46.000 --> 00:48:56.000
So that's the estimate of the two stage least squares estimate, which we would get in the second state regression.
00:48:56.000 --> 00:49:07.000
And here we have to be careful that if we run this regression regression with computer software, then it would not recognize that we are using the instrument z in here.
00:49:07.000 --> 00:49:22.000
So computer software, which is, which we do not tell that we do is to stage least squares regression would compute the wrong estimate of the standard.
00:49:22.000 --> 00:49:43.000
So here a little exercise here if you use our assumptions, a nine through a 11, then try to verify that the variance covariance matrix of beta hat and two stage squares approach is equal to sigma square u times this expression here.
00:49:43.000 --> 00:50:01.000
So, it is equal to the result we obtained for the IB estimator, which is of course, which must be the result because we know that the IB estimator at the two stage least squares estimate are identical.
00:50:01.000 --> 00:50:11.000
Yeah, as I say the standard computer output will compute the standard errors of the second stage regression using the normal over the s type formula.
00:50:11.000 --> 00:50:24.000
So, for equation 12, which was this equation here which we would estimate by our s right for equation 12 in the second stage.
00:50:24.000 --> 00:50:36.000
We would get as an estimate by the computer routine, the head prime the head over n minus k times x hat prime x hat in mercy.
00:50:36.000 --> 00:50:48.000
And if you compute this, then you can again substitute up the x hats by the according expression made up of x's and z's that we have already had.
00:50:48.000 --> 00:50:59.000
So that would be equal to x prime z z plus x inverse e multiplied by the estimate of the variance of the values.
00:50:59.000 --> 00:51:20.000
Now you see that this matrix here is actually the same matrix as the matrix which we would get in the two stage least squares case, which would be the one we calculated in this exercise here
00:51:20.000 --> 00:51:45.000
would be the same matrix as this one, but the difference is whether we use this estimate of a scaling factor. So the variance of or the estimated variance of the V of the W resilient, or whether we use the estimated variance of the U residual residuals.
00:51:45.000 --> 00:51:59.000
So, what is the difference between this variance here and that variance here. We know that W is equal to V hat times beta plus you.
00:51:59.000 --> 00:52:10.000
And therefore, the variance of W is equal to beta squared times the variance of the V. So this accounts for this part here.
00:52:10.000 --> 00:52:15.000
Plus the variance of you, which is this one here.
00:52:15.000 --> 00:52:19.000
And then of course plus the covariance here.
00:52:19.000 --> 00:52:31.000
And therefore we see that sigma squared W is in general different from sigma squared U.
00:52:31.000 --> 00:52:42.000
Okay, and you may also show in an exercise that the covariance between VI and Ui is different from zero which is however trivial.
00:52:42.000 --> 00:52:53.000
Now this said that a standard computer routines would give you in the second stage of the two stage least squares estimate the wrong standard errors.
00:52:53.000 --> 00:53:05.000
It is of course a very familiar problem and therefore most computer packages have their own two stage least squares routines, for instance, data does have this.
00:53:05.000 --> 00:53:23.000
And therefore you can use specific commands to compute the correct standard errors in the two stage least squares setting. So state computer software will readily give you the correct standard errors when you use the commands which are pre-programmed
00:53:23.000 --> 00:53:28.000
for this two stage least squares estimation. And there's no problem.
00:53:28.000 --> 00:53:45.000
But you have to use those commands. If you perhaps are not so familiar with a computer package like Stata and you don't really know what was the command for two stage least squares and you think well I can also do
00:53:45.000 --> 00:53:56.000
two stage least squares quickly by just running the two stages with OLS estimates, then you run into the problem that in the second stage the standard errors will be.
00:53:56.000 --> 00:54:02.000
And you should be aware of that.
00:54:02.000 --> 00:54:10.000
Now as I have already pointed out the Z matrix is a matrix so it can include more than one instrument.
00:54:10.000 --> 00:54:24.000
I have structured my examples here always just with one instrument but there is no conceptual problem in using more than one instrument.
00:54:24.000 --> 00:54:38.000
In fact, for the causal analysis that we aim at it is important to include all relevant controls in both the first and in the second stage regression.
00:54:38.000 --> 00:54:54.000
For only if we do this, we can make sure that any possible instrument Z, which is by definition not included in the first step regression satisfies what is called an exclusion restriction.
00:54:54.000 --> 00:55:07.000
Z as an instrument operates or takes effects on the dependent variable y just through the regular regressor so just through the relevant controls.
00:55:07.000 --> 00:55:19.000
And it is important that Z does not have any additional explanatory potential than the explanatory potential, which is captured in the relevant controls.
00:55:19.000 --> 00:55:45.000
But for this reason, of course, it is also important that we do really include all relevant controls in the first stage and in the second stage regression, because otherwise it would appear as if Z had an independent effect on y apart from the effect through x.
00:55:45.000 --> 00:55:59.000
Okay, yeah, our initial example of course Z just had an effect on y through d which is the most simple, which is the simplest case.
00:55:59.000 --> 00:56:17.000
If we have more than one endogenous regressor, then we need at least as many instruments as we have endogenous regressors and then to state least where estimation would be a possible way to go forward.
00:56:17.000 --> 00:56:26.000
Now let us return to the example of wage determination which I have already discussed at the beginning of this lecture again at which you know from previous lectures.
00:56:26.000 --> 00:56:40.000
So, a setup is well known wage depends on education and ability we have the same assumptions as always correlation is positive between education and the unobserved variable ability.
00:56:40.000 --> 00:56:50.000
So, let's say we estimate wage is equal to constant plus beta one education plus plus V.
00:56:50.000 --> 00:57:07.000
Then we know estimating this equation by OLS would give us wrong estimates of beta they wouldn't be unbiased they wouldn't be consistent because we have the correlation between education and V, which includes ability.
00:57:08.000 --> 00:57:20.000
Since we have positive correlation between education and wage and positive correlation between education and ability. We also know from our analysis of the
00:57:20.000 --> 00:57:35.000
omitted variables problem that the estimate of beta one, so the OLS estimate of beta one which does not take into account this correlation between education and V would overstate true importance of education.
00:57:37.000 --> 00:57:47.000
So, there is a real world estimation which we look at now which is taken from Angrist and Biskit.
00:57:47.000 --> 00:57:57.000
So they had a specific set of real world data and they were estimating just for instructional purposes just to show students what happens.
00:57:57.000 --> 00:58:07.000
This type of regression by OLS. Here is the log that they took the log of wages, which is usually done to scale the variables properly.
00:58:07.000 --> 00:58:33.000
The log of wages estimated or run as regression on education at a concert gives the estimate for the log of wages as being equal to a coefficient point 185 or negative point 185 plus then and this is the coefficient of interest point 109 times education.
00:58:33.000 --> 00:58:54.000
They have 428 individuals here which they include in this regression, and the R squared is about 12%, which may seem low to you but this is actually not so much what is relevant for this regression in
00:58:54.000 --> 00:59:02.000
cross section data, you may very well end up with no R squares even though nothing is wrong with the regression.
00:59:02.000 --> 00:59:12.000
Now, we're not in the interactive part but perhaps you can just answer my question by means of the chat.
00:59:12.000 --> 00:59:27.000
Can any of you perhaps tell me what do we learn from the, what, what do we seem to learn from the coefficient estimates and the estimated standard errors which you find down here in parentheses.
00:59:27.000 --> 00:59:35.000
What does this education, what does this regression equation, tell us or what does it seem to indicate.
00:59:35.000 --> 00:59:53.000
And who would please like to give an answer to that.
00:59:53.000 --> 01:00:05.000
Who would like to answer this question please raise your hand and then type your answer, I will wait until you're done.
01:00:05.000 --> 01:00:11.000
But nobody wants to ask this because answer the question then there is no point in waiting.
01:00:11.000 --> 01:00:30.000
So I can answer the question what can we see or what do we seem to see from this regression here.
01:00:30.000 --> 01:00:45.000
So I mean, this would be an example which are yeah an example of a question which you may encounter in a written exam or in an oral exam, actually, and you should be able to answer it right so what we see here is we have a coefficient estimate for the
01:00:45.000 --> 01:00:57.000
constant of negative point 185, which has coincidentally, the same number as the standard error so it's very easy to see that the t statistic on the constant is just one.
01:00:57.000 --> 01:01:01.000
Right. And therefore the constant is apparently not significant.
01:01:01.000 --> 01:01:12.000
We cannot make much out of this result but just to get some practice in that in it when the coefficient isn't at least twice as big as the standard error than it is insignificant.
01:01:12.000 --> 01:01:17.000
So the constant, we can safely say is zero.
01:01:18.000 --> 01:01:28.000
How about the influence of education on wages, where we see this is a positive influence or it is estimated as being positive.
01:01:28.000 --> 01:01:34.000
And you see that the coefficient estimate is much larger than the standard error.
01:01:34.000 --> 01:01:49.000
So we can say roughly eight times as large right point one one is eight times as large as point 014 approximately. So we have a t statistic of approximately eight and eight is highly significant.
01:01:49.000 --> 01:02:09.000
If you come up to, I don't know, your head of research and tell him proudly you've estimated this regression equation here and you find out a significant influence of education on wages, then you may expect, you know, that that he praises your scientific
01:02:09.000 --> 01:02:13.000
education and how well you did your job.
01:02:13.000 --> 01:02:21.000
In fact, you didn't do your job very well as you will see. Right. So this result here is not truly reliable.
01:02:21.000 --> 01:02:30.000
Because, as we have discussed now many times education may be correlated with the unobserved variable ability.
01:02:30.000 --> 01:02:36.000
So what may be an appropriate instrument for this regressor education.
01:02:36.000 --> 01:02:51.000
We thought about that already at the beginning of this lecture, it must be highly correlated with education, and it should have been determined before ability was determined to make sure it doesn't correlate with ability.
01:02:51.000 --> 01:03:01.000
And therefore, we may hope that it's uncorrelated with the error term user with the unexplained part of the log of wages.
01:03:02.000 --> 01:03:16.000
As I already said, we will think of or we will use the education of the father as an instrument for education, which we hope is uncorrelated with ability.
01:03:16.000 --> 01:03:30.000
This instrument as I already pointed out, has the, the advantages of being observable of course a necessary condition for an instrument. It is typically highly correlated with the education of the children.
01:03:30.000 --> 01:03:40.000
So, this relevance criterion is probably satisfied education of the father is highly correlated with education of the children.
01:03:40.000 --> 01:03:54.000
And it certainly precedes the determination of the children's ability if we assume that the father went through his education before he fathered children.
01:03:54.000 --> 01:04:09.000
However, if ability is not purely genetically determined in some type of purpose random process, but if ability is also a result of social circumstances.
01:04:09.000 --> 01:04:30.000
So if ability of a person is also formed, say in childhood education, then it may actually be that the father's education correlates with the child's ability and the father may in his parental education within the family have had a positive influence,
01:04:30.000 --> 01:04:36.000
or sometimes also negative influence on the child's ability.
01:04:36.000 --> 01:04:52.000
This obviously depends on how active father takes part in raising the children, and in times in which the men did not engage as much in the raising of the children as the women did.
01:04:52.000 --> 01:05:04.000
It may be that we can argue the father's education doesn't play such a big role in the ability of the children, perhaps this is more passed on by the mother.
01:05:04.000 --> 01:05:16.000
So with older data if you say have data from the 1950s or 1960s when fathers perhaps didn't take as much part in the education of the children, as did the mothers.
01:05:16.000 --> 01:05:33.000
The assumption that the education of the father does not correlate with the ability of the children and probably be easier defended than today where fathers typically take more part in raising children.
01:05:33.000 --> 01:05:53.000
So it is not absolutely beyond criticism to use such an instrument as the education of the father, even though, most likely the education of the father was completed before the ability of the offspring was determined.
01:05:53.000 --> 01:06:13.000
It is also not clear whether the education of the father is uncorrelated with you for different reasons, because if the children work in a similar profession as the father has, then it may well be that the father has had some valuable information on where to find good
01:06:13.000 --> 01:06:26.000
and well paying jobs, so that perhaps this case, the father's education has some impact on the wages the children are.
01:06:26.000 --> 01:06:37.000
And that may actually be something which was more common in the old times than today that the children took up the same type of job as the father did.
01:06:37.000 --> 01:06:53.000
So perhaps that would be a counter argument against using this instrument, specifically if we use old data here in terms of the father's influence by means of education on the ability of the children.
01:06:53.000 --> 01:07:13.000
Father's education as an instrument for children's education is perhaps easier, more easily justified with old data here in terms of this argument as whether the father and children have the same type of education, the same type of jobs.
01:07:13.000 --> 01:07:22.000
It is perhaps just the other way around. So it is in general difficult to find instruments which are beyond any doubt.
01:07:22.000 --> 01:07:26.000
There was a question coming in.
01:07:26.000 --> 01:07:44.000
How about taking the test scores at high school as instrumental variable for ability. It is estimated way before person applies to work. Yes, that is true. If we have information on test scores at high school, then that is possibly a good proxy.
01:07:44.000 --> 01:07:55.000
And we discussed the case of proxy variables and actually we discussed the case of a measured IQ as proxy variable in the last week, as you may recall.
01:07:55.000 --> 01:08:09.000
In this particular example, which I discussed now, I just assume that we lack information on test scores at high school. I mean, this always depends on the data set which you have at your disposal.
01:08:09.000 --> 01:08:25.000
You may have data on education of 800 people and on their wages, but perhaps you lack the data on their IQ scores at high school. And if this is the case, that is the case which I am discussing, then we have to look for instruments.
01:08:25.000 --> 01:08:34.000
If we do have a good proxy variable for the ability, then there are other ways around as we have already discussed.
01:08:34.000 --> 01:08:47.000
All right. So, now, in this setup now we use the father's education despite possible objections as an instrument for education.
01:08:47.000 --> 01:08:52.000
And the first stage regression would now tell us the following.
01:08:53.000 --> 01:09:01.000
Education of the offspring, children, is regressed on father's education.
01:09:01.000 --> 01:09:07.000
What we see here is we estimate a significant constant, very highly significant.
01:09:07.000 --> 01:09:12.000
Standard error is just 0.28 and the estimate is 10 point something.
01:09:12.000 --> 01:09:31.000
A highly significant negative estimate of the constant. And then what is important, the constant is actually not important. We have an estimate of 0.27 for the education of the father, and the standard error is 0.03.
01:09:31.000 --> 01:09:55.000
So, we have approximately a t-statistic of nine here, which is very, very high, right. So, clearly, the education of the children is very strongly correlated with the education of the father and it is positively correlated as we have assumed.
01:09:55.000 --> 01:10:07.000
So, this is now the second stage regression or the IV regression, so either second stage of the two stage least squares approach, or it is the full IV approach what we do here.
01:10:07.000 --> 01:10:18.000
We regress the log of wages on a constant and on the education using the education of the father as an instrument.
01:10:18.000 --> 01:10:25.000
We see that the coefficient for the education is 0.06.
01:10:25.000 --> 01:10:32.000
And the standard error is 0.035. So essentially 0.04.
01:10:32.000 --> 01:10:47.000
But leave it with 35. In any case, the coefficient estimate is less than two times the standard error, which means that education is not significant anymore.
01:10:47.000 --> 01:11:07.000
So, despite of the fact that in the first naive OLS regression, we had a t-statistic of eight, which seemed to be beyond any doubt, which seemed to prove beyond any doubt that education has a positive impact on wages.
01:11:07.000 --> 01:11:20.000
And despite of the fact that father's education is strongly with a t-statistic of about nine strongly correlated with the education of the children.
01:11:20.000 --> 01:11:32.000
In the IV estimation, or in the two stage least squares estimation, we do not find a significant influence of education on the log of wages anymore.
01:11:32.000 --> 01:11:44.000
This is quite disappointing. This is quite disappointing because actually is very plausible that education has a positive impact on the log of wages.
01:11:44.000 --> 01:11:57.000
And actually we do have the point estimate being still positive, but we estimate it very, very precisely only with a great standard error in the IV or two stage least squares approach.
01:11:57.000 --> 01:12:12.000
And that is something you should be aware of, even in such a simple setting where it seems actually clear even before we start estimating that there is a positive influence of education on the log of wages.
01:12:12.000 --> 01:12:25.000
We are unable to verify this statistically, because the standard error is relatively large as compared to the estimated coefficient.
01:12:25.000 --> 01:12:44.000
And this is unfortunately often encountered in econometrics where we use instrumental variables techniques or two stage least squares techniques, which in many cases is just the same thing, then we then find that standard errors become bigger
01:12:44.000 --> 01:12:53.000
and coefficient estimates are not significant anymore in many cases.
01:12:53.000 --> 01:13:08.000
However, I mean, we are good statisticians, we have to take the fact as they are, we know the OLS standard errors or the standard errors which would be computed on the basis of using just the OLS formula would be too small.
01:13:08.000 --> 01:13:19.000
That's an unbiased estimate of or at least a consistent estimate of the true standard error here and we have produced just this result as the influence of education on the log of wages.
01:13:19.000 --> 01:13:30.000
So we have to take these results at face value and say it is not beyond doubt to hypothesize that education has a positive impact on wages.
01:13:30.000 --> 01:13:42.000
Perhaps there are other things which we also need to take into account where we want to explain the development of wages, perhaps education is not as important as everybody seems to think.
01:13:42.000 --> 01:14:03.000
That may be disappointed for those who spend a lot of time or money and money on education, but at least for this particular sample and with the techniques that I illustrated here, it is not beyond doubt that education has no effect on wages.
01:14:03.000 --> 01:14:16.000
No, it's not beyond doubt that education has a positive effect on wages and may actually be the case that it has just no effect on wages with some probability.
01:14:16.000 --> 01:14:17.000
Okay.
01:14:17.000 --> 01:14:26.000
Further, oh yes, I should perhaps also mention that look just at the size of the coefficient. It's 0.06.
01:14:26.000 --> 01:14:31.000
In the first regression we had 0.11.
01:14:31.000 --> 01:14:47.000
So that's almost twice as much as we so here we have estimated an influence of education on wages, which is almost twice the size, as we have estimated in this case here.
01:14:47.000 --> 01:15:00.000
We know that in the omitted variables problem, we have a positive bias to the coefficient estimated by means of OLS.
01:15:00.000 --> 01:15:13.000
So actually, this confirms our theoretical analysis, what we find here, but it also comes as a warning to take the OLS coefficient at face value, because
01:15:13.000 --> 01:15:30.000
at least in terms of point estimates, we can say that it seems to be the case that education is not as strong an impact or a strong an influence on the log of wages as the OLS estimate has suggested.
01:15:30.000 --> 01:15:40.000
Actually, it may just have half the influence that the OLS estimate seems to have suggested.
01:15:40.000 --> 01:15:49.000
Yes. So this is what I have already said, if I see this correctly. Yes.
01:15:49.000 --> 01:15:52.000
This we just leave out.
01:15:52.000 --> 01:15:57.000
Yeah, in general, it's difficult to find that.
01:15:58.000 --> 01:16:12.000
One problem, which gives rise to rather great standard errors in instrumental variables approaches is precisely the fact that we may work with weak instruments.
01:16:12.000 --> 01:16:31.000
If an instrument is weak in a sense which we will try to give more precise meaning in a minute, then the likelihood that the standard errors are great and therefore the coefficient estimates not significant, this likelihood would be rather big.
01:16:32.000 --> 01:16:46.000
So, we have to deal with the question what quality does our instrument actually have and I gave you already the rule of thumb, make sure that the F statistic is greater than 10.
01:16:46.000 --> 01:17:02.000
Actually, even an F statistic greater than 10 is not necessarily a guarantee that the estimate is of good quality but you can be sure that the instrument is of bad quality is a weak instrument.
01:17:02.000 --> 01:17:20.000
The F statistic is much below 10. So you should not use instruments which are weak in this sense, because actually the quality of the estimates can be worse in this case, then even the biased and inconsistent OLS estimates.
01:17:20.000 --> 01:17:32.000
So, weak instruments, we call instruments whose correlation with the regressor, they are supposed to replace as opposed to instrument is low.
01:17:32.000 --> 01:17:44.000
Now the question is what exactly is low, as I said, the rule of thumb is that the F statistic of the instrument to exceed 10 in the first stage regression.
01:17:44.000 --> 01:18:07.000
If we however have instruments which do not satisfy this and are weak in this sense, then the small violation of the exogeneity assumption of the instruments leads to an exploding bias of the estimate that actually disqualifies the estimates.
01:18:07.000 --> 01:18:24.000
This is rather technical, but let's try to understand what happens in our example of equation two, where we started out with this simple regression here why is regressed on a constant and on a treatment variable only.
01:18:24.000 --> 01:18:34.000
And we have the problem that the treatment variable is endogenous in the sense that it correlates for at least one individual with the error term.
01:18:34.000 --> 01:18:50.000
Now, we focus again on the beta one and we know what the OLS estimate of beta one is. This is a ratio of the estimated covariances and variances of y and d in the familiar form.
01:18:50.000 --> 01:19:13.000
We can now replace the y by d beta one plus u because the constant doesn't play a role when computing the covariance. So it's just the d beta one plus u or beta one d plus u, which we have to plug in here for the y variable.
01:19:13.000 --> 01:19:23.000
So we have an expression just contingent on these. We can multiply this out. This gives us then beta one times the variance of d.
01:19:23.000 --> 01:19:36.000
Right. Beta one here times d. Well, perhaps d times d is d squared so that would be the variance of d and here we would have the covariance, the estimated covariance of u and d divided by the variance of d.
01:19:36.000 --> 01:19:57.000
So you see that the variance of d here cancels and we get the true coefficient beta one multiplied by this term here, which I have now written in a somewhat extended form.
01:19:57.000 --> 01:20:22.000
You see the covariance, the estimated covariance here. Same thing as this one. Then I have split up the variance into its square root here and there and I have multiplied denominator, denominator and numerator, the square root of the variance of u so that here I get actually the correlation
01:20:22.000 --> 01:20:31.000
between u and d times the ratio of two standard deviations.
01:20:31.000 --> 01:20:51.000
So we see that the O and S estimate beta one hat is equal to the true value beta plus the estimated correlation between error term and the causal variable, the regressor. So that's the problem.
01:20:51.000 --> 01:21:03.000
This correlation here, which biases our estimates times the ratio of the standard deviations of u and d.
01:21:03.000 --> 01:21:19.000
If we take this to infinity, then of course we get that the plim of beta one hat is equal to the true value beta one plus the true ratio of the standard deviations of u and d times the plim of the correlation of u and d.
01:21:19.000 --> 01:21:39.000
So we can do this. So basically what I aim at is saying, well, this here is the asymptotic bias. This shows us how much even with great sample sizes, the estimate, the OLS estimate of beta one is still biased, still biased away from the true value.
01:21:39.000 --> 01:21:59.000
Now we may look at whether things become better under IV estimation. We know the formula for the IV estimation, which here would be the covariance of Z and Y divided by the covariance of Z and D, both covariances of course estimated, then same procedure as we have already done.
01:21:59.000 --> 01:22:28.000
We would replace the Y here by beta one D plus u and then multiply out. So we would get beta one times the estimated covariance of Z and D from Z and beta one D here plus the covariance, the estimated covariance of Z and U, which is Z and U here divided by the covariance, the estimated covariance of Z and D.
01:22:29.000 --> 01:22:47.000
Then again, this covariance here cancels against that covariance there so that we arrive at beta one plus something plus some bias and this bias is the ratio of the covariances, estimated covariances of Z and U and Z and D.
01:22:48.000 --> 01:23:07.000
And these we can very easily transform into ratios of correlations this way here by dividing through with the standard errors of U and D in appropriate form. So we get the ratio of the correlations here and multiplied by standard
01:23:08.000 --> 01:23:25.000
errors or estimated standard errors for U and D so that the plim of the IV estimate of beta one is equal to beta one times sigma U over sigma D times the plim of the ratio of the two correlations here.
01:23:26.000 --> 01:23:43.000
Now what I want to do here is that I compare which estimate is actually better and which is worse in particular situations. This is the simple OLS estimate of which we know it is asymptotically biased.
01:23:44.000 --> 01:23:59.000
This here is the IV estimate of which we know it is asymptotically biased if there's nonzero correlation between Z and U, which of course should not be the case.
01:24:00.000 --> 01:24:20.000
The idea of the IV estimate is to have an instrument Z which does not correlate with U. So if this term here is zero, then obviously this whole plim here is zero and the whole bias term is zero and the IV estimate is consistent.
01:24:21.000 --> 01:24:38.000
However, if the Z is some type of imperfect instrument, so if some correlation is remaining between Z and U, then we may have a problem here if the correlation between Z and D is low.
01:24:39.000 --> 01:24:49.000
Because if the correlation between Z and D is low, then this here is close to zero. So this would be the case of a weak instrument.
01:24:49.000 --> 01:25:13.000
The relevance criterion would be not very well met. So we would have here a value which is rather low and dividing the correlation, perhaps just a weak correlation which remains between Z and U, dividing something which perhaps is small here by something which is zero lets this thing explode.
01:25:14.000 --> 01:25:31.000
I mean dividing by zero is always a disaster. So even if the remaining correlation between Z and U is small, we may have a big problem if the instrument is weak and the correlation between Z and D is actually not very strong.
01:25:32.000 --> 01:25:48.000
Because then we would have a bias term here and it may well be that the bias of the instrumental variables estimator, the asymptotic bias of the instrumental variables estimator is then even worse than the bias of the OLS estimator.
01:25:49.000 --> 01:26:01.000
It depends on what the initial correlation between U and D is as compared to the ratio of two correlations here where one correlation perhaps is close to zero.
01:26:02.000 --> 01:26:16.000
Okay, that's what I write on this slide here. We compare the two terms here and come to the conclusion that there is a big problem in the case of weak instruments.
01:26:21.000 --> 01:26:22.000
Yeah.
01:26:23.000 --> 01:26:48.000
Yeah. Now, that's again the same point. I've made that already just a numerical example here. If the correlation between Z and U is small, just let's say 0.03 so we have only very little remaining endogeneity of the instrument, but a little bit is there.
01:26:48.000 --> 01:27:12.000
But we have a correlation between D and U of 0.1, so a higher correlation of the endogenous regressor D with the error term U than of the instrument Z with the error term U so that we may think, well, perhaps we better take Z as an instrument rather than using OLS where D would be an instrument for itself.
01:27:13.000 --> 01:27:26.000
Then let's study what happens for different values of the correlation between Z and D. If the correlation between Z and D is equal to 0.1,
01:27:26.000 --> 01:27:44.000
then the asymptotic bias of the IV-SD-meter depends on the ratio 0.03 divided by 0.1 is 0.3 and this is much greater than actually the correlation of D and U on which the bias of the OLS-SD-meter hinges.
01:27:45.000 --> 01:28:05.000
If the correlation between Z and D is 0.2, then same thing applies. We divide 0.03 by 0.2, which is 0.15, which is still greater than the correlation between D and U and therefore still the OLS-estimator would be the estimator with a smaller asymptotic bias.
01:28:06.000 --> 01:28:31.000
Only if the correlation between Z and D is substantial in this example, say 0.6, then we would have 0.03 divided by 0.6, which is 0.05, and that would be smaller than the corresponding correlation in the expression for the bias of the OLS-estimator. So in this case, the instrumental variables estimator would be the preferred estimator.
01:28:32.000 --> 01:28:53.000
For your own research question, please think about potential instruments in the way in which we have done this in the wage and education example and discuss for yourself the relevance and the validity of the assumptions for this type of instruments.
01:28:53.000 --> 01:29:19.000
That's it for today. Sorry, I have used a little more time than actually was intended, but I wanted to conclude this here. Are there any questions now or is there any wish to go to interaction? Please raise your hand if you either want to pose a question or if you want to go into interactive mode.
01:29:23.000 --> 01:29:35.000
It seems that this is not the case. Then I will stop the recording here and say bye-bye until Thursday afternoon.