WEBVTT - autoGenerated
00:00:00.000 --> 00:00:08.000
Good. So before I move to the actual lecture, let me go to the question which came in over
00:00:08.000 --> 00:00:15.000
that. The first question is, are there any limitations in interpreting causality from
00:00:15.000 --> 00:00:22.000
a model that incorporates lagged variables? Now, I understand your question, but obviously
00:00:22.000 --> 00:00:29.000
the question is not quite as precise as it must be to give a precise answer. When you
00:00:29.000 --> 00:00:36.000
say the model incorporates lagged variables, you probably mean that among the regressors,
00:00:36.000 --> 00:00:43.000
there are variables which do not have the same time index as the dependent variable has,
00:00:43.000 --> 00:00:48.000
and that is in general possible. There's no problem with it. Actually, it may be that
00:00:49.000 --> 00:01:00.000
lagged variables do indeed give you a better explanation of the conditioning variables of
00:01:00.000 --> 00:01:09.000
this model. So for instance, if you look at the income effects of schooling, then you would
00:01:09.000 --> 00:01:18.000
typically have as the dependent variable the income, which the salary, which people earn,
00:01:18.000 --> 00:01:28.000
and it may be very reasonable to explain this salary partially by a variable which indicates
00:01:29.000 --> 00:01:36.000
schooling, perhaps in different levels also, and the schooling obviously occurred prior to
00:01:36.000 --> 00:01:41.000
the salary being earned. So these are, in that sense, lagged variables.
00:01:43.000 --> 00:01:49.000
Perhaps, however, you have something different in mind and have in mind that you have a
00:01:51.000 --> 00:01:59.000
dependent variable and you include the lagged dependent variable in the covariance of the
00:01:59.000 --> 00:02:07.000
regression. That can in some settings also make sense, but in both cases,
00:02:07.000 --> 00:02:14.000
whether it is a lagged regressor or the regressor by itself is lagged by a number of time periods,
00:02:14.000 --> 00:02:18.000
or whether the regressor is the lagged dependent variable, what you always have to make sure is
00:02:18.000 --> 00:02:26.000
that there is no correlation between the lagged endogenous variable and the error term of the
00:02:26.000 --> 00:02:33.000
equation. So that's the normal property you need to have satisfied in order to make sure
00:02:33.000 --> 00:02:39.000
that the estimate is consistent. I hope this answers question one. Question two,
00:02:39.000 --> 00:02:44.000
does including a lagged variable create a linear trend? Answer, no, it does not.
00:02:46.000 --> 00:02:54.000
Unless you really have a time series model in mind where you would regress a dependent variable
00:02:54.000 --> 00:03:02.000
on its own path, so on its own lagged values, and it happens to be the case that the coefficient
00:03:02.000 --> 00:03:10.000
on the lagged variable is one, exactly one, and you have a constant in the regression. Then you
00:03:10.000 --> 00:03:15.000
have a simple time series model, actually, and this would create something like a linear trend,
00:03:15.000 --> 00:03:23.000
namely a drift, stochastic drift, but this is not usually the case in causality analysis.
00:03:26.000 --> 00:03:30.000
Yeah, you give an example of GDP. I think this was what you have in mind with this question.
00:03:30.000 --> 00:03:36.000
Third question, is the use of lagged dependent variables a method to get rid, to read the model
00:03:36.000 --> 00:03:42.000
of autocorrelation? Yes, in a time series context that is the case, and you're obviously currently
00:03:42.000 --> 00:03:50.000
posing your questions from a time series context or with a time series model in mind. And yes,
00:03:50.000 --> 00:03:54.000
it's true. If you have the lagged dependent variables among the regressors, then you typically
00:03:54.000 --> 00:03:59.000
capture parts or all of the autocorrelation. But that is not really the setting we are dealing
00:03:59.000 --> 00:04:04.000
with because we usually have cross-section data here and not time series data.
00:04:04.000 --> 00:04:13.000
So this was my answer to the three questions. I didn't see any other indication that you would
00:04:13.000 --> 00:04:19.000
like to go into interactive mode, and I didn't see any other questions. So let's come back to where
00:04:19.000 --> 00:04:27.000
we started last, where we stopped on Tuesday, which was regression analysis, regression analysis
00:04:27.000 --> 00:04:34.000
of experiments. And what I already introduced to you was what you see again on this slide here.
00:04:35.000 --> 00:04:42.000
We have the two potential outcomes, y1i and y0i. So the outcomes in the case of treatment or in the
00:04:42.000 --> 00:04:50.000
case of non-treatment for individual i. And we can say that the difference between the two is the
00:04:50.000 --> 00:04:57.000
causal effect of treatment. That's almost by definition. So we would say the causal effect
00:04:57.000 --> 00:05:03.000
of treatment is delta i. That would be the causal effect of treatment on individual i or observation
00:05:03.000 --> 00:05:09.000
i. Here we make the courageous assumption that the effect of treatment is the same for all
00:05:09.000 --> 00:05:16.000
individuals. So we drop the i here and say for all people, the difference between the two potential
00:05:16.000 --> 00:05:24.000
outcomes is the same, namely delta. And if this is the case, which is, of course, not evident that
00:05:24.000 --> 00:05:33.000
it is the case, then we could rewrite the equation which describes the relationship between the
00:05:33.000 --> 00:05:43.000
potential outcomes and the actual observations, yi in form one, where the observations yi would
00:05:43.000 --> 00:05:51.000
be equal to a constant alpha, plus the treatment effect delta, which is constant across individuals,
00:05:51.000 --> 00:05:57.000
times the indicator variable for treatment. So di is one in the case of treatment or zero in the
00:05:57.000 --> 00:06:03.000
case of non-treatment, plus an error term. And the error term would be the difference between
00:06:04.000 --> 00:06:13.000
the potential outcome y0i and the unconditional expectation of y0i. Because here I
00:06:13.000 --> 00:06:22.000
write alpha as the unconditional expectation. So basically, as the mean effect or the mean of all
00:06:22.000 --> 00:06:27.000
the potential outcomes in the case of non-treatment, that would be alpha. And the difference between
00:06:27.000 --> 00:06:33.000
actual potential outcome in the case of non-treatment and the mean in the case of non-treatment would
00:06:33.000 --> 00:06:40.000
be more or less the individual effect. That's then the error term ui, which we ignore. So which would
00:06:40.000 --> 00:06:46.000
be an error term, which we can estimate in our regression equation, but we have no observations
00:06:46.000 --> 00:06:56.000
on that. So ui is the random effect. And clearly now this equation here is exactly the same equation
00:06:56.000 --> 00:07:02.000
as this one here, even though it is written in a completely different form. And it should be
00:07:02.000 --> 00:07:10.000
obvious to you that equation one seems to lend itself to estimation by the least squares estimator,
00:07:10.000 --> 00:07:17.000
because we just would have to regress the observations yi on a constant and on a dummy
00:07:17.000 --> 00:07:22.000
variable di, which indicates treatment. And that's it. Then we estimate the delta as a
00:07:22.000 --> 00:07:27.000
constant parameter for all the individuals. We estimate some constant as the expectation of
00:07:28.000 --> 00:07:32.000
a potential outcome in the case of non-treatment. And then we have some errors. All right.
00:07:37.000 --> 00:07:45.000
Now let us evaluate the conditional expectation of this equation for the two possible cases
00:07:45.000 --> 00:07:51.000
that either the individual has been subjected to treatment or that he has not been subjected to
00:07:51.000 --> 00:07:59.000
treatment. So I take the conditional expectation of equation one, conditional on individual one
00:07:59.000 --> 00:08:07.000
having received treatment. So then the expectation is of course, so this expectation is of course
00:08:07.000 --> 00:08:14.000
alpha plus delta, because di is now equal to one, and this is given, right? So the regressor was
00:08:14.000 --> 00:08:22.000
actually di times coefficient delta, di is equal to one. So plus delta plus the expectation of ui
00:08:23.000 --> 00:08:30.000
for those people which have received treatment. And in the case of non-treatment, we have the
00:08:30.000 --> 00:08:36.000
same thing. Essentially the expectation is the constant alpha plus zero, because di is equal to
00:08:36.000 --> 00:08:44.000
zero. So delta times zero is zero. So alpha plus the expectation of ui for those people who have
00:08:44.000 --> 00:08:52.000
not received treatment. So that for these two equations here, we can look at the difference
00:08:52.000 --> 00:08:58.000
between treatment and non-treatment. Just subtract the second equation from the first equation.
00:08:59.000 --> 00:09:05.000
What happens then is obviously that the alpha is eliminated. It cancels, right? When we take
00:09:05.000 --> 00:09:10.000
the difference between the second and the first equation, if we subtract the second equation from
00:09:10.000 --> 00:09:18.000
the first equation, and we are left with the expectation of yi conditional on having received
00:09:18.000 --> 00:09:24.000
treatment minus the expectation of yi conditional and not having received treatment is just
00:09:24.000 --> 00:09:32.000
treatment effect plus the difference between the two expectations of the individual error terms
00:09:33.000 --> 00:09:40.000
conditional on either having received treatment or having not received treatment. So this difference
00:09:40.000 --> 00:09:50.000
here is the selection bias, because this tells me what kind of error do we expect or what kind of
00:09:50.000 --> 00:09:58.000
difference between potential outcome and the mean potential outcome both in the case of non-treatment
00:09:58.000 --> 00:10:06.000
do we expect for the group of treated and what kind of error do we expect for the case of
00:10:06.000 --> 00:10:13.000
non-treated individuals. So if we have a selection bias, if we pick for the treatment group some,
00:10:14.000 --> 00:10:21.000
if we pick for the treatment group people which are likely to have say a greater deviation
00:10:22.000 --> 00:10:29.000
of the individual potential outcome from the mean potential outcome, then in the group of people
00:10:29.000 --> 00:10:37.000
which do not receive treatment, then we would have a selection bias. However, if we randomize,
00:10:37.000 --> 00:10:42.000
then obviously the expectation in the case of treatment and the expectation in the case of
00:10:42.000 --> 00:10:47.000
non-treatment should be the same and the selection bias vanishes. This was basically the lesson
00:10:47.000 --> 00:10:56.000
of the last lecture. So the selection bias actually amounts then to the correlation between
00:10:56.000 --> 00:11:05.000
the regression error term ui and the regressor di, because the question is whether the ui's are
00:11:05.000 --> 00:11:12.000
different for those people who have di is equal to 1 from those people which have di is equal to 0.
00:11:13.000 --> 00:11:20.000
When there is no difference between them, then apparently there is no correlation between di
00:11:20.000 --> 00:11:26.000
and ui. But if there is selection bias, then apparently there must be some correlation between
00:11:27.000 --> 00:11:34.000
the indicator of the treatment and the size of the error we have. Therefore, we see that selection
00:11:34.000 --> 00:11:40.000
bias is basically the same thing as correlation between regression error term ui and regressor di.
00:11:41.000 --> 00:11:47.000
Now you know that a correlation between the regressor of an a regression and the error term
00:11:47.000 --> 00:11:54.000
of the regression violates the assumptions which we have used in the review of basic econometrics
00:11:55.000 --> 00:12:03.000
in the form a6.2. By the way, also a6.1, but a6.2 was more specifically formulated for this
00:12:03.000 --> 00:12:10.000
case here. We would have correlation between the regressor and the error term,
00:12:10.000 --> 00:12:13.000
and therefore the estimate would not be consistent.
00:12:20.000 --> 00:12:27.000
In the case of a6.1 actually, where we would assume that e of u given x is equal to 0,
00:12:28.000 --> 00:12:34.000
we would have the direct implication actually that both of these terms here are 0.
00:12:35.000 --> 00:12:43.000
So the selection bias would be 0 because e of u given x translates the setting to e of u given d
00:12:43.000 --> 00:12:52.000
or given di. Regardless of whether di is equal to 1 or equal to 0, e of u given d would mean
00:12:52.000 --> 00:12:59.000
that this thing is 0 and that thing is 0, so 0 minus 0 is 0, and therefore the selection bias
00:12:59.000 --> 00:13:07.000
is 0. So a6.1 immediately implies this, but the consistency we would also get with a6.2,
00:13:07.000 --> 00:13:11.000
so with the assumption that there's no correlation. Even if the conditional independence assumption
00:13:11.000 --> 00:13:18.000
were violated, it suffices to have no correlation between regressor and error term.
00:13:18.000 --> 00:13:29.000
Now, we know that ui is equal to the potential outcome of individual i minus the expectation
00:13:29.000 --> 00:13:36.000
of this potential outcome over all people in our population, which we have denoted by alpha,
00:13:36.000 --> 00:13:43.000
which is a constant, right? So for this reason, we know that the expectation of ui given di is
00:13:43.000 --> 00:13:50.000
equal to 1 minus the expectation of ui given that di is equal to 0 is the same thing as the
00:13:50.000 --> 00:13:57.000
expectation of y0i, so the potential outcome in the case of non-treatment, given di is equal to 1
00:13:57.000 --> 00:14:05.000
minus the expectation of y0i given the non-treatment, so di is equal to 0, right? The
00:14:05.000 --> 00:14:13.000
constant here would just drop out when we replace in these cases here, and in these expectations
00:14:13.000 --> 00:14:20.000
here, the ui by y0i minus alpha. So we would have the minus alpha here and the minus alpha here,
00:14:20.000 --> 00:14:26.000
so that would cancel, and therefore we can associate the ui's here with a potential outcome
00:14:26.000 --> 00:14:35.000
y0i's. Both of them, of course, are not observable. From this, we can conclude that the correlation
00:14:35.000 --> 00:14:42.000
between the ui's and the di's reflects a difference in non-treatment potential outcomes
00:14:42.000 --> 00:14:49.000
between those who get treated and those who don't get treated, right? These would be the
00:14:49.000 --> 00:14:53.000
potential outcome for those people who get treated, and these would be the potential outcome
00:14:53.000 --> 00:15:03.000
for those who don't get treatment. So if we have a selection bias, then we have correlation between
00:15:03.000 --> 00:15:10.000
ui and di, and this, as I just said, means that there is a difference in the non-treatment
00:15:10.000 --> 00:15:17.000
potential outcomes between the treated and the non-treatment. In our hospital example,
00:15:17.000 --> 00:15:24.000
which you may recall, this means that those who were treated in the hospital had poorer health
00:15:25.000 --> 00:15:30.000
in the non-treatment state than those people who did not have hospital treatment. I mean,
00:15:30.000 --> 00:15:36.000
because their health was better, they didn't go to hospital, right? Or to give you a second example,
00:15:36.000 --> 00:15:41.000
students in smaller classes tend to have intrinsically lower test scores in a simple
00:15:41.000 --> 00:15:50.000
comparison without randomization for the simple reason that weak students may be placed with a
00:15:50.000 --> 00:15:54.000
higher likelihood in small classes by their teachers, because teachers think these students
00:15:54.000 --> 00:15:59.000
need more help, they need more attention, they need better care, so they should go into the smaller
00:15:59.000 --> 00:16:05.000
classes, and for this reason, students in these smaller classes may have actually lower test scores
00:16:05.000 --> 00:16:10.000
if we have a simple comparison, which then is biased by the selection bias.
00:16:12.000 --> 00:16:19.000
Now in this star experiment, the treatment, the di, was randomly assigned, and therefore the selection
00:16:19.000 --> 00:16:26.000
term disappears. Actually, it is not quite true that the di was completely randomly assigned in
00:16:26.000 --> 00:16:30.000
the star experiment. I will explain that a little later, but for the time being, just accept it that
00:16:30.000 --> 00:16:38.000
way, as I have said it, a star experiment was an experiment with randomization. Since this is the
00:16:38.000 --> 00:16:46.000
case that the star experiment has had randomization, we would know that the estimate of the cause of
00:16:46.000 --> 00:16:54.000
effect is unbiased. So when we regress now a yi in the star experiment, so the cognitive ability
00:16:54.000 --> 00:17:04.000
of the students on the regressor, small classes or bigger classes, then we would estimate the
00:17:04.000 --> 00:17:10.000
cause effect of interest, the delta, without selection bias or without bias at all.
00:17:12.000 --> 00:17:21.000
I'll show you the results of the random class, random class size assignment on the test scores.
00:17:22.000 --> 00:17:29.000
So what you see in this table here are actually three different attempts to estimate the causal
00:17:29.000 --> 00:17:36.000
effect of smaller classes with the data of the star experiment. So with data which were
00:17:39.000 --> 00:17:47.000
organized or which were measured in an experiment with randomization. And there are four attempts
00:17:47.000 --> 00:17:52.000
here to do this. The first attempt is actually the attempt which I showed you on the previous
00:17:54.000 --> 00:18:04.000
slide, basically this thing here, regressing yi on a constant and then on treatment and then nothing
00:18:04.000 --> 00:18:12.000
else. So on the dummy variable treatment, which means small class or larger class, this is the
00:18:12.000 --> 00:18:18.000
regressor small class. The constant is not reported here because it's not important,
00:18:19.000 --> 00:18:25.000
but it was in the regression. Here is the small class regressor, which was the dummy variable for
00:18:25.000 --> 00:18:32.000
whether a child has a kindergarten attended a small sized class or a regular sized class.
00:18:33.000 --> 00:18:40.000
And you see that the result, the coefficient we estimate here for the delta, so this is our
00:18:40.000 --> 00:18:50.000
estimate for the delta is 4.82, which means that we have close to 5% better test scores on cognitive
00:18:50.000 --> 00:18:58.000
ability for those students which went to the small classes than for those which went to the regular
00:18:58.000 --> 00:19:06.000
classes. This estimate of 4.82 is significant as you see because here is the standard error
00:19:06.000 --> 00:19:16.000
and the standard error is less than twice the coefficient. So we know that the t statistic is
00:19:16.000 --> 00:19:22.000
greater than two and therefore assuming that the distribution of the coefficient is normal,
00:19:23.000 --> 00:19:30.000
we would conclude that this is a significant effect. Note however that the r squared of this
00:19:30.000 --> 00:19:36.000
regression is very very low, so it's 0.01. So just one percent of the variation in test scores
00:19:36.000 --> 00:19:44.000
is explained in this regression by the small class size. For this reason, or potentially also for
00:19:44.000 --> 00:19:49.000
other reasons, researchers have also tried other specifications. In the next specification you find
00:19:49.000 --> 00:19:59.000
in column two of this table, where again the only systematic regressor, so again you have
00:19:59.000 --> 00:20:05.000
the regressor small class size, so the same dummy variable as in regression one,
00:20:06.000 --> 00:20:14.000
but now they also include school fixed effects. This means that for every student
00:20:15.000 --> 00:20:24.000
it is indicated which school he or she attended by including more dummy variables which always,
00:20:24.000 --> 00:20:28.000
so as many dummy variables as there were different schools or different kindergartens,
00:20:29.000 --> 00:20:37.000
as many dummy variables as there were different schools and each dummy variable has a one in
00:20:38.000 --> 00:20:46.000
the entry of observation i if child i attended this particular school and for all the other
00:20:46.000 --> 00:20:52.000
children which did not attend this school there was a zero. So this is called a school fixed effect
00:20:52.000 --> 00:20:57.000
and there are many of them if there are many schools so there are as many regressors in here
00:20:57.000 --> 00:21:04.000
as there are schools in the star experiment. We indicate in this table only whether there were
00:21:04.000 --> 00:21:13.000
school fixed effects or whether there weren't school fixed effects. There would be many many
00:21:13.000 --> 00:21:17.000
parameters to report if we wanted to report all the parameters and they would not tell us anything
00:21:17.000 --> 00:21:24.000
because we don't know the schools. You see that including the school fixed effects raises the r
00:21:24.000 --> 00:21:33.000
squared substantially from 1% to 25% so that's a great achievement in terms of r squared and
00:21:33.000 --> 00:21:39.000
actually the regression coefficient for the causal effect is also changed by not really a great amount
00:21:39.000 --> 00:21:49.000
but it's a substantial increase here from 4.8 to 5.4 essentially but even more remarkably
00:21:49.000 --> 00:21:59.000
the standard error for the small class regressor decreases by a lot by almost half of its size.
00:22:00.000 --> 00:22:08.000
So this coefficient here is of much higher significance as this coefficient here.
00:22:08.000 --> 00:22:15.000
It still tells us it's about 5% which we have as better test score so in terms of magnitude
00:22:15.000 --> 00:22:19.000
okay this is a little less than 5%, this is a little more than 5% but we may still say it's
00:22:19.000 --> 00:22:27.000
about 5% so the conclusion is actually not that different from regression one but the precision
00:22:27.000 --> 00:22:35.000
of the estimate is much bigger much higher 1.26 and also we explain much more variance so we can
00:22:35.000 --> 00:22:44.000
feel more comfortable in having something. Now the question is why are school fixed effects
00:22:44.000 --> 00:22:52.000
included? Well school fixed effects include control for certain characteristics of schools
00:22:52.000 --> 00:23:00.000
which may differ across schools so there may be schools which have certain facilities which
00:23:00.000 --> 00:23:06.000
other schools do not have or some schools may have better teachers or higher salaries than
00:23:06.000 --> 00:23:11.000
other schools have because of higher school salaries they attract better teachers or something
00:23:11.000 --> 00:23:18.000
like this so the schools may not be different the schools may not be equal completely equal.
00:23:19.000 --> 00:23:26.000
Now this actually runs counter to randomization. Usually we would think that if we have a
00:23:26.000 --> 00:23:33.000
completely randomized experiment then we would think that also the schools are completely
00:23:34.000 --> 00:23:43.000
randomly chosen and therefore there should be no systematic influence of the school of the schools
00:23:43.000 --> 00:23:50.000
or its characteristics on the causal effect which we estimate here because we have complete
00:23:50.000 --> 00:23:58.000
randomization but that was not the case in the star experiment. The number of schools was not
00:23:58.000 --> 00:24:05.000
so big that we could assume that we have sort of all possible characteristics of schools
00:24:05.000 --> 00:24:13.000
more or less randomly for both the treated students and the non-treated students. Rather
00:24:13.000 --> 00:24:20.000
what the star experiment did was they picked the number of schools somehow I don't really know how
00:24:20.000 --> 00:24:26.000
they did that they picked a number of schools and then in the schools they assigned randomly
00:24:26.000 --> 00:24:32.000
the students to either small classes or regularly sized classes. So in that sense the star experiment
00:24:32.000 --> 00:24:37.000
was completely randomized as the assignment of students to small or regular classes was completely
00:24:37.000 --> 00:24:44.000
random but picking the schools was not completely random and therefore there may be systematic
00:24:45.000 --> 00:24:52.000
influences there of characteristics of the schools on the causal effect.
00:24:52.000 --> 00:24:57.000
Apparently this is not too much of a problem because the causal effect doesn't change
00:24:57.000 --> 00:25:04.000
dramatically. As I say it is still around five percent it has increased in size but it is still
00:25:04.000 --> 00:25:12.000
around five percent so this indicates that there are some non-random influences in the
00:25:12.000 --> 00:25:21.000
characteristics of the schools which we now can identify and project on the regressors which
00:25:21.000 --> 00:25:29.000
measure the school fixed effect and thereby we arrive at an improved estimate which we can see
00:25:29.000 --> 00:25:37.000
from the review standard error. The researchers then went on and tried to condition on other
00:25:38.000 --> 00:25:46.000
variables for instance in regression three they also included covariance dummy variables which
00:25:47.000 --> 00:25:53.000
gave the information on whether the child was Asian or so white or Asian on the one hand side
00:25:53.000 --> 00:25:59.000
or non-white non-Asian on the other side. So that was a dummy variable which was one in the case of
00:25:59.000 --> 00:26:09.000
white or Asian and zero in the other race that there was and you see that here also this explains
00:26:09.000 --> 00:26:16.000
parts of the variance this ethnicity or race or however you want to call it the coefficient which
00:26:16.000 --> 00:26:24.000
is estimated here is highly significant we estimate more than eight the standard error is just 1.3 so
00:26:24.000 --> 00:26:31.000
the statistic is about five I would say so it's highly significant it indicates that students
00:26:31.000 --> 00:26:38.000
which are either white or Asian have better test scores than students which are not white or Asian
00:26:38.000 --> 00:26:43.000
for whatever reason reason that maybe there is no causality assumption about this year so we don't
00:26:43.000 --> 00:26:51.000
specify why this is the case but we just have to learn that the fact that white or Asian students
00:26:51.000 --> 00:26:57.000
perform better in the kindergartens than those students which come from families which are not
00:26:57.000 --> 00:27:06.000
white or Asian. Moreover there was a regressor which indicated the sex of the student so girl
00:27:06.000 --> 00:27:15.000
was one and boy was zero and you see that girls fared substantially better in the test score
00:27:15.000 --> 00:27:21.000
than the boys did this regression coefficient is also highly coefficient highly significant as you
00:27:21.000 --> 00:27:28.000
see here the standard error is 0.6 the estimate is 4.5 so that's probably a t statistic of
00:27:28.000 --> 00:27:33.000
what seven or eight which makes this year foolproof this this coefficient apparently girls are better
00:27:33.000 --> 00:27:41.000
for whatever reason in the test scores than the boys are and then there is the regressor regressor
00:27:41.000 --> 00:27:45.000
which I also already explained free lunch basically measuring the income of the parents
00:27:45.000 --> 00:27:53.000
so the poor parents would receive free lunch for their children and apparently children from
00:27:54.000 --> 00:28:00.000
families with low income children which were granted free lunch had substantially
00:28:00.000 --> 00:28:08.000
worse test scores than other children in kindergarten and this coefficient is even
00:28:08.000 --> 00:28:12.000
more significant than all the other covariate coefficients which I have discussed
00:28:12.000 --> 00:28:19.000
here now the interesting thing about this is actually that the causal effect the estimated
00:28:19.000 --> 00:28:26.000
causal effect hardly changes and that is not only interesting it is also what was to be expected
00:28:26.000 --> 00:28:32.000
due to the fact that we had randomization so actually it should not matter whether somebody
00:28:32.000 --> 00:28:39.000
was white or asian or of some other color of the skin or whether he was a boy or girl
00:28:39.000 --> 00:28:46.000
or whether he or she received free lunch should not matter because randomization
00:28:47.000 --> 00:28:56.000
should have ensured that the percentage of people having certain characteristic was the same
00:28:57.000 --> 00:29:04.000
in the treatment group as it was in the non-treatment group so we should not see
00:29:04.000 --> 00:29:12.000
that controlling for these characteristics here would change the causal effect and in fact it
00:29:12.000 --> 00:29:19.000
doesn't so this basically tells us randomization has worked as far as these categories here are
00:29:19.000 --> 00:29:27.000
concerned the regressors here are basically uncorrelated to the regressor small class which
00:29:27.000 --> 00:29:33.000
is why the estimated coefficient for the small class regressor does not change because there
00:29:33.000 --> 00:29:40.000
is no correlation here and the fact that this is not correlated is of course the result of
00:29:40.000 --> 00:29:48.000
randomization we also see that the r squared increases again due to the fact that we control
00:29:48.000 --> 00:29:55.000
for these influences here but as I say it is not that the estimated coefficient changes greatly
00:29:55.000 --> 00:30:02.000
almost not at all we see however that by controlling for these influences here the standard
00:30:03.000 --> 00:30:08.000
error becomes a little smaller again so this coefficient estimate is even a little
00:30:08.000 --> 00:30:12.000
precisor than this coefficient estimate here and the r squared is a little bit improved
00:30:13.000 --> 00:30:20.000
well and then there were further controls which were added and you can go through this yourself
00:30:20.000 --> 00:30:26.000
white teacher apparently does not play a big role this insignificant here coefficient 0.6
00:30:26.000 --> 00:30:33.000
standard error two completely insignificant teacher experience matters it's not overly
00:30:33.000 --> 00:30:40.000
strong but it is significant with p statistic of 2.6 whether the teacher has or has not a master's
00:30:40.000 --> 00:30:46.000
degree does not matter surprisingly perhaps I don't know what kind of degree teachers must have
00:30:46.000 --> 00:30:50.000
in the united states but apparently a master's degree is not necessary so perhaps a bachelor
00:30:50.000 --> 00:30:57.000
degree would suffice and then apparently the master's education is not too valuable in terms
00:30:57.000 --> 00:31:05.000
of improving kindergarten children test scores which perhaps again is not so surprising because
00:31:05.000 --> 00:31:10.000
I think with a bachelor degree you would also have no enough to educate kindergarten children
00:31:11.000 --> 00:31:16.000
whatever school fixed effects are included in all three regressions two three and four the r
00:31:16.000 --> 00:31:22.000
square doesn't change by these weaker covariates here of which actually one adds a little bit of
00:31:23.000 --> 00:31:29.000
explanatory potential we see the standard error of the causal effect decreases again a tiny little
00:31:29.000 --> 00:31:38.000
bit and the coefficient estimate is again basically unchanged 5.37 so in irrigations two
00:31:38.000 --> 00:31:44.000
three and four regardless of which controls we add we get the same causal effect however
00:31:44.000 --> 00:31:51.000
estimated with higher precision if we also control for certain uh uh regressors so if we include
00:31:51.000 --> 00:31:58.000
regressors which in the setting are typically called covariates and that we have done and this
00:31:58.000 --> 00:32:03.000
has helped us in reducing the standard error that's the only purpose of this otherwise there
00:32:03.000 --> 00:32:09.000
should not be any correlation between between these regressors here and the small class regressor
00:32:09.000 --> 00:32:17.000
regressor therefore the coefficient estimate should not be changed and as we see there is
00:32:17.000 --> 00:32:26.000
no change so randomization apparently worked very well here okay and most of what is written here
00:32:26.000 --> 00:32:33.000
i have probably already said let me just uh the estimated estimated treatment control differences
00:32:33.000 --> 00:32:41.000
for the kindergarten children um show now or this is the result of the previous page the small class
00:32:41.000 --> 00:32:48.000
effect of between five and six percentage points and uh this effect is significantly different from
00:32:48.000 --> 00:32:54.000
zero this i have uh said well yeah and in current two three and four there are additional explanatory
00:32:54.000 --> 00:32:59.000
variables covariates uh then i've also said adding such variables does not change the
00:32:59.000 --> 00:33:05.000
coefficient estimates greatly in two three and four the fixed effects changed a little bit um
00:33:05.000 --> 00:33:11.000
or more than a little bit but also not dramatically uh but in two three and four the coefficient
00:33:11.000 --> 00:33:16.000
estimate was basically unchanged and standard errors decrease yeah as more covariates are included as i
00:33:16.000 --> 00:33:28.000
have said now you may ask if we have a randomized controlled trial rct why do we actually include
00:33:28.000 --> 00:33:35.000
the covariates because randomization should actually make this unnecessary randomization
00:33:35.000 --> 00:33:42.000
should ensure that all the characteristics are distributed in the treatment group exactly the same
00:33:42.000 --> 00:33:48.000
as in the non-treatment group and therefore since they are distributed exactly the same
00:33:48.000 --> 00:33:59.000
they should not have any effect on the measured um so the question is why there is uh then uh
00:33:59.000 --> 00:34:07.000
why it is useful to include uh the covariates in the regression now let us look at this problem in
00:34:07.000 --> 00:34:20.000
a more general notation call those covariates xi so xi is now a vector of covariates for
00:34:20.000 --> 00:34:26.000
individual i so this vector includes information on race and on teachers on free lunches and
00:34:26.000 --> 00:34:33.000
these kind of things so these are the covariates if we have randomization then the xis should be
00:34:33.000 --> 00:34:40.000
uncorrelated with the other regressor treatment di so when we estimate what is called a long
00:34:40.000 --> 00:34:46.000
regression where we have constant and treatment variable with coefficient delta which is the
00:34:46.000 --> 00:34:53.000
causal effect and then we include certain covariates then these covariates should actually
00:34:53.000 --> 00:35:01.000
not change by much the uh data here the estimate of the delta here because there should be no
00:35:01.000 --> 00:35:09.000
correlation between the xis and the di if we have randomization therefore we should see that
00:35:09.000 --> 00:35:16.000
estimating this long regression resides in a very similar estimate for delta as estimating
00:35:16.000 --> 00:35:24.000
the short equation so why actually do we do we estimate the long regression well one thing is
00:35:24.000 --> 00:35:30.000
it is always interesting to estimate a long regression and compare the estimate of delta
00:35:30.000 --> 00:35:35.000
with the estimate of the short regression because we can check whether randomization has actually
00:35:35.000 --> 00:35:41.000
worked or basically we could even construct a test a formal test on whether randomization
00:35:41.000 --> 00:35:47.000
possibly failed because if the delta here is very different so the delta hat the estimated delta
00:35:47.000 --> 00:35:51.000
if the delta head in the long regression is very different from the delta head in the short
00:35:51.000 --> 00:35:57.000
regression then we must suspect that randomization did not work the way it was supposed to work so
00:35:57.000 --> 00:36:02.000
that failed somehow so we can actually construct a test of that we won't do this here but just to
00:36:02.000 --> 00:36:08.000
give you give you the idea in general it is quite satisfactory already already to just
00:36:08.000 --> 00:36:14.000
look at the data estimates and see well are they different are they not not so different
00:36:17.000 --> 00:36:25.000
there is one reason why despite random assignments um it's um that the randomization may not quite
00:36:25.000 --> 00:36:31.000
work so it may not be possible to balance all characteristics exactly simply due to the reason
00:36:31.000 --> 00:36:38.000
that we have finite samples as always and in finite samples we will never be able to have
00:36:38.000 --> 00:36:45.000
all characteristics in the treatment group with exactly the same relative frequency as in the
00:36:45.000 --> 00:36:51.000
non-treatment group that just won't work this works asymptotically but it doesn't work with
00:36:51.000 --> 00:36:56.000
finite samples so there are always slight differences in the composition of the treatment
00:36:56.000 --> 00:37:04.000
group and of the control group and these slight differences can be captured by the covariates
00:37:05.000 --> 00:37:10.000
to make it even a little better than the randomization which is somewhat imperfect in
00:37:12.000 --> 00:37:21.000
finite samples so that's one reason why actually the inclusion of of covariates makes sense to
00:37:21.000 --> 00:37:28.000
spite of randomization moreover the inclusion of the xi variables may generate
00:37:28.000 --> 00:37:34.000
precisely estimates of the causal effect which we have seen when we
00:37:35.000 --> 00:37:44.000
observed that the standard error of the estimate of delta decreased the more covariates we included
00:37:44.000 --> 00:37:54.000
in the regression now although the covariates xi are uncorrelated with a da they have substantial
00:37:54.000 --> 00:38:03.000
explanatory power for yi we saw this in the increase of the r squared right so the covariates
00:38:03.000 --> 00:38:11.000
do not really contribute to the estimated delta so the if randomization worked properly then
00:38:12.000 --> 00:38:19.000
delta hat should not really change by much as i have explained but we can explain more variants
00:38:20.000 --> 00:38:27.000
of the dependent variable of the yi's therefore since the covariates reduce
00:38:28.000 --> 00:38:35.000
variance of the yi or explain the variance of the yi the covariates also reduce the residual
00:38:35.000 --> 00:38:42.000
variance and the residual variance as you know from our view of basic econometrics is key to
00:38:42.000 --> 00:38:48.000
the standard error of the coefficient estimates because you remember the residual variance
00:38:48.000 --> 00:38:56.000
variance this is sigma square u which reduces and this sigma square u also affects the standard
00:38:56.000 --> 00:39:03.000
errors of the estimated coefficients because the variance covariance matrix of the estimated
00:39:03.000 --> 00:39:09.000
coefficients was sigma square u times x prime x inversely right now there is the sigma square
00:39:09.000 --> 00:39:15.000
u in the case of homoskeleticity as a constant factor by which the x prime x inversely matrix
00:39:15.000 --> 00:39:21.000
is multiplied so reducing the estimate of sigma square u or having smaller errors actually
00:39:21.000 --> 00:39:27.000
increases the precision of the estimate of the causal effect and that is of course something
00:39:27.000 --> 00:39:35.000
which we are interested in there's by the way also the reason why we include school fixed
00:39:35.000 --> 00:39:43.000
effects they explain a important part of the variance in student performance as we have seen
00:39:43.000 --> 00:39:50.000
increasing the r squared from 1 to 25 so they have decreased the residual variance and therefore
00:39:50.000 --> 00:40:01.000
also the standard errors of the estimated causal effect there are some settings in which you
00:40:01.000 --> 00:40:06.000
should better not include control variables one is of course when the control variable is strongly
00:40:06.000 --> 00:40:11.000
affected by missing values obviously this doesn't make much sense because then you lose many
00:40:11.000 --> 00:40:18.000
observations and then you should not lose too many of them if you have only a few observations where
00:40:18.000 --> 00:40:24.000
the covariate variable is has a missing value then it may perhaps be reasonable to drop these
00:40:24.000 --> 00:40:31.000
few observations for the reduction in the standard error which you may achieve by including the
00:40:31.000 --> 00:40:37.000
covariate but there is no general rule on that and you actually have to decide this on your own
00:40:37.000 --> 00:40:44.000
beware of dropping certain observations with missing values in covariance because this can
00:40:44.000 --> 00:40:51.000
destroy randomization obviously right so it's it's really tricky issue to drop
00:40:51.000 --> 00:40:56.000
observations from a randomized experiment from a randomized controlled trial
00:40:58.000 --> 00:41:04.000
and of course you should never include a control variable if the control variable is affected by
00:41:04.000 --> 00:41:11.000
the treatment itself that is called a bad control and we will talk about this at length in later
00:41:11.000 --> 00:41:22.000
slides now from merely a statistical point of view randomized controlled trials are generally
00:41:22.000 --> 00:41:32.000
desirable but they come at considerable costs and in the star experiment for instance it was
00:41:32.000 --> 00:41:41.000
logistically difficult to find sufficiently many schools and students who would voluntarily
00:41:41.000 --> 00:41:50.000
participate in such a an experiment moreover the experiment involved observation over a rather long
00:41:50.000 --> 00:41:57.000
period of time namely four years and this can be very costly to have this long time span and
00:41:57.000 --> 00:42:04.000
control that the experiment runs smoothly and as designed over such a long period of time and
00:42:04.000 --> 00:42:11.000
therefore this can potentially give rise to high costs of randomized trials which is why often
00:42:11.000 --> 00:42:19.000
researchers think about using observational data think about whether observational data
00:42:19.000 --> 00:42:26.000
can be interpreted as some kind of natural experiment in which data were generated which were
00:42:28.000 --> 00:42:33.000
generated in such a way that we can say it is almost like randomization and then we don't need
00:42:33.000 --> 00:42:38.000
to conduct a big experiment on actual population but we can just take the data that have been
00:42:38.000 --> 00:42:45.000
collected by somebody and argue that for some reason we have something similar to randomization
00:42:45.000 --> 00:42:55.000
there. Now in many cases such randomized controlled trials are impractical
00:42:57.000 --> 00:43:01.000
and there are a number of reasons which I will give you.
00:43:04.000 --> 00:43:11.000
It may also be that the time span is not sufficiently long to design and apply for
00:43:11.000 --> 00:43:18.000
I mean you probably also need authorization by some government authority and then conduct
00:43:18.000 --> 00:43:23.000
actually the experiment before you get the answer so sometimes it may just be better to have
00:43:24.000 --> 00:43:30.000
observational data which you can use right away rather than designing an experiment randomized
00:43:30.000 --> 00:43:37.000
controlled trial so many researchers as I say try to exploit cheaper and more readily sources of
00:43:37.000 --> 00:43:44.000
information. This is then sometimes called a natural experiment or a quasi experiment
00:43:44.000 --> 00:43:51.000
if the randomized trial it would have been similar to the data that the researcher now
00:43:51.000 --> 00:43:56.000
uses from mere observation. So a question coming in.
00:43:56.000 --> 00:44:10.000
Can I give an example of observational data that are similar to randomization? Yes I can.
00:44:10.000 --> 00:44:18.000
For instance there has been a study on the effect of minimum wage legislation
00:44:19.000 --> 00:44:25.000
and minimum wage legislation has always been viewed rather skeptical by economists because
00:44:25.000 --> 00:44:34.000
economists worried that minimum wage legislation turns against the least qualified in the labor
00:44:34.000 --> 00:44:43.000
market arguing that every worker is being paid his or her marginal product so when minimum wages
00:44:43.000 --> 00:44:50.000
are effective in raising the wage then this would mean that those workers which have a marginal
00:44:50.000 --> 00:44:57.000
product which is lower than the minimum wage are being laid off and this would increase
00:44:57.000 --> 00:45:06.000
then unemployment for the less qualified less skilled workers whereas the workers with a higher
00:45:06.000 --> 00:45:11.000
marginal product don't actually have much benefit from it because they would have earned a higher
00:45:11.000 --> 00:45:18.000
wage anyway. So there was and still is I would say among economists quite a degree of skepticism
00:45:18.000 --> 00:45:27.000
works minimum wage legislation. Then there was the following natural experiment that in the
00:45:27.000 --> 00:45:34.000
United States of America one state introduced minimum wage laws and the neighboring state
00:45:34.000 --> 00:45:42.000
did not. I think it was the state of Pennsylvania which introduced minimum wage law
00:45:43.000 --> 00:45:50.000
and not quite sure was the West Virginia which did not. In any case what researchers then did was
00:45:50.000 --> 00:46:00.000
that they identified a number of McDonald shops which were located in well close to the border
00:46:01.000 --> 00:46:08.000
between Pennsylvania and let's say it was West Virginia. So basically they would say we now have
00:46:08.000 --> 00:46:16.000
a natural experiment because for some workers at McDonald's typically no qualified workers we had
00:46:16.000 --> 00:46:23.000
minimum wage laws in effect suddenly being introduced and for others we had not and the
00:46:23.000 --> 00:46:28.000
distribution of the workers the characteristics of the workers should not change or should not be
00:46:28.000 --> 00:46:33.000
very different between areas in Pennsylvania which border on West Virginia and areas in West Virginia
00:46:33.000 --> 00:46:41.000
which border on Pennsylvania. So the workers which apply for jobs at McDonald's or which
00:46:41.000 --> 00:46:48.000
active work at McDonald's should be at least if the sample is good almost randomly distributed
00:46:48.000 --> 00:46:59.000
across McDonald McDonald shops which fell under minimum wage legislation and which did not. So
00:46:59.000 --> 00:47:05.000
in this case we had something like a natural experiment there and that was a study by
00:47:05.000 --> 00:47:10.000
Cardant Ashenfelter which received much attention has been published in the American economic
00:47:10.000 --> 00:47:16.000
revenue so very good journal and it came surprisingly to the conclusion that the minimum
00:47:16.000 --> 00:47:23.000
wage law did not have a negative impact on employment in McDonald shops in Pennsylvania
00:47:24.000 --> 00:47:31.000
so employment actually increased in Pennsylvania. What was a little bit more positive than
00:47:31.000 --> 00:47:40.000
West Virginia despite of the fact that the McDonald's leasers had to increase the wage
00:47:40.000 --> 00:47:44.000
for some of the really low-qualified workers. They also always had workers which
00:47:47.000 --> 00:47:53.000
who earned more than the minimum wage which Pennsylvania enacted so not all workers
00:47:53.000 --> 00:47:58.000
were affected but some were affected and doesn't seem to have had negative impact on those.
00:48:08.000 --> 00:48:17.000
Now are there also weaknesses or disadvantages of randomized trials indeed there may be and even
00:48:17.000 --> 00:48:23.000
though from the statistical point of view randomization seems to be the best thing we can
00:48:23.000 --> 00:48:29.000
achieve so sometimes we say that's a gold standard that's the best setting we may actually
00:48:29.000 --> 00:48:38.000
have but there may be say ethical financial or political reasons why randomization doesn't
00:48:38.000 --> 00:48:46.000
work well for instance ethical means or an example of ethical reasons why randomization is not good
00:48:46.000 --> 00:48:54.000
is we cannot really say we assign hospital treatment randomly to people because this would
00:48:54.000 --> 00:49:00.000
mean we refuse hospital treatment to people who are seriously ill we just tell them well you haven't
00:49:01.000 --> 00:49:11.000
been assigned to hospital treatment you have drawn the right lottery the right lottery ticket
00:49:11.000 --> 00:49:15.000
so you're not allowed to go to hospital even though you are seriously so this is completely
00:49:15.000 --> 00:49:24.000
impossible right so in that sense there are ethical reasons why we cannot use a randomization
00:49:25.000 --> 00:49:32.000
in the case of hospital treatment this is quite clear but in the case of experimenting on a new
00:49:32.000 --> 00:49:39.000
drug this is much less clear actually the ethical point is still there and still randomizations
00:49:39.000 --> 00:49:46.000
are being done because when some pharmaceutical company has developed a new drug it needs to
00:49:46.000 --> 00:49:55.000
provide a study on the effectiveness of the drug and the study must be randomized otherwise people
00:49:55.000 --> 00:50:00.000
will argue well this was not a randomized controlled trial so we don't really know whether
00:50:00.000 --> 00:50:07.000
you have selection rights so they try to randomize or they do randomize their studies by addressing
00:50:07.000 --> 00:50:16.000
a number of people who shall receive treatment for some type of illness they have and some of
00:50:16.000 --> 00:50:22.000
these people now receive the drug and others receive a placebo but the people who receive the
00:50:22.000 --> 00:50:29.000
placebo don't know that they receive the placebo they think they also receive the drug and they
00:50:29.000 --> 00:50:35.000
hope to be cured by receiving the drug whereas in fact they may not be cured at all since the
00:50:35.000 --> 00:50:43.000
placebo by definition doesn't have any clinical effect so their treatment goes their illness goes
00:50:43.000 --> 00:50:50.000
untreated that of course if the illness is serious is a big ethical concern on conception
00:50:52.000 --> 00:50:57.000
there may also be financial reasons why a randomized controlled trial is not possible
00:50:58.000 --> 00:51:03.000
for instance if we want to study the effects of certain types of social support i mean shall you
00:51:03.000 --> 00:51:09.000
really give social support to people who do not qualify for social support who are not in need of
00:51:09.000 --> 00:51:14.000
social support this would make such an experiment very expensive and speaks against the randomized
00:51:14.000 --> 00:51:21.000
controlled trials there or there may be political obstacles like for instance there may be parents
00:51:21.000 --> 00:51:26.000
in say the star experiment you know which are really angry that their children are not assigned
00:51:26.000 --> 00:51:32.000
who attend small sized classes where presumably the quality of instruction is better and the
00:51:32.000 --> 00:51:38.000
learning effect is better whereas their neighbors are lucky they have been randomly assigned to send
00:51:38.000 --> 00:51:44.000
their children and they say well the neighbors are exactly like we are and why is our child
00:51:44.000 --> 00:51:49.000
discriminated against effectively not intentionally but effectively discriminated against by being
00:51:49.000 --> 00:51:55.000
refused the opportunity to attend a small sized class whereas my neighbor's son or daughter may
00:51:55.000 --> 00:52:04.000
attend this class so this may actually unfold political opposition and people will write to
00:52:04.000 --> 00:52:07.000
you know their local representative and the representative will talk to the governor
00:52:07.000 --> 00:52:12.000
and the governor may say well i do need votes in this precinct so better stop the experiment
00:52:12.000 --> 00:52:17.000
so these kinds of weaknesses and disadvantages of randomized trials are certainly there
00:52:19.000 --> 00:52:25.000
there's another problem with randomized trials which i should mention and this is dropout
00:52:25.000 --> 00:52:34.000
from the trial that's a big problem actually that you have a nice randomization design you have
00:52:34.000 --> 00:52:40.000
control group and treatment group such that characteristics are completely random and therefore
00:52:40.000 --> 00:52:48.000
hopefully equal to each other and now you subject certain subject the treatment group with the
00:52:48.000 --> 00:52:54.000
treatment for instance you have low qualified workers and some receive additional training
00:52:54.000 --> 00:53:00.000
but training means effort right so people have to get up early in the morning and they have to go
00:53:00.000 --> 00:53:06.000
to some classes and learn something and perhaps to exercise and these kind of things after sometimes
00:53:06.000 --> 00:53:11.000
some of the treated people just drop out they don't want to do it anymore and the question
00:53:11.000 --> 00:53:18.000
then is whether the dropouts are also a random selection of the treated people or whether
00:53:18.000 --> 00:53:26.000
perhaps just some people with certain characteristics drop out for instance the laziest
00:53:26.000 --> 00:53:31.000
right the laziest drop out of the experiment because of the experiment the training involves
00:53:31.000 --> 00:53:37.000
effort and then of course we do not have randomization anymore we cannot compare
00:53:38.000 --> 00:53:43.000
the results of the treatment group under the assumption of randomization with the results of
00:53:43.000 --> 00:53:53.000
the control group this is a so-called attrition problem when drop out from a randomized controlled
00:53:54.000 --> 00:54:02.000
trial is non-random so yeah then we have what is called a panel attrition
00:54:04.000 --> 00:54:10.000
and the next problem is that there may be a disappointment effect in the group of
00:54:11.000 --> 00:54:17.000
controls so if people know they are in such an experiment and they have been assigned
00:54:17.000 --> 00:54:24.000
non-treatment whereas they know others receive treatment then maybe that the people who do not
00:54:24.000 --> 00:54:30.000
receive treatment are disappointed and this disappointment may actually change their behavior
00:54:30.000 --> 00:54:36.000
in some way and therefore there would not be complete randomization anymore there would be
00:54:36.000 --> 00:54:42.000
a structural difference between the control group and the treatment group which distorts
00:54:43.000 --> 00:54:51.000
the results and the last thing is that internal validity of a randomized controlled trial does
00:54:51.000 --> 00:54:59.000
not necessarily imply external validity what I mean with this you can probably best imagine
00:54:59.000 --> 00:55:05.000
when you think of this minimum wage study which I explained which as I said was not a randomized
00:55:05.000 --> 00:55:12.000
controlled trial but you could perhaps design it as a randomized controlled trial right so suppose
00:55:12.000 --> 00:55:18.000
such an experiment was done in a randomized controlled trial setting then what would you
00:55:18.000 --> 00:55:25.000
actually know you would know that for workers working in fast food stores like mcdonald or
00:55:25.000 --> 00:55:32.000
perhaps just mcdonald stores the minimum wage legislation does not have a detrimental effect
00:55:32.000 --> 00:55:40.000
on employment but does this imply that minimum wage laws in general have no effect on employment
00:55:40.000 --> 00:55:47.000
actually we have the result just internally for employees of mcdonald shops but we don't
00:55:47.000 --> 00:55:55.000
have them for employees at large at any kind of of occupation and we just don't know whether we can
00:55:55.000 --> 00:56:02.000
infer from the small group of people who work at mcdonald stores that the same thing is also true
00:56:02.000 --> 00:56:09.000
for people who work in garages for instance or who work on construction sites or wherever low
00:56:09.000 --> 00:56:17.000
qualified workers may work which have the benefit or may have the benefit or the doubtful benefit of
00:56:17.000 --> 00:56:28.000
minimum wage loss so whether the internal validity which we may have in the in the sorry
00:56:28.000 --> 00:56:39.000
um um the the internal validity which we may have in the randomized controlled trial whether
00:56:39.000 --> 00:56:46.000
this can actually be extrapolated to similar settings which are similar but different that is
00:56:46.000 --> 00:56:56.000
absolutely not clear okay now think of your research question please remember the question
00:56:56.000 --> 00:57:04.000
you picked from the list uh also mentioned in last lecture and just try to think for yourself
00:57:04.000 --> 00:57:13.000
what would the ideal experiment look like in your case if you wanted to research your question
00:57:13.000 --> 00:57:19.000
what kind of an experiment would you want to devise you don't need to bother about costs you
00:57:19.000 --> 00:57:25.000
just should think through how you would devise an experiment such that you could conduct a randomized
00:57:25.000 --> 00:57:31.000
controlled trial and then you may also think about the question would that be feasible or what kind
00:57:31.000 --> 00:57:39.000
of difficulties might arise if you go and try to put the randomized controlled trial into action
00:57:39.000 --> 00:57:49.000
and yeah well give reasons for that we skip the slides on visualization and move on to the next
00:57:50.000 --> 00:57:57.000
set of slides which continues on this line of thought with systematic investigation of the
00:57:58.000 --> 00:58:02.000
estimator for these type of questions but before i start with this let me
00:58:02.000 --> 00:58:09.000
ask you if there are any questions concerning the intent of slides five
00:58:09.000 --> 00:58:11.000
five
00:58:16.000 --> 00:58:24.000
it seems that this is not the case at least i don't see anybody raising the hand then let's move on
00:58:29.000 --> 00:58:34.000
so now we speak about the question how we can use ols estimation
00:58:34.000 --> 00:58:40.000
and i give you here again the readings the two books by angrist and
00:58:42.000 --> 00:58:47.000
chapter three here in that mostly harmless econometrics or chapter two in mastering metrics
00:58:47.000 --> 00:58:54.000
by these two authors you may also consult the textbook by woodridge introductory econometrics
00:58:54.000 --> 00:58:59.000
there it is chapter three section three point three or chapter nine section nine point two which you
00:58:59.000 --> 00:59:09.000
may contact on the content of this rather lengthy chapter of the lecture what we do here is actually
00:59:09.000 --> 00:59:15.000
that we bring together the results of chapter three so a review of basic econometrics with
00:59:15.000 --> 00:59:22.000
the results of chapter four the causality and counterfactual setting and we have already seen
00:59:22.000 --> 00:59:25.000
chapter five which i might have also mentioned here so we also bring together this with chapter
00:59:25.000 --> 00:59:34.000
five we have seen that the causality framework the potential output framework actually may
00:59:35.000 --> 00:59:42.000
flow very nicely into a regression setting which is not only true in the examples i gave you but
00:59:42.000 --> 00:59:49.000
is true in general so regression is a very important place a very important role in
00:59:49.000 --> 00:59:58.000
empirical research and also so in experimental data and also in observational data
00:59:59.000 --> 01:00:07.000
and the question is essentially whether regression analysis can be used in cases of
01:00:07.000 --> 01:00:15.000
non-random assignment so when we don't have an rct at our disposal then it depends on the
01:00:15.000 --> 01:00:23.000
circumstances whether ols regression can be used in a useful way or whether it can not so it is not
01:00:23.000 --> 01:00:29.000
always clear without people thinking whether we have a causal interpretation of the regression
01:00:29.000 --> 01:00:35.000
equation or not in general as you know regression equations just give us information on
01:00:37.000 --> 01:00:44.000
on correlations and not on causality so the question is when does ols have a causal
01:00:44.000 --> 01:00:53.000
interpretation so just estimate regression and now you may think of the question if this ols
01:00:53.000 --> 01:01:02.000
regression has a causal interpretation there's a special assumption actually which guarantees us
01:01:02.000 --> 01:01:08.000
causal interpretation of suitably formulated regression equation and this is the so-called
01:01:08.000 --> 01:01:19.000
conditional independence assumption c i a which says conditional on a matrix of covariance x i
01:01:20.000 --> 01:01:27.000
the treatment d i is independent of the potential outcomes symbolically
01:01:28.000 --> 01:01:36.000
if we condition the d i's on observables on data which we have observed
01:01:38.000 --> 01:01:44.000
so if we control for the impact of observed data on the treatment status
01:01:45.000 --> 01:01:53.000
then the treatment status shall be independent of the potential outcomes so basically we can control
01:01:53.000 --> 01:02:02.000
for or let's suppose it's the other way around suppose the treatment status is not independent
01:02:02.000 --> 01:02:07.000
of the potential output outcomes but then we have selection bias as we know right and then
01:02:07.000 --> 01:02:15.000
we would not be able to estimate the causal effect consistently now what we can do is that we say
01:02:15.000 --> 01:02:24.000
let's try to explain parts of the correlation of d i with the potential output by controlling
01:02:24.000 --> 01:02:30.000
for some influences of the selection bias by controlling for some determinants of this
01:02:31.000 --> 01:02:43.000
selection bias so let us look at d i given x i in the sense of d i being conditioned on x i
01:02:44.000 --> 01:02:54.000
all the um selection bias which can be captured by the x i variable can by means of a regression
01:02:54.000 --> 01:03:02.000
being controlled for and if sort of the residual of a auxiliary regression if you want to
01:03:03.000 --> 01:03:11.000
imagine it that way of a auxiliary regression of d i on x i if the residual of such an auxiliary
01:03:11.000 --> 01:03:17.000
regression is then independent of the potential outcomes then the conditional independence
01:03:17.000 --> 01:03:26.000
assumption is satisfied then it holds that is actually the condition for a causal interpretation
01:03:26.000 --> 01:03:36.000
of regression size sometimes this is just implicit in the interpretation of a regression equation
01:03:36.000 --> 01:03:44.000
that the effect is a causal effect but sometimes we may actually explicitly head for a causal effect
01:03:44.000 --> 01:03:53.000
and then we need to ensure that the cia codes this cia assumption is often called the selection
01:03:53.000 --> 01:04:01.000
on observables assumption because it basically means we select the treatment group and the
01:04:01.000 --> 01:04:12.000
non-treatment group the control group on certain observables so certain observable data help us
01:04:12.000 --> 01:04:20.000
in deciding who is being treated and who is not being traded then we have selection or observables
01:04:20.000 --> 01:04:27.000
this means the selection bias can be explained and actually be explained away by the observable
01:04:27.000 --> 01:04:33.000
variables which have been used to assign people either to the control group or through the to the
01:04:33.000 --> 01:04:40.000
treatment group that is of course very helpful if we think of our hospital example so if we can
01:04:40.000 --> 01:04:48.000
observe the health status of people already before they go to hospital which should in principle be
01:04:48.000 --> 01:04:55.000
possible then we can assign just the ill people to the treatment group just ill people are being
01:04:55.000 --> 01:05:00.000
treated but we do know already what kind of illness they have or at least we do know that
01:05:00.000 --> 01:05:07.000
they are ill and therefore we have observed the criteria by which people are selected to be either
01:05:07.000 --> 01:05:12.000
in the treatment group or in the control group and in the control group there should be just the
01:05:12.000 --> 01:05:19.000
healthy people obviously this is easier said than done because we cannot ensure that those people
01:05:19.000 --> 01:05:25.000
who do not go to hospital are actually uh but ill there may be many cases in particular the United
01:05:25.000 --> 01:05:29.000
States have been many cases where people did not go to hospital when they were ill because they
01:05:29.000 --> 01:05:36.000
didn't have sufficient health care there's no health insurance for some people so it's not
01:05:36.000 --> 01:05:42.000
really clear that all the ill people truly were in the treatment group and of course we may not
01:05:42.000 --> 01:05:49.000
know what the health status of somebody is who goes to the hospital there may be people who go
01:05:49.000 --> 01:05:57.000
to the hospital who are not ill at all there may be people who just have certain symptoms which seem
01:05:57.000 --> 01:06:04.000
harmless some stomach ache perhaps but then it turns out only in hospital that the status
01:06:04.000 --> 01:06:08.000
the health status is really bad when they have a tumor in their stomach or something like that
01:06:08.000 --> 01:06:16.000
so all of this creates problems in selecting on observables but be it as it may in principle if we
01:06:16.000 --> 01:06:21.000
know the criteria by which people have been assigned to either the treatment group or the
01:06:21.000 --> 01:06:28.000
control group then we may hope that the CIA assumption is valid
01:06:28.000 --> 01:06:39.000
so what the CIA assumption the conditional independent assumption asserts is that conditional
01:06:39.000 --> 01:06:46.000
on observed characteristics selection bias disappears the question is then of course
01:06:46.000 --> 01:06:52.000
which covariates are these which control variants do we need or which what should be control
01:06:52.000 --> 01:06:57.000
variables which tell us that the selection bias disappears that we have captured everything
01:06:57.000 --> 01:07:04.000
which was influential in separating people between treatment and non-treatment so control group
01:07:06.000 --> 01:07:12.000
let us go back to a previous example which i had already in my in my slides the returns to
01:07:12.000 --> 01:07:17.000
education first so you remember students either going to college or not going to college and then
01:07:17.000 --> 01:07:24.000
later looking at the celebratory of those which went to college and those which did not go to
01:07:24.000 --> 01:07:31.000
college if the conditional independence assumption if we can take this as given
01:07:31.000 --> 01:07:38.000
given then conditional on xi comparisons of average earnings across school schooling levels
01:07:38.000 --> 01:07:48.000
have a causal interpretation but if we have xi variables which are such that the CIA assumption
01:07:48.000 --> 01:07:56.000
is valid then we know that the average earnings across schooling levels can be interpreted as the
01:07:56.000 --> 01:08:02.000
causal effects of schooling on average income on average earnings
01:08:04.000 --> 01:08:09.000
in many randomized experiments the CIA prevails simply because the DI is randomly assigned
01:08:09.000 --> 01:08:17.000
conditional on xi right so so far we have always assumed that we have complete randomization
01:08:18.000 --> 01:08:24.000
that may be the case in some settings for instance in the story experiment students were just
01:08:24.000 --> 01:08:31.000
randomly drawn and assigned to go to smaller sized class or regular classes but it may also be that
01:08:31.000 --> 01:08:41.000
we first separate the population on some observable characteristics and then we assign from those which
01:08:41.000 --> 01:08:48.000
satisfy certain characteristics some to the treatment randomly and others not to the treatment
01:08:48.000 --> 01:08:58.000
and the same thing we also do for the people who do not satisfy the the xi criteria but then
01:08:58.000 --> 01:09:06.000
while there is a selection bias determined by the xi status we can still correct for this selection
01:09:06.000 --> 01:09:12.000
bias by including the xi regressor regressor in the regression and therefore ensuring that
01:09:12.000 --> 01:09:24.000
the conditional independence assumption holds okay um now if we have uh if we do not have a rct
01:09:24.000 --> 01:09:30.000
but a an observational study so something which we may call a randomly a random experiment or
01:09:30.000 --> 01:09:36.000
natural experiment then the conditional independence assumption basically means that
01:09:37.000 --> 01:09:44.000
the treatment status DI is as good as randomly assigned conditional on the observational the
01:09:44.000 --> 01:09:55.000
observations which we have in xi okay um let's go to section one of this chapter here potential
01:09:55.000 --> 01:10:03.000
outcomes and regression analysis the potential outcomes in approach as you may have understood
01:10:03.000 --> 01:10:10.000
should have understood involves variables which are observable observable so the yi's and then
01:10:10.000 --> 01:10:21.000
the potential variables which are unobservable y1i and y0i or more precisely um the yi always
01:10:21.000 --> 01:10:29.000
includes one potential outcome for an individual so this is observed and the other the potential
01:10:29.000 --> 01:10:35.000
outcome is regularly not observed so for instance either for instance we observe for individual i
01:10:35.000 --> 01:10:43.000
outcome y1i potential outcome y1i because individual i is treated but then we is treated
01:10:43.000 --> 01:10:51.000
but then we do not observe potential outcome y0i for individual i now this seems to pose a big
01:10:51.000 --> 01:10:57.000
problem for regression analysis because regression analysis can only be done on data we observe
01:10:57.000 --> 01:11:02.000
right so the usual setting which we have dealt with in chapter three was y is x beta plus u
01:11:02.000 --> 01:11:10.000
and obviously we needed all the data y and x as an orbital the regression so how does the potential
01:11:10.000 --> 01:11:17.000
outcomes approach where we do not observe some interesting data some potential outcomes for each
01:11:17.000 --> 01:11:28.000
individual how does this match with the regular regression setup well fairly easily actually
01:11:28.000 --> 01:11:36.000
because we have an observation for all observe for all individuals so in the potential outcomes
01:11:36.000 --> 01:11:42.000
set up we just collect all the observed outcomes in an n by 1 vector which we call y which is equal
01:11:42.000 --> 01:11:51.000
to y1 to yn so these are the observed outcomes some of them are y1i's and others are y0j's
01:11:54.000 --> 01:12:03.000
we know why we observe now the y1 for some individuals and the y0 for others so i
01:12:03.000 --> 01:12:08.000
introduced here a notation which you may not be acquainted with dot basically means that i
01:12:09.000 --> 01:12:13.000
that there is that there should be an index but i don't specify the index so there should be either
01:12:13.000 --> 01:12:19.000
an i or j or k or something like this but here i want to emphasize this can be any index so i just
01:12:19.000 --> 01:12:26.000
set this dot here so y1 dot means for some individuals we have the treatment outcome
01:12:27.000 --> 01:12:35.000
and for other individuals we have the non-treatment outcome therefore as you have already seen yi is
01:12:35.000 --> 01:12:45.000
equal to di y1i plus one minus di y0i right depending on whether individual i was treated
01:12:45.000 --> 01:12:53.000
or was not treated we either get outcome y1i or we get outcome y0i so apparently the treatment
01:12:53.000 --> 01:12:59.000
variable di is the prime explanatory variable for explaining the different outcomes for different
01:12:59.000 --> 01:13:05.000
individuals i and j if we assume that individual i was treated and individual j was not treated
01:13:05.000 --> 01:13:15.000
but di was equal to one and dj was equal to zero so what we can do is that we create a column vector
01:13:15.000 --> 01:13:24.000
d collecting the individual treatment indicators di right so d is then just a vector d1 up to dn
01:13:24.000 --> 01:13:28.000
for each individual we now know whether it's we do know and we connect this information in this
01:13:28.000 --> 01:13:36.000
vector d column vector which contains the individual dis here i always write the column
01:13:36.000 --> 01:13:42.000
vector to say space as a row vector transposed right so it's a column this is a dummy variable
01:13:42.000 --> 01:13:47.000
right d is a binary variable and such binary variables we call dummy variables
01:13:47.000 --> 01:13:54.000
um it is binary because it consists of just two possible entries either zero or one that's why
01:13:54.000 --> 01:14:00.000
it's called binary and such binary variables as explanatory variables are called dummy variables
01:14:01.000 --> 01:14:08.000
now be aware of the fact that binary variables can be tricky in observations
01:14:09.000 --> 01:14:15.000
for instance if the dependent variable of a regression so if the y is a binary variable
01:14:15.000 --> 01:14:21.000
then we need a special econometric modeling because the regressors or the linear combination
01:14:21.000 --> 01:14:30.000
of the regressors x beta or x beta plus u usually are not binary but we do need of course that y is
01:14:30.000 --> 01:14:39.000
equal to x beta plus u so if y is just one or zero how can we ensure that x beta plus u is always
01:14:39.000 --> 01:14:47.000
exactly one or zero um note that this is u not u hat u hat is of course or you had always satisfies
01:14:47.000 --> 01:14:54.000
this equation but u does not and ideally would like to have that x beta is always either close
01:14:54.000 --> 01:15:00.000
to zero or close to one but if x is a continuous variable why should it actually be the case that
01:15:00.000 --> 01:15:06.000
x beta is close to zero or one so this needs a special econometric modeling and techniques
01:15:06.000 --> 01:15:14.000
logit models probit models or something like this um i have this also um on my agenda for this
01:15:14.000 --> 01:15:19.000
lecture but i'm afraid we won't uh have enough time to cover that so you may deal with this
01:15:19.000 --> 01:15:26.000
in a later lecture we won't usually have a dependent variable in a regression which is
01:15:26.000 --> 01:15:31.000
binary we will have independent variables which are binary so regressors which are binary
01:15:33.000 --> 01:15:39.000
so for us the d is an explanatory variable and this explains why we observe the potential
01:15:39.000 --> 01:15:45.000
outcome y one dot for some individuals and y zero dot uh for others
01:15:45.000 --> 01:15:56.000
also note that the d is a qualitative variable it is only informative on whether or not an
01:15:56.000 --> 01:16:03.000
individual has received treatment it does not give us quantitative information on how much treatment
01:16:03.000 --> 01:16:10.000
an individual has received so for instance if you test a new drug as a pharmaceutical company
01:16:11.000 --> 01:16:19.000
then you may specify your d as the variable which indicates whether some person has taken the drug
01:16:19.000 --> 01:16:26.000
or has not taken the drug but you may not know how much of the drug the person has taken
01:16:27.000 --> 01:16:33.000
in a good experiment of course uh the person would receive the drug just under controlled
01:16:33.000 --> 01:16:40.000
circumstances so there you would need to have staff which administers the drug regularly
01:16:40.000 --> 01:16:45.000
to with the person at quantities which you know of but if you just give the pills to the
01:16:45.000 --> 01:16:49.000
person and say well take one per day then you don't really know whether the person takes one
01:16:49.000 --> 01:16:55.000
pill per day and actually some people in the control group may take just one pill or no pill
01:16:55.000 --> 01:17:01.000
at all we never know um so we don't have the quantitative information on how much of a drug
01:17:01.000 --> 01:17:08.000
that people have actually received so d is a qualitative variable but a qualitative variable
01:17:08.000 --> 01:17:15.000
need not be a binary variable for instance we may have a variable which indicates ethnicity
01:17:15.000 --> 01:17:22.000
and let's say we say if the person is white then this is a zero and black is one and Hispanic is a
01:17:22.000 --> 01:17:28.000
two and Chinese is three or something like that so that would be a variable which is coded in
01:17:28.000 --> 01:17:34.000
number zero one two three so it's not binary because there are more than two values which
01:17:34.000 --> 01:17:40.000
are being assumed by the variable still it is a qualitative variable it does not measure any kind
01:17:40.000 --> 01:17:46.000
thing quantitatively doesn't measure how black you are how much black and how much white ancestors
01:17:46.000 --> 01:17:51.000
you have but it's just a qualitative variable either you are black or you are white so it's
01:17:51.000 --> 01:18:00.000
very easy view of very simple-minded view of the world in causal regressions we very often use
01:18:00.000 --> 01:18:06.000
binary variables as explanatory variables so in the simplest setting as we have already discussed
01:18:06.000 --> 01:18:12.000
the treatment variable d is the only explanatory variable with the capital of the constant so this
01:18:12.000 --> 01:18:18.000
was the case i started with in this lecture y is equal to beta one times a vector of once yuta
01:18:18.000 --> 01:18:25.000
recall yuta is vector of once plus delta d plus u and the previous set of set of slides we said
01:18:25.000 --> 01:18:30.000
this is alpha right the constant and then there was the treatment effect which usually is called
01:18:30.000 --> 01:18:37.000
delta times the dummy variable plus the u which as you know is the difference between
01:18:37.000 --> 01:18:41.000
individual potential outcome in the case of non-treatment and the expected value of the
01:18:41.000 --> 01:18:47.000
potential outcome in the case of non-treatment so this equation equation three as we have learned
01:18:47.000 --> 01:18:51.000
in the previous set of slides is appropriate if the treatment was randomly assigned
01:18:51.000 --> 01:19:02.000
but there may be other characteristics which influence observed outcomes the important thing
01:19:02.000 --> 01:19:10.000
is that these other characteristics do not correlate with the d i so they may indeed in fact
01:19:10.000 --> 01:19:18.000
affect the y y i but they shall not correlate with the d i otherwise or with the vector d
01:19:18.000 --> 01:19:21.000
otherwise we would not have randomization
01:19:26.000 --> 01:19:32.000
oh yeah perhaps i should mention that if there are other influences on y which do not correlate
01:19:32.000 --> 01:19:39.000
with d then we can nevertheless estimate equation three and get a consistent estimate of delta
01:19:39.000 --> 01:19:47.000
because in this case if the true model is actually beta one times yuta plus delta d plus some x times
01:19:47.000 --> 01:19:55.000
x prime times gamma plus error term then we would just have the x prime times gamma in the error
01:19:55.000 --> 01:20:03.000
here in the u right this would be part of the error it's part of a neglected variables problem
01:20:03.000 --> 01:20:10.000
right we discussed this problem in the review of basic econometrics as long as the variables
01:20:10.000 --> 01:20:16.000
which we neglect are not correlated with the regressors we have in the regression
01:20:16.000 --> 01:20:23.000
specifically not correlated with the regressor nothing happens we will still estimate the delta
01:20:23.000 --> 01:20:29.000
correctly we will have greater standard errors we will have higher residual variance all that is
01:20:29.000 --> 01:20:34.000
clear so we will estimate the delta less precisely but the estimate will be consistent
01:20:35.000 --> 01:20:41.000
there will be no asymptotic bias so we are still on fairly safe territory so we may treat
01:20:41.000 --> 01:20:49.000
um influences which we have not included in the regression as part of the error term as long as
01:20:49.000 --> 01:20:56.000
these influences do not correlate with d and that's quite good to know because in many settings we
01:20:56.000 --> 01:21:02.000
will actually think well there are other factors influencing the y think of these but we don't have
01:21:02.000 --> 01:21:12.000
data on them um like for instance if you look at salaries suppose y is a salary and a delta
01:21:12.000 --> 01:21:20.000
measures the effect of schooling then you may say well um salary is not only affected by schooling
01:21:20.000 --> 01:21:27.000
but it is also affected by ability uh the intellectual cognitive ability of a person
01:21:28.000 --> 01:21:34.000
this ability may be imperfectly measured by schooling there are people who are very able
01:21:34.000 --> 01:21:37.000
but have not received a schooling because they were i don't know not in a position to go to
01:21:37.000 --> 01:21:47.000
school or didn't want to couldn't for financial reasons whatever and in this case we would not
01:21:47.000 --> 01:21:54.000
have any data on ability ability is typically not measured for for for people so we would need to
01:21:54.000 --> 01:22:02.000
say ability is part of the error term here what we know is that this would not cause problems as long
01:22:02.000 --> 01:22:09.000
as ability is not correlated with the decision to go to college or not well unfortunately ability
01:22:09.000 --> 01:22:15.000
would typically be correlated with the decision to go to college or not um but in say in a
01:22:15.000 --> 01:22:23.000
dictatorial society where people are assigned college education on some arbitrary basis which
01:22:23.000 --> 01:22:29.000
has nothing to do with ability you may perhaps have the setting that ability is uncorrelated
01:22:29.000 --> 01:22:37.000
with d and then you would still estimate the the causal effect correctly which is not a policy
01:22:37.000 --> 01:22:46.000
advice to have dictatorial societies okay in the case of random assignment now equation three
01:22:46.000 --> 01:22:54.000
implies that the expectation of y i given that d i belongs to the treatment group is equal to
01:22:55.000 --> 01:23:02.000
beta zero plus delta right given that the i is equal to zero we would know the expectation of
01:23:02.000 --> 01:23:07.000
this thing here is just uh or i wrote beta zero but here it is beta one so it should be beta one
01:23:08.000 --> 01:23:15.000
uh beta one uh plus delta since d is one uh for those which are in the treatment group
01:23:15.000 --> 01:23:21.000
and the expectation of u is of course uh zero and the expectation of y i given that d i is equal to
01:23:21.000 --> 01:23:27.000
zero is well i call it now beta zero uh so perhaps on the former slide i should change this to beta
01:23:27.000 --> 01:23:35.000
zero it's just beta zero and there is no delta because d i is zero moreover due to equation
01:23:35.000 --> 01:23:43.000
two that we can decompose the observations y i into d i times potential outcome in the case
01:23:43.000 --> 01:23:49.000
of treatment plus one minus d i times potential outcome in the case of non-treatment we have
01:23:49.000 --> 01:23:55.000
that the expectation of y i given d i belongs to the treatment group so individual i belongs to
01:23:55.000 --> 01:24:02.000
the treatment group is equal to the expectation of y one i given that the individual belongs to
01:24:02.000 --> 01:24:08.000
the treatment group right very clear and the same thing then of course for non-treatment so
01:24:08.000 --> 01:24:14.000
expectation of y i give non-treatment is expectation of the potential output given non-treatment so
01:24:14.000 --> 01:24:24.000
there we can write the delta is both the difference of y i given treatment minus expectation of y i
01:24:24.000 --> 01:24:29.000
given non-treatment and that is equal to the expectation of potential output in the case of
01:24:29.000 --> 01:24:36.000
treatment given treatment minus expectation of potential output for non-treatment given non-treatment
01:24:40.000 --> 01:24:46.000
since we now have random assignment the att and the atm t and the att ate all coincide
01:24:47.000 --> 01:24:54.000
and are just equal to the right hand side of equation eight so to this side here right
01:24:55.000 --> 01:25:03.000
given that we have random assignment we know that the causal effect which is this one here
01:25:03.000 --> 01:25:12.000
expectation of one i y one i given treatment minus expectation of y zero i given non-treatment
01:25:13.000 --> 01:25:20.000
is the same as this difference here y one i given treatment also expectation of y one i given
01:25:20.000 --> 01:25:28.000
treatment minus expectation of y zero i given treatment so this here is the att right this the
01:25:28.000 --> 01:25:34.000
average treatment effect of the treated here we have the same expression just given non-treatment
01:25:34.000 --> 01:25:41.000
di is equal to zero here di is equal to zero here so this is apparently the atnt the average
01:25:41.000 --> 01:25:48.000
treatment effect of the non-treated and here we would have the ate the expectation of y one i
01:25:48.000 --> 01:25:55.000
minus the expectation of y zero i across all individuals so more compactly the regression
01:25:55.000 --> 01:26:02.000
coefficient delta which is actually this thing here is either equal to the difference of the
01:26:02.000 --> 01:26:08.000
potential outcomes given treatment which would be the att or it is the expectation of the difference
01:26:08.000 --> 01:26:14.000
between the potential outcomes given non-treatment which would be the atnt or it is just the
01:26:14.000 --> 01:26:20.000
unconditional expectation of the difference in potential outcome so with random assignment all
01:26:20.000 --> 01:26:26.000
these three things are the same but in general of course that's not the case and we will deal
01:26:26.000 --> 01:26:35.000
with these issues in the next lecture we are at the end of today's lecture time um is there the
01:26:35.000 --> 01:26:41.000
wish to go to interaction then please uh wave or raise your hand
01:26:46.000 --> 01:26:52.000
yes um is this a wish for interaction or can you pose the question also in the chat
01:26:55.000 --> 01:26:56.000
is already the chat
01:26:58.000 --> 01:27:00.000
uh okay you can type the question yes okay
01:27:00.000 --> 01:27:05.000
if anybody else would like to go to interaction we can still do that
01:27:05.000 --> 01:27:11.000
but then you need to raise your hand and esla please uh write your uh type your question
01:27:13.000 --> 01:27:20.000
i will release that was wrong i hope you're still there are you um
01:27:22.000 --> 01:27:24.000
i actually wanted to stop the recording
01:27:30.000 --> 01:27:36.000
which i do here