WEBVTT - autoGenerated
00:00:00.000 --> 00:00:21.000
Thank you.
00:00:30.000 --> 00:00:59.000
Thank you.
00:01:00.000 --> 00:01:27.000
Hello and welcome to this lecture of Estimation and Inference.
00:01:27.000 --> 00:01:35.000
I just scanned through a couple of questions which you raised, more or less everybody there
00:01:35.000 --> 00:01:45.000
was saying that he or she has questions relating also to a previous material which I covered
00:01:45.000 --> 00:01:51.000
in this lecture and whether we can have an interactive session, yes indeed we can.
00:01:51.000 --> 00:02:00.000
I suggest that I will just proceed as planned today in this lecture and that you please send
00:02:00.000 --> 00:02:08.000
me your questions by email and then in tomorrow's lecture I will reserve enough time for us
00:02:08.000 --> 00:02:14.000
to have an interactive session or perhaps even start with the interactive session to
00:02:14.000 --> 00:02:20.000
make sure that we have enough time to cover your questions and then move on with the lecture
00:02:21.000 --> 00:02:31.000
material. I hope that is all right for you. Are there any questions which relate to the DID method?
00:02:34.000 --> 00:02:41.000
Okay this was just a thank you. Any questions relating to the stuff which I covered in the
00:02:41.000 --> 00:02:50.000
last lecture. I just repeat for you this incinerator example which you probably recall.
00:02:51.000 --> 00:02:58.000
The key question the question of interest was what kind of damage is inflicted on the house owners
00:02:58.000 --> 00:03:06.000
who own property close to the site where an incinerator is to be constructed and then starts
00:03:06.000 --> 00:03:14.000
operating a little bit earlier and we take up this question and came up with an estimate
00:03:14.000 --> 00:03:26.000
of almost twelve thousand dollars in 1978 prices which is the effect of the incinerator being close
00:03:26.000 --> 00:03:37.000
to the real estate so this would be the damage which the owners of the real estate have on average
00:03:37.000 --> 00:03:44.000
of course and one key question that we discussed then was how can we actually study the significance
00:03:44.000 --> 00:03:53.000
of this estimate here and what we did then was basically that we remodeled the regression model
00:03:54.000 --> 00:04:04.000
and used eventually well I think actually this here is the right slide for that a reformulation
00:04:04.000 --> 00:04:12.000
which you see here in equation six which came up with a regressor matrix of this type
00:04:13.000 --> 00:04:22.000
for the vector of prices which we observed for houses sold in 1978 and for houses sold in 1981
00:04:23.000 --> 00:04:29.000
and I just want to go through the design of this regressor matrix once again perhaps the first
00:04:29.000 --> 00:04:38.000
point to repeat is that this is a very simple regressor matrix in the sense that all entries
00:04:38.000 --> 00:04:47.000
in these matrix are just either zeros or ones so nothing else but ones or zeros in the first column
00:04:47.000 --> 00:04:54.000
obviously we just have as usual the constant terms so just a vector of ones and then these
00:04:54.000 --> 00:05:01.000
n78 and n81 vectors as you may recall are just dummy variables which indicate whether a
00:05:02.000 --> 00:05:10.000
house which has been sold in either here is close to the incinerator so closer than three miles to
00:05:10.000 --> 00:05:17.000
the incinerator or whether it is farther away in the case where it is close to the incinerator
00:05:17.000 --> 00:05:25.000
this variable will have an index of one or an entry of one and otherwise it will be zero
00:05:26.000 --> 00:05:34.000
and then we have another dummy variable here which is essentially dummy variable for year 1981
00:05:34.000 --> 00:05:42.000
so for all 78 prices this variable just assumes the value zero and for all 1981 prices it is just
00:05:42.000 --> 00:05:48.000
a vector of ones so the scooter here and just a vector of ones so that's a dummy variable for
00:05:48.000 --> 00:05:55.000
the question whether did the house sell in 1981 or did it sell in 1978 and if it sold in 1981 then
00:05:55.000 --> 00:06:02.000
we have one year and this vector here is then again of course a vector of zero and ones because
00:06:02.000 --> 00:06:10.000
these are all zeros and this is again the dummy for the closeness of the property to the incinerator
00:06:10.000 --> 00:06:17.000
side so it's exactly the same dummy as this one here and by the way it is also the product of the
00:06:17.000 --> 00:06:25.000
entries of this dummy variable and that dummy variable because obviously this is just a vector
00:06:25.000 --> 00:06:31.000
of ones being multiplied by n81 it gives n81 and here we just have a vector of zero so
00:06:31.000 --> 00:06:39.000
multiply this by n78 then there's obviously every everything is zero and we noted that with
00:06:39.000 --> 00:06:46.000
this design of the regressor matrix we actually estimate the effect data one we are interested
00:06:47.000 --> 00:06:56.000
in namely the difference between the two regression coefficients gamma one 81 and gamma one 78
00:06:57.000 --> 00:07:03.000
where the gamma one coefficients always are the coefficients of the variable which measures
00:07:03.000 --> 00:07:11.000
the closeness of the property to the incinerator side so the coefficients of this n78 or n81
00:07:12.000 --> 00:07:19.000
variable here the question then is whether actually this coefficient has changed due to the
00:07:19.000 --> 00:07:27.000
fact that the incinerator has been built right so our hypothesis was well it may well be the case
00:07:27.000 --> 00:07:34.000
that real estate close to the future side of the incinerator was already less valuable in 1978
00:07:35.000 --> 00:07:42.000
than it was valuable uh uh that then other property was uh worth um but the question is
00:07:42.000 --> 00:07:47.000
whether the news about the incinerator being built decreased the value of property close
00:07:47.000 --> 00:07:55.000
to the incinerator side even further so we are interested in this change here and now let us
00:07:55.000 --> 00:08:04.000
perhaps just have us look again at the regressor matrix to understand well what the idea of this
00:08:04.000 --> 00:08:13.000
design here is by looking at the coefficients which are being estimated when we have regresses
00:08:13.000 --> 00:08:21.000
as modeled in this regressor matrix clearly uh when we have a constant term here or let's just
00:08:21.000 --> 00:08:28.000
first look at the constant term the first row of prices so the constants uh which are for the 1978
00:08:29.000 --> 00:08:37.000
prices then we would say well this year is the average price of houses which is being which are
00:08:37.000 --> 00:08:46.000
sold in uh 1978 and this variable here makes adjustment for those houses which are close to
00:08:47.000 --> 00:08:55.000
the incinerator so which which are um on land on which the incinerator will be built later on but
00:08:56.000 --> 00:09:02.000
in 1978 it was not yet known that the incinerator was to be built so this adjusts for differences
00:09:02.000 --> 00:09:10.000
in value of houses close to the side without any information about the site being used for
00:09:10.000 --> 00:09:19.000
the construction of an incinerator and now what happens in the next uh row clearly um if this
00:09:19.000 --> 00:09:31.000
vector of once here captures the basic price of real estate then this additional vector of once
00:09:31.000 --> 00:09:45.000
here allows the data to um estimate a different average price in 1981 than was estimated for 1978
00:09:46.000 --> 00:09:54.000
so think of this first column here as just the column of average 1978 prices then when you add
00:09:54.000 --> 00:10:01.000
another vector of once here then apparently it is possible to make an adjustment for house prices
00:10:01.000 --> 00:10:09.000
as we measure them in 1981 so the average price of houses may have changed from 1978 to 1971
00:10:12.000 --> 00:10:21.000
and this change is being measured by this variable here and the same thing is the case for an 81
00:10:21.000 --> 00:10:28.000
here and there we have again the same type of regressor in the second column and in the
00:10:28.000 --> 00:10:35.000
fourth column if we just look at the second block of entries in this matrix here so it's just the
00:10:35.000 --> 00:10:43.000
same thing an 81 here and an 81 there much like we had just once here and just once there so this
00:10:43.000 --> 00:10:52.000
regressor here allows us to estimate a different effect for houses close to the uh incinerator in
00:10:52.000 --> 00:11:00.000
1981 then we have estimated for 1978 and since no adjustment is possible in 1978 apparently the
00:11:01.000 --> 00:11:09.000
coefficient for this regressor here will be the change in value when we move from houses
00:11:09.000 --> 00:11:14.000
far from the incinerator to houses close to the incinerator so that's the 1978 effect here
00:11:15.000 --> 00:11:22.000
and this here gives the second change when we move on in time to 1981 when everybody knows
00:11:22.000 --> 00:11:30.000
that there's an incinerator being built or having been built okay i see there are a number of
00:11:30.000 --> 00:11:41.000
questions or comments um let me just look at this um now there are again questions which
00:11:42.000 --> 00:11:52.000
relate to the um the regression discontinuity design again my suggestion is please send me
00:11:52.000 --> 00:11:57.000
your questions by email and i will then respond to them in tomorrow's interactive
00:11:57.000 --> 00:12:05.000
uh session and so i won't answer this question uh now but please my encouragement um send it to
00:12:05.000 --> 00:12:15.000
me by email and i will answer it tomorrow and um this is apparently again a question which is
00:12:15.000 --> 00:12:23.000
concerned with the uh regression discontinuity design yes it is um so again i come back to these
00:12:23.000 --> 00:12:28.000
questions um tomorrow all right any more questions here
00:12:32.000 --> 00:12:39.000
not all right then uh let's go where i left off which essentially was this equation seven here
00:12:39.000 --> 00:12:45.000
which is just a scalar representation of the system of uh equations which i just explained
00:12:46.000 --> 00:12:51.000
actually i forgot to mention this yesterday last week in the lecture that there is the simple
00:12:51.000 --> 00:12:56.000
exercise that you just check that this representation seven is exactly equal to
00:12:56.000 --> 00:13:03.000
the vector representation six which i have just gone through so please do this check in order to
00:13:03.000 --> 00:13:09.000
make sure that you understood well what i was talking about now i would like now i would now
00:13:09.000 --> 00:13:16.000
like like to move on uh and depart from the incinerator example i will return to it a little
00:13:16.000 --> 00:13:22.000
later but now talk about the difference in differences estimator somewhat more uh generally
00:13:22.000 --> 00:13:31.000
and um here i uh before i move into the uh the uh formula uh let us just recall what we know from
00:13:31.000 --> 00:13:37.000
the revue basic echelon matrix that when we have an ols regression of the typical linear model
00:13:37.000 --> 00:13:45.000
y is equal to x beta plus u then we know that the estimate of the uh error terms so the
00:13:45.000 --> 00:13:52.000
residuals the estimated residuals uh u hat are always orthogonal to the regressor matrix x so
00:13:52.000 --> 00:13:59.000
that was a property of the ols estimation which we have widely discussed uh we used this type
00:13:59.000 --> 00:14:07.000
of formula here which i have just reproduced for convenience so x prime times u hat is the same
00:14:07.000 --> 00:14:15.000
thing as x prime times this m matrix i minus x x plus times the vector of dependent variables y
00:14:15.000 --> 00:14:23.000
and that is always zero this we have proven and it implies that each column of x is orthogonal
00:14:24.000 --> 00:14:32.000
to u hat orthogonality and vector notation or matrix notation means that the inner product is
00:14:32.000 --> 00:14:39.000
zero so each column of x be multiplied by the u hat vector has a value of zero by construct
00:14:40.000 --> 00:14:47.000
and we are going to make use of this uh fact now in order to understand uh better why the difference
00:14:47.000 --> 00:14:53.000
differences estimator is called difference differences estimator so let's go back first
00:14:53.000 --> 00:15:01.000
to the regression which we had in the incinerator example uh for 1978 but now with the understanding
00:15:01.000 --> 00:15:07.000
that while i still use the notation of the incinerator example what i say here is actually
00:15:07.000 --> 00:15:14.000
completely general and would apply to any type of estimation and any kind of data set which has the
00:15:14.000 --> 00:15:19.000
same type of features as the incinerator example so just for convenience in order to avoid
00:15:19.000 --> 00:15:27.000
introducing a new notation i still talk about n 78 and gamma 178 and so forth um but since the
00:15:27.000 --> 00:15:34.000
the regressor matrix is um basically the same in all kind of settings it's just consists of zeros
00:15:34.000 --> 00:15:41.000
and ones obviously the context is much more general uh than uh for the specific example so
00:15:42.000 --> 00:15:49.000
in the incinerator example we had as a first regression so our first attempt to attack the
00:15:49.000 --> 00:15:56.000
problem a regression where we just regressed the 78 will be ignored in 1981 crisis so far
00:15:57.000 --> 00:16:04.000
just regressed the 78 prices on a constant and on a dummy variable for well how close uh the
00:16:04.000 --> 00:16:09.000
property is to the incinerator and then we estimated a regression here are already the
00:16:09.000 --> 00:16:17.000
estimated coefficients denoted by hats and here is the receiver now as i just pointed out we know
00:16:17.000 --> 00:16:25.000
each column in the regressor matrix is orthogonal to the reservoir so the u-tip is orthogonal to
00:16:25.000 --> 00:16:33.000
you had 78 and the same thing is true for the n 78 and what i want to show you is um that the
00:16:34.000 --> 00:16:44.000
parameter of interest this data one hat is actually um a um can be represented as the difference
00:16:44.000 --> 00:16:51.000
between average prices so i'm now starting to compute average prices and the first thing we
00:16:51.000 --> 00:16:56.000
have to do when we compute average prices is that we sum all the prices that we add them all up
00:16:56.000 --> 00:17:04.000
right so uta prime times p 78 is just the sum of all the 78 prices clearly
00:17:05.000 --> 00:17:10.000
and for p 78 i can of course use the right hand side of this expression here of expression
00:17:11.000 --> 00:17:17.000
three so i can write this the same thing as uta prime now the regressor matrix and here the
00:17:17.000 --> 00:17:25.000
matrix of estimated coefficients plus zero because uta prime times u hat 78 is zero due to
00:17:25.000 --> 00:17:31.000
this orthogonality property right so this is where this zero here comes from and then in the next
00:17:31.000 --> 00:17:39.000
line also this zero here i just use this orthogonality property that the residuals are orthogonal
00:17:39.000 --> 00:17:45.000
to the columns of the regressor matrix so we just need to compute this regression here
00:17:46.000 --> 00:17:54.000
now multiply the first column of x by uta prime so uta prime uta is obviously n the total number
00:17:54.000 --> 00:18:01.000
of observations we have in 1970 remember i made the somewhat uh peculiar but completely innocent
00:18:01.000 --> 00:18:10.000
and um yeah completely innocent uh assumption that we have the same number of houses being sold
00:18:10.000 --> 00:18:16.000
in 1978 and one and also the same number of houses being close to the incinerator
00:18:17.000 --> 00:18:24.000
in 1978 and the 1981 this facilitates notation here and nothing else and doesn't change any
00:18:24.000 --> 00:18:30.000
essence of the problem so we know this here is the number of houses being sold the total number
00:18:30.000 --> 00:18:39.000
of houses being sold namely uta prime uta and uta prime times second column here times n 78
00:18:39.000 --> 00:18:45.000
counts the number of houses which are close to the incinerator this number we had denoted as n
00:18:45.000 --> 00:18:54.000
index nr for near so they are the houses near uh the the future uh incinerator so um this product
00:18:54.000 --> 00:19:00.000
here is just total number of houses and then houses close to the incinerator multiplied by
00:19:00.000 --> 00:19:08.000
the vector of coefficients gives us of course n times gamma hat 0 78 plus n n r times gamma hat 1
00:19:08.000 --> 00:19:15.000
78 now we leave this equation as it is and do the same operation for the second column of the
00:19:15.000 --> 00:19:28.000
regressor matrix so now we multiply n 78 prime p 78 and replace the p 78 the price is in 1978
00:19:28.000 --> 00:19:34.000
again by the right hand side of equation three so we get n 78 prime times the regressor matrix
00:19:34.000 --> 00:19:40.000
times the vector of estimated coefficients plus the zero due to the orthogonality condition
00:19:41.000 --> 00:19:48.000
now what is this product here n 78 prime uta counts the number of houses which are
00:19:48.000 --> 00:19:55.000
near the incinerator right which are not farther away than uh three three miles from the incinerator
00:19:55.000 --> 00:20:06.000
side so this is just an n nr the number of houses near the incinerator and n 78 prime n 78
00:20:06.000 --> 00:20:14.000
is the same number so we can just write this as n near times the sum of the two regression
00:20:14.000 --> 00:20:23.000
coefficients gamma hat 0 and gamma hat 1 in 1978 so now let's just subtract equation nine
00:20:24.000 --> 00:20:29.000
from equation eight both on the left hand side and on the right hand side of the equation
00:20:29.000 --> 00:20:36.000
so when i subtract the n 78 prime p 78 from this expression here i get
00:20:37.000 --> 00:20:43.000
the view vector minus the n 78 vector prime times p 78 on the left hand side of the equation
00:20:45.000 --> 00:20:51.000
and on the right hand side of the equation we see that the term n near times gamma hat 78
00:20:51.000 --> 00:21:00.000
cancels right because i also have n near times gamma 178 here and therefore i just get n
00:21:00.000 --> 00:21:10.000
minus n near times gamma hat 0 78 well obviously the total number of houses minus the houses close
00:21:10.000 --> 00:21:16.000
to the infinite operator is the number of houses far from the incinerator so this is n far right
00:21:17.000 --> 00:21:21.000
which means that on the right hand side we have the number of houses far from the incinerator times
00:21:21.000 --> 00:21:32.000
gamma hat 0 78 and what do we have here well here we count all the prices of houses far from
00:21:32.000 --> 00:21:40.000
the incinerator because yota is a vector of once and n 78 is a vector of once and zero
00:21:40.000 --> 00:21:48.000
so this vector is one when the the property is close to the incinerator it's near the incinerator
00:21:48.000 --> 00:21:52.000
then this difference would be zero so nothing is being counted because the price has been
00:21:52.000 --> 00:21:59.000
multiplied by zero but if n 78 is zero then the house is far from the incinerator side
00:21:59.000 --> 00:22:04.000
so i would have a one here and i sum all the prices which are far from the incinerator
00:22:05.000 --> 00:22:10.000
so this here is the sum of all the houses sold in 1978 which are far from the incinerator
00:22:11.000 --> 00:22:17.000
if i divide this equation now by n far i get the average price of houses which are
00:22:17.000 --> 00:22:22.000
far from the incinerator and that's apparently measured by gamma hat 0 78
00:22:24.000 --> 00:22:32.000
so we conclude gamma hat 0 78 is the average of 1978 prices for all the houses which are far
00:22:32.000 --> 00:22:39.000
from the incinerator we can denote this expression by p bar 78 far right it's the
00:22:39.000 --> 00:22:50.000
average of the far houses in 1978 now what can we make out of equation nine which was the equation
00:22:50.000 --> 00:22:59.000
which we just subtracted from equation eight well nine implies that the sum of gamma 0 78 gamma
00:22:59.000 --> 00:23:07.000
hat 0 78 and gamma 1 at 78 is the average of 1978 houses house prices for houses near the
00:23:07.000 --> 00:23:16.000
incinerator price right because um well look at this equation again we have here the number of
00:23:16.000 --> 00:23:23.000
houses which are near the incinerator and here apparently we count or add up all the prices
00:23:23.000 --> 00:23:30.000
which are prices for houses near the incinerator because n 78 is one just for the houses which
00:23:30.000 --> 00:23:35.000
are close to the incinerator so that's the sum of all the prices which have been obtained for
00:23:35.000 --> 00:23:40.000
houses close to the incinerator if i divide this by the number of houses which have been sold
00:23:40.000 --> 00:23:47.000
then i see that the sum of these two coefficients here is just the average price for houses which
00:23:47.000 --> 00:23:54.000
are close to the incinerator side and i denote this then as p bar 78 near
00:23:56.000 --> 00:24:04.000
now when i subtract equation 11 from equation 12 then i get that gamma 1
00:24:05.000 --> 00:24:13.000
had 78 so this coefficient here is the same thing as the difference between two average prices
00:24:14.000 --> 00:24:24.000
right subtract equation 11 from equation 12 then we see that the gamma hat 0 78
00:24:24.000 --> 00:24:32.000
cancels right and i'm just left with gamma hat uh 178 so with this thing here which apparently
00:24:32.000 --> 00:24:39.000
then is the difference between the average price for houses near the incinerator and the price for
00:24:39.000 --> 00:24:45.000
average the average price for houses um which are far from the incinerator both in 1978
00:24:46.000 --> 00:24:55.000
so this is our interpretation of uh gamma hat 178 equation 11 gives us an interpretation of gamma 0
00:24:55.000 --> 00:25:06.000
78 78 and this gives us the interpretation for uh gamma 178 hat right um the difference between
00:25:06.000 --> 00:25:11.000
the two is of course the price change due to the fact that the house is close to the incinerator
00:25:11.000 --> 00:25:19.000
same site before the incinerator was actually known to be built now we get of course the same
00:25:19.000 --> 00:25:25.000
results if we do the same operations for the 1981 regression so for the regression where we just
00:25:25.000 --> 00:25:32.000
regress the 1981 prices on a constant and on a dummy variable for being close or not close
00:25:32.000 --> 00:25:39.000
to the incinerator side so then we get that gamma 1 hat 81 is the difference between the
00:25:39.000 --> 00:25:47.000
average price in 81 for houses near the incinerator minus the average price for houses far from the
00:25:47.000 --> 00:25:54.000
incinerator and now we know that the difference in difference is the estimator was the difference
00:25:54.000 --> 00:26:05.000
between gamma 181 hat and gamma 178 hat so delta one hat the parameter we are actually interested
00:26:05.000 --> 00:26:12.000
in is the difference between this parameter gamma 181 hat which we have computed here
00:26:13.000 --> 00:26:20.000
and the other parameter which we had just computed on the previous slide so it is the difference
00:26:20.000 --> 00:26:26.000
between the difference of average prices and that's where the name comes from right
00:26:27.000 --> 00:26:37.000
so it's p81 near average price minus p81 far average price minus and parentheses the difference
00:26:37.000 --> 00:26:47.000
between the same type of prices in 1978 this expression here applies to all estimators
00:26:47.000 --> 00:26:56.000
through all DID estimators whose regressor matrix is of the said structure so just with zeros and
00:26:56.000 --> 00:27:03.000
ones for those dummy variables constant and the dummy variables of course for time
00:27:03.000 --> 00:27:09.000
and for treatment namely the question whether it is close or not close to the incinerator
00:27:11.000 --> 00:27:16.000
if we have additional covariance the simple decomposition does not apply anymore right
00:27:16.000 --> 00:27:24.000
but as a intuition it is always useful to think of just the average prices and the difference
00:27:24.000 --> 00:27:30.000
between average prices i see there's a question coming in because somebody raised his hand
00:27:31.000 --> 00:27:34.000
so i will wait for a minute until the question is there
00:27:36.000 --> 00:27:44.000
that is we have to assume that house prices are not increasing exponentially no we don't have to
00:27:44.000 --> 00:27:51.000
assume that there's no no reason to do that we will later see that we have to have an assumption
00:27:51.000 --> 00:28:00.000
which implies a common trend in both groups of prices so it is very well possible that house
00:28:00.000 --> 00:28:06.000
prices do increase exponentially but it would be important to ensure that the trend which
00:28:06.000 --> 00:28:14.000
characterizes house prices in the near group is the same as the trend which characterizes house
00:28:14.000 --> 00:28:22.000
prices in the far group actually it need not even be a trend which is monotonic it could be that
00:28:22.000 --> 00:28:27.000
they have just the same underlying movement right or could be a movement which is non-monotonic
00:28:27.000 --> 00:28:33.000
and whatever happens as long as it is the same in both groups of prices we have no problem with
00:28:33.000 --> 00:28:40.000
the vid approach because the common component would cancel by taking the difference of prices
00:28:42.000 --> 00:28:44.000
all right good
00:28:46.000 --> 00:28:55.000
now this property makes the did framework very flexible and we can expand it for instance we
00:28:55.000 --> 00:29:00.000
can include further explanatory variables and we will do this in the incinerator example in just
00:29:00.000 --> 00:29:07.000
a minute of course everything i said already in the lecture about the issue of bad controls
00:29:07.000 --> 00:29:14.000
also applies for possible covariates which we may include in the did framework so we have to think
00:29:14.000 --> 00:29:18.000
a little bit about which variables we use as covariates but in principle there's no
00:29:19.000 --> 00:29:27.000
problem in including further explanatory variables and as i will also show you we can easily extend
00:29:27.000 --> 00:29:32.000
the framework to multiple time periods so far we have just compared two points in time but we can
00:29:32.000 --> 00:29:40.000
do this with many points in time now let's first move to the first issue how would we proceed when
00:29:40.000 --> 00:29:48.000
we include further explanatory variables and why actually should we do this well one reason is of
00:29:48.000 --> 00:29:56.000
course that when we have more explanatory variables then the error term shrinks so standard
00:29:56.000 --> 00:30:01.000
deviation and its variance shrinks and this means that we get smaller standard errors of the
00:30:01.000 --> 00:30:08.000
coefficients because recall from review of basic econometrics the covariance matrix of the
00:30:09.000 --> 00:30:17.000
estimated coefficients is sigma hat squared times x prime x inverse e and the sigma hat squared is
00:30:17.000 --> 00:30:22.000
the estimated variance of the residual so when the residuals becomes smaller obviously the sigma
00:30:22.000 --> 00:30:27.000
hat squared term becomes smaller and therefore the standard errors of the estimated resilience
00:30:27.000 --> 00:30:35.000
becomes smaller because the x prime x doesn't change and there's a second important reason to
00:30:35.000 --> 00:30:43.000
include covariance as you will see in a minute also in the example because interpretation is
00:30:43.000 --> 00:30:54.000
much easier since it may well be that price differences have several causes and we only
00:30:54.000 --> 00:31:00.000
are interested in the cause of the incinerator site so we should control for everything else
00:31:00.000 --> 00:31:08.000
which perhaps explains different price developments so we can distinguish the different causal effects
00:31:08.000 --> 00:31:14.000
much more easily or actually only then we can distinguish the different causal effects
00:31:14.000 --> 00:31:25.000
when we have further covariance. Now here's another recap and I will go through this rather
00:31:25.000 --> 00:31:31.000
quickly because basically I've said that already for the incinerator example you see here the
00:31:31.000 --> 00:31:37.000
regression equation and scalar formulation which I started the lecture with or in the last lecture
00:31:37.000 --> 00:31:45.000
with so nothing new and the interpretation is very simple the constant term the beta naught
00:31:45.000 --> 00:31:54.000
is the average house price in 1978 and par from the incinerator side right because when n i is
00:31:54.000 --> 00:32:01.000
zero then we are far so then this doesn't play a role and when the d is zero then we are in 1978
00:32:01.000 --> 00:32:09.000
and obviously then this product is also zero so for houses far from the incinerator and in 1978
00:32:09.000 --> 00:32:14.000
we just have the beta naught and therefore beta naught is the measure of the average house price
00:32:14.000 --> 00:32:21.000
in 1978 for houses far from the incinerator side and then clearly the delta naught captures the
00:32:21.000 --> 00:32:29.000
average change in all housing values from 1978 to 1981 right this just then adds to the beta naught
00:32:29.000 --> 00:32:41.000
here when the time when time progresses when we move from 1978 to 1981 same and beta 1 measures
00:32:41.000 --> 00:32:49.000
the location effect that is not yet due to the presence of the incinerator so if n i is equal
00:32:49.000 --> 00:32:56.000
to 1 then the beta 1 will come into play and that will also be the case in 1978 when nothing is
00:32:56.000 --> 00:33:02.000
yet known about the incinerator so this is the location effect that is not due to the presence
00:33:02.000 --> 00:33:06.000
of the incinerator that answers the question whether land close to the future incinerator
00:33:06.000 --> 00:33:13.000
side was already less valuable than 1978 than land elsewhere in north indoba and then of course delta
00:33:13.000 --> 00:33:19.000
1 measures to decline in housing values due to the new incinerator so that's the coefficient
00:33:19.000 --> 00:33:28.000
of the interaction effect as we have discussed obviously the key assumption for this interpretation
00:33:28.000 --> 00:33:34.000
is that the average house price in 1978 and in 1981 are not different for any other reason
00:33:35.000 --> 00:33:43.000
then first the passage of time and second the construction of the incinerator but very clearly
00:33:43.000 --> 00:33:52.000
this need not be the case and we can only reasonably compare the averages if the composition
00:33:52.000 --> 00:33:59.000
of the sample of houses in 1978 is the same as the composition of sample houses in 1981
00:34:00.000 --> 00:34:07.000
now we know this is basically a random sampling which we do here we have no control over which
00:34:07.000 --> 00:34:12.000
houses have been sold in 1978 and which houses have been sold in 1981 and it may well be that
00:34:12.000 --> 00:34:22.000
the houses in 1978 are well differently equipped than the houses in 1981 for instance so that the
00:34:22.000 --> 00:34:27.000
distribution of the characteristics of the houses is not the same and that would then of course
00:34:27.000 --> 00:34:37.000
jeopardize the validity of our of our comparison so for instance suppose that the houses sold in 1978
00:34:38.000 --> 00:34:44.000
were at the time of the selling of course on average older than the houses sold in 1981
00:34:45.000 --> 00:34:51.000
then it is clear that the average house price will also be different due to the difference
00:34:52.000 --> 00:35:00.000
in age between 1978 and 1981 houses so let's say houses sold in 1978 were on average 15 years old
00:35:00.000 --> 00:35:06.000
and houses sold in 1981 were on average just 10 years old both of them at the time of their
00:35:06.000 --> 00:35:11.000
selling right then clearly you would expect that the newer houses are more valuable and achieve a
00:35:11.000 --> 00:35:17.000
higher price so the price difference would not only be due to a generator but would also be due
00:35:17.000 --> 00:35:25.000
to age and that of course asks them for further covariates in the regression same thing would
00:35:25.000 --> 00:35:32.000
apply to the size of houses so if 1978 houses were smaller for instance than 1981 houses then again
00:35:32.000 --> 00:35:36.000
we would need to control for the size of the houses in order to have a valid improvement here
00:35:40.000 --> 00:35:49.000
then we should look for covariates which may explain such differences in prices which are not
00:35:49.000 --> 00:35:55.000
due to the incinerator and people who have worked with this data set have actually found a number
00:35:55.000 --> 00:36:02.000
of covariates which seem to be rather reasonable covariates for instance they have information on
00:36:02.000 --> 00:36:11.000
the age of each house that has been sold in years right or to the location with respect to
00:36:11.000 --> 00:36:17.000
other aspects than just the incinerator side in this case they measure the distance to the
00:36:17.000 --> 00:36:22.000
interstate i'm not sure whether people want to live close to it in to an interstate but perhaps
00:36:23.000 --> 00:36:29.000
someone would appreciate that so i don't really know what they expect with a different distance
00:36:29.000 --> 00:36:35.000
to the interstate whether they expect a positive or negative sign here because some people could
00:36:35.000 --> 00:36:40.000
find that interstate close to what's at home is actually in the distance and others would enjoy
00:36:40.000 --> 00:36:49.000
easy access to the interstate and fast travel to work and things like this but we can leave this
00:36:49.000 --> 00:36:57.000
open that is probably something which is relevant for prices when a house is being sold how far
00:36:57.000 --> 00:37:02.000
the interstate is then of course the question is how big is actually the property what is the land
00:37:02.000 --> 00:37:10.000
area and what is the house area how much space do you have in your house the number of rooms
00:37:10.000 --> 00:37:17.000
is also an example that we have information about that and same with the number of baths right so
00:37:19.000 --> 00:37:24.000
the regression has been re-run and i will now show you three different specifications of the
00:37:25.000 --> 00:37:30.000
regression where all of these covariates just enter linearly with the exception of h
00:37:31.000 --> 00:37:40.000
h also enters as a squared term to allow for some kind of okay form of depreciation on
00:37:40.000 --> 00:37:48.000
the value of process yeah let's just move to these three specifications here
00:37:48.000 --> 00:37:57.000
uh so um independent variable here is a little misleading this means of course the independent
00:37:57.000 --> 00:38:04.000
variable is always the column header here in column one two and three may the independent
00:38:04.000 --> 00:38:09.000
variable namely the house prices are being explained sorry no i'm not what am i talking
00:38:09.000 --> 00:38:14.000
about independent variable is correct here and these are the explanatory variables sorry i was
00:38:15.000 --> 00:38:19.000
uh confusing independent independent variable no of course not this is the in these are the
00:38:19.000 --> 00:38:27.000
independent variables and we have the constant here as usual we have uh the year 81 dummy variable
00:38:27.000 --> 00:38:33.000
which indicates where the house is being sold in uh one we have the dummy variable for near
00:38:33.000 --> 00:38:41.000
incinerator um here so this is the n 78 or n 81 variable and then we have here the cross product
00:38:42.000 --> 00:38:48.000
time dummy variable and uh the location dummy variable which gives rise to our
00:38:48.000 --> 00:38:55.000
uh coefficient of interest delta one hat and here's just one column note for all the other
00:38:55.000 --> 00:39:04.000
controls right and and we see how the regression results for the four coefficients which we have
00:39:04.000 --> 00:39:12.000
discussed change when we increase um other control when we add other controls so in the
00:39:12.000 --> 00:39:18.000
first column no covariates have been added in the second column just age and the squared age term
00:39:18.000 --> 00:39:26.000
have been added and here the full set of of covariates including land area house area
00:39:26.000 --> 00:39:33.000
number of deaths has been added and we can then just see how uh the regression results change
00:39:34.000 --> 00:39:39.000
first first thing to look at here is the r squared now we see in the first regression
00:39:39.000 --> 00:39:47.000
without covariates we just have 17 percent of observed variance being explained um here is 41
00:39:47.000 --> 00:39:53.000
percent so sizable more and here's actually 66 percent so already two thirds of all of the housing
00:39:53.000 --> 00:39:59.000
price variance can be explained when we consider this full set of covariates so this is a very
00:39:59.000 --> 00:40:06.000
impressive increase actually in terms of power square number of observations as you see is always
00:40:07.000 --> 00:40:15.000
the same now and let's look at the constant the constant term with the interpretation that I
00:40:15.000 --> 00:40:22.000
already mentioned as being the average price for 1978 houses which are far from the incinerator
00:40:23.000 --> 00:40:36.000
changes from 82,500 to 89,000 here and then down to 13,800 there that is something which is perhaps
00:40:36.000 --> 00:40:41.000
surprising at first sight and I will actually ask you now perhaps I should ask you right away
00:40:42.000 --> 00:40:51.000
why do you think that in the third equation this constant is so much less than the constant in
00:40:51.000 --> 00:40:59.000
equation two and in equation one and also note that the standard error here is much larger
00:41:00.000 --> 00:41:08.000
than it was in equation two and in equation one it's 2,700 2,400 it's 11,000 right so you have
00:41:08.000 --> 00:41:14.000
any idea think about it for a minute while I discuss the other uh results and you can raise
00:41:14.000 --> 00:41:21.000
your hand with that raise the question again all right so that's perhaps surprising at first sight
00:41:21.000 --> 00:41:30.000
even though there's a very natural explanation for this effect here is the year 81 dummy so this
00:41:30.000 --> 00:41:38.000
dummy basically tells us how house prices have increased between 1978 and 1981 for houses which
00:41:39.000 --> 00:41:48.000
were not affected by the incinerator so that was in our first regression 18,790 it's 21,000
00:41:48.000 --> 00:41:56.000
a little different but not gravely so here it is quite a bit less about 14,000 observe that the
00:41:56.000 --> 00:42:04.000
standard error goes down here from one column to the other so unlike what we saw for the constant
00:42:04.000 --> 00:42:10.000
we see that this coefficient is estimated more and more precisely like this standard
00:42:10.000 --> 00:42:17.000
error being just 2,800 whereas this year is 4,000 so quite a decrease in standard errors
00:42:18.000 --> 00:42:26.000
being near to the incinerator also has a decrease in standard errors from one to two and from three
00:42:26.000 --> 00:42:35.000
even though it is not as dramatic it has a dramatic effect on the size and the sign on the
00:42:35.000 --> 00:42:45.000
core of the coefficient when we include further covariance so here we had negative 18,000 or
00:42:45.000 --> 00:42:53.000
actually negative 19,000 this moves down to 9,000 when we control just for age and it goes down to
00:42:53.000 --> 00:43:04.000
3,800 when we control for the full set of covariance here and now yeah perhaps
00:43:07.000 --> 00:43:15.000
you can also think about what how do we interpret this result here the 3,800
00:43:15.000 --> 00:43:21.000
in relation to the initial result of negative 19,000 you have an explanation
00:43:22.000 --> 00:43:27.000
for that if so then I will come up with this question again in a minute
00:43:27.000 --> 00:43:33.000
and you can already think about it whether you have a good explanation for that and here's our
00:43:33.000 --> 00:43:38.000
delta right this is the coefficient we are actually interested in you recall we estimate
00:43:38.000 --> 00:43:48.000
this initially at 11,800 now note I had raised the question again actually at the beginning
00:43:48.000 --> 00:43:54.000
of this lecture whether this effect is significant or whether we know that it is significant we had
00:43:54.000 --> 00:44:02.000
not yet computed the standard error for the 11,800 here and now we see that the coefficient is
00:44:02.000 --> 00:44:09.000
actually not significant in the first regression because the standard error is 7,500 so clearly the
00:44:09.000 --> 00:44:17.000
coefficient is less than twice the standard error and from regression one we would not say anymore
00:44:17.000 --> 00:44:24.000
that the harm inflicted on house owners close to the incinerator is in the order of 12,000 dollar
00:44:24.000 --> 00:44:30.000
but we would say it's in the order of zero right because the coefficient is not significant if we
00:44:30.000 --> 00:44:38.000
just look at equation one and in equation two however when we control for age with this concave
00:44:38.000 --> 00:44:46.000
modeling here then we see that the effect is indeed significant and it is even much larger
00:44:46.000 --> 00:44:51.000
damage inflicted on house owners by the incinerator is now estimated to be even much larger
00:44:51.000 --> 00:45:00.000
than in the first regression here it's now almost 22,000 dollar but this again is not robust because
00:45:00.000 --> 00:45:08.000
when we include all the covariates here then the effect is estimated at some 14,000 dollars of
00:45:08.000 --> 00:45:14.000
damage so between the initial point estimate here and the second point estimate here we are now at
00:45:14.000 --> 00:45:21.000
negative 14,000 the standard error is again much smaller than in the former estimate so this is the
00:45:21.000 --> 00:45:28.000
precise estimate that we can get and this is clearly significant because the t-statistic is
00:45:28.000 --> 00:45:35.000
almost three as you see right so we would now conclude that house owners near the incinerator
00:45:35.000 --> 00:45:42.000
have had damage of the construction of the incinerator on their personal wealth in the order
00:45:42.000 --> 00:45:52.000
of 14,000 dollars so that would be the final conclusion now and let me perhaps come to the
00:45:52.000 --> 00:46:01.000
first question why is the constant in regression three so much smaller than the estimated constants
00:46:01.000 --> 00:46:10.000
in regressions one and two i see already three comments here let me see whether one of the
00:46:12.000 --> 00:46:15.000
comments answers already this question
00:46:17.000 --> 00:46:23.000
there's one question i will come back to later that's not an answer to my question
00:46:24.000 --> 00:46:31.000
houses near the incinerator in 1978 are more valuable but are for example an average smaller
00:46:31.000 --> 00:46:35.000
or older this could be the case because the incinerator is built in the central region
00:46:35.000 --> 00:46:44.000
for example yeah this is an answer to the question why we have this price change here near incinerator
00:46:46.000 --> 00:46:54.000
example at the near incinerator regressor there the explanation provided by one of you is and that
00:46:54.000 --> 00:47:03.000
is corrected by controlling for age apparently we see that this price difference here of negative
00:47:03.000 --> 00:47:12.000
19 000 was overwhelmingly due to the fact that houses close to the incinerator are on average
00:47:12.000 --> 00:47:18.000
older than houses farther away from the incinerator and this is the right explanation
00:47:19.000 --> 00:47:25.000
here assuming that old houses sell for lower prices than new houses obviously it can in some instances
00:47:25.000 --> 00:47:31.000
also be different but i think that's the natural assumption to make that there are no let's say
00:47:31.000 --> 00:47:38.000
important historic monuments which would actually sell higher with higher age and but that this
00:47:38.000 --> 00:47:43.000
basically the age effect is in the effect of depreciation and therefore not controlling
00:47:43.000 --> 00:47:54.000
for age we have completely wrongly estimated the um in the closeness of the incinerator to be the
00:47:54.000 --> 00:48:01.000
root of lower house prices in fact now it appears that property close to the incinerator
00:48:02.000 --> 00:48:10.000
is or was more valuable in 1978 um when we control for the age of of the houses
00:48:10.000 --> 00:48:17.000
than it was in 19 that then forced for houses farther from the incinerator and as the student
00:48:17.000 --> 00:48:21.000
correctly pointed out this could be the case because the incinerator is in some let's say
00:48:21.000 --> 00:48:29.000
older or central region of town where property is in general actually more worth than in
00:48:30.000 --> 00:48:37.000
more peripheral areas of north and okay that is not yet a question though so this was the the
00:48:37.000 --> 00:48:41.000
answer to my second question it is not yet the question to the answer to our first question
00:48:41.000 --> 00:48:49.000
about the reduced coefficient for the constant um and the answer here is i think it's all right
00:48:49.000 --> 00:48:53.000
it is not significantly different from zero because everything is controlled for the constant
00:48:53.000 --> 00:48:59.000
three gives the price for a house when all controls are zero and yes and that is completely
00:48:59.000 --> 00:49:09.000
correct right since we have here the full set of regressors um counting let's say land area
00:49:09.000 --> 00:49:17.000
and house area and age of course and and the number of baths and the number of rooms
00:49:18.000 --> 00:49:26.000
this coefficient here would actually measure the price of a house which has no land area
00:49:26.000 --> 00:49:33.000
which has no the area to live in which has no rooms at all which has no baths at all right
00:49:33.000 --> 00:49:38.000
and for such a house i think it would be reasonable to be to say that the price should be zero
00:49:38.000 --> 00:49:43.000
and that is exactly what we find here right because uh this coefficient is not even significant
00:49:43.000 --> 00:49:50.000
we have some $14,000 here but the standard error is $11,000 so the coefficient is insignificant
00:49:50.000 --> 00:49:56.000
actually we should think of this coefficient here as being uh zero and this the appropriate
00:49:56.000 --> 00:50:03.000
price for a house which has no rooms no baths no van to live in and so forth okay
00:50:04.000 --> 00:50:08.000
what is interesting is actually the fact that the standard error moves up so strongly
00:50:08.000 --> 00:50:15.000
so we have two point 2,700 here 2,400 here and then it moves up to 11,200
00:50:16.000 --> 00:50:24.000
which is only rarely the case when you get precisor estimates right in general estimates
00:50:24.000 --> 00:50:29.000
have become precise as we see when we look at these standard errors here so the residual
00:50:29.000 --> 00:50:36.000
variance is actually much lower not surprising because the r squared is much higher and the fact
00:50:36.000 --> 00:50:43.000
that here the standard error moves um up so greatly suggests actually that we also have a
00:50:43.000 --> 00:50:51.000
problem of co-linearity here so perhaps in this full set of regressors we have one regressors
00:50:51.000 --> 00:50:58.000
regressor which is almost constant not exactly of course because otherwise the regression wouldn't
00:50:58.000 --> 00:51:05.000
run but perhaps almost constant so that we operate with almost two constants here and then we have
00:51:05.000 --> 00:51:11.000
of course the problem of co-linearity and standard errors may explode as i've also explained to you
00:51:11.000 --> 00:51:19.000
in the review of basic ecomometrics but i haven't checked this whether we can identify this type of
00:51:19.000 --> 00:51:29.000
control good and on the next slide i just asked you to interpret these results and i've done that
00:51:29.000 --> 00:51:38.000
already for you and i've also provided to you the answers um one particular answer i think is
00:51:38.000 --> 00:51:44.000
important i mentioned it already but i repeatedly again for convenience the land close to the site
00:51:44.000 --> 00:51:52.000
was not significantly more valuable in like 17.8 so it is exactly the opposite of what the
00:51:52.000 --> 00:51:58.000
result in column one suggested and i had not properly phrased this in the previous version of
00:51:58.000 --> 00:52:06.000
the slides are correct that because we can also not conclude that it's more valuable with the
00:52:06.000 --> 00:52:12.000
estimated coefficient is insignificant so the only thing we can say is that the land like close to
00:52:12.000 --> 00:52:22.000
the site was not significantly more valuable in like in 17.8 okay and here was another misprint
00:52:22.000 --> 00:52:29.000
by the way i think i have 11,200 units of cost nonsense the correct value is 14,200 so our best
00:52:29.000 --> 00:52:35.000
estimate is now that house owners close to the site lost about fourteen thousand dollars in value
00:52:35.000 --> 00:52:45.000
due to the incinerator okay any further questions relating to the incinerator example of the set of
00:52:45.000 --> 00:52:49.000
regression results please raise your hand if you have any questions
00:52:54.000 --> 00:53:01.000
apparently no question good and so um what we have here is what we sometimes call a natural
00:53:01.000 --> 00:53:09.000
experiment it's not an experiment designed by somebody which is what we call it an experiment
00:53:09.000 --> 00:53:16.000
by nature or a quasi experiment it looks like an experiment but it isn't a carefully designed
00:53:16.000 --> 00:53:24.000
experiment but it's just something which happened all by itself and we can interpret it as if it
00:53:24.000 --> 00:53:29.000
were an experiment where we analyze the effect of some intervention in this case the intervention
00:53:29.000 --> 00:53:37.000
of constructing an incinerator and see what kind of effect this has on house prices such natural
00:53:37.000 --> 00:53:44.000
experiments typically occur when we have some exogenous event which changes some circumstance
00:53:44.000 --> 00:53:55.000
changes the environment sense that circumstantial aspects of the data change and then we can observe
00:53:55.000 --> 00:54:03.000
certain things on individuals families firms or whatever regional units and observe whether they
00:54:03.000 --> 00:54:12.000
change their behavior so we can try to identify the causal effect of the exogenous event
00:54:14.000 --> 00:54:20.000
exogeneity of course refers to the question of whether the individuals whose behavior we observe
00:54:20.000 --> 00:54:27.000
have any influence on what is happening exogenously well if it is happening exogenously
00:54:27.000 --> 00:54:34.000
exogenously then they must not have any influence on that so we would suppose that the house owners
00:54:34.000 --> 00:54:42.000
in north endova had no say in the decision to locate the incinerator on the site where it was
00:54:42.000 --> 00:54:50.000
constructed which is well not so clear when they had no say right i mean maybe that this was a
00:54:50.000 --> 00:54:57.000
democratic decision so each and everybody would have had some say that and then would not be
00:54:57.000 --> 00:55:03.000
completely thoughtless yet but let's not discuss this further because we have no information on
00:55:03.000 --> 00:55:09.000
what kind of democratic possibilities there were to decide for an incinerator site and perhaps some
00:55:09.000 --> 00:55:17.000
other location location important for a natural experiment is of course the fact that we have a
00:55:17.000 --> 00:55:22.000
control group which is not affected by the policy change and the treatment group which is affected
00:55:22.000 --> 00:55:27.000
by the policy change and in the incinerator example these two types of groups were provided to us
00:55:27.000 --> 00:55:34.000
simply by the passage of time because before the incinerator was known about we had the control
00:55:34.000 --> 00:55:40.000
group and later we had sort of the treatment group when and if we have this type of setup in
00:55:40.000 --> 00:55:46.000
the natural experiment then we can use the id estimates in order to assess the estimate
00:55:46.000 --> 00:55:57.000
the effects of the exogenous event and there is a question which i don't completely understand
00:55:58.000 --> 00:56:06.000
but this isn't a little variable bias what do you mean by this does it change in the interpretation
00:56:06.000 --> 00:56:11.000
of the coefficient isn't that unrealistic i'm sorry i lost the context here of your
00:56:13.000 --> 00:56:15.000
question perhaps you can provide this again
00:56:17.000 --> 00:56:22.000
i was referring to your first answer which was the answer
00:56:24.000 --> 00:56:33.000
that uh that the that the house is very average uh older and you are asking is this
00:56:33.000 --> 00:56:41.000
uh um uh omitted variables bias well in fact i would say it is an omitted variables bias
00:56:42.000 --> 00:56:50.000
i mean we have omitted in this case the variable age right and this means um that age differences
00:56:50.000 --> 00:57:02.000
are either projected into uh the near income or in the error term and then sort of the near
00:57:02.000 --> 00:57:09.000
incinerator and into the the error term and then the near incinerator um variable
00:57:12.000 --> 00:57:17.000
the the near incinerator uh a variable then correlates with the error term so yes i would
00:57:17.000 --> 00:57:24.000
say it's an omitted variables bias which is responsible for this mis estimation of
00:57:26.000 --> 00:57:31.000
the influence of the location of the incinerator on the house processing
00:57:32.000 --> 00:57:41.000
now apparently i uh this was the wrong question and she says
00:57:45.000 --> 00:57:53.000
because before it captures a lot uh but is it constant wait where was another quest answer of
00:57:53.000 --> 00:58:00.000
years now because it captures a lot of the effects of the important characteristics of
00:58:01.000 --> 00:58:02.000
the house
00:58:08.000 --> 00:58:14.000
sorry now now i miss again the context of your reply here to which question uh do you
00:58:14.000 --> 00:58:22.000
fill with it because before it captures a lot of the effects of the important oh i think this
00:58:22.000 --> 00:58:31.000
probably referred to the constant here right to me again yeah to the change of the constant exactly
00:58:36.000 --> 00:58:46.000
and the constant is this an omitted variables bias no no here you are right this is not an
00:58:46.000 --> 00:58:53.000
omitted variables bias because the constant doesn't correlate with uh the with the regressor
00:58:53.000 --> 00:58:58.000
with it with the error term by definition it cannot correlate with the regressor rather the
00:58:58.000 --> 00:59:04.000
constant was here defined to be the average price across all houses regardless of what other
00:59:04.000 --> 00:59:11.000
characteristics they have so um the interpretation of the constant is different and i think this was
00:59:11.000 --> 00:59:15.000
what you wrote further on below the interpretation of the constant here is different from the
00:59:15.000 --> 00:59:20.000
interpretation of the constant there and that is fine this is legitimate and that's correct
00:59:20.000 --> 00:59:27.000
to do it that way but this constant here is actually estimated in the correct way and is not
00:59:27.000 --> 00:59:33.000
biased if it is given the interpretation that is the average price of all houses regardless of what
00:59:33.000 --> 00:59:45.000
the characteristics are yes you're right there yeah okay so you agree good um this is uh that
00:59:46.000 --> 00:59:53.000
um yeah and uh now let's uh move on uh two other examples um apart from this
00:59:54.000 --> 01:00:01.000
incinerator uh example uh so as i already pointed out we always need treatment group and a control
01:00:01.000 --> 01:00:08.000
group but this need not be a difference in time which defines the treatment group of the control
01:00:08.000 --> 01:00:16.000
group such as we have added in the uh incinerator example but it can also be let's say a different
01:00:16.000 --> 01:00:22.000
in difference in region or let's say there are two different states right in one states and acts
01:00:22.000 --> 01:00:28.000
a certain policy measure the other state doesn't uh now we clearly have different treatment in for
01:00:28.000 --> 01:00:36.000
different groups of units and we can compare uh the effect uh on the two different groups by means of
01:00:37.000 --> 01:00:46.000
bi so it need not necessarily be a difference in time one example here is suppose that the
01:00:46.000 --> 01:00:52.000
government cuts the level of unemployment benefits only for some group a which then is of course
01:00:52.000 --> 01:00:58.000
treatment group and uh suppose that group a normally has longer unemployment duration
01:00:58.000 --> 01:01:06.000
durations than group b whose benefits have not been cut which then is of course uh the control
01:01:06.000 --> 01:01:13.000
group then we can look at the unemployment duration and if the difference in unemployment
01:01:13.000 --> 01:01:20.000
durations between group a and group b becomes smaller after the reform then obviously reducing
01:01:20.000 --> 01:01:25.000
unemployment benefits seems to reduce unemployment duration for those affected
01:01:27.000 --> 01:01:34.000
the key assumption here again is that changes in the outcome so let's say here in
01:01:35.000 --> 01:01:43.000
unemployment durations would be the same in both groups um if there were no reform
01:01:43.000 --> 01:01:51.000
that is um often phrased as the common trends uh assumption so only if any other type of change
01:01:51.000 --> 01:01:58.000
which occurs in unemployment duration for completely different uh reasons then a policy
01:01:58.000 --> 01:02:06.000
measure which has been affected should then uh prolong or change unemployment duration in the
01:02:06.000 --> 01:02:11.000
same way in both the treatment group and in the control group and this is the common trends
01:02:11.000 --> 01:02:16.000
assumption obviously if this assumption doesn't hold then difference differences doesn't make
01:02:16.000 --> 01:02:23.000
any sense and unfortunately since we do not observe the treatment group without
01:02:23.000 --> 01:02:34.000
the treatment we cannot test this counterfactual situation directly um actually i think i don't i
01:02:34.000 --> 01:02:40.000
have somewhere i have a formal definition oops where is it here's the the common trends
01:02:40.000 --> 01:02:45.000
assumption i think i will show to you now since i mentioned it uh already so sorry for
01:02:46.000 --> 01:02:52.000
jumping forward with uh the slides but it fits well actually where i will just uh was and the
01:02:52.000 --> 01:02:57.000
common trends assumption says that in the absence of treatment the treatment group would have followed
01:02:57.000 --> 01:03:03.000
the same trend as the control group and formally you would write it in such a way you have treatment
01:03:03.000 --> 01:03:10.000
group a and treatment group b and let's say unemployment benefits uh were changed for group
01:03:10.000 --> 01:03:16.000
a as i just uh explained right but here we look at the potential outcome of the group a
01:03:18.000 --> 01:03:25.000
people uh in the case of non-treatment so this is what the zero stands for and then the question
01:03:25.000 --> 01:03:33.000
would be whether the expectation of the difference of outcomes in the case of non-treatment
01:03:33.000 --> 01:03:40.000
for those which are treated is the same as the expectation in the difference of outcomes
01:03:41.000 --> 01:03:48.000
in the case of non-treatment for those which are not treated so v equal to zero here and v equal
01:03:48.000 --> 01:03:58.000
to one here as the conditioning variables and yeah so that would be the formal way to write
01:03:58.000 --> 01:04:02.000
the common trends assumption and without this common trends assumption
01:04:02.000 --> 01:04:12.000
the DID framework would just not make any sense now and here in this table i summarize once again
01:04:12.000 --> 01:04:17.000
the working of the DID estimator in this simple case where we do not have any
01:04:17.000 --> 01:04:26.000
covariates so basically we have some dependent variable y regress it on a constant regress it on
01:04:26.000 --> 01:04:34.000
a time variable which basically tells me whether it is before the treatment which has been enacted
01:04:34.000 --> 01:04:44.000
or after and then we regress on a treatment dummy beta one times the treatment dummy and then
01:04:44.000 --> 01:04:49.000
regress on the interaction dummy where the time dummy and the treatment dummy are being multiplied
01:04:49.000 --> 01:04:56.000
by each other and we estimate the coefficient of this product then there may be other factors
01:04:56.000 --> 01:05:04.000
which we don't discuss yet and the table shows you the interpretation of the coefficients here
01:05:04.000 --> 01:05:11.000
for instance in the control room which is not treated at all before the treatment actually
01:05:11.000 --> 01:05:18.000
was applied clearly the outcome would be just the beta naught just the constant so this is
01:05:18.000 --> 01:05:27.000
similar to 1978 prices far from the incinerator price and after the treatment has been enacted
01:05:27.000 --> 01:05:32.000
but not applied of course to the people in the control room so after passage of time essentially
01:05:32.000 --> 01:05:39.000
we get this time effect right so before it is beta is not and here it is beta naught plus
01:05:39.000 --> 01:05:45.000
delta naught because the after variable would take on value one the difference between after
01:05:45.000 --> 01:05:52.000
and before is of course delta naught for the treatment group we would have beta naught plus
01:05:52.000 --> 01:05:58.000
beta one so plus the coefficient for those pieces of real estate which are close to the
01:05:58.000 --> 01:06:04.000
incinerator side before we know that the incinerator will be built there so that is the before situation
01:06:04.000 --> 01:06:10.000
before this treatment actually occurs and then the after situation is of course the beta
01:06:12.000 --> 01:06:19.000
naught plus the passage of time plus the beta one people are close to the incinerator side
01:06:20.000 --> 01:06:24.000
plus they are still close to the incinerator side after they know that the incinerator has been
01:06:24.000 --> 01:06:30.000
built so plus the delta one effect so the difference between after and before is delta naught
01:06:30.000 --> 01:06:39.000
plus delta one and this thing plus this thing here okay and if you now take the difference
01:06:39.000 --> 01:06:44.000
between these two differences here then you see delta naught plus delta one minus delta naught
01:06:44.000 --> 01:06:50.000
it's just delta one and this is our key parameter of interest and you get the same key parameter of
01:06:50.000 --> 01:06:54.000
interest when you take the treatment the the difference between treatment and control first
01:06:55.000 --> 01:07:03.000
right so when you take the difference between the treatment outcome here and the control outcome
01:07:03.000 --> 01:07:10.000
here then the difference is obviously beta one in the before time period when you do the same
01:07:10.000 --> 01:07:15.000
thing in the after time period and you take the difference between this row here this entry here
01:07:15.000 --> 01:07:22.000
and this entry here then the difference is beta one plus delta one because the beta naught now
01:07:22.000 --> 01:07:28.000
of course cancels as does the beta zero right and then when you take the difference between
01:07:28.000 --> 01:07:33.000
after and before for the difference between treatment and control then again the difference
01:07:33.000 --> 01:07:38.000
is delta one so there are basically two ways to compute the difference of the differences
01:07:38.000 --> 01:07:41.000
it doesn't play a role which difference you compute first and which difference you compute
01:07:41.000 --> 01:07:47.000
but second obviously you can change that and you will always arrive at delta one this is essentially
01:07:48.000 --> 01:07:55.000
basis now here comes another example an application of the different differences
01:07:55.000 --> 01:08:03.000
estimation and strategy which now relates to labor markets again and specifically to
01:08:03.000 --> 01:08:11.000
worker compensation laws the issue here is that workers which do physical
01:08:12.000 --> 01:08:20.000
work on on that job are sometimes injured and if they are injured during work so due to the
01:08:20.000 --> 01:08:26.000
work which they have carried out then they are entitled to receive benefits right so they are
01:08:26.000 --> 01:08:34.000
then unable to work for some time and during this time they receive some benefits now in 1980
01:08:34.000 --> 01:08:44.000
Kentucky raised the maximum benefit from 131 dollars per week to 217 dollars per week so this
01:08:44.000 --> 01:08:52.000
was a sizable increase obviously of the maximum benefits which workers would enjoy when they have
01:08:52.000 --> 01:09:00.000
been injured while working and are unable to work however there was always a cap on this because no
01:09:00.000 --> 01:09:08.000
worker may receive more than 66 percent of his regular pay per week well why is this cap important
01:09:08.000 --> 01:09:16.000
obviously we may get an incentive problem if let's say workers would receive 100 percent of their
01:09:16.000 --> 01:09:23.000
regular pay per week one could be afraid that workers inflict minor injuries to themselves
01:09:23.000 --> 01:09:29.000
in order to enjoy some days off at full pay right so this is probably the reason why
01:09:29.000 --> 01:09:35.000
Kentucky had some cap here and this cap was at 66 percent of the regular pay per week and the cap
01:09:35.000 --> 01:09:42.000
of 66 percent has not been changed when they increase the maximum benefit from 131 dollars
01:09:42.000 --> 01:09:52.000
to 217 dollars per week and now obviously this first seems like a well generous and
01:09:53.000 --> 01:10:02.000
a provable measure which receives probably quite a bit of public support that we give more support
01:10:02.000 --> 01:10:09.000
to workers who have been injured who have had the bad luck of being injured during
01:10:10.000 --> 01:10:17.000
work they carry out for some companies so this is some type of social support which has been
01:10:17.000 --> 01:10:24.000
increased here that we think that most people would be favorable of such a change in legislation
01:10:27.000 --> 01:10:34.000
the specific way how legislation was changed here however allows us to test some interesting
01:10:34.000 --> 01:10:44.000
hypotheses which may show between cast a different light on the desirability of this policy here
01:10:45.000 --> 01:10:51.000
and why can we carry out this analysis and why can we consider this natural
01:10:52.000 --> 01:10:58.000
natural experiment basically for the reason that the
01:11:00.000 --> 01:11:07.000
increase in the benefit was beneficial only for those workers which had higher pay
01:11:08.000 --> 01:11:18.000
than other workers right so the high income workers benefited from higher benefits but the
01:11:18.000 --> 01:11:27.000
low income workers were possibly already at the cap of 66 percent when the maximum benefit was
01:11:27.000 --> 01:11:36.000
still 131 dollars and were therefore not affected at all by the increase in benefits
01:11:38.000 --> 01:11:44.000
now the legislation was also such that injured workers are entitled to receive benefits as long
01:11:44.000 --> 01:11:51.000
as they need to return to work so they could decide on their own when they thought or basically
01:11:51.000 --> 01:11:57.000
on their own probably when when they thought that the injury had healed and that they were able to
01:11:57.000 --> 01:12:03.000
resume their work so there was no cap on the duration of weeks out of work
01:12:03.000 --> 01:12:11.000
there are three economists maya vis kuzi and german who published a natural experiment analysis
01:12:11.000 --> 01:12:20.000
of this type of data in the american economic review in 1995 and they were highly critical
01:12:20.000 --> 01:12:25.000
of what the state of kentucky had done there even though they truly enjoyed the natural experiment
01:12:25.000 --> 01:12:32.000
design at the opportunity to publish that in the american review and so they were
01:12:32.000 --> 01:12:40.000
and so they pointed out that higher benefits may decrease workers incentives to avoid
01:12:41.000 --> 01:12:47.000
injuries and so perhaps workers are less careful because they think well i would receive
01:12:48.000 --> 01:12:57.000
quite a bit of support when i'm out of work it may also increase the incentives to file for
01:12:57.000 --> 01:13:04.000
compensation for any given job injury so some workers would otherwise possibly decide not to
01:13:04.000 --> 01:13:12.000
actually have this this injury registered but just reappear to work every day and work despite
01:13:12.000 --> 01:13:21.000
of the injury because they would not want to lose their income the higher benefits may also foster
01:13:21.000 --> 01:13:27.000
more claims for non-work injuries so the problem of cheating is of course there workers may have
01:13:27.000 --> 01:13:32.000
had some injury somewhere in their leisure time and then go to work and claim that this happened
01:13:32.000 --> 01:13:38.000
during birth because they would like to take advantage of the higher benefits which are being
01:13:38.000 --> 01:13:45.000
paid out and of course higher benefits may make extending the duration of a claim more attractive
01:13:45.000 --> 01:13:54.000
so they may stay out of work for longer because the level of support is higher so all of these four
01:13:54.000 --> 01:14:01.000
elements might lead to an increase in the average duration of injured workers being out of work
01:14:02.000 --> 01:14:10.000
so the question here is did the change in kentucky legislation actually lead to longer weeks out of
01:14:10.000 --> 01:14:16.000
work due to injuries and maya is kusin durban suggested that high-income workers are the
01:14:16.000 --> 01:14:21.000
treatment group because they benefited from the increase in benefits and low-income workers
01:14:23.000 --> 01:14:29.000
are the control group specifically look at this diagram here which explains the design of
01:14:29.000 --> 01:14:36.000
treatment and control group rather well same we have here on the y-axis the weekly benefit
01:14:36.000 --> 01:14:49.000
amount and initially this weekly benefit amount was kept here at some level so if you had an
01:14:49.000 --> 01:14:59.000
earning of less than e1 for instance you would at most get this type of benefit here wba mid
01:14:59.000 --> 01:15:09.000
and if you have a high earning between e2 and e3 no sorry a high earning beyond e3 let's first
01:15:09.000 --> 01:15:18.000
look at beyond e3 so really high earnings so sorry i think i made a mistake when i showed you
01:15:18.000 --> 01:15:24.000
this thing here this was the minimum level and i was actually wanted to stress the maximum level
01:15:24.000 --> 01:15:32.000
prior to reform which is this level here so the low earnings group is actually this earnings
01:15:32.000 --> 01:15:43.000
group here which would have this type of support in case they are injured right and the group which
01:15:43.000 --> 01:15:51.000
has higher earnings so earnings higher than income level e3 they would have at most this
01:15:51.000 --> 01:15:57.000
type of benefit here because this is the cap of 66 percent of their of their earnings right
01:15:59.000 --> 01:16:09.000
so after the reform for the high earnings group this level of support increased to maximum
01:16:09.000 --> 01:16:17.000
the 270 dollars i think it was so the high earnings group really has a substantial
01:16:17.000 --> 01:16:26.000
increase in benefits due to the reform whereas the low earnings group still has just a constant
01:16:26.000 --> 01:16:39.000
percentage of their income being given handed out to them in case of injury and so nothing changed
01:16:39.000 --> 01:16:45.000
here for the low earnings group but quite a bit changed for the high earnings group and this in
01:16:45.000 --> 01:16:52.000
between earnings group between e2 and e3 we just ignore my viscosity just ignore that
01:16:53.000 --> 01:16:55.000
that's a question apparently
01:16:57.000 --> 01:17:02.000
um somebody asked what was the duration of the study i must say i don't know
01:17:03.000 --> 01:17:09.000
for how long the workers were observed wouldn't it make sense to have a long time study because
01:17:09.000 --> 01:17:14.000
there might be a positive effect because if you like your injuries here you could avoid
01:17:14.000 --> 01:17:20.000
injuries that are caused by not letting your prior injury fully here yes the argument i think
01:17:20.000 --> 01:17:29.000
is in principle correct and i know that this has not been taken into account so this is a criticism
01:17:29.000 --> 01:17:37.000
which you may raise if you are a referee to this paper this would require that one looks at the
01:17:37.000 --> 01:17:45.000
cumulative amounts of weeks out of work over a specific time period and this is why you asked
01:17:45.000 --> 01:17:50.000
for the duration of the study so you could for instance say you look at a long time period of
01:17:50.000 --> 01:17:56.000
five years and then count the numbers of weeks out of work and and use this then as the outcome
01:17:56.000 --> 01:18:03.000
variable this has not been done by the workers they were looking at just a specific number of
01:18:03.000 --> 01:18:09.000
weeks out of work for a specific industry over some time period which i forget i'm sorry i just
01:18:09.000 --> 01:18:15.000
don't know but you can can look it up in the paper by Durban Biscussi and Watson
01:18:18.000 --> 01:18:26.000
and somebody else says this is like a regression discontinuity design
01:18:27.000 --> 01:18:34.000
not quite because it is not really discontinuity as you see here right the benefit is always
01:18:34.000 --> 01:18:44.000
continuous so we do not have a jump in there unless you really cut out this E2 to E3 income
01:18:44.000 --> 01:18:51.000
group here but then you would have a jump only in the also in the independent variable which is not
01:18:51.000 --> 01:18:59.000
the case in the regression discontinuity design so um no i would say it's it's not rd it's really
01:18:59.000 --> 01:19:04.000
a case for difference in differences but good comment nevertheless
01:19:08.000 --> 01:19:14.000
okay what am i am is kuzi and uh german sorry i said what's german of course um provide uh they
01:19:14.000 --> 01:19:20.000
they provide information on the mean of the log duration in the kentucky uh data set so for the
01:19:20.000 --> 01:19:28.000
high earnings group we know or they publish in their paper that before the change in the law
01:19:28.000 --> 01:19:39.000
the average duration was 1.38 um that's the the log of weeks they focus state uh of work
01:19:40.000 --> 01:19:45.000
and after the increase for the high earnings workers so for those workers which benefited
01:19:45.000 --> 01:19:53.000
from the legislation the duration out of work was higher it was 1.58 which in logs is quite a bit
01:19:53.000 --> 01:20:03.000
right this is like a 20 increase in the duration of weeks out of work right if we look at the low
01:20:03.000 --> 01:20:11.000
earnings group then we saw see before the increase was 1.13 and after the increase it was 1.13
01:20:11.000 --> 01:20:17.000
so no change at all uh there therefore we don't even need to run a regression and my
01:20:17.000 --> 01:20:22.000
schools in german don't do this or don't provide uh the results on that they just compute the
01:20:22.000 --> 01:20:29.000
differences in differences right uh like the way i told you to do it with the incinerator example
01:20:29.000 --> 01:20:34.000
where you compute the differences of prices from houses and if the difference of the differences
01:20:34.000 --> 01:20:40.000
in houses as the different differences estimator so the first difference we may look at is the
01:20:40.000 --> 01:20:46.000
difference after increase before increase for the high earnings this is column two minus column one
01:20:46.000 --> 01:20:56.000
right two minus one we see this difference is 0.20 here right um same thing for the low earnings uh
01:20:56.000 --> 01:21:01.000
column four minus column three the difference is virtually zero uh insignificantly different from
01:21:01.000 --> 01:21:08.000
zero by the way you see here and then we can compute the difference between these differences
01:21:08.000 --> 01:21:18.000
so 0.20 minus 0.01 is equal to 0.19 and actually there's a standard error affiliated here with it
01:21:18.000 --> 01:21:25.000
which they probably have computed by uh the id uh and this shows us that this effect here is
01:21:25.000 --> 01:21:34.000
significant now this is what you find in the published uh paper and you can also download
01:21:34.000 --> 01:21:42.000
the uh the data set from the the Woolridge web page and then run the DID regression on the same
01:21:42.000 --> 01:21:50.000
data so you would take the log of the durations out of week out of work and regress that on a constant
01:21:51.000 --> 01:21:58.000
on the dummy variable time or so in this case denoted by after after the reform right on a dummy
01:21:58.000 --> 01:22:05.000
variable for treatment so for the high earnings because only the high earnings people were treated
01:22:06.000 --> 01:22:11.000
and then on a dummy variable which interacts the after variable with the high earnings
01:22:11.000 --> 01:22:16.000
variable and if you do this um then you will find that the regression coefficient for the
01:22:16.000 --> 01:22:26.000
actual term is the 0.191 which corresponds exactly to this term there are two things which are
01:22:26.000 --> 01:22:33.000
noteworthy for that one is which i ask you actually to verify in an exercise that the
01:22:33.000 --> 01:22:41.000
coefficients which we have this regression here are exactly compatible with the average
01:22:42.000 --> 01:22:46.000
values which are given in this table and i would like you to verify that please
01:22:46.000 --> 01:22:51.000
at home that these are exactly the same coefficients there and the second thing to
01:22:51.000 --> 01:22:57.000
note is that we can apparently have valid inference on the effect of this kentucky
01:22:58.000 --> 01:23:05.000
law or modification of law at a significant level right almost a statistic of three here so quite
01:23:05.000 --> 01:23:13.000
a bit of significance there even if we explain only a very very low fraction of the observed
01:23:13.000 --> 01:23:20.000
variance so you see the r squared here is just 2 percent i am very very low and still apparently
01:23:20.000 --> 01:23:28.000
we get highly significant results here which indicate that due to the change in law the
01:23:29.000 --> 01:23:37.000
amount of weeks out of work increased of course one must be careful with the interpretation of
01:23:37.000 --> 01:23:46.000
what exactly is the reason why this increases i had given you four possible reasons but this
01:23:46.000 --> 01:23:57.000
is not a complete list of possible reasons so one other explanation for the result that
01:24:00.000 --> 01:24:06.000
duration has increased could for instance be that prior to the reform workers concealed injuries
01:24:08.000 --> 01:24:14.000
for fear of reduced income in that case one would of course say that the change in the
01:24:14.000 --> 01:24:20.000
legislation was good to do because we do not want workers to work when they are actually
01:24:20.000 --> 01:24:28.000
injured we do not want them to to conceal their injuries but there are also the more negative
01:24:28.000 --> 01:24:34.000
interpretations like did workers perhaps claim more or more often injuries which were actually
01:24:35.000 --> 01:24:43.000
uh caused in non-work events right so did they abuse the benefits which was now being paid up
01:24:43.000 --> 01:24:49.000
to them because there was some higher incentive for abuse due to the higher level of support
01:24:49.000 --> 01:24:54.000
this type of question has not been answered the only thing we do answer is there is an increase
01:24:54.000 --> 01:25:02.000
in the work of the weeks out of the the number of weeks out of work and this should be further
01:25:02.000 --> 01:25:09.000
investigated to find out what is the reason for that because we put hint at an at a misuse at an
01:25:09.000 --> 01:25:17.000
abuse of the support but it is not necessarily an abuse there are also more noxious interpretations
01:25:17.000 --> 01:25:23.000
of what may have happened so further scrutiny would be desired for this question here
01:25:26.000 --> 01:25:30.000
one will note the coefficient on after is small and statistically insignificant
01:25:30.000 --> 01:25:38.000
as you can very funny uh here right this year is very small the after coefficient and clearly
01:25:38.000 --> 01:25:45.000
insignificant the standard error is much uh larger so the increase in the earnings cap has
01:25:45.000 --> 01:25:52.000
no effect on the duration for the low income uh workers all right the remark on the r squared
01:25:52.000 --> 01:26:01.000
i have provided now let me briefly discuss how we treat multiple periods so suppose for
01:26:01.000 --> 01:26:07.000
instance we are again in the incinerator example and we have more data as before
01:26:07.000 --> 01:26:15.000
for instance a third year 1984 so still 1978 1981 and 1984 so basically we have two years of
01:26:15.000 --> 01:26:22.000
after so one way would be to just run two regressions of the type we have run in equation
01:26:22.000 --> 01:26:34.000
seven one for 1981 versus 1978 and the other one from 1984 versus 1978 in this case we would
01:26:34.000 --> 01:26:45.000
of course have two estimates which are interpretable in different ways so we would
01:26:45.000 --> 01:26:52.000
have a beta not for yeah as the coefficient gamma not 78 and the beta one is the coefficient gamma
01:26:52.000 --> 01:27:01.000
178 so we would estimate the same parameters with two different sets of data which makes
01:27:01.000 --> 01:27:10.000
interpretation then tricky if not impossible or at least very difficult and the data parameters
01:27:11.000 --> 01:27:19.000
also are not necessarily the same because they refer either to 1981 or to 1984 so essentially
01:27:19.000 --> 01:27:26.000
they will always be different i mean it would be very very unlikely and i actually have probability
01:27:26.000 --> 01:27:33.000
zero that we estimate exactly the same data parameters for 1984 so we would have to distinguish
01:27:33.000 --> 01:27:44.000
for between a delta not parameter for 1981 and a delta not parameter for 1984 and that is completely
01:27:46.000 --> 01:27:52.000
legitimate because the incinerator effect may not be constant over time it may be that
01:27:52.000 --> 01:28:00.000
after some time people got used to the incinerator and house prices did not deteriorate rate as much
01:28:00.000 --> 01:28:08.000
anymore as they initially did or it may be that garbage trucks drive with motors which are less
01:28:08.000 --> 01:28:18.000
noisy for instance as technical programs make such trucks available right or it could be just
01:28:18.000 --> 01:28:25.000
the opposite that even more and more garbage is delivered to the incinerator side and there's
01:28:25.000 --> 01:28:32.000
more traffic and there's more bad smell for instance and that therefore in 1984 house prices
01:28:32.000 --> 01:28:40.000
have decreased even more so very natural that we have differences in the time evolution of the
01:28:40.000 --> 01:28:49.000
delta not coefficient and that is also perfectly explainable from the fact that the delta one
01:28:50.000 --> 01:28:57.000
coefficients are just the differences of average prices from the formulas that i have described
01:28:57.000 --> 01:29:08.000
because clearly well there are different average prices in 81 and 84 here and there so it is very
01:29:08.000 --> 01:29:15.000
clear that we will typically have different estimate of delta one 1981 and delta one 1984
01:29:15.000 --> 01:29:22.000
the one way to deal with this issue would be that we just modify our regressor matrix and now
01:29:22.000 --> 01:29:31.000
provide for two delta not and delta one coefficients one for 81 and one for 84 each
01:29:32.000 --> 01:29:40.000
and basically what happens is just that the regressor matrix now has a new block here which
01:29:40.000 --> 01:29:48.000
mirrors the structure of this block here so while we have yuta and n81 here in the second
01:29:48.000 --> 01:29:56.000
row we have no yuta and n84 here in the third row and thereby we would just make this vector
01:29:56.000 --> 01:30:03.000
in the same way with the same principle as we have done when we estimated the simpler model
01:30:03.000 --> 01:30:15.000
okay and when we have a single observation we can of course write this regression equation
01:30:15.000 --> 01:30:26.000
in this way here where i have now in curly brackets summarized the year effects here
01:30:26.000 --> 01:30:33.000
and summarized the effects of being near the incinerator in 1981 and 1984 and an interesting
01:30:33.000 --> 01:30:40.000
hypothesis would be the question of whether the delta one is unchanged between 1981 and 1984
01:30:40.000 --> 01:30:50.000
or whether it has some change so the suggestive way to do this would be to use an f test then
01:30:50.000 --> 01:30:56.000
and test on the equality of this coefficient here and that coefficient there
01:30:58.000 --> 01:31:07.000
if the hypothesis cannot be rejected so then we accept it typically then we could also estimate
01:31:07.000 --> 01:31:13.000
the regression equation in such a way that we just have one effect of being near the incinerator
01:31:14.000 --> 01:31:20.000
we can do the same thing of course for the new effects in 1981 and 1984 for the data zero
01:31:21.000 --> 01:31:32.000
coefficients yeah that i think is all i have to say there are many possibilities of doing such
01:31:32.000 --> 01:31:38.000
type of analysis with many points in time and nothing there will surprise you but i don't want
01:31:38.000 --> 01:31:43.000
to go into the details here because it's more important that you understand the principle
01:31:43.000 --> 01:31:48.000
also the final exam i will not ask you about more than two time periods so if you should
01:31:49.000 --> 01:31:55.000
that i don't ask difficult questions there when it is actually just important to understand the
01:31:55.000 --> 01:32:02.000
easy stuff but these stuff you should have well understood and i think it's really not that
01:32:02.000 --> 01:32:10.000
difficult good i think i'll stop here unless there are further questions and remind you all
01:32:10.000 --> 01:32:18.000
of those of you who would like to post questions relating to older material in tomorrow's lecture
01:32:18.000 --> 01:32:27.000
or interaction session and please send it to me by email since we want to do the interaction
01:32:28.000 --> 01:32:37.000
i asked you to log in then under the interaction link i sent this already to you once but perhaps
01:32:37.000 --> 01:32:41.000
it's better i send it to you again so i will write an email right after this lecture and send you the
01:32:41.000 --> 01:32:46.000
link and then please log in for tomorrow's lecture first under the active link where there is no
01:32:46.000 --> 01:32:54.000
recording by the way and then when i have answered your question your questions and we have discussed
01:32:54.000 --> 01:33:01.000
whatever needs to be discussed i will move on and present to you the last set of slides for this
01:33:02.000 --> 01:33:09.000
lecture and again please send the question by email the questions i receive by chat are sometimes
01:33:09.000 --> 01:33:16.000
not so easy to understand because obviously you don't have many much time to to write the question
01:33:16.000 --> 01:33:22.000
and they are well-forminated sometimes so if there are remaining questions send them in my
01:33:22.000 --> 01:33:28.000
email and i will discuss them tomorrow i don't see any further question for today's lecture
01:33:28.000 --> 01:33:38.000
so i wish you a good afternoon and see you then in tour under the interaction link