WEBVTT - autoGenerated
00:00:30.000 --> 00:00:59.000
Welcome to today's lecture on estimation and inference in econmetrics.
00:00:59.000 --> 00:01:07.000
We have concluded in the last week the second set of slides, you still see here the last
00:01:07.000 --> 00:01:12.000
slide I presented on the usage of p-values.
00:01:12.000 --> 00:01:17.000
I think that most of you this is familiar territory.
00:01:17.000 --> 00:01:26.000
Let me nevertheless ask you, are there any questions which relate to this review of statistics?
00:01:26.000 --> 00:01:28.000
If you have any, then please raise your hand.
00:01:28.000 --> 00:01:30.000
I see at least one question.
00:01:30.000 --> 00:01:57.000
Please type it in the chat.
00:01:57.000 --> 00:02:02.000
So the question is, on the one hand, we say a random sample is drawn, but on the other
00:02:02.000 --> 00:02:10.000
hand, we define it as a set of random variables.
00:02:10.000 --> 00:02:23.000
Not sure I completely understand what is being asked, really not clear.
00:02:23.000 --> 00:02:28.000
So perhaps let me just comment on that.
00:02:28.000 --> 00:02:35.000
A random sample is indeed something which is drawn from the population.
00:02:35.000 --> 00:02:40.000
So there's some explanation coming in, let me see, shouldn't the sample be a realization
00:02:40.000 --> 00:02:41.000
of those random variables?
00:02:41.000 --> 00:02:43.000
Yes, this is true.
00:02:43.000 --> 00:02:46.000
The sample is the realization of those random variables.
00:02:46.000 --> 00:02:54.000
So the sample is actually what we measure in terms of variables, what is the realization
00:02:54.000 --> 00:02:56.000
of the random variables.
00:02:56.000 --> 00:03:04.000
And you say that we define it as a set of random variables.
00:03:04.000 --> 00:03:13.000
Probably the confusion here arises from the fact that of course you can look at a sample
00:03:13.000 --> 00:03:21.000
also prior to its realization and then interpret it as a set of random variables.
00:03:21.000 --> 00:03:32.000
So if we were linguistically absolutely exact, we should always distinguish between a random
00:03:32.000 --> 00:03:39.000
variable prior to its realization, so a true random variable, and then the realization
00:03:39.000 --> 00:03:44.000
of the random variable, which is actually not random variable anymore, but it's just
00:03:44.000 --> 00:03:45.000
a real number.
00:03:45.000 --> 00:03:49.000
And the same thing also applies to samples.
00:03:49.000 --> 00:03:57.000
So usually when I speak of samples, I do indeed mean the realization of the random variables.
00:03:57.000 --> 00:04:05.000
But it may well occur, I'm sorry for some confusion, some confusion arises there, that
00:04:05.000 --> 00:04:14.000
sometimes I mean the sample prior to its realization and then indeed this is a set
00:04:14.000 --> 00:04:17.000
of random variables.
00:04:17.000 --> 00:04:26.000
I see that this may sound a little inexact and a little confusing at the outset, but
00:04:26.000 --> 00:04:31.000
I hope you can get used to it and always infer from the context of what is meant.
00:04:31.000 --> 00:04:39.000
In terms of notation, I think I'd distinguish very clearly between those two possibilities
00:04:39.000 --> 00:04:47.000
in that I always denote the sample prior to its realization, the set of random variables
00:04:47.000 --> 00:04:51.000
I should better say, which give rise to a sample.
00:04:51.000 --> 00:04:58.000
This I denote by capital letters and the realized sample I denote by small letters.
00:04:58.000 --> 00:05:02.000
So I hope this helps.
00:05:02.000 --> 00:05:09.000
Any other questions?
00:05:09.000 --> 00:05:11.000
A question, when will you release the student contributions result?
00:05:11.000 --> 00:05:15.000
I cannot see the first week's results on Steene.
00:05:15.000 --> 00:05:16.000
I haven't checked on that.
00:05:16.000 --> 00:05:18.000
This is done by my secretary.
00:05:18.000 --> 00:05:24.000
I thought that the student contributions results are already on Steene.
00:05:24.000 --> 00:05:29.000
I will ask my secretary after the lecture why this is not yet the case, if it is true
00:05:29.000 --> 00:05:35.000
that it isn't yet the case, but they should of course be published on Steene.
00:05:35.000 --> 00:05:42.000
So perhaps you check again in two or three days and I'm sure they will be there then.
00:05:42.000 --> 00:05:51.000
Okay, no further questions, then let me just go to one more kind of work incentive for
00:05:51.000 --> 00:05:52.000
you.
00:05:52.000 --> 00:06:01.000
Certainly, throughout this lecture, I will appeal to what I call your own research question,
00:06:01.000 --> 00:06:05.000
which basically is just a thought experiment.
00:06:05.000 --> 00:06:06.000
And the thought experiment goes like this.
00:06:06.000 --> 00:06:14.000
I will now present you a list of possible topics for research.
00:06:14.000 --> 00:06:22.000
And I will ask you to pick out one of those questions, not to conduct actual research
00:06:22.000 --> 00:06:34.000
on this question, but rather to follow with this question in mind the topics of the contents
00:06:34.000 --> 00:06:35.000
of this lecture.
00:06:35.000 --> 00:06:41.000
Because every once in a while, I will ask you at the end of some material which I have
00:06:41.000 --> 00:06:50.000
covered, well, how would you deal with this newly learned material in terms of your research
00:06:50.000 --> 00:06:51.000
question?
00:06:51.000 --> 00:06:57.000
What kind of research design do you have and how would you apply the methods or the contents
00:06:57.000 --> 00:07:02.000
that I have taught you to this specific research question?
00:07:02.000 --> 00:07:11.000
And therefore, you are asked to think about a real application with real world data of
00:07:11.000 --> 00:07:14.000
the methods that I teach in this lecture.
00:07:14.000 --> 00:07:17.000
So it is only a hypothetical research question.
00:07:17.000 --> 00:07:27.000
You're not asked to do this research as a true econometric research, but as a hypothetical
00:07:27.000 --> 00:07:34.000
question, you should decide on one of the questions which I now present you in the list
00:07:34.000 --> 00:07:40.000
and apply the methods which I teach in this lecture throughout the lecture to this particular
00:07:40.000 --> 00:07:43.000
research question of yours.
00:07:43.000 --> 00:07:50.000
So I will every once in a while ask you about how this application would work in your particular
00:07:50.000 --> 00:07:51.000
research question.
00:07:51.000 --> 00:07:58.000
You do not have to bother about real world data and the availability of such data.
00:07:58.000 --> 00:08:03.000
For the time being, you just have to choose a question and then I will confront you with
00:08:03.000 --> 00:08:13.000
the first questions that I have to you when you apply the concepts of this review of statistics
00:08:13.000 --> 00:08:15.000
to your research question.
00:08:15.000 --> 00:08:19.000
So here's a long list of possible research questions like, for instance, what is the
00:08:19.000 --> 00:08:24.000
effect of a minimum wage on unemployment?
00:08:24.000 --> 00:08:26.000
Reasonable research question.
00:08:26.000 --> 00:08:31.000
Lots of this research has already been done, but you could think about how you would address
00:08:31.000 --> 00:08:37.000
this question in your own research and how you apply methods which you hopefully become
00:08:37.000 --> 00:08:38.000
familiar with.
00:08:38.000 --> 00:08:43.000
Or does access to electricity improve health?
00:08:43.000 --> 00:08:47.000
Does it pay off for companies to advertise on Facebook?
00:08:47.000 --> 00:08:52.000
Do tuition fees reduce the number of students?
00:08:52.000 --> 00:08:59.000
Does course attendance improve your grade in this course or any other course?
00:08:59.000 --> 00:09:04.000
Does increasing alcohol taxes reduce binge drinking?
00:09:04.000 --> 00:09:10.000
Does playing a music instrument increase cognitive skills?
00:09:10.000 --> 00:09:16.000
Does pay for performance increase stress levels of workers?
00:09:16.000 --> 00:09:23.000
Do students' jobs decrease the performance of students at university?
00:09:23.000 --> 00:09:32.000
Does increasing the generosity of paid sick leave improve the health of employees?
00:09:32.000 --> 00:09:36.000
Does retirement make people happy?
00:09:36.000 --> 00:09:39.000
Does unemployment make you sick?
00:09:39.000 --> 00:09:45.000
Does graduation with honors improve your wages?
00:09:45.000 --> 00:09:50.000
Does an internship increase the chance of finding a job immediately after graduation?
00:09:50.000 --> 00:09:58.000
So long list of questions, just pick one of these, right, but then decide to have this
00:09:58.000 --> 00:10:06.000
question throughout the course of this lecture and then apply the methods which I teach to
00:10:06.000 --> 00:10:13.000
this question whenever I ask you to do this under the heading your research question.
00:10:13.000 --> 00:10:20.000
So here's the first section, and if you have chosen a topic from the list of possible research
00:10:20.000 --> 00:10:26.000
questions which I gave to you, then please answer the following questions.
00:10:26.000 --> 00:10:32.000
For your particular research question, what would be the population of interest?
00:10:32.000 --> 00:10:39.000
So this shall teach you or make you more familiar with the concept of a population that you
00:10:39.000 --> 00:10:42.000
now think in terms of your particular research question.
00:10:42.000 --> 00:10:46.000
What actually is the population of interest?
00:10:46.000 --> 00:10:53.000
How would you draw a sample from this population?
00:10:53.000 --> 00:10:59.000
Define some interesting hypothesis for your research question and state the null hypothesis
00:10:59.000 --> 00:11:08.000
and the alternative hypothesis for the hypothesis which you want to research it.
00:11:08.000 --> 00:11:13.000
What would a type one error look like in your case and what would be a type two error
00:11:13.000 --> 00:11:16.000
given the hypothesis which you have formulated?
00:11:16.000 --> 00:11:19.000
So it's just a thought experiment, as I said.
00:11:19.000 --> 00:11:26.000
Don't bother to find real world data on your question, assume the availability of data
00:11:26.000 --> 00:11:34.000
and then try to answer these questions as precisely as possible.
00:11:34.000 --> 00:11:37.000
This is it for the review of statistics.
00:11:37.000 --> 00:11:47.000
I will now move on to the review of basic econometrics, where I actually will start
00:11:47.000 --> 00:11:51.000
with a really very basic stuff which all of you have already had.
00:11:51.000 --> 00:11:58.000
But then I think I will move rather quickly to basic econometrics and matrix notation,
00:11:58.000 --> 00:12:04.000
which is perhaps not so familiar to all of you but will be the basic method of how I
00:12:04.000 --> 00:12:13.000
teach econometrics in many of the chapters you will have to go through in this lecture.
00:12:13.000 --> 00:12:21.000
So please see where everything is still well known to you and where you sense that perhaps
00:12:21.000 --> 00:12:28.000
you should invest some more time to familiarize you more with what I do here.
00:12:28.000 --> 00:12:35.000
The beginning of this review of basic econometrics is certainly by content familiar to you even
00:12:35.000 --> 00:12:39.000
though perhaps not by notation.
00:12:39.000 --> 00:12:45.000
Later on perhaps there are also some aspects of basic econometrics which you may not have
00:12:45.000 --> 00:12:49.000
covered in your undergraduate lectures.
00:12:49.000 --> 00:12:55.000
Typically I think you should have had this covered but perhaps it's not proof for everybody.
00:12:55.000 --> 00:12:59.000
So if you have deficiencies there then see that you closed.
00:12:59.000 --> 00:13:04.000
I will spend quite a bit of time on the review of basic econometrics.
00:13:04.000 --> 00:13:12.000
So I think I will take the whole month of November for this review of basic econometrics
00:13:12.000 --> 00:13:15.000
and possibly even the first week of December we'll see.
00:13:15.000 --> 00:13:19.000
But now it's November 1st and we start with it.
00:13:19.000 --> 00:13:24.000
Here are two references which you may consult, are there many econometric textbooks around
00:13:24.000 --> 00:13:28.000
and perhaps you have used other textbooks in your undergraduate studies then that's
00:13:28.000 --> 00:13:29.000
also fine.
00:13:29.000 --> 00:13:32.000
Typically they all teach the same type of content.
00:13:32.000 --> 00:13:38.000
But if you want to use one of these two I would recommend either using William Green's
00:13:38.000 --> 00:13:44.000
econometric analysis textbook or Jeffrey Woodridge's introductory econometrics where the letter
00:13:44.000 --> 00:13:47.000
is easier to read.
00:13:47.000 --> 00:13:50.000
William Green is certainly a little bit more advanced.
00:13:50.000 --> 00:13:55.000
So if you want to have something which is closer to what you did probably in undergraduate
00:13:55.000 --> 00:13:58.000
then the Woodridge text may be quite good for that.
00:13:58.000 --> 00:14:02.000
If you want to come a little bit closer to what we do now on the master's level then
00:14:02.000 --> 00:14:06.000
the green text may be the appropriate choice for you.
00:14:06.000 --> 00:14:14.000
Okay, as I said we start really easy with a bivariate regression without any statistical
00:14:14.000 --> 00:14:15.000
model yet.
00:14:15.000 --> 00:14:19.000
So what we now do is just descriptive analysis.
00:14:19.000 --> 00:14:24.000
So suppose we have some observations which we call y and we have other observations we
00:14:24.000 --> 00:14:36.000
call x and we can match the observations y and x in such a way that we think we can describe
00:14:36.000 --> 00:14:45.000
the observed values of y by some function of the observed values of x.
00:14:45.000 --> 00:14:49.000
This is perhaps just an approximate description.
00:14:49.000 --> 00:14:55.000
We don't really think that it is necessarily exact relationship which we have there but
00:14:55.000 --> 00:15:01.000
somehow we have some hypothesis that y and x are related.
00:15:01.000 --> 00:15:05.000
We don't see anything about causality so we don't know whether y causes x or x causes
00:15:05.000 --> 00:15:07.000
y and these kinds of things.
00:15:07.000 --> 00:15:14.000
We just say that the information included in x is similar to the information included
00:15:14.000 --> 00:15:23.000
in y and we can retrieve the information in y approximately at least by just using
00:15:23.000 --> 00:15:30.000
the data x assuming that there's some function f which retrieves this kind of information
00:15:30.000 --> 00:15:34.000
on y from the data in x.
00:15:34.000 --> 00:15:40.000
As always we try to make life as simple as possible and then make the first assumption
00:15:40.000 --> 00:15:42.000
that f is a linear function.
00:15:42.000 --> 00:15:49.000
So in this case we could write equation one as yi so the i-th observation on variable
00:15:49.000 --> 00:16:03.000
y is equal to some constant b1 plus some constant b2 times the observation xi so the i-th observation
00:16:03.000 --> 00:16:09.000
we have for variable x plus some approximation error which we call elac.
00:16:09.000 --> 00:16:15.000
So this is why I can write an exact equality sign here now whereas I had approximate equality
00:16:15.000 --> 00:16:22.000
in equation one because everything which was not included in the information we have on
00:16:22.000 --> 00:16:31.000
x is now projected on this error term here which includes actually two different components
00:16:31.000 --> 00:16:38.000
of an error namely first the component that well whatever the difference between the information
00:16:38.000 --> 00:16:44.000
in x and the information in y maybe which gives rise to the fact that I have approximate
00:16:44.000 --> 00:16:53.000
equality here only whatever is this difference this is in the ci here and then second there's
00:16:53.000 --> 00:16:59.000
also the approximation part because when I assume that f is a linear function then obviously
00:16:59.000 --> 00:17:07.000
I approximate possibly non-linear function f here by some linear function and this gives
00:17:07.000 --> 00:17:15.000
rise to an approximation error so in as much as this approximation is inappropriate we
00:17:15.000 --> 00:17:23.000
would have a second source of error in equation two and this is also included in the approximation
00:17:23.000 --> 00:17:26.000
error ei here.
00:17:26.000 --> 00:17:33.000
Of course this formulation already suggests that we can match observations from y one
00:17:33.000 --> 00:17:41.000
for one with observations on x so we would have this index i which always tells us that
00:17:41.000 --> 00:17:48.000
observation xi is matched to observation yi and observation xj is not matched to yi but
00:17:48.000 --> 00:17:53.000
it would be matched to yj of course this also means that I assume we have the same number
00:17:53.000 --> 00:18:01.000
of observations on variable y as we have on variable x but this is the usual setup.
00:18:01.000 --> 00:18:07.000
Now let me give you an example for that data which I have uploaded to Steena please download
00:18:07.000 --> 00:18:12.000
them you will get exercises for these data.
00:18:12.000 --> 00:18:19.000
The example is taken from the pending broad table tables collection of macroeconomic data
00:18:19.000 --> 00:18:26.000
which are adjusted in terms of purchasing power parity so PPP here for purchasing power
00:18:26.000 --> 00:18:34.000
parity these are PPP adjusted data for basically all of the countries people could get data
00:18:34.000 --> 00:18:40.000
from so in this case 187 countries we know there are a little bit more than 200 countries
00:18:40.000 --> 00:18:47.000
in the world so it covers large size of the countries and I have picked here private consumption
00:18:47.000 --> 00:18:54.000
data and GDP data where GDP data and private consumption data are not only real in the
00:18:54.000 --> 00:19:00.000
sense that they are PPP adjusted but they are also taken on a per capita level and then
00:19:00.000 --> 00:19:05.000
I chose a particular year in this case I chose 2007 so before all this mess with the financial
00:19:05.000 --> 00:19:15.000
crisis and so forth studied and I will look at a cross section now over all of the countries
00:19:15.000 --> 00:19:22.000
over all of the 187 countries so I have no time dimension here I just used year 2007
00:19:22.000 --> 00:19:29.000
in order to study the relationship between income and consumption so the most basic relationship
00:19:29.000 --> 00:19:36.000
which you have always encountered in upgraded studies first macro sessions a consumption
00:19:36.000 --> 00:19:41.000
function basically so question is can we estimate a Keynesian type of world consumption function
00:19:41.000 --> 00:19:48.000
where consumption would just be a function of current income so we have no permanent
00:19:48.000 --> 00:19:55.000
income idea here just take the very simple Keynesian approach that currency and consumption
00:19:55.000 --> 00:20:04.000
depends on just current income now as I said please download the data and compute the correlation
00:20:04.000 --> 00:20:11.000
between private consumption and GDP you should get as a result that this correlation is rather high
00:20:11.000 --> 00:20:21.000
namely 92.3 percent you see this here in the scatter plot of the data where you have GDP
00:20:21.000 --> 00:20:27.000
and real per capita terms on the horizontal axis and you have private consumption and real per
00:20:27.000 --> 00:20:35.000
capita terms on the vertical axis and very clearly there is a high degree of correlation between GDP
00:20:35.000 --> 00:20:45.000
and consumption so between income and consumption now this is an example of the type of exercise we
00:20:45.000 --> 00:20:54.000
are doing merely descriptive for the relationship between income and private consumption and one
00:20:54.000 --> 00:21:02.000
idea one could now have is that why it seems clear that consumption can be written as some function
00:21:02.000 --> 00:21:09.000
f of GDP it is also suggestive at least as a first approach to say this is perhaps a linear
00:21:09.000 --> 00:21:16.000
relationship which we have here right and therefore we would use this linear approach which I have
00:21:16.000 --> 00:21:26.000
shown on the previous slide and I would have to look for the coefficients b1 and b2 which I had
00:21:26.000 --> 00:21:34.000
in this linear formulations obviously we can estimate b1 and b2 in many different forms and
00:21:34.000 --> 00:21:41.000
such estimates can be either good or bad we are of course looking for best estimate we may possibly
00:21:41.000 --> 00:21:51.000
have so what is the criterion for a good estimate as you know the criterion is that the error term
00:21:51.000 --> 00:22:00.000
this ei which I had in my linear formulation should be rather small and therefore we use the
00:22:00.000 --> 00:22:08.000
common approach of squaring the ei values and then looking at the mean squared error mean squared
00:22:08.000 --> 00:22:17.000
error is one over n for n observations and is 187 in this case and I take the sum of the squared
00:22:17.000 --> 00:22:24.000
errors which we have here which are as I say approximation errors or even substantial
00:22:24.000 --> 00:22:32.000
errors so errors which relates to the fact that perhaps income doesn't explain all of
00:22:32.000 --> 00:22:39.000
consumption all of this is in this ei here I square the ei's which are of course unobservable
00:22:40.000 --> 00:22:46.000
currently but as a concept I can square them and then I take mean value values
00:22:47.000 --> 00:22:55.000
very clearly n is a given number so it is not really important of whether we
00:22:56.000 --> 00:23:04.000
we minimize one over n times the sum of the squared errors or whether we just minimize the sum
00:23:04.000 --> 00:23:10.000
of the squared errors so we will do the latter and then this is the so-called least squares
00:23:11.000 --> 00:23:17.000
problem which gives them also its name to the estimator based on the least squares principle
00:23:18.000 --> 00:23:25.000
so I define a sum s which depends on the choice of the coefficients b1
00:23:26.000 --> 00:23:36.000
and b2 in my simple linear hypothesis the sum s is the sum of the squared errors and I know the
00:23:36.000 --> 00:23:47.000
error ei is should adjust my observation yi minus b1 minus b2 xi and of this expression I have to
00:23:47.000 --> 00:23:54.000
take the square and then I sum all those squared here the least squares problem consists in choosing
00:23:54.000 --> 00:24:02.000
b1 and b2 in such a form that the whole expression becomes minimal so this this notation minimize
00:24:02.000 --> 00:24:11.000
over appropriate choice of b1 and b2 this expression here which is exactly the sum of the
00:24:11.000 --> 00:24:20.000
squared error terms now obviously what we have to do when we want to minimize we have to take
00:24:20.000 --> 00:24:30.000
the derivative with respect to b1 and with respect to b2 I do this here step by step first I look at
00:24:30.000 --> 00:24:37.000
the derivative with respect to b1 setting this equal to zero so I get the first order condition
00:24:37.000 --> 00:24:43.000
we get two first order conditions but this is the first first order condition the partial derivative
00:24:43.000 --> 00:24:51.000
of s with respect to b1 so when I take the derivative then obviously I will have to
00:24:51.000 --> 00:24:59.000
differentiate with respect to b1 so the squared term becomes a coefficient of two here
00:25:00.000 --> 00:25:06.000
and the inner derivative is negative one which is this one here so the derivative as a whole is
00:25:06.000 --> 00:25:14.000
the sum over two times yi minus b1 minus b2 xi times the inner derivative negative one
00:25:14.000 --> 00:25:20.000
and this shall be equal to zero clearly since it is equal to zero the negative one here and the
00:25:20.000 --> 00:25:27.000
two here don't really play a role so I can break the sum apart and move the negative components
00:25:27.000 --> 00:25:35.000
to the right hand side of the equation so that I would get the sum over the yi's shall be equal
00:25:35.000 --> 00:25:43.000
if I have an optimal choice of b1 and b2 shall be equal to the sum of the components b1 plus b2
00:25:43.000 --> 00:25:53.000
times xi and now summing this b1 coefficient here over n units is of course nothing else but
00:25:53.000 --> 00:26:02.000
n times b1 moreover the b2 is just a constant which I can factor out of the sum so the same
00:26:02.000 --> 00:26:10.000
condition can be written as sum over the yi's is equal to n times b1 plus b2 times the sum over
00:26:10.000 --> 00:26:19.000
all of the xi's and now dividing this equation here through by n I would get on the left hand
00:26:19.000 --> 00:26:28.000
side of this equation here the mean of the yi's which is y bar so y bar shall be equal to b1
00:26:29.000 --> 00:26:38.000
because the n goes away since we divide through by n plus b2 times the mean of all the x observations
00:26:38.000 --> 00:26:48.000
of all the xi's here so this is our first order condition y bar shall be equal to b1 plus b2 times
00:26:48.000 --> 00:26:59.000
x bar and the bars of course are just the sample averages then we take the derivative also with
00:26:59.000 --> 00:27:05.000
respect to the other parameter namely with respect to b2 so same procedure we take the partial
00:27:05.000 --> 00:27:11.000
derivative of s with respect to b2 then derivative in this case looks very similar to the first
00:27:11.000 --> 00:27:19.000
derivative the only difference being that now the inner difference the the inner derivative is not
00:27:19.000 --> 00:27:27.000
minus this is not negative one but negative xi because now we differentiate with respect to b2 xi
00:27:27.000 --> 00:27:35.000
here so the inner derivative is negative xi when we take the derivative with respect to b2
00:27:36.000 --> 00:27:43.000
okay and so in this case we can just divide the whole equation by two so who doesn't play
00:27:43.000 --> 00:27:51.000
a road zero divided by two zero and we can again take it apart and would in this case
00:27:52.000 --> 00:28:01.000
get that the sum over the products xi and yi shall be equal to the sum over b1 xi
00:28:02.000 --> 00:28:11.000
plus the sum over b2 xi squared right the xi is multiplied into this first term here in the
00:28:11.000 --> 00:28:18.000
parentheses and then we get negative xi yi this gives rise to this term if we move it over to
00:28:18.000 --> 00:28:28.000
the other side and here i would have plus terms xi times b1 and xi times xi gives rise to xi
00:28:28.000 --> 00:28:33.000
squared this then being multiplied by the coefficients we are looking for b2 and b1
00:28:35.000 --> 00:28:44.000
dividing this equation here by n which is just a constant factor we would get one over n xi yi
00:28:44.000 --> 00:28:54.000
is equal to b1 times x bar plus b2 times the mean of the squared values of the xis
00:28:54.000 --> 00:29:03.000
here so one over n sum over xi squared now this expression looks still a little complicated with
00:29:03.000 --> 00:29:11.000
these two sums in there but this can be written in a much easier way as you are probably aware
00:29:11.000 --> 00:29:20.000
we'll see now in the following because we can exploit the fact that one over n times xi yi
00:29:21.000 --> 00:29:30.000
which is exactly this expression here minus x bar y bar is actually an estimator of the covariance
00:29:30.000 --> 00:29:37.000
between x and y then denote this x the estimator by covariance with a little hat to indicate that
00:29:37.000 --> 00:29:47.000
is an estimator the same thing i can use for the term one over n sum over xi squared here which i
00:29:47.000 --> 00:29:56.000
write down which i write here because i know that this term here minus x bar squared is an estimator
00:29:56.000 --> 00:30:04.000
of the variance of x note that these estimators which i denote by variance with a little hat
00:30:04.000 --> 00:30:14.000
and covariance with a little hat are biased estimators they are unbiased only asymptotically
00:30:14.000 --> 00:30:22.000
but they are biased in finite samples because i divide here by n here or there but in order to
00:30:22.000 --> 00:30:28.000
have an unbiased estimator i would actually need to divide by n minus one so they are not the best
00:30:28.000 --> 00:30:34.000
estimators which we can have for the variance and the covariance but i will make use of these
00:30:34.000 --> 00:30:42.000
type of estimators sometimes in this review of basic econometrics and as i say their notation
00:30:42.000 --> 00:30:48.000
is then covariance hat or variance hat and for the unbiased estimator of the variance and the
00:30:48.000 --> 00:30:54.000
covariance i will use different so just just bear it in your mind that covariance hat and
00:30:54.000 --> 00:31:03.000
variance hat are the biased estimators of the of the covariance and of the variance of two
00:31:03.000 --> 00:31:10.000
variables they are not completely bad estimators however they are actually the maximum likelihood
00:31:10.000 --> 00:31:18.000
estimators of the covariance and the variance so they appear quite often in econometrics even though
00:31:18.000 --> 00:31:24.000
we know in finite samples we can improve on these estimators by dividing through by n minus one
00:31:24.000 --> 00:31:33.000
rather than by n okay and so why do i relate to these estimators of covariance and variance
00:31:34.000 --> 00:31:41.000
simply because i want to simplify my second first order equation which currently takes
00:31:41.000 --> 00:31:50.000
the form of equation five by getting rid of these ugly summation terms here and there and my
00:31:50.000 --> 00:31:58.000
proposal to get rid of these terms is just to make use of these estimators here which however
00:31:58.000 --> 00:32:07.000
are not exactly the same thing as the summation terms here because they differ by these terms
00:32:07.000 --> 00:32:13.000
which are subtracted from them namely x bar y bar so the product of x bar y bar and the product
00:32:13.000 --> 00:32:22.000
of x bar x bar squared we'll see how we can transform equation five such that these things
00:32:22.000 --> 00:32:31.000
actually are included in equation five namely what we do is we go back to the first first
00:32:31.000 --> 00:32:38.000
order equation which was equation four and multiply this first order equation just by x bar
00:32:38.000 --> 00:32:45.000
right so the first first order equation the first order first first order condition
00:32:46.000 --> 00:32:56.000
was y bar is equal to b1 plus b2 x bar and now i have multiplied the whole first first order
00:32:56.000 --> 00:33:05.000
condition by x bar so that i get x bar times y bar is equal to b1 x bar plus b2 x bar squared
00:33:05.000 --> 00:33:11.000
you see immediately that you could just divide through by x bar and then you retrieve the first
00:33:11.000 --> 00:33:18.000
first order condition this equation six here which is an inflated first first order equation
00:33:18.000 --> 00:33:26.000
i now subtract from equation five to get this expression here one over n x i y i
00:33:28.000 --> 00:33:34.000
minus the term x bar x bar y bar here so i have on the left hand side of this equation
00:33:34.000 --> 00:33:40.000
the estimator the maximum likelihood estimator actually of the covariance shall be equal to
00:33:41.000 --> 00:33:54.000
b1 x bar plus one over n b2 times the sum of the squared x i bars minus b1 x bar
00:33:55.000 --> 00:34:02.000
so minus the right hand side of equation six minus b1 x bar minus b2 x bar squared
00:34:03.000 --> 00:34:10.000
and then you see the negative b1 x bar here and the positive b1 x bar here cancel precisely
00:34:11.000 --> 00:34:17.000
and what remains on the right hand side of this equation is just the expression for the
00:34:17.000 --> 00:34:24.000
maximum likelihood estimator of variance so it's variance hat of x so the second first order
00:34:24.000 --> 00:34:30.000
condition using the first first order condition becomes that the estimator of the covariance
00:34:30.000 --> 00:34:37.000
between x and y shall be equal to b2 which is the parameter we're looking for times the estimated
00:34:37.000 --> 00:34:45.000
variance of of x so that's the second optimality condition we have essentially this condition here
00:34:45.000 --> 00:34:56.000
and equation four so so we now equation four in this last condition for b1 and b2 we get
00:34:57.000 --> 00:35:07.000
the optimal choices for b1 and b2 as b2 is equal to well the the the ratio between the
00:35:07.000 --> 00:35:19.000
estimated covariance and the estimated variance of x right and observe this problematic factor
00:35:19.000 --> 00:35:30.000
one over n which gave rise to the fact that the covariance hat is not unbiased in finite samples
00:35:30.000 --> 00:35:37.000
or is not that the estimator of the variance is not unbiased in finite sample samples this
00:35:38.000 --> 00:35:45.000
factor one over n appears here both in the numerator and in the denominator of this ratio
00:35:45.000 --> 00:35:51.000
so it actually cancels so we could equally will use the unbiased estimator of covariance
00:35:51.000 --> 00:35:58.000
and variance just doesn't play a role b2 is the ratio of the estimate of the covariance
00:35:58.000 --> 00:36:07.000
and the estimate of the variance and from this we can easily retrieve what b1 is because b1 was y bar
00:36:07.000 --> 00:36:16.000
minus b2 times x bar but b2 we have computed here so we just replace b2 by the ratio of covariance
00:36:16.000 --> 00:36:25.000
hat divided by variance hat so this y bar minus covariance hat divided by variance hat times x bar
00:36:25.000 --> 00:36:41.000
okay and perhaps it is noteworthy to see that this second first order condition
00:36:41.000 --> 00:36:49.000
is a first order condition which relates the coefficient b2 to the second moments of x and y
00:36:49.000 --> 00:36:57.000
so here is the variance of x right it's the second central moment of x and here's the covariance of x
00:36:57.000 --> 00:37:06.000
and y which is also a second central moment of now the joint distribution of x and y so b2 is
00:37:06.000 --> 00:37:13.000
something which tells us something about the second moments of the variables b1 is something
00:37:13.000 --> 00:37:20.000
which is basically related to the first moments of the variance b1 is related to the mean y bar
00:37:20.000 --> 00:37:28.000
which is an estimator of the expected value of y and it's related to x bar which is an estimator of
00:37:28.000 --> 00:37:37.000
the expected value of x so b1 is related to first moments of the variables
00:37:38.000 --> 00:37:46.000
using in order to determine the weight between y bar and x bar b2 is coefficient which is related
00:37:46.000 --> 00:37:54.000
to the second modes of the variance but sort of as a as a general rule you can say the intercept
00:37:54.000 --> 00:38:01.000
in a regression as the one we have just gone through is related to the first moment and
00:38:01.000 --> 00:38:05.000
the other regression coefficient or the other regression coefficients
00:38:05.000 --> 00:38:10.000
if we have many of them are related to the second modes of the variables
00:38:12.000 --> 00:38:19.000
moreover the estimate b2 is similar to a correlation coefficient and on the next
00:38:19.000 --> 00:38:25.000
slide you will get as an exercise to find out what is actually the dissimilarity why is this
00:38:25.000 --> 00:38:33.000
not truly a correlation coefficient but just something similar to it yeah and the estimate b1
00:38:33.000 --> 00:38:38.000
as i said the average of the observations y adjusted for the average of the explanatory variable x
00:38:41.000 --> 00:38:48.000
here's the exercise i spoke of first exercise how does b2 differ from the correlation coefficient
00:38:48.000 --> 00:38:53.000
between x and y if you don't see it directly then please go back to the formula which i have
00:38:53.000 --> 00:39:00.000
given you in the previous lectures and on the second exercise please do the following
00:39:01.000 --> 00:39:10.000
compute x bar y bar the variance of x and the covariance of x and y for this pampered
00:39:10.000 --> 00:39:17.000
table data set of GDP and private consumption data which i've used in the example
00:39:17.000 --> 00:39:28.000
um and then i ask you to compute the estimate b1 and b2 manually so by manually i mean of course
00:39:28.000 --> 00:39:35.000
that you may use computer software but don't use sort of this fixed setup regression package like
00:39:35.000 --> 00:39:43.000
the one which i will now use when i show you the os results but rather compute in some software
00:39:43.000 --> 00:39:50.000
actually for this example it may still just be just be excel for for instance compute the mean
00:39:50.000 --> 00:39:57.000
over the axis compute the mean over the y's easily done excel compute the variance and compute the
00:39:57.000 --> 00:40:04.000
covariance let's say in excel and then use these data which you have in excel to compute estimates
00:40:04.000 --> 00:40:13.000
the optimal estimates b1 and b2 to verify that what you have done say in excel use exactly the
00:40:13.000 --> 00:40:19.000
same results as the ones which i will now show you using an econometric software maybe eius okay
00:40:20.000 --> 00:40:27.000
so just to verify that eius is doing its job properly i will rather often in this lecture
00:40:27.000 --> 00:40:34.000
give you exercises of this type where i write do something manually manually means always you can
00:40:34.000 --> 00:40:43.000
use any kind of software but you should really compute estimates then from the moments of
00:40:43.000 --> 00:40:51.000
the data rather than just running a regression uh routine so basically verifying that the
00:40:51.000 --> 00:40:59.000
regression routines which are in your software do the right job later on you may find it perhaps
00:40:59.000 --> 00:41:09.000
more convenient to use matlab or r or gauss or any kind of mathematical matrix oriented language
00:41:09.000 --> 00:41:16.000
to do these kind of manual operations but for the time being you can still do it in excel and
00:41:16.000 --> 00:41:21.000
actually many of these exercises you could also do in excel if you are not so familiar with r or
00:41:21.000 --> 00:41:28.000
matlab or gauss i don't care what you use use any package which is familiar to you in a way if
00:41:28.000 --> 00:41:36.000
you're comfortable doing these kind of calculations anyway when i use a commercial econometric
00:41:36.000 --> 00:41:46.000
software like e-views i get this output here i will over the course of the semester rather often
00:41:46.000 --> 00:41:52.000
present you output from e-views um output from other packages stata
00:41:53.000 --> 00:42:00.000
uh matlab gauss or whatever you use is of course formatted in a different way
00:42:01.000 --> 00:42:06.000
but mostly gives the same kind of information in particular on the highlighted coefficients here
00:42:06.000 --> 00:42:13.000
you should get exactly the same type of result even if you do it in different software so the
00:42:13.000 --> 00:42:19.000
e-views output is structured as follows we have a dependent variable which is private consumption
00:42:19.000 --> 00:42:28.000
i denote it here at c c private uh we have 187 included observations mainly for 187 countries
00:42:28.000 --> 00:42:36.000
in the year year 2007 and then we estimate that private consumption is explained by
00:42:36.000 --> 00:42:44.000
some coefficient c some constant c which is just our b1 coefficient so this b1 coefficient would
00:42:44.000 --> 00:42:55.000
be 1518 right and y is the symbol for gdp for income and so the coefficient here is actually
00:42:55.000 --> 00:43:06.000
the b2 coefficient which would be estimated as 0.514 and you can always round after some digits
00:43:06.000 --> 00:43:14.000
okay so these would be the two coefficients which we have estimated now as the latter one the ratio
00:43:14.000 --> 00:43:20.000
between the covariance the estimated covariance between private consumption and gdp data why in
00:43:20.000 --> 00:43:30.000
this case and here then the difference of the means mean of y adjusted sorry mean of private
00:43:30.000 --> 00:43:37.000
consumption adjusted for the mean of gdp with the speed to coefficient in the way you have seen that
00:43:37.000 --> 00:43:44.000
all the other statistics in this output i will explain later so just don't bother about them
00:43:44.000 --> 00:43:49.000
you'll see this screen or a similar screen quite a few times in the following lectures
00:43:53.000 --> 00:43:58.000
here is again the scatter plot of the data you've seen the data already nothing has changed there
00:43:58.000 --> 00:44:08.000
i just put the regression line in the scatter plot the regression line in red is now
00:44:08.000 --> 00:44:15.000
the linear function which i estimate for this regression here
00:44:18.000 --> 00:44:25.000
now i'm not sure how much experience you have with econometric work in practice
00:44:26.000 --> 00:44:30.000
um where you have some experience you should note that there's a problem here
00:44:30.000 --> 00:44:34.000
does anybody see a problem in the scatter plot
00:44:40.000 --> 00:44:47.000
please raise your hand if you do see a problem yes please uh i see two people uh raising their
00:44:47.000 --> 00:44:54.000
hand please write in this in the chat what kind of problem you see first is already there
00:44:56.000 --> 00:45:02.000
uh there's somebody there writing there's perhaps heteroskeleticity um yes no yes
00:45:02.000 --> 00:45:08.000
probably there's also heteroskeleticity there uh what was the second suggestion
00:45:08.000 --> 00:45:27.000
i haven't introduced heteroskeleticity yet but i'm sure you have encountered the term in
00:45:27.000 --> 00:45:34.000
the undergraduate econometrics we'll come to this later heteroskeleticity was actually not the
00:45:34.000 --> 00:45:40.000
problem i was aiming at when i asked you this question here there is even a more fundamental
00:45:40.000 --> 00:45:50.000
problem and since the other participant does apparently not write his opinion yet
00:45:51.000 --> 00:45:57.000
let me just explain what it is see when you look closely you see that for very low incomes
00:45:58.000 --> 00:46:08.000
uh the data are almost all below the regression line so the errors which we make are all negative
00:46:08.000 --> 00:46:17.000
for the low incomes whereas for the higher incomes most of the errors are above the regression line
00:46:17.000 --> 00:46:23.000
there are a few also down here right but not so very many most of them are actually above
00:46:23.000 --> 00:46:32.000
the regression line so it seems that the error itself still depends on income low incomes give
00:46:32.000 --> 00:46:39.000
rise to negative errors and high incomes or middle incomes give rise to positive errors mostly at
00:46:39.000 --> 00:46:47.000
least right so this suggests that we have not explained everything yet in private consumption
00:46:47.000 --> 00:46:53.000
which we may be able to explain by income and i will show you a different graph which reviews
00:46:53.000 --> 00:47:00.000
this problem much more strikingly than it is done in the scatter plot here namely when i plot
00:47:00.000 --> 00:47:09.000
the estimated residuals ei against the logarithm of GDP then it looks like this
00:47:09.000 --> 00:47:18.000
see that's an eview's graph which you can produce you have here the residuals this is the ei
00:47:19.000 --> 00:47:26.000
here is the zero line obviously the eis are centered around the zero line but we see that
00:47:26.000 --> 00:47:33.000
for low incomes and i measure you come now in logs in order to make the phenomenon
00:47:33.000 --> 00:47:39.000
uh to come out of the data a little bit more clearly um you see for low incomes almost all
00:47:39.000 --> 00:47:46.000
of the residuals are negative up to here approximately right and for higher incomes
00:47:46.000 --> 00:47:51.000
many of them at least are positive not all of them as you see there are also quite a few which are
00:47:51.000 --> 00:47:59.000
negative but on balance you could verify that many more are positive than there are negative
00:47:59.000 --> 00:48:07.000
errors in here the answer given by one of you that there's also evidence of
00:48:07.000 --> 00:48:13.000
heteroskeleticity was also correct because you see that the size of the error seems to depend
00:48:13.000 --> 00:48:24.000
on the size of of income on the magnitude of income so with smaller incomes the errors are
00:48:24.000 --> 00:48:32.000
the absolute value smaller than when income is great so that is also evidence of heteroskeleticity
00:48:32.000 --> 00:48:39.000
this right and we probably deal with it um by uh also dealing with this other problem which i
00:48:39.000 --> 00:48:45.000
pointed out namely that the errors for the low incomes are negative all of them whereas the
00:48:45.000 --> 00:48:53.000
errors for the middle and higher incomes are mostly positive so and perhaps we should go back
00:48:53.000 --> 00:49:02.000
to um this uh thing here again to this graph here again because uh this will already hint a
00:49:02.000 --> 00:49:10.000
little bit what kind of modification uh we may take because actually when you look at the data
00:49:10.000 --> 00:49:17.000
you may say perhaps the idea of taking linear approximation through the data was not the best
00:49:17.000 --> 00:49:24.000
idea at all perhaps actually uh this scatter plot doesn't really suggest something linear
00:49:24.000 --> 00:49:29.000
but rather suggests something with a little curvature perhaps it's even easier to see when
00:49:29.000 --> 00:49:36.000
you don't have this misleading linear uh uh regression line in there but you rather look
00:49:36.000 --> 00:49:42.000
at this uh thing here but if you really had a linear relationship starting out here with
00:49:42.000 --> 00:49:49.000
the slope suggested by the lowest income observations then actually linear line will
00:49:49.000 --> 00:49:56.000
always almost move like my little hand here can't do it quite exactly but after some time you would
00:49:56.000 --> 00:50:03.000
think well linear line is perhaps a little bit off the line is more going that way rather than
00:50:04.000 --> 00:50:11.000
describing good fit for the data here for which perhaps you're better advised to take something
00:50:11.000 --> 00:50:18.000
which has curvature so some type of concave relationship between GDP and private consumption
00:50:18.000 --> 00:50:24.000
in the cross section could be appropriate here and this is exactly what we'll do now
00:50:27.000 --> 00:50:32.000
we seem to have systematic error right poor countries consume less than linear and linear
00:50:32.000 --> 00:50:39.000
consumption function would suggest so we try a concave uh specification for our function f
00:50:40.000 --> 00:50:49.000
by uh taking the logarithms of the data so now i denote by lny the log of GDP and by lnc private
00:50:49.000 --> 00:50:58.000
the log of private consumption and again my regression is actually a linearly specified
00:50:58.000 --> 00:51:07.000
regression but now the logarithms of the data so um the fact that data suggests some non-linear
00:51:07.000 --> 00:51:12.000
relationship that does not necessarily imply that we have to use a non-linear function f
00:51:14.000 --> 00:51:21.000
it may suffice and actually many cases it does suffice to non-linearly transform the data
00:51:22.000 --> 00:51:31.000
and use the non-linearly transformed data in a linear regression approach so in a linear
00:51:31.000 --> 00:51:38.000
function like this one here so the approach is now non-linear in the data but uh the relationship
00:51:38.000 --> 00:51:45.000
between uh variables uh the transformed variables is still a linear relationship log of private
00:51:45.000 --> 00:51:53.000
consumption is equal to b1 plus b2 times log of private income plus some regression error ei
00:51:55.000 --> 00:52:01.000
now obviously the b1 and the b2 and the ei here denote now something completely different
00:52:01.000 --> 00:52:07.000
from what the same symbols have denoted in the first approach in the linear approach just for
00:52:07.000 --> 00:52:13.000
convenience i still use the same notation right um but but please bear in mind this b1 which we
00:52:13.000 --> 00:52:18.000
estimate now is of course something not only numerically but also conceptually very different
00:52:18.000 --> 00:52:24.000
from the b1 we have uh dealt with in the previous approach and the same how it's broken too and the
00:52:24.000 --> 00:52:29.000
eis are of course of course and also uh different i've noted this down here we just use the same
00:52:29.000 --> 00:52:36.000
symbols but they mean something different okay here is uh the regression result and now you see
00:52:36.000 --> 00:52:41.000
we have again a constant which takes a completely different value from what we have before and here
00:52:41.000 --> 00:52:50.000
so this is the estimate of b1 and here we have the estimate of b2 which describes us uh how the log
00:52:50.000 --> 00:52:57.000
of gp translates into uh private consumption or better into the log of private consumption
00:52:57.000 --> 00:53:05.000
i have highlighted now a different part of the output here namely this part where uh evius writes
00:53:05.000 --> 00:53:12.000
sum of squared resilience right so this is the value of our objective function we wanted to minimize
00:53:12.000 --> 00:53:18.000
our objective function so this is the lowest value which evius could achieve in terms of the
00:53:18.000 --> 00:53:27.000
squared resilience right so it's this sum here uh the sum over all of the ei squared which is given
00:53:27.000 --> 00:53:37.000
in the output there but just looking at this one output um this value is of no informative uh
00:53:37.000 --> 00:53:44.000
content uh we can't really use it only if we run different specifications on the same set of data
00:53:44.000 --> 00:53:52.000
so on the same logarithm data then we could compare uh the sum of squared resilience with other
00:53:53.000 --> 00:53:59.000
sums of squared resilience in other approaches you can see where we achieve the minimum value
00:53:59.000 --> 00:54:06.000
there's no point of course in comparing the sum of squared resilience here which is basically 20.3
00:54:06.000 --> 00:54:13.000
with the sum of squared resilience which we had here which was a huge number as you see because
00:54:13.000 --> 00:54:19.000
here we have completely different units in which we worked so this has nothing to do with each other
00:54:19.000 --> 00:54:26.000
because first they never not logarithms but rather PPP adjusted dollar now they are unlocked so these
00:54:26.000 --> 00:54:31.000
cannot be compared but if we have a second specification in terms of logs then we could
00:54:31.000 --> 00:54:36.000
actually compare uh the sum of squared resilience with the other sum of squared resilience we would
00:54:37.000 --> 00:54:45.000
obtain the second um approach all right exercise for you also reproduce in
00:54:46.000 --> 00:54:49.000
some software the value of the sum of squared resilience
00:54:52.000 --> 00:55:00.000
if i now plot the data uh so another scatter plot i use the logs of the data so this is the log
00:55:00.000 --> 00:55:08.000
of GDP this is the log of private consumption here are my data and you see that the regression
00:55:08.000 --> 00:55:17.000
line seems to be quite good in the sense that it really looks like this is a linear relationship
00:55:18.000 --> 00:55:25.000
true we still have let's say one two three four negative observations at the start of the sample
00:55:25.000 --> 00:55:31.000
for variable incomes but there are also already two observations exactly on the straight line and
00:55:31.000 --> 00:55:37.000
then at the latest here we have quite a few positive observations so this can well be a random
00:55:37.000 --> 00:55:44.000
event that the first it's actually just the first observation was negative and the residual was uh
00:55:44.000 --> 00:55:50.000
then the residual will already be zero uh here the residual is negative negative negative zero again
00:55:50.000 --> 00:55:56.000
then positive negative positive negative so that can well be uh the result of some random event
00:55:56.000 --> 00:56:01.000
that we have kind of sequence of negative and positive resilience suggested by the graph
00:56:02.000 --> 00:56:09.000
here we would probably not take this as evidence uh for some kind of further hidden non-linearity
00:56:09.000 --> 00:56:16.000
but rather seems that this regression explains uh the data rather well and um now coming back
00:56:16.000 --> 00:56:24.000
to this one answer which was supplied to my question the evidence of heteroskeleticity has
00:56:24.000 --> 00:56:30.000
also uh finished right now it doesn't look like the size of the errors for low incomes
00:56:31.000 --> 00:56:38.000
is smaller than the size of the errors for high uh incomes so basically probably by taking the
00:56:38.000 --> 00:56:43.000
logs of the data we have also dealt with the problem of heteroskeleticity
00:56:43.000 --> 00:56:52.000
so it seems quite uh satisfactory um still one should have a look at the distribution of the
00:56:52.000 --> 00:57:01.000
errors and the distribution of the estimated ei terms and i do this here by looking at a plot of
00:57:01.000 --> 00:57:10.000
the histogram of the residuals and now you see well the histogram is like these light blue bars
00:57:10.000 --> 00:57:19.000
here i can also estimate with certain methods um which we will discuss in this lecture but
00:57:19.000 --> 00:57:25.000
just to mention it here because if this provides this possibility i can estimate a density
00:57:25.000 --> 00:57:33.000
function pdf for the residuals and when you look at this either at the blue bars or at the red
00:57:33.000 --> 00:57:42.000
kernel estimate of the pdf then you see that this is certainly not a normal pdf so it doesn't seem
00:57:42.000 --> 00:57:50.000
that the residuals have a normal distribution now i i told you that there is no necessity
00:57:50.000 --> 00:57:57.000
in empirical work that the residuals should have normal distribution this distribution here is
00:57:57.000 --> 00:58:06.000
clearly skewed to the left right um and this is fine i mean this this may occur uh there there is
00:58:06.000 --> 00:58:12.000
as i say no law which requires the residuals to have a normal distribution still often it is
00:58:12.000 --> 00:58:20.000
advisable to study this issue and see well have we really explained everything in the dependent
00:58:20.000 --> 00:58:27.000
observations um or can we perhaps improve on this estimate here we won't go down further
00:58:27.000 --> 00:58:34.000
in this route because this would require quite a bit of more empirical exercises here but leave
00:58:34.000 --> 00:58:39.000
it at this because there's nothing wrong with a non-normal residual distribution
00:58:41.000 --> 00:58:48.000
rather we move on to multivariate regression so far we just had a binary regression one
00:58:48.000 --> 00:58:53.000
explanatory variable and uh independent variable and now we suppose that we have one dependent
00:58:53.000 --> 00:59:00.000
variable which is doing not apply and we assume that there is more than one explanatory variable
00:59:01.000 --> 00:59:09.000
x1 to xk where i already use this kind of kind of terminology calling the x's here explanatory
00:59:09.000 --> 00:59:17.000
variables which in some sense may already suggest that i interpret them as causal variables which
00:59:17.000 --> 00:59:24.000
explain why but it is not meant this way it's just the usual language to speak of the exercise
00:59:24.000 --> 00:59:31.000
explanatory variables um but i do not want to suggest that this is already causality which we
00:59:31.000 --> 00:59:40.000
establish here only change is we have more than one x so we have k axis the k can of course still be
00:59:40.000 --> 00:59:46.000
just equal to one but what is interesting to us now is k greater than or equal to one so
00:59:46.000 --> 00:59:53.000
actually what is interesting is k greater than one so y is approximately some function f of x1 x2
00:59:53.000 --> 01:00:04.000
up to xk an example may for instance be that y is measured to wages or burgers and you may explain
01:00:04.000 --> 01:00:10.000
the observed wages by different factors for instance by education or by experience or by
01:00:10.000 --> 01:00:16.000
the age of the worker or by the sex of the worker whatever so there may be different explanatory
01:00:16.000 --> 01:00:25.000
variables for the y now suppose again we have n observations on the workers for instance right
01:00:25.000 --> 01:00:32.000
so y i um is the observation on one burger and we have n such observations
01:00:33.000 --> 01:00:41.000
in this case for the explanatory variables we would use two indices x i one is the first
01:00:41.000 --> 01:00:52.000
explanatory variable for the i th burger right and x i k is the kth explanatory variable
01:00:53.000 --> 01:00:59.000
for the i th burger let's say variable one is education and variable k is sex and in the green
01:00:59.000 --> 01:01:04.000
we also have and the variable for experience and education and possibly other variables
01:01:05.000 --> 01:01:13.000
so our regression would be of the form y i is equal to b one times one so just providing for
01:01:13.000 --> 01:01:21.000
constant here b one is not a coefficient of any of the observed axes um b two times x i two
01:01:22.000 --> 01:01:32.000
b three times x i three plus plus plus b k times x i k but actually i have now um i have now not
01:01:32.000 --> 01:01:41.000
used um an x i one here so this one you have to think of as the x i one in this case x i one would
01:01:41.000 --> 01:01:50.000
just be a constant and x i two for instance would be education all right it is very convenient to have
01:01:50.000 --> 01:01:59.000
a matrix notation for this type of data if we just speak about the observations we have for verca i
01:02:00.000 --> 01:02:07.000
we would use a vector x i for all the variables which characterize
01:02:07.000 --> 01:02:18.000
possible uh reasons why uh verca i has a certain uh wage so in this case x i would be a vector one
01:02:18.000 --> 01:02:28.000
x i two x i three x i k so this would be a vector which always has some component x i in it and then
01:02:28.000 --> 01:02:35.000
the column the second index would indicate which variable it is it is a vector of individual
01:02:35.000 --> 01:02:42.000
characteristics in this example setting of explaining wages for wage of uh verca number i
01:02:43.000 --> 01:02:50.000
and we would similarly have a vector of unknown parameters b one up to b k and we denote this
01:02:50.000 --> 01:03:01.000
vector by b so observation i can then be described as y i let's say the wage of verca i is equal to
01:03:01.000 --> 01:03:12.000
x i prime times b plus the error epsilon i which is a scalar right so we would take the transpose
01:03:12.000 --> 01:03:20.000
here of vector x i i denote transposes by primes i would take the transpose of x i prime and then
01:03:20.000 --> 01:03:30.000
multiply by b you see this vector here has k rows so x i prime would have k columns this would match
01:03:30.000 --> 01:03:38.000
well with the fact that the vector b has k rows so x i prime times b is just again again right
01:03:38.000 --> 01:03:47.000
that's uh yeah the the matrix or the vector product of a one by k vector with a k by one
01:03:47.000 --> 01:03:55.000
vector so it is of dimension one by one this is however just for observation i and we have
01:03:55.000 --> 01:04:01.000
n observations so what we may do is that we collect all the wages which we have in one
01:04:01.000 --> 01:04:09.000
vector y then we have y one y two y three up to the wage of the nth uh verca and then we have
01:04:09.000 --> 01:04:15.000
all the verca's characteristics which are now right as x one prime for the first verca x two
01:04:15.000 --> 01:04:24.000
prime for the second verca and finally x n prime for the last the nth verca all these uh
01:04:24.000 --> 01:04:31.000
row vectors x one prime x two prime x n prime are multiplied by a column vector b of unknown
01:04:31.000 --> 01:04:39.000
coefficients and then i collect all the error terms in a vector n by one vector of error terms
01:04:39.000 --> 01:04:47.000
e one up to en and i denote this vector here by just e so more compactly i can write this as y is
01:04:47.000 --> 01:04:57.000
equal to x b plus e and x is this matrix here because this is truly a matrix with n rows and k
01:04:57.000 --> 01:05:04.000
columns whereas y is a n by one column vector and e is an n by one column vector
01:05:05.000 --> 01:05:12.000
x is called the regressor matrix and it is made up of a column of ones which by convention is
01:05:12.000 --> 01:05:20.000
typically the first column and then come all the other explanatory variables ordered by variable
01:05:20.000 --> 01:05:26.000
so the columns they are all the different variables like education experience age
01:05:26.000 --> 01:05:35.000
sex and the rows they are all the verca's um so the second row would give us the characteristics
01:05:35.000 --> 01:05:43.000
of the second verca the sum of the squared residues in matrix notation would be e prime e
01:05:44.000 --> 01:05:49.000
this is just the same thing as yeah some of the squared residues right e prime e is the
01:05:49.000 --> 01:05:54.000
inner product of the vector e so that's some of the squared residuals
01:05:57.000 --> 01:06:01.000
please verify the following identity as an exercise if we take this matrix x in the way
01:06:01.000 --> 01:06:10.000
in which i have to find it then x prime x is the same thing as the sum over all ends of the small
01:06:10.000 --> 01:06:20.000
x's x i x i prime and x prime y is the same thing as the sum over all ends over x i y i
01:06:22.000 --> 01:06:29.000
this x i y i actually well defined yes indeed it is x i is an n by one vector and this is just
01:06:29.000 --> 01:06:34.000
multiplied by a column just by multiplied by a scalar excuse me just multiplied by a scalar
01:06:34.000 --> 01:06:43.000
so a scalar we can always multiply with a column vector right and x prime e so the product of the
01:06:43.000 --> 01:06:53.000
regressor matrix and the vector of of errors is just the same thing as the sum over all the x i's
01:06:53.000 --> 01:06:58.000
which are n by one column vectors um times the e i
01:07:04.000 --> 01:07:11.000
so and the ei is again a scalar okay um so we'll use these identities sometimes so please make
01:07:11.000 --> 01:07:17.000
it clear that you understood that this is true and after you've understood this so basically
01:07:17.000 --> 01:07:23.000
gone through it through it in your mind then verify the first equations in some software
01:07:23.000 --> 01:07:31.000
MATLAB or R or whatever and using any kind of data for x and y obviously this third equation
01:07:31.000 --> 01:07:36.000
you cannot verify because you don't have any data on e typically you don't know what the true errors
01:07:36.000 --> 01:07:46.000
are so we will now write the least squares problem as a problem written in matrix notation
01:07:47.000 --> 01:07:54.000
so our problem is to find the vector b such that the vector product e prime e is minimal
01:07:55.000 --> 01:07:59.000
so the same type of problem is the one we had written in scalars before is now written as
01:07:59.000 --> 01:08:07.000
minimize over choice of vector b e prime e and what is e prime e where we know e the vector of
01:08:07.000 --> 01:08:15.000
um errors is the same thing as y minus x b i'm just using this
01:08:16.000 --> 01:08:26.000
simple equation here e is equal to y minus x b so substituting for e y minus x b
01:08:27.000 --> 01:08:35.000
or for e prime y minus x b prime i get this type of product um you know probably how the prime
01:08:35.000 --> 01:08:43.000
uh is translating to matrix products we would in this case have the same thing as y prime
01:08:43.000 --> 01:08:50.000
minus b prime times x prime because the order of multiplication is reversed when you take the
01:08:50.000 --> 01:08:57.000
transpose of a matrix product of x b the parenthesis prime is the same thing as b prime x prime
01:08:57.000 --> 01:09:04.000
times y minus x b and in this case you can then just multiply out component wise so you would get
01:09:04.000 --> 01:09:13.000
y prime times y is y prime y and y prime times x b with a negative sign is negative y prime x b
01:09:15.000 --> 01:09:23.000
and b prime x prime y with a negative sign is negative b prime x prime y and finally negative
01:09:23.000 --> 01:09:32.000
b prime x prime times negative x b is plus b prime x prime x b it's the usual arithmetic
01:09:32.000 --> 01:09:38.000
of equations which you see here in vector four now there's one thing which is important to note
01:09:38.000 --> 01:09:46.000
because you have y prime x b here and b prime x prime y here both with a negative sign
01:09:47.000 --> 01:09:51.000
at first sight it seems these are different expressions right but in fact they are not
01:09:52.000 --> 01:09:59.000
why are they not different expressions we first note that this expression here is exactly the
01:09:59.000 --> 01:10:06.000
transpose of this expression here or the the opposite way opposite way this expression here
01:10:06.000 --> 01:10:12.000
is the transpose of this expression here right taking the transpose of this thing here would
01:10:12.000 --> 01:10:20.000
mean that you have to reverse the order of the vectors so start with y it would be y prime times
01:10:20.000 --> 01:10:29.000
x times b which is exactly the thing here y prime times x times b so these two are transposes
01:10:29.000 --> 01:10:37.000
transposes to each other and moreover y prime x b is a scalar so this is just a one by one matrix
01:10:37.000 --> 01:10:47.000
right because y prime is an n by sorry y is an n by one vector so y prime is one by n in terms
01:10:47.000 --> 01:10:56.000
of dimension one by n b multiplied by multiplied by the x matrix which is n by k gives a one by k
01:10:56.000 --> 01:11:04.000
matrix the one by k matrix multiplied with b which is a k by one vector gives me a one by one
01:11:04.000 --> 01:11:13.000
result so this is one by one and it is the transpose of this thing here which is also one by one
01:11:13.000 --> 01:11:19.000
obviously the two things are the same the same because a scalar a one by one matrix taking it
01:11:19.000 --> 01:11:24.000
as transpose doesn't change anything right so we can also write this as y prime y
01:11:25.000 --> 01:11:33.000
this thing here minus two times b prime x prime y plus then the fourth term b prime x prime x b
01:11:35.000 --> 01:11:41.000
and we shall minimize we want to minimize this thing here by appropriate choice of b
01:11:41.000 --> 01:11:48.000
obviously minimizing this expression here is done only with respect to the second term
01:11:49.000 --> 01:11:52.000
and the third term because these terms depend on the choice of b
01:11:53.000 --> 01:11:59.000
right we have it here there y prime y is completely independent of b so we can just
01:11:59.000 --> 01:12:05.000
throw it away we just have to minimize negative two b prime x prime y plus b prime x prime x b
01:12:05.000 --> 01:12:14.000
how is this done well minimizing is always means always taking a derivative there are
01:12:14.000 --> 01:12:21.000
rules for matrix differentiation which i give you here without proof you can actually verify
01:12:21.000 --> 01:12:26.000
this just in your mind that this is true but you can also learn this by heart if you like
01:12:26.000 --> 01:12:33.000
if we take the derivative of an expression like b prime and some matrix this case x prime y
01:12:34.000 --> 01:12:41.000
b prime times some matrix and then taking the derivative with respect to b is just equal to
01:12:41.000 --> 01:12:48.000
this matrix x prime b so this is like usual linear derivatives or taking the derivative
01:12:48.000 --> 01:12:54.000
of a linear expression the b prime just vanishes and the matrix in this case x prime y is the
01:12:54.000 --> 01:13:02.000
derivative we take the derivative of a quadratic equation where we have a quadratic expression
01:13:02.000 --> 01:13:09.000
excuse me a quadratic expression where we have b prime x prime x b with respect to b
01:13:09.000 --> 01:13:17.000
then the derivative is two times x prime x so essentially it's always the b prime which
01:13:17.000 --> 01:13:25.000
vanishes here right and it vanishes here and in the case of a quadratic equation you then also
01:13:25.000 --> 01:13:35.000
have this usual term of two as a factor by which the remaining matrix x prime x b is multiplied
01:13:36.000 --> 01:13:44.000
so if we take the derivative of this expression here with respect to b right
01:13:45.000 --> 01:13:52.000
we take this derivative here then the first order condition is minus two times x prime y
01:13:52.000 --> 01:14:01.000
plus two times x prime x b shall be equal to zero obviously we can divide by two
01:14:02.000 --> 01:14:08.000
and we can move the negative term to the right hand side of the equation then we are left with
01:14:08.000 --> 01:14:16.000
x prime x b shall be equal to x prime y this is a whole set of equation actually these are k
01:14:16.000 --> 01:14:23.000
equations for k unknowns namely the k components of the k-dimensional vector b these equations
01:14:23.000 --> 01:14:30.000
the set of equations is called the normal equations and it's very obvious when we want to
01:14:30.000 --> 01:14:37.000
derive what b is we have that we have to solve the normal equations for b which is easily done
01:14:37.000 --> 01:14:45.000
if it is true that the matrix product x prime x is non-singular if we can compute the inverse
01:14:45.000 --> 01:14:54.000
of x prime x so if x prime x inversely exists then we know that b is equal to x prime x inversely
01:14:54.000 --> 01:15:01.000
times x prime y and this is the least squares operator estimator right this is the least squares
01:15:01.000 --> 01:15:11.000
estimator which we have derived here in matrix notation is it trivial that x prime x inversely
01:15:11.000 --> 01:15:18.000
exists do we have any reason to believe that the matrix x prime x is not singular yes indeed we do
01:15:18.000 --> 01:15:26.000
this is typically the case i just asked you to verify this in a very simple MATLAB or any other
01:15:26.000 --> 01:15:34.000
kind of software exercise where you compute b manually oh no sorry this is not yet this exercise
01:15:35.000 --> 01:15:41.000
but but just use MATLAB to compute the b manually as x prime x inversely times x prime y
01:15:42.000 --> 01:15:47.000
for the sample data set which i provided to you where you have here the log of
01:15:48.000 --> 01:15:55.000
private consumption and you regress this on a constant and on the log of gdp and then again
01:15:55.000 --> 01:15:59.000
compare with our earlier evious output and if you have done this exercise correctly then you should
01:15:59.000 --> 01:16:09.000
get the same results as those which we had in our evious output now here comes the question
01:16:09.000 --> 01:16:16.000
is it true that x prime x inversely exists suppose that the x matrix has n rows as we
01:16:16.000 --> 01:16:22.000
have always said and take columns and that n is greater okay so we have more workers which we
01:16:22.000 --> 01:16:28.000
study than we have explanatory variables that should usually be the case in regression analysis
01:16:28.000 --> 01:16:35.000
well if n is greater than k then x is a rectangular matrix and therefore it is certainly singular
01:16:37.000 --> 01:16:44.000
however this doesn't matter for the question of existence of x prime x inversely what doesn't
01:16:44.000 --> 01:16:51.000
matter for the existence of x prime x inversely is the question of whether x has full column rank
01:16:52.000 --> 01:16:55.000
so if it is true that the rank of x is k
01:16:56.000 --> 01:17:06.000
k being smaller than n but at least the rank must be k because if the rank of x is k so if the columns
01:17:06.000 --> 01:17:13.000
of matrix x are all linearly independent then it is true that the rank of x prime x is equal to k
01:17:14.000 --> 01:17:20.000
and if the rank of x prime x is equal to k then x prime x prime x inversely exists because
01:17:21.000 --> 01:17:31.000
x prime x is a k by k matrix right and so the fact that the rank of x prime x is equal to k
01:17:31.000 --> 01:17:40.000
would then mean that k has full rank but then x prime x has full rank k right so if x has full
01:17:40.000 --> 01:17:48.000
column rank if the rank of x is equal to k then x prime x inversely exists and that is almost
01:17:48.000 --> 01:17:56.000
always the case unless you have an exact linear dependency among your regresses which means that
01:17:56.000 --> 01:18:02.000
you have a superfluous regressor in your regressor matrix which you can easily
01:18:02.000 --> 01:18:06.000
kick out because you know all the information this regressor is contained in some of the other
01:18:07.000 --> 01:18:15.000
regressors in just the same way to convince you of this fact perhaps the best thing is that
01:18:15.000 --> 01:18:24.000
in your software be it MATLAB or whatever just generate random matrices of size n by k with n
01:18:24.000 --> 01:18:30.000
greater than k call these matrices x right the content of these matrices are just random
01:18:30.000 --> 01:18:34.000
numbers all these programs have random number generators so it's not easy not difficult
01:18:35.000 --> 01:18:44.000
very easy actually to generate random matrices x of size n by k and then we just take random numbers
01:18:44.000 --> 01:18:54.000
verify that x prime x has always rank k so the inverse of x prime x always exists
01:18:55.000 --> 01:19:03.000
right i mean i offer you bet that you won't be able to find any x prime x matrix drawn from
01:19:03.000 --> 01:19:11.000
random numbers right which would have rank lower than k so it's always the case if they are random
01:19:11.000 --> 01:19:23.000
numbers that you will have rank k for x prime x now one important thing is that the inverse of
01:19:23.000 --> 01:19:33.000
x of course does not exist because as i said x is singular there is however the generalization of
01:19:33.000 --> 01:19:41.000
the concept of a an inverse matrix um this generalization of a concept of an inverse matrix
01:19:41.000 --> 01:19:51.000
is the so-called pseudo inverse which is denoted by a plus for some given matrix a it is actually
01:19:51.000 --> 01:19:59.000
the case that every matrix or also the the non-singular matrices really every matrix a
01:20:00.000 --> 01:20:08.000
has a so-called pseudo inverse which is denoted by a plus and this pseudo inverse is also called
01:20:08.000 --> 01:20:18.000
the Moore-Penrose inverse after its two discoverers Moore and Penrose. Moore discovered this
01:20:19.000 --> 01:20:26.000
inverse already in the 1920s but nobody understood his publication and was forgotten and Penrose
01:20:26.000 --> 01:20:33.000
then discovered it again in the 1950s unaware of the work of Moore and developed its properties
01:20:33.000 --> 01:20:39.000
and only after Penrose has published his papers other people came up and said hey this looks like
01:20:39.000 --> 01:20:43.000
what Moore has published years ago so that's why it's called the the Moore-Penrose inverse.
01:20:45.000 --> 01:20:52.000
What exactly is a Moore-Penrose or pseudo inverse for a matrix x for a matrix a?
01:20:53.000 --> 01:21:01.000
So given any matrix a there is just one matrix a plus this is why I write it is a unique matrix
01:21:01.000 --> 01:21:09.000
there's just one matrix a plus and there always exists such a matrix a plus which satisfies four
01:21:09.000 --> 01:21:17.000
conditions which I have listed here. One condition is that a is the same thing as a times a plus
01:21:18.000 --> 01:21:25.000
times a right and if you just want to compare this with the regular inverse if a plus were the
01:21:25.000 --> 01:21:32.000
regular inverse then clearly a times a inverse is just the identity matrix so this we can forget
01:21:33.000 --> 01:21:39.000
and then a is equal to a all right the regular inverse also satisfies all these four properties
01:21:39.000 --> 01:21:43.000
which you actually shall show in the exercise which is down here as the first task in this
01:21:43.000 --> 01:21:50.000
exercise all right so four properties one property a is equal to a times a plus a
01:21:51.000 --> 01:21:56.000
second property a plus is equal to a plus times a times a plus
01:21:57.000 --> 01:22:06.000
third property a plus times a is symmetric so a plus times a is the same thing as a plus times a
01:22:06.000 --> 01:22:16.000
transposed and fourth property a times a plus is also symmetric so it's the same thing as a times a
01:22:16.000 --> 01:22:25.000
plus transpose note that a plus a is not the same as a a plus right the order of multiplication there
01:22:25.000 --> 01:22:31.000
plays a role in the case of a plus b the regular inverse of course this would be the same thing
01:22:31.000 --> 01:22:38.000
but for the generalized inverse the pseudo inverse this is not the same thing so there for any matrix
01:22:38.000 --> 01:22:48.000
a there exists one and only one matrix a plus which satisfies all these equations here and
01:22:48.000 --> 01:22:57.000
actually you can solve systems of linear equations using the pseudo inverse which in many cases then
01:22:57.000 --> 01:23:04.000
involves using some type of solution a plus times a so the a plus matrix really functions
01:23:04.000 --> 01:23:10.000
in a very similar sense in a very similar way as the regular inverse but i won't go into
01:23:10.000 --> 01:23:17.000
the general theory of solving linear equations but the Moore-Penrose inverse is important because
01:23:17.000 --> 01:23:26.000
it comes up very naturally in the derivation of the least squares estimator so to better understand
01:23:27.000 --> 01:23:33.000
the Moore-Penrose inverse please go through this exercise here the first part of it i have already
01:23:33.000 --> 01:23:43.000
mentioned then we show that a plus is equal to the inverse of a if a is regular so if a has full
01:23:43.000 --> 01:23:54.000
rank and therefore is non-singular and then second show that the pseudo inverse of some matrix x
01:23:54.000 --> 01:24:03.000
is equal to x prime x inverse c times x prime if the matrix x has full column rank so basically
01:24:03.000 --> 01:24:11.000
if x prime x inverse c exists right in this case x plus is the same thing as x prime x inverse c
01:24:11.000 --> 01:24:19.000
x prime and obviously x prime x inverse x prime is exactly the expression we had here in our
01:24:19.000 --> 01:24:25.000
derivation of the least squares estimator there's x prime x inverse c times x prime so this is the
01:24:25.000 --> 01:24:33.000
Moore-Penrose inverse of x and this is being multiplied then by the vector of observations
01:24:33.000 --> 01:24:45.000
y to give the optimal estimates for the b vector all right another MATLAB-based exercise
01:24:45.000 --> 01:24:58.000
generate again random matrices of size n by k and verify numerically that x prime x inverse c times
01:24:58.000 --> 01:25:06.000
x prime satisfies all the properties one to four which define a Moore-Penrose inverse so verify
01:25:06.000 --> 01:25:10.000
that x prime x inverse c times x prime is actually x plus
01:25:10.000 --> 01:25:23.000
as I already said the least squares estimator is simply b written as x plus times y there's
01:25:23.000 --> 01:25:30.000
a very simple intuition for that if it were the case that n is equal to k so we have as many
01:25:30.000 --> 01:25:37.000
observations as we have explanatory variables and if the x matrix were regular and regular is the
01:25:37.000 --> 01:25:44.000
same thing as non-singular and extra regular then we could just set b equal to x inverse c times y
01:25:45.000 --> 01:25:52.000
right so all the errors e would in this case be zero but in the usual case in our usual applications
01:25:52.000 --> 01:25:57.000
we have many more observations hopefully then we have professors so if n is greater than k
01:25:57.000 --> 01:26:03.000
we simply replace this x inverse c here in this thought experiment
01:26:04.000 --> 01:26:11.000
the squares estimator we simply replace the x inverse here by the Moore-Penrose
01:26:11.000 --> 01:26:17.000
inverse and therefore we have a generalization of the concept of an inverse matrix here
01:26:20.000 --> 01:26:27.000
now how would we write the resilience after we have derived the squares estimator we know the
01:26:27.000 --> 01:26:35.000
e whose sum of squares we have minimized the e is just the difference between y and x b
01:26:36.000 --> 01:26:43.000
initially we did not know what b would be but now we have an estimate for b which is the least
01:26:43.000 --> 01:26:52.000
squares estimate so we can compute y minus x b b is x plus times y so this is y minus x x plus
01:26:52.000 --> 01:27:00.000
times y right x plus y is the least squares estimate okay in this case in this case it's
01:27:00.000 --> 01:27:09.000
still the estimator y minus x x plus y is equal to e so obviously we can factor out the y vector
01:27:09.000 --> 01:27:17.000
here and write this as identity matrix i minus x x plus in parentheses times y
01:27:17.000 --> 01:27:26.000
this matrix here which appears often in econometrics identity matrix denoted by i minus x x plus
01:27:26.000 --> 01:27:36.000
i call matrix m so i can write e is equal to m times y now important note three important
01:27:36.000 --> 01:27:45.000
properties of this matrix m here first thing is that x prime m is equal to zero so this m matrix
01:27:45.000 --> 01:27:52.000
and the regressor matrix are orthogonal to each other why is x prime m equal to zero
01:27:52.000 --> 01:28:01.000
well very easy if you multiply from the left side x prime to this matrix parenthesis here
01:28:01.000 --> 01:28:07.000
to this m matrix here you would have x prime times identity matrix is x prime so x prime
01:28:08.000 --> 01:28:19.000
minus x prime x times x plus but x plus is x prime x inverse e times x prime
01:28:20.000 --> 01:28:26.000
so we would have x prime x times x prime x inverse e which cancels times x prime
01:28:27.000 --> 01:28:32.000
so we would have x prime minus x prime which is zero right so x prime m is equal to zero
01:28:32.000 --> 01:28:38.000
and the same thing holds when you multiply x from the other side from the right hand side
01:28:38.000 --> 01:28:47.000
to matrix m so m times x is also equal to zero as you can very clearly convince yourself
01:28:48.000 --> 01:28:56.000
and the third property is that m is a so-called idempotent matrix namely it is the case that m
01:28:56.000 --> 01:29:04.000
times m is equal to m you can also easily verify that by just multiplying out i minus x x plus times
01:29:04.000 --> 01:29:09.000
i minus x x plus you'll see that i minus x x plus times i minus x x plus is equal to i
01:29:09.000 --> 01:29:18.000
minus x x plus so m by m i m times m is equal to m this is the property we call idempotent matrix
01:29:18.000 --> 01:29:25.000
m is idempotent regardless how many times we multiply m by itself and the result is always m
01:29:26.000 --> 01:29:35.000
as exercises before we conclude today's lecture show that these properties which i just
01:29:35.000 --> 01:29:43.000
have introduced you to imply that x prime e is equal to zero so the regressor matrix
01:29:43.000 --> 01:29:50.000
is orthogonal to the errors which we have estimated to the regression resilience
01:29:50.000 --> 01:29:59.000
further show that the sum of the ei's of the regression residuals is equal to zero
01:30:00.000 --> 01:30:04.000
if the regression contains a constant term that's an important issue if the regression
01:30:04.000 --> 01:30:11.000
contains a constant term so if one of the columns of x is constant just a vector of ones
01:30:12.000 --> 01:30:17.000
then the sum of all the errors will be equal to zero
01:30:20.000 --> 01:30:28.000
well and again in MATLAB or R or whatever matrix language you like to use verify the
01:30:28.000 --> 01:30:33.000
properties a and b for our sample regression of the log of private consumption of the log of
01:30:34.000 --> 01:30:41.000
gdp and a constant so verify that a and b will not hold if you do not include a constant in
01:30:41.000 --> 01:30:47.000
the regression this is part of of problem c in this exercise
01:30:51.000 --> 01:30:54.000
that's it for today are there any questions
01:30:58.000 --> 01:31:04.000
raise your hand if there is a question otherwise you have the opportunity of
01:31:05.000 --> 01:31:08.000
raising a question tomorrow when i start the lecture in the afternoon
01:31:10.000 --> 01:31:15.000
i assume that most of what i did in the first part of this lecture was well known to you and
01:31:15.000 --> 01:31:20.000
perhaps the second part was not quite as well known in particular if you haven't had
01:31:21.000 --> 01:31:28.000
matrix notation for econometrics so far in this case please work well through what i have presented
01:31:28.000 --> 01:31:35.000
to you today because i will work matrix notation for a long time now and you should be well familiar
01:31:35.000 --> 01:31:42.000
with it i do not see any questions so thank you very much for your attention and until tomorrow
01:31:42.000 --> 01:31:45.000
i have to