WEBVTT - autoGenerated
00:00:00.000 --> 00:00:08.000
Okay, and here is the definition of the mean squared error once again as the expectation
00:00:08.000 --> 00:00:18.000
of the difference between W and theta squared. So basically it is the error made in estimating
00:00:18.000 --> 00:00:26.000
by producing an estimate W or small w and comparing the difference between the estimate
00:00:26.000 --> 00:00:33.000
and the true value of the parameter. We square this difference in order to make sure that
00:00:33.000 --> 00:00:39.000
positive and negative errors don't cancel and take the expectation of the squared error.
00:00:39.000 --> 00:00:48.000
That's where the name comes from, mean squared error. Okay, and then let's move on. One thing
00:00:48.000 --> 00:00:55.000
to note about this is that in computing this mean squared error, we subtract theta from W,
00:00:55.000 --> 00:01:03.000
but we do not subtract the expected value of W from W, right, unless theta is equal to the
00:01:03.000 --> 00:01:12.000
expected value of W. So unless the estimator W is unbiased. Now, if we did that, obviously,
00:01:12.000 --> 00:01:24.000
if we subtracted the expected value of W from W, so if we had W minus its expected value here,
00:01:24.000 --> 00:01:30.000
then obviously we would just have the variance of the estimator, not the mean squared
00:01:31.000 --> 00:01:38.000
error. So it makes sense to subtract the parameter, excuse me, it makes sense to subtract
00:01:38.000 --> 00:01:44.000
the parameter from the estimator rather than the expectation of the estimator from the estimator W.
00:01:45.000 --> 00:01:52.000
And I also mentioned already last time that the MSE does not allow for an ambiguous ranking of
00:01:52.000 --> 00:02:01.000
estimators across all possible values of theta. So across all the possible true values of theta,
00:02:01.000 --> 00:02:08.000
but then rather we may again have the property that the mean squared error of some estimator W1
00:02:08.000 --> 00:02:15.000
is smaller than the MSE of an alternative estimator W2 for a particular value of theta,
00:02:15.000 --> 00:02:24.000
say for theta a, but that the converse is true for some other value of theta, say theta b,
00:02:24.000 --> 00:02:33.000
which is different from theta a. This is clear that something like this may happen because the
00:02:34.000 --> 00:02:40.000
definition of the mean squared error not only depends on the estimator, it also depends on theta.
00:02:42.000 --> 00:02:47.000
The definition of the mean squared error also depends on theta, so actually it's slightly
00:02:47.000 --> 00:02:57.000
incorrect to write here MSE of W, we should write the MSE of W and theta, so it depends obviously
00:02:57.000 --> 00:03:04.000
on theta. Therefore, such a notation MSE of W theta is actually more appropriate than just writing
00:03:04.000 --> 00:03:15.000
MSE of W. Now note that we can decompose the MSE, and this is one of the slides by the way where
00:03:15.000 --> 00:03:20.000
I've made slight changes. I notified you yesterday that there was a new set of slides which you may
00:03:21.000 --> 00:03:28.000
download from Steena, and there are marginal differences as opposed to the formal version,
00:03:28.000 --> 00:03:32.000
but in case you still have the earlier version then you may note that has changed this slide a
00:03:32.000 --> 00:03:39.000
little bit. What I want to show you here is that we can decompose the MSE such that the mean
00:03:39.000 --> 00:03:46.000
squared error is actually the variance of the estimator plus the squared bias of the estimator,
00:03:47.000 --> 00:03:53.000
and that shows you very clearly that the MSE combines the two concepts of unbiasedness
00:03:53.000 --> 00:04:01.000
expressed here by the squared bias and relative efficiency expressed here by the variance of W.
00:04:02.000 --> 00:04:08.000
So how do we see that the MSE is actually the sum of variance and the squared bias of an estimator?
00:04:08.000 --> 00:04:13.000
Well all we have to do is that we look at the definition of the MSE which is the expectation
00:04:14.000 --> 00:04:21.000
of W minus theta squared, and then obviously what we do is that we write this square here
00:04:23.000 --> 00:04:32.000
in the single terms, no sorry, that we that we extend this expression here by adding and subtracting
00:04:32.000 --> 00:04:41.000
the same magnitude. So here we again have W minus theta and that is squared, and then I have
00:04:41.000 --> 00:04:49.000
squared, and then I have added the expectation of W and I have subtracted the expectation of W.
00:04:49.000 --> 00:04:56.000
So the second and the third term in the sum here actually cancel. The square here refers
00:04:56.000 --> 00:05:04.000
to the parentheses which are around the whole sum here, so indeed minus E of W and plus E of W
00:05:04.000 --> 00:05:11.000
they just cancel. We have the same expression as we had here. I've just written in such a way that
00:05:11.000 --> 00:05:18.000
we can now split this up in two components which are to become the variance and the squared bias
00:05:18.000 --> 00:05:27.000
because obviously if we now take the expectation of this squared expression here, then we can
00:05:27.000 --> 00:05:35.000
multiply out the term which is in the first pair of parentheses here which is squared.
00:05:35.000 --> 00:05:42.000
So we get then the expectation of W minus E of W squared and that is of course the variance of W
00:05:43.000 --> 00:05:53.000
plus the expectation of this term here squared E of W minus theta squared.
00:05:54.000 --> 00:06:02.000
Well if we take the expectation of two things which both of them are just real numbers, so
00:06:02.000 --> 00:06:06.000
the expectation of W is not a random variable anymore, it's a real number and theta is a real
00:06:06.000 --> 00:06:12.000
number, then we can just forget this first expectations operator here. We can write this
00:06:12.000 --> 00:06:19.000
as the expectation of W minus theta and the expectation of W minus theta is of course then
00:06:20.000 --> 00:06:34.000
the bias of the estimator plus two times the covariance, so the cross product of this term
00:06:34.000 --> 00:06:43.000
here and this term here. Therefore we have to compute what the covariance is and it turns out
00:06:43.000 --> 00:06:54.000
that it is zero. As you can see here we have to multiply W minus E of W and E of W minus theta,
00:06:54.000 --> 00:07:00.000
so these two product terms we have to multiply. If you multiply this out then you get the expression
00:07:00.000 --> 00:07:11.000
which is down here, W times E of W minus theta W, so this W minus this theta here, minus E of W
00:07:11.000 --> 00:07:18.000
squared, these two products, these two terms multiplied by each other here and then plus theta
00:07:18.000 --> 00:07:26.000
times E of W which is the minus E of W here and the minus theta here and taking the expectation
00:07:26.000 --> 00:07:38.000
of this expression here gives us E of W here, so we have E of W squared which is here minus theta
00:07:38.000 --> 00:07:45.000
times the expectation of W which is this thing here minus E of W squared unchanged from above
00:07:46.000 --> 00:07:51.000
plus theta times the expectation of W and you see that the first and the third term
00:07:51.000 --> 00:07:57.000
cancels and the second and the fourth term cancel so the whole thing is zero and we are left with
00:07:57.000 --> 00:08:05.000
the variance of W and the bias of W squared and that's what I wanted to show you. The mean
00:08:05.000 --> 00:08:12.000
squared error is just the sum of the variance of an estimator plus the squared bias of an estimator.
00:08:15.000 --> 00:08:23.000
Okay here I have repeated that as a property, I want to just say MSE is equal to variance of
00:08:23.000 --> 00:08:31.000
the estimator plus the squared bias. This property or this expression also makes clear that we often
00:08:31.000 --> 00:08:38.000
face a trade-off in an estimation between either having an estimator with larger
00:08:38.000 --> 00:08:46.000
variance or an estimator with having a larger bias. These things occur rather often so
00:08:46.000 --> 00:08:54.000
that we have to make a decision on what is more important to us, reduce the bias of an estimator
00:08:54.000 --> 00:09:00.000
or reduce the variance of an estimator. These two components, the variance of the estimator
00:09:00.000 --> 00:09:07.000
and the bias of the of the estimator are not independent and they may actually be negatively
00:09:07.000 --> 00:09:13.000
correlated in the sense that when we decrease the bias then the variance increases or vice versa
00:09:13.000 --> 00:09:18.000
and then of course the trade-off is difficult. If they were positively correlated obviously it would
00:09:18.000 --> 00:09:23.000
be very easy to say well we decrease both the variance and the bias that would be a desire
00:09:23.000 --> 00:09:25.000
but that is often not possible.
00:09:28.000 --> 00:09:32.000
Any questions up to here for the finite properties of estimators?
00:09:36.000 --> 00:09:45.000
Don't see anything? Then let's continue with the asymptotic properties of estimators. So bias and
00:09:45.000 --> 00:09:51.000
relative efficiency and mean squared errors were properties which can be applied to any sample size.
00:09:52.000 --> 00:09:58.000
Now we will ask the questions how does the estimator behave when the sample size increases
00:09:58.000 --> 00:10:07.000
to infinity and here the main property that we will deal with is the property of consistency
00:10:07.000 --> 00:10:18.000
which I define here in this green box. So let wn be an estimator of unknown parameter theta
00:10:19.000 --> 00:10:25.000
based on a sample of n observations. This is why we have the index n now for the estimator.
00:10:25.000 --> 00:10:34.000
So given n observations here we're still written as the random variables of n drawings which we
00:10:34.000 --> 00:10:39.000
produced in the n observations. So observe that these are still the capital letters so these are
00:10:39.000 --> 00:10:45.000
still the random variables for the observations. Given n such random variables we can construct
00:10:45.000 --> 00:10:56.000
an estimator which depends on the n variables so that is an estimator of size n. We then say
00:10:56.000 --> 00:11:07.000
that wn or to be more precise that the sequence of estimators wn with n increasing and going towards
00:11:07.000 --> 00:11:14.000
infinity. We then say that this sequence or loosely speaking that the estimator wn is a
00:11:14.000 --> 00:11:21.000
consistent estimator of the unknown parameter theta if and only if this is why I write the
00:11:21.000 --> 00:11:28.000
if you with two f's if and only if for every small number epsilon greater than zero it is true that
00:11:28.000 --> 00:11:41.000
the probability of the estimator being greater or smaller being greater and absolute value than
00:11:41.000 --> 00:11:48.000
parameter theta is greater than epsilon that this probability converges to zero as n goes to infinity.
00:11:50.000 --> 00:11:54.000
So this here is always a little complicated to understand when you see it
00:11:54.000 --> 00:12:00.000
for the first time so let me explain this expression again. What do we look at here? We
00:12:00.000 --> 00:12:09.000
look at the difference between the estimator and the true value of the parameter and to make things
00:12:09.000 --> 00:12:16.000
easy we just look at the absolute value of this difference so it is the absolute value of the
00:12:16.000 --> 00:12:26.000
deviation between the estimator of size n and the true parameter. And now we ask what is the
00:12:26.000 --> 00:12:33.000
probability that this difference here is still rather large. Rather large means in this case
00:12:33.000 --> 00:12:41.000
for some epsilon which we think of as typically a small number we would like to know what is the
00:12:41.000 --> 00:12:49.000
probability that the deviation of the estimator from its true values is in absolute terms
00:12:49.000 --> 00:12:57.000
still larger than this epsilon which would mean that we say well the estimator is still
00:12:59.000 --> 00:13:07.000
that the estimator is still distant from the true value by some sizable amount greater
00:13:07.000 --> 00:13:13.000
than this epsilon. We look at the probability of this event occurring. Obviously when this event
00:13:13.000 --> 00:13:20.000
does not occur then the probability is zero so it is desirable to have as low a probability for
00:13:20.000 --> 00:13:29.000
this event as is possible. What we would like to have is that the probability converges to zero
00:13:29.000 --> 00:13:36.000
as the sample size and increases as we have more and more observations available. We would like
00:13:37.000 --> 00:13:44.000
this probability to go to zero regardless of how small we choose this epsilon. This is why we
00:13:44.000 --> 00:13:51.000
here write for every epsilon great angle even if we choose a very very small epsilon we want that
00:13:51.000 --> 00:13:59.000
the probability of the deviation being still larger than the small epsilon is very very small
00:13:59.000 --> 00:14:07.000
goes to zero as n goes to infinity. If this property is satisfied for a sequence of estimators
00:14:07.000 --> 00:14:15.000
wn with n increasing then we have a shorthand notation for this which is the plim of wn
00:14:16.000 --> 00:14:25.000
is equal to theta. So we also say that the probability limit for wn is theta and the
00:14:25.000 --> 00:14:33.000
probability limit means the probability converges in the limit against zero.
00:14:35.000 --> 00:14:43.000
We also call this convergence in probability either the probability limit of wn is theta or
00:14:43.000 --> 00:14:50.000
we have convergence of wn in probability to the true value theta. This is a particular
00:14:51.000 --> 00:14:58.000
concept for the convergence of random variables. We had already at least implicitly
00:14:59.000 --> 00:15:07.000
encountered a different concept for the convergence of random variables when I taught you the
00:15:10.000 --> 00:15:18.000
central limit theorem. No sorry I didn't teach you that yet so I will teach it to you today.
00:15:19.000 --> 00:15:27.000
So convergence in distribution that a particular sequence of random variables converges in
00:15:27.000 --> 00:15:34.000
distribution to the distribution of a given limiting random variable is a different concept
00:15:34.000 --> 00:15:42.000
of convergence of random variables which is actually implied by this plim converges which
00:15:42.000 --> 00:15:48.000
we have here. So the shorthand notation is the typical limb but with a p
00:15:51.000 --> 00:16:01.000
at the beginning of this expression so plim or simply the plim of wn is equal to theta if this
00:16:01.000 --> 00:16:10.000
condition here holds for all epsilons greater than zero. We call an estimator a consistent estimator
00:16:10.000 --> 00:16:17.000
if the plim of wn is equal to theta. Therefore it is reasonable to say the estimator is
00:16:17.000 --> 00:16:27.000
consistent for theta. Again the property also depends on theta. If an estimator wn is not
00:16:27.000 --> 00:16:36.000
consistent for theta then we say that is inconsistent. Now here as an example the
00:16:36.000 --> 00:16:43.000
sampling distributions of a sequence of estimators where the number of observations
00:16:43.000 --> 00:16:50.000
in the sample increases. That's a more or less fictitious estimator and it's just for
00:16:50.000 --> 00:16:55.000
illustrative purposes that I give you those distributions here. So suppose we have an
00:16:55.000 --> 00:17:02.000
estimator which is intended to estimate the true parameter value theta which we have here.
00:17:02.000 --> 00:17:14.000
Then the realizations of the estimator are small w. So these are what we get as a result
00:17:14.000 --> 00:17:22.000
by applying the formula for the estimator to the sample observations that we have and it may be
00:17:22.000 --> 00:17:29.000
the case that for a small sample of say four observations we have a distribution of the
00:17:29.000 --> 00:17:35.000
estimates which is given by this probability density function here by this red function here.
00:17:38.000 --> 00:17:48.000
So you see this is a non-symmetrical density function. You see that it is actually skewed
00:17:48.000 --> 00:17:55.000
to the right skewed to the right to steep on the left-hand side of the distribution
00:17:55.000 --> 00:18:00.000
and it may well be actually I think it is constructed that way that the expectation
00:18:00.000 --> 00:18:06.000
of the estimator is equal to theta. But you see there is a great probability that we are quite
00:18:06.000 --> 00:18:13.000
far off in our estimates from the true value of theta so we can be rather closer to the origin
00:18:13.000 --> 00:18:21.000
or we can be way off here with relatively high probability. Now increasing the number of
00:18:21.000 --> 00:18:29.000
observations so the example here goes to say n is equal to 16. We have a probability density
00:18:29.000 --> 00:18:38.000
function of the estimator which is given by this brown line here which is still skewed to the right
00:18:39.000 --> 00:18:48.000
steep on the left-hand side of the pdf which possibly still has an expected value of theta
00:18:48.000 --> 00:18:54.000
even though I would say it doesn't really look like that. So perhaps it is actually biased but
00:18:54.000 --> 00:19:00.000
you see that the variance is smaller. The variance has decreased relative to the variance we had
00:19:00.000 --> 00:19:09.000
for n equal to 4. So the likelihood of very extreme values for our estimator has decreased
00:19:09.000 --> 00:19:17.000
considerably. And then increasing the number of observations even more here to n equal to 40
00:19:17.000 --> 00:19:23.000
we get this rather symmetrical looking pdf the blue one here which has even smaller
00:19:24.000 --> 00:19:32.000
variance and which may have smaller bias at least than the brown curve here. So when we
00:19:32.000 --> 00:19:38.000
continue with this exercise and increase the number of observations even further it may well
00:19:38.000 --> 00:19:45.000
be that eventually we get a probability density function which is very narrowly focused about
00:19:45.000 --> 00:19:53.000
the value of theta so that the variance becomes smaller and smaller and possibly the bias also
00:19:53.000 --> 00:20:00.000
becomes smaller and smaller and asymptotically we actually estimate the true value of theta
00:20:00.000 --> 00:20:05.000
with probability one. That would be the idea of a consistent estimator.
00:20:05.000 --> 00:20:17.000
So this said let me give you a brief comparison between unbiasedness the finite sample property
00:20:17.000 --> 00:20:26.000
and consistency the infinite sample the asymptotic property. Unbiasedness as I just said is a finite
00:20:26.000 --> 00:20:35.000
sample property. It can be evaluated for any finite sample size. If I write it true for any
00:20:35.000 --> 00:20:42.000
finite sample size this does not mean that every finite sample estimator is unbiased but it means
00:20:42.000 --> 00:20:49.000
that we can check whether the property of unbiasedness is true for any given sample size.
00:20:49.000 --> 00:20:56.000
The consistency on the other hand is just an infinite sample property so actually property
00:20:56.000 --> 00:21:01.000
which we never will have because we never have infinitely many observations but we may come close
00:21:02.000 --> 00:21:10.000
to the properties of an estimator with a infinitely great sample so we can think of
00:21:10.000 --> 00:21:16.000
consistency as an approximation to the properties of the estimator which we actually deal with
00:21:17.000 --> 00:21:23.000
if our sample is sufficiently great and I will show you that sometimes if we are lucky
00:21:23.000 --> 00:21:31.000
asymptotic properties are already very close to the sample size we work with
00:21:31.000 --> 00:21:39.000
for relatively small sample sizes like 30 or 40 observations which are very easy to have even in
00:21:39.000 --> 00:21:47.000
economics. And so the assumption here with consistency is that sample size approaches
00:21:49.000 --> 00:21:58.000
the intuition for the unbiasedness is the question if we could observe many different samples of the
00:21:58.000 --> 00:22:06.000
same finite size would then the resulting distribution of the estimates have a mean
00:22:06.000 --> 00:22:13.000
equal to the true value. So going back to this thing here this is exactly the type of question
00:22:13.000 --> 00:22:24.000
I have just posed. Suppose we have many many many different samples of size 4 all of them
00:22:24.000 --> 00:22:31.000
independently drawn from the same population so many many times we are in a position to draw n
00:22:31.000 --> 00:22:38.000
samples in this case four samples from the same population then we would compute the estimator
00:22:38.000 --> 00:22:47.000
and would tabulate how the estimates spread out over a certain range of numbers or how they spread
00:22:47.000 --> 00:22:54.000
out in the range of real numbers and then we would get a certain histogram which may approach
00:22:54.000 --> 00:22:59.000
this probability density function here actually the histogram may be an estimator of the PDF.
00:23:00.000 --> 00:23:08.000
So given many different samples of the same fixed size in this case n equals equal to 4 the question
00:23:08.000 --> 00:23:16.000
is if on average the estimates which we obtain for the many different samples that we have drawn
00:23:16.000 --> 00:23:25.000
if on average these estimates are equal to theta that's the concept of unbiasedness.
00:23:26.000 --> 00:23:31.000
I repeat the question here again if we could observe many different samples of the same
00:23:31.000 --> 00:23:37.000
finite size would the resulting distribution of estimates have a mean so unexpected value
00:23:37.000 --> 00:23:46.000
equal to the true value that's the concept of unbiasedness. So here we basically let the number
00:23:46.000 --> 00:23:54.000
of samples go to infinity but the size of the samples is fixed. By contrast in consistency
00:23:54.000 --> 00:24:01.000
the intuition is if we could make one sample size as large as we want so we just draw one sample
00:24:02.000 --> 00:24:08.000
but the size is as large as we want and actually it is almost infinitely large
00:24:09.000 --> 00:24:13.000
does in this case our estimate perhaps there's a better right estimate here
00:24:14.000 --> 00:24:23.000
get arbitrarily close to the true value so that's the estimator converge to the true value even in
00:24:23.000 --> 00:24:31.000
theory. So these are the two very different questions which you have to connect with the
00:24:31.000 --> 00:24:40.000
concepts of unbiasedness and consistency. Now note this is the new slide which I have introduced
00:24:40.000 --> 00:24:46.000
yesterday expanding a little bit on the remark which I had in the previous set of slides.
00:24:47.000 --> 00:24:57.000
Note that an estimator can be biased and consistent as well as an estimator can be
00:24:57.000 --> 00:25:07.000
biased it can be unbiased but inconsistent so we may either have the undesirable property
00:25:07.000 --> 00:25:15.000
of biasedness despite of the fact that asymptotically the estimator converges
00:25:15.000 --> 00:25:20.000
in probability to the true value it may still be biased in all finite samples
00:25:21.000 --> 00:25:29.000
or we may have the undesirable property of inconsistency even though the estimator is
00:25:29.000 --> 00:25:36.000
unbiased in all finite samples and I give you two examples which should make this very clear.
00:25:36.000 --> 00:25:41.000
The first example you should actually know well from your statistic lectures in undergraduate
00:25:42.000 --> 00:25:52.000
comics. Suppose you estimate the variance of a sample by taking the mean of the squared
00:25:52.000 --> 00:26:00.000
difference between the observations and the sample. So first you compute the sample mean y n bar
00:26:01.000 --> 00:26:10.000
and then you subtract the sample mean from each of the n observations y i minus y bar n this n
00:26:10.000 --> 00:26:16.000
here just tells you that we use n observations to compute the sample mean. So for each y i you
00:26:16.000 --> 00:26:23.000
subtract the sample mean and square this difference and then we add up all the square differences
00:26:23.000 --> 00:26:31.000
and well average over the sum so that is one estimator possible estimator of the
00:26:31.000 --> 00:26:40.000
variance of the sample we call this typically sigma y hat squared and we know and you should
00:26:40.000 --> 00:26:47.000
probably know that this estimator is biased in finite samples because we should not divide by n
00:26:47.000 --> 00:26:52.000
here in order to have a finite and unbiased estimator we should divide by n minus one.
00:26:53.000 --> 00:26:59.000
The degrees of freedoms associated with this estimator here are actually n minus one
00:26:59.000 --> 00:27:07.000
because we lose one degree of freedom already by estimating the sample mean. I think you have gone
00:27:07.000 --> 00:27:14.000
through with your statistics professor in your undergraduate education you also so you know that
00:27:14.000 --> 00:27:27.000
the estimator is biased and the factor of bias is actually n over n minus one. So the bias goes
00:27:27.000 --> 00:27:32.000
to zero as the sample size increases asymptotically it doesn't really make a difference whether you
00:27:34.000 --> 00:27:43.000
divide by n or by n minus one if n goes to infinity both of these devices become so large that
00:27:45.000 --> 00:27:52.000
the difference is actually negligible so we have the consistency for very very large sample sizes
00:27:52.000 --> 00:27:59.000
but each finite sample size has a property or the estimator applied to each finite sample size
00:27:59.000 --> 00:28:06.000
has the property of being biased so that's an example where we have biases in all finite samples
00:28:06.000 --> 00:28:12.000
but we have consistency. Perhaps less well known is the second example where we have an estimator
00:28:12.000 --> 00:28:21.000
which is unbiased in all finite samples but still it isn't consistent and this is a little bit of a
00:28:21.000 --> 00:28:27.000
weird estimator which I give you here as an example. Suppose we want to estimate the sample mean
00:28:29.000 --> 00:28:36.000
by this estimator root tilde of y so for some sample y we have the estimator root tilde
00:28:36.000 --> 00:28:43.000
and this estimator is just constructed as the mean of the first observation and the last observation.
00:28:43.000 --> 00:28:52.000
This estimator is not a good estimator not at all quite a weird idea to use it but it is an unbiased
00:28:52.000 --> 00:29:01.000
estimator because the expected value of y1 is y and the expected value of yn is also y so the
00:29:02.000 --> 00:29:12.000
expected value of the sum here is 2y multiplied by 0.5 the sorry the the I misspoke sorry the
00:29:13.000 --> 00:29:22.000
expected value of y1 is mu and the expected value of yn is mu and so the expected value of the sum
00:29:22.000 --> 00:29:30.000
is 2 mu and this multiplied by 0.5 is of course than just mu so we see that this estimator here
00:29:30.000 --> 00:29:40.000
is unbiased in each and every sample for any finite sample but it is not a consistent estimator
00:29:41.000 --> 00:29:46.000
and the reason for this estimator not being consistent is that we don't make use of the
00:29:46.000 --> 00:29:57.000
full sample estimation. Why we always use the last observation and this last observation keeps
00:29:57.000 --> 00:30:04.000
changing when n grows we always throw away all the information by all the other observations
00:30:04.000 --> 00:30:12.000
except the first and the last so the the variance of this estimator does not decrease
00:30:12.000 --> 00:30:17.000
over time and when the variance of the estimator does not decrease over time then obviously the
00:30:17.000 --> 00:30:25.000
probability of extreme estimates so the probability of the deviation of the estimator from
00:30:26.000 --> 00:30:36.000
its from the true parameter does not go to zero so this probability is basically a constant since
00:30:36.000 --> 00:30:40.000
the variance is staying constant and therefore the estimator is inconsistent
00:30:43.000 --> 00:30:53.000
which tells you that biasness does not imply consistency and consistency does not sorry
00:30:53.000 --> 00:30:58.000
unbiasedness does not imply consistency and consistency does not imply unbiasedness
00:31:00.000 --> 00:31:12.000
there is no superiority of one of these two concepts over the other however there is an
00:31:12.000 --> 00:31:21.000
important special case which establishes such a superiority namely an unbiased estimated
00:31:21.000 --> 00:31:30.000
estimator which converges in probability is always consistent so provided that the unbiased
00:31:30.000 --> 00:31:38.000
estimator actually has a plim as a limit in probability then the unbiased estimator is
00:31:38.000 --> 00:31:46.000
always consistent and therefore in that case the unbiasedness is a property which is superior which
00:31:46.000 --> 00:31:54.000
is stronger than the property of consistency this example here is constructed in such a way
00:31:54.000 --> 00:31:59.000
that the unbiased estimator does not converge in probability therefore it is inconsistent
00:32:00.000 --> 00:32:07.000
but if we sorry if we have the probability the property that an estimator converges in
00:32:07.000 --> 00:32:12.000
probability and this estimator is unbiased then consistency is implied
00:32:12.000 --> 00:32:22.000
so in that case the consistency is the weaker criterion and often we
00:32:24.000 --> 00:32:32.000
focus on estimators which do have a probability limit therefore then consistency is in some sense
00:32:32.000 --> 00:32:40.000
the minimum quality requirement that we impose on our choice of estimators if the estimator
00:32:40.000 --> 00:32:47.000
is consistent and unbiased that's even better but that's often impossible to prove so the
00:32:48.000 --> 00:32:52.000
the minimum requirement that we would like to have with estimators is often consistency
00:32:52.000 --> 00:32:58.000
not so much unbiasedness because unbiasedness is in these cases where probability limit exists
00:32:59.000 --> 00:33:02.000
the stronger criterion therefore it is harder to prove
00:33:02.000 --> 00:33:14.000
Now let us come to two important theorems which are used over and over again in statistics and
00:33:15.000 --> 00:33:24.000
metrics and for these which you probably already know I will consider again the estimator of a
00:33:24.000 --> 00:33:31.000
sample of a population mean mu so suppose we have any random variable y
00:33:32.000 --> 00:33:38.000
we just consider the sample average as the estimator for the expectation of y
00:33:38.000 --> 00:33:46.000
for the population mean u so the sample average as I already introduced on the previous slide
00:33:46.000 --> 00:33:53.000
is yn bar so that's just the sum over all the observations yi that we have at our
00:33:53.000 --> 00:34:02.000
disposal so there are n drawings from the same population we have n random variables yi and the
00:34:02.000 --> 00:34:08.000
estimator is the sum over the random variables yi which are random drawings from the population
00:34:08.000 --> 00:34:16.000
divided by n this is the sample mean and we just call this thing estimator w or sometimes wn if you
00:34:16.000 --> 00:34:22.000
want to indicate that this is an estimator depending on n random variables on n independent
00:34:22.000 --> 00:34:32.000
drawings from the population now there is an important theorem the law of large numbers ln
00:34:33.000 --> 00:34:41.000
which says that suppose that y1 y2 yn are independent and identically distributed so iid
00:34:42.000 --> 00:34:50.000
random variables with mean mu so each of them has of course the same mean because they have
00:34:50.000 --> 00:34:58.000
identical distribution and we call them mean mu then it is true that the plim of yn bar
00:34:58.000 --> 00:35:08.000
is equal to mu right so we know that the sample mean yn bar is very simple sample mean here
00:35:09.000 --> 00:35:16.000
this sample mean converges in probability to the true value when the number of observations
00:35:16.000 --> 00:35:23.000
becomes large so that's an asymptotic property which we have here and it's a very very strong
00:35:23.000 --> 00:35:29.000
property actually because we have made no assumption on the distribution of the yn's
00:35:30.000 --> 00:35:38.000
regardless how weird the underlying distribution for the random variables yn is we always have
00:35:38.000 --> 00:35:44.000
the property that the sample mean converges in probability to the true parameter
00:35:46.000 --> 00:35:55.000
this is much more general than we typically may think because this property also extends
00:35:55.000 --> 00:36:02.000
to higher moments you may perhaps think this is just a property for the first moment just for
00:36:02.000 --> 00:36:11.000
the expected value of these variables but suppose the variables are actually the squares of some
00:36:11.000 --> 00:36:18.000
underlying variable then obviously when we take the mean of the squares we get an estimate of
00:36:18.000 --> 00:36:25.000
the second moment which as you know is related to the variance or if we take the mean of random
00:36:25.000 --> 00:36:34.000
variables raised to the power of three then we get an estimate of the third moment third
00:36:34.000 --> 00:36:42.000
non-central moment and this again will have the property that the sample mean of those
00:36:43.000 --> 00:36:50.000
random variables raised to the power of three converges to the true expected value of
00:36:51.000 --> 00:36:57.000
the random variables raised to the power of three so we know that also for the schooners
00:36:57.000 --> 00:37:03.000
we can construct estimators which have the property that the plim of the estimator is equal
00:37:04.000 --> 00:37:10.000
to the true parameter and therefore the law of large numbers is a very very strong
00:37:11.000 --> 00:37:19.000
law and often used in statistics and econometrics to prove desired properties of an estimator
00:37:20.000 --> 00:37:26.000
there are actually weaker forms of the law of large numbers so it is not really necessary for
00:37:26.000 --> 00:37:35.000
the random variables y1 up to yn to be iid so i have our independent drawings and we identically
00:37:35.000 --> 00:37:41.000
distributed random variables that can be weakened to some extent but this is more technical and
00:37:41.000 --> 00:37:47.000
for our purposes it is completely sufficient that you understand the basic law of large number
00:37:47.000 --> 00:37:51.000
like it is typically in under credit statistics courses
00:37:54.000 --> 00:38:01.000
so what the law of large number tells us is that we may when we are interested in estimating the
00:38:01.000 --> 00:38:09.000
population mean that we may get arbitrarily close to the true parameter value mu using just the
00:38:09.000 --> 00:38:18.000
sample average provided that the sample is sufficiently large so provided we can generate
00:38:18.000 --> 00:38:27.000
sufficiently many observations we can with our simple sample mean get an estimate which is
00:38:27.000 --> 00:38:35.000
arbitrarily close to the true unknown parameter so we can really infer what the true parameter
00:38:35.000 --> 00:38:40.000
is provided that we have sufficiently many drawings from the underlying population
00:38:43.000 --> 00:38:48.000
very important as i already said is that the law of large number does not require any particular
00:38:48.000 --> 00:38:54.000
distribution for which the sample is drawn so we do not require that the yn all come from
00:38:54.000 --> 00:39:03.000
say normal distribution or t distribution or whatever Poisson distribution it is a completely
00:39:03.000 --> 00:39:09.000
non-parametric theorem we do not make any assumptions about distributions in it
00:39:11.000 --> 00:39:14.000
so that is one big building block of estimator
00:39:17.000 --> 00:39:19.000
theory of estimators of estimation theory in econometrics
00:39:20.000 --> 00:39:27.000
another big building block is also a non-parametric theorem and that is the well-known central limit
00:39:27.000 --> 00:39:34.000
theorem which i abbreviate clt because in the central limit theorem we also do not assume any
00:39:34.000 --> 00:39:43.000
particular distribution for the input random variables y1 up to yn however unlike the law of
00:39:43.000 --> 00:39:50.000
large numbers the central limit theorem yields a particular distribution for the output variable
00:39:50.000 --> 00:39:57.000
which is the standard best sample so again we look at the sample mean however this time we
00:39:57.000 --> 00:40:03.000
standardize it by dividing with by the standard deviation otherwise it is just the sample mean
00:40:03.000 --> 00:40:14.000
which we look at and the surprising result of the central limit theorem is that even though we do
00:40:14.000 --> 00:40:21.000
not make any assumption about the particular distributions of the input random variables y1
00:40:21.000 --> 00:40:27.000
up to yn of which we take the standardized sample mean the central limit theorem tells us
00:40:27.000 --> 00:40:33.000
that the standardized sample mean is normally distributed so regardless of what the initial
00:40:33.000 --> 00:40:38.000
distributions are we end up with a normal distribution and here's the central limit
00:40:39.000 --> 00:40:48.000
theorem so suppose y1 up to yn is a random sample again random sample means independently
00:40:48.000 --> 00:40:56.000
and identically distributed so iid with population mean mu and finite variance sigma squared
00:40:57.000 --> 00:41:09.000
in this case we have that the standardized sample mean yn bar minus mu divided by sigma
00:41:09.000 --> 00:41:16.000
and multiplied by the square root of n as an asymptotic standard normal distribution
00:41:18.000 --> 00:41:23.000
so that's perhaps a little hard to digest if you see this theorem for the first time
00:41:23.000 --> 00:41:28.000
which i suppose is not the case problem most of you have seen that in other graduate statistics
00:41:28.000 --> 00:41:34.000
but it may even be hard to digest at the same tree so let's look at it again and discuss what
00:41:34.000 --> 00:41:44.000
it says so first of all we are looking here at the standardized variable which has the random
00:41:44.000 --> 00:41:54.000
variable yn bar so the sample mean at its core so it's actually the standardized sample mean
00:41:54.000 --> 00:42:02.000
which we look at because we take the sample mean yn bar and subtract its expected value mu
00:42:03.000 --> 00:42:10.000
oh we know that the expectation of the sample mean of the sample average you know that the
00:42:10.000 --> 00:42:17.000
expectation of the sample average is always the population mean mu so we just subtract the
00:42:17.000 --> 00:42:26.000
expectation from the estimator to give the denominator up here excuse me the numerator up here
00:42:27.000 --> 00:42:35.000
an expectation of zero and then we divide by sigma and multiply by the square root of n
00:42:36.000 --> 00:42:42.000
well actually what we do here is that we just divide by the standard deviation of the yn bar
00:42:42.000 --> 00:42:48.000
because the standard deviation of the yn bar is sigma divided by the square root of n as i will
00:42:48.000 --> 00:42:54.000
show you in a minute so essentially we do the same type of standard of standardization that we have
00:42:54.000 --> 00:43:01.000
already had where we looked at a normal distribution and at a standard normal distribution where we
00:43:01.000 --> 00:43:08.000
also said okay we have a normally distributed variable and when we take this normally
00:43:08.000 --> 00:43:13.000
distributed variable and subtract its mean and divide by its standard deviation then we get this
00:43:13.000 --> 00:43:22.000
a standard normal variable here we do almost the same thing we take a random variable yn bar
00:43:22.000 --> 00:43:30.000
however not knowing that this random variable is normally distributed so we just know it is a
00:43:30.000 --> 00:43:36.000
sample mean it may have a distribution which is quite different from a normal distribution so we
00:43:36.000 --> 00:43:43.000
make no assumption on the distribution of this variable we take the sample mean subtract its
00:43:43.000 --> 00:43:48.000
expectation and divide by its standard deviation which as i will show you is sigma divided by the
00:43:48.000 --> 00:43:57.000
square root of n so this standardized variable has an asymptotic standard normal distribution
00:43:58.000 --> 00:44:07.000
regardless of what the distribution of the yn bar or the distribution of the y1 up to yn is so
00:44:07.000 --> 00:44:12.000
completely regardless of which distribution these random variables come from they may be
00:44:12.000 --> 00:44:19.000
completely non-normal the standardized variable has a standard normal distribution and that is
00:44:19.000 --> 00:44:25.000
of course also very strong result which is very useful in econometrics and sampling theory as
00:44:26.000 --> 00:44:32.000
you will see throughout this lecture it can also be stated in weaker forms and more technical
00:44:33.000 --> 00:44:40.000
forms than the one i have given here so again we do not exactly need the
00:44:41.000 --> 00:44:48.000
properties of y and y1 up to yn being iid independently identically distributed
00:44:48.000 --> 00:44:53.000
but that's the easiest way to state it that can be written to something okay
00:44:56.000 --> 00:45:03.000
so this variable zn here what you define this way has asymptotically a normal distribution with
00:45:03.000 --> 00:45:09.000
mean zero and variance equal to one and therefore of course also standard deviation equal
00:45:10.000 --> 00:45:16.000
to one and this is true even though we have no idea what the distribution of the yi's
00:45:16.000 --> 00:45:27.000
comes from so what is often seen as confusing for students is this factor square root of n
00:45:27.000 --> 00:45:33.000
where does it come from probably you would be more familiar with this formula here
00:45:33.000 --> 00:45:38.000
if there were not this factor square root of n because then we would just have just have
00:45:38.000 --> 00:45:43.000
we would have something which looks very normal but which is actually not normal we would have
00:45:43.000 --> 00:45:48.000
something which looks quite conventional let's say a variable minus its expectation divided by
00:45:48.000 --> 00:45:54.000
sigma and you may think well sigma is the standard deviation of this variable here of this estimator
00:45:55.000 --> 00:46:01.000
but this is exactly where the error would be and the sigma here is not the standard deviation of
00:46:01.000 --> 00:46:09.000
the yn bar the sigma is the standard deviation of the single observation of y1 or y2 or yn
00:46:10.000 --> 00:46:16.000
but it is not the standard deviation of the yn bar the square root of n actually comes from the
00:46:16.000 --> 00:46:22.000
fact that we need to divide here by the standard deviation of yn bar not by the standard deviation
00:46:22.000 --> 00:46:33.000
of the single observations yn y2 or yn y1 and y2 or yn so to understand this recall that normally
00:46:33.000 --> 00:46:39.000
distributed variable can be standardized to a standard normal distribution by subtracting
00:46:39.000 --> 00:46:48.000
its mean and dividing this and by the standard deviation and the standard deviation of the yn
00:46:48.000 --> 00:46:58.000
bar happens to be sigma over square root of n so dividing y bar n minus mu by this term here is
00:46:58.000 --> 00:47:07.000
the same as multiplying by square root of n and yeah square root of n over sigma so multiplying
00:47:07.000 --> 00:47:14.000
with this factor here the same thing as dividing through this factor there so the only thing that
00:47:14.000 --> 00:47:21.000
we need to make clear to us is that this here is indeed the standard deviation of y bar n
00:47:23.000 --> 00:47:30.000
which is easy to do so let's start out with the variance of y bar n the variance of y bar n is
00:47:30.000 --> 00:47:35.000
of course the square of the standard deviation of y bar n the variance of y bar n is the variance of
00:47:35.000 --> 00:47:44.000
the sample mean which is 1 over n times the sum over all the individual yi's we know that a constant
00:47:44.000 --> 00:47:50.000
factor like 1 over n we can just pull out of the variance and we have to square then so it is 1 over
00:47:50.000 --> 00:47:59.000
n squared times the variance of the sum of all the yi's so all the yi's are independent of each other
00:47:59.000 --> 00:48:06.000
so the covariances are zero since the covariances are zero we know that the variance of the sum of
00:48:06.000 --> 00:48:15.000
the yi's is the same thing as the sum of the variances of the yi's the variance of the yi
00:48:15.000 --> 00:48:24.000
is sigma squared as we know so basically we have here n terms in each of these terms equal to sigma
00:48:24.000 --> 00:48:33.000
square so this is the value of this sum here is n times sigma square this is n times sigma square
00:48:33.000 --> 00:48:41.000
and we have to divide by n square so we have 1 over n squared times n times sigma square
00:48:41.000 --> 00:48:49.000
so this is sigma square divided by n the variance of yn bar in order to get the standard deviation
00:48:49.000 --> 00:48:55.000
of yn bar obviously we just have to take the square root of this term here so sigma yn bar
00:48:55.000 --> 00:49:01.000
is equal to the square root of the variance of yn bar and this is then equal to sigma divided by
00:49:01.000 --> 00:49:08.000
the root of n at the square root of n so that's the standard deviation of yn bar and this makes
00:49:08.000 --> 00:49:15.000
clear why in this expression here we have to multiply by square root of n divided by sigma
00:49:15.000 --> 00:49:25.000
because this is the inverse of the square root of the standard deviation or if perhaps it's easier
00:49:25.000 --> 00:49:31.000
to understand let's say you just divide by sigma over the square root of n and then you can write
00:49:31.000 --> 00:49:38.000
the square root of n as a constant factor which is multiplied to yn bar minus mu divided by this
00:49:38.000 --> 00:49:43.000
standard deviation of the equal of the single observations y1 up to yn
00:49:46.000 --> 00:49:56.000
okay in this case where we have a sample mean of times n we say that yn bar converges in
00:49:56.000 --> 00:50:03.000
distribution to a normal variable so that's as i already uh set out to say when i introduced
00:50:03.000 --> 00:50:10.000
the blimp convergence that's a different convergence concept for random variables
00:50:10.000 --> 00:50:19.000
that a variable converges in distribution to a limiting distribution which in this case is n
00:50:20.000 --> 00:50:29.000
normal distribution with expected value mu and variance sigma squared okay so yn bar converges
00:50:29.000 --> 00:50:36.000
in distribution to a normally distributed variable with mu with expected value mu
00:50:36.000 --> 00:50:40.000
and variance sigma square over n so the square of the standard deviation
00:50:43.000 --> 00:50:48.000
either we say this yn bar converges in distribution to this type of normal distribution here always
00:50:48.000 --> 00:50:53.000
say that yn bar is asymptotically normally distributed and then we write it this way
00:50:54.000 --> 00:51:02.000
yn bar is asymptotically distributed as n of mu and sigma square n these two things are
00:51:02.000 --> 00:51:06.000
completely the same right that's equivalent either you can write it this way convergence
00:51:06.000 --> 00:51:13.000
distribution so with a b of this arrow here or you use this sign here with an a to indicate that it
00:51:13.000 --> 00:51:21.000
is an asymptotic uh distribution to which the distribution of the yn here converges both say
00:51:21.000 --> 00:51:28.000
exactly the same thing same thing obviously this asymptotic distribution of yn bar is normal
00:51:29.000 --> 00:51:36.000
but it is not standard normal right only the standardized variable yn bar minus mu
00:51:36.000 --> 00:51:41.000
divided by sigma and multiplied by the square root of n as a standard normal distribution so
00:51:42.000 --> 00:51:48.000
yn bar itself you can derive as the normal distribution with this expected value and
00:51:48.000 --> 00:51:48.000
this variance
00:51:53.000 --> 00:51:59.000
the central limit theorem now shows you why the normal distribution plays a key role in statistic
00:52:00.000 --> 00:52:07.000
and it is often the asymptotic distribution of many estimators right the normal distribution
00:52:07.000 --> 00:52:14.000
is very often the asymptotic distribution of many estimators regardless of which distribution the
00:52:15.000 --> 00:52:21.000
observations we have made in our sample come from so provided the yi's are iid
00:52:23.000 --> 00:52:32.000
we know that many estimators are just averages of some magnitudes which are iid distributed
00:52:32.000 --> 00:52:41.000
and we know that averages of iid random variables converge in distribution to a normal distribution
00:52:41.000 --> 00:52:47.000
and if it is appropriately standardized then it converges to a standard normal distribution
00:52:48.000 --> 00:52:54.000
and that's why the normal distribution is so important it is the limiting distribution of
00:52:54.000 --> 00:53:02.000
many estimators not of all as we can see but other estimators are actually related in their
00:53:02.000 --> 00:53:12.000
properties to that because some limiting distributions like for instance the f distribution
00:53:12.000 --> 00:53:18.000
or the chi-square distributions are actually derived from the normal distribution and therefore
00:53:18.000 --> 00:53:25.000
can also be easily computed and maybe the limiting distribution of other estimators which involve
00:53:25.000 --> 00:53:30.000
the sum of squared random variables for instance in the case of chi-squared or which involves the
00:53:30.000 --> 00:53:37.000
ratio of two such sums in the case of the f distribution so we know these distributions also
00:53:37.000 --> 00:53:44.000
and by means of or by virtue of the central limit theorem we can then derive that certain other
00:53:44.000 --> 00:53:52.000
estimators converge towards chi-square or towards the f distribution which are just reparameterizations
00:53:52.000 --> 00:53:57.000
or transformations I should better say transformations of a normal distribution
00:53:57.000 --> 00:54:02.000
any questions up to here
00:54:07.000 --> 00:54:13.000
I don't see anybody waving or sending questions
00:54:15.000 --> 00:54:22.000
then we have 30 minutes to go and then let me just go on with confidence intervals and then
00:54:22.000 --> 00:54:26.000
questions can be posed in the last minutes of the lecture if there is need
00:54:28.000 --> 00:54:34.000
what we have talked about so far are point estimates so we have looked at estimators
00:54:34.000 --> 00:54:38.000
and we have asked what kind of result does this estimator give if there's a
00:54:38.000 --> 00:54:44.000
concrete sample to work with so we get just one number which is a point estimate
00:54:45.000 --> 00:54:55.000
the point estimate by itself does not tell us how close the estimate probably is to the true
00:54:55.000 --> 00:55:01.000
population parameter so to the true and unknown underlying parameter in population we just come
00:55:01.000 --> 00:55:07.000
up with some estimate that's a single number and we have no idea actually how close this estimate
00:55:07.000 --> 00:55:17.000
is to the parameter we're actually looking for so what we want to do now is that we want to get
00:55:17.000 --> 00:55:25.000
some sense of this closeness and in order to do so we use the standard deviation of the estimator
00:55:25.000 --> 00:55:29.000
because the standard deviation of the estimator provides some information on the accuracy
00:55:29.000 --> 00:55:40.000
of the estimator as I have pointed out still the accuracy of the estimate is not yet a statement
00:55:40.000 --> 00:55:49.000
about where the population value is likely to lie in relation to the estimate and we can
00:55:49.000 --> 00:55:53.000
now overcome this difficulty by constructing a confidence interval
00:55:53.000 --> 00:56:00.000
so consider the easiest case the case of a normal distribution and
00:56:01.000 --> 00:56:07.000
sort of unrealistic case that we know the variance of the variable which is normally distributed
00:56:08.000 --> 00:56:15.000
so suppose that in the population or that the population has a normal distribution which with
00:56:15.000 --> 00:56:25.000
unknown mean mu but known variance one okay so here's an exercise for you show that in this
00:56:25.000 --> 00:56:33.000
case the sample average yn bar for a given sample of n observations drawn from this mv1 distribution
00:56:34.000 --> 00:56:43.000
has and even finite samples a normal distribution mu with with expected value mu and variance one
00:56:43.000 --> 00:56:51.000
over n so this is not an application of the center limit theory which you have to do here
00:56:51.000 --> 00:56:58.000
it's much easier than that show that this sample mean has a normal distribution even in finite
00:56:59.000 --> 00:57:04.000
samples with these two parameters expected value and variance as given in this expression here
00:57:04.000 --> 00:57:14.000
now if we have this setting where we know the standard deviation we know the variance of the
00:57:14.000 --> 00:57:22.000
population then we can of course consider an estimator which is yn bar that's not the estimate
00:57:22.000 --> 00:57:29.000
the estimator is yn bar but we not consider a standardized form yn bar minus the unknown
00:57:30.000 --> 00:57:41.000
mean divided by one over the square root of n we know it should be clear to you this thing here
00:57:41.000 --> 00:57:47.000
we cannot compute we cannot compute because we do not know what the mu is equal to we can compute
00:57:47.000 --> 00:57:55.000
the yn bar we can compute one over square root of n but we ignore the value of mu so this thing we
00:57:55.000 --> 00:58:03.000
can write down but we cannot compute it why is it useful to write it down well very easy we know
00:58:03.000 --> 00:58:12.000
that this has a distribution which is standard normal so expected value of zero and variance of one
00:58:13.000 --> 00:58:20.000
because well we subtract the mu from the yn here and mu is the expectation of the yn bar
00:58:20.000 --> 00:58:26.000
right and that's standardized by the standard deviation of the yn bar
00:58:26.000 --> 00:58:31.000
so obviously the distribution is normal with expected value of zero and well
00:58:33.000 --> 00:58:38.000
now read this thing here almost like you would usually read an equation right we can write it
00:58:38.000 --> 00:58:45.000
down and now we see in this type of equation if you want to see it that way we have one unknown
00:58:45.000 --> 00:58:54.000
which is this mu so what we can do is that we can in some sense rearrange this equation here
00:58:54.000 --> 00:59:02.000
and solve of mu okay and that's what we do now and first thing is that we make a real equation
00:59:02.000 --> 00:59:07.000
out of this thing because obviously this is not an equation this is just something which tells us how
00:59:07.000 --> 00:59:15.000
the distribution of this ratio here is which is n zero one but it's not yet an equation we make
00:59:15.000 --> 00:59:22.000
an equation out of that by looking at probabilities right from the distribution from the standard
00:59:22.000 --> 00:59:31.000
normal distribution we know the probability of this ratio here lying between minus 1.96 and
00:59:31.000 --> 00:59:40.000
plus 1.96 is 0.95 sorry i always say minus 1.96 because that the German use it negative
00:59:40.000 --> 00:59:49.000
1.96 and positive 1.96 the probability of this unknown ratio here lying between
00:59:49.000 --> 00:59:59.000
negative 1.96 and positive 1.96 is 0.95 we derive this equation here from the fact
00:59:59.000 --> 01:00:06.000
that we have a standard normal distribution and 1.96 is the 2.5 percent critical value
01:00:06.000 --> 01:00:10.000
and each tail for the standard normal distribution as you probably know
01:00:12.000 --> 01:00:17.000
so this is now our equation that's a true equation and we can just rearrange this equation in such a
01:00:17.000 --> 01:00:25.000
sense that in some way we can solve for the mu which we do by first multiplying in these parentheses
01:00:25.000 --> 01:00:36.000
here the these inequalities by one over square root of n so divide negative 1.96
01:00:38.000 --> 01:00:44.000
by one over no no sorry but multiply this by one over square root of n right
01:00:44.000 --> 01:00:50.000
multiply by one over square root of n then we get negative 1.96 divided by square root of n
01:00:50.000 --> 01:00:57.000
here we do not have to divide by one over square root of n anymore because we multiply
01:00:57.000 --> 01:01:04.000
the whole inequality through by one over square root of n and here of course we have also 1.96
01:01:04.000 --> 01:01:10.000
divided by the square root of n so that's exactly the same event if this here is true
01:01:11.000 --> 01:01:17.000
then obviously this here is also true with the same probability so we just leave the p of
01:01:17.000 --> 01:01:24.000
something is equal to 0.95 because the event two events here are completely equivalent to have the
01:01:24.000 --> 01:01:32.000
same probability then the next step is that we multiply this event in here by minus one which
01:01:32.000 --> 01:01:42.000
makes the inequality times turn around right so the negative here becomes positive this becomes
01:01:42.000 --> 01:01:49.000
a greater rather than a smaller here we have the negative of this thing here mu minus y bar and
01:01:49.000 --> 01:01:56.000
then again a greater instead of a smaller sign and negative 1.96 instead of positive 1.96 divided
01:01:56.000 --> 01:02:02.000
by square root of n so again multiplying this equation here by minus one does not change in
01:02:02.000 --> 01:02:10.000
any way the probability if the probability of this event here was uh 95 percent then the probability
01:02:10.000 --> 01:02:21.000
of this event here is also 95 percent and then the last step is that we add y bar n in
01:02:22.000 --> 01:02:32.000
all three parts of this inequality relationship here so we would get y n bar minus 1.96 is smaller
01:02:32.000 --> 01:02:42.000
than mu is smaller than y n bar plus 1.96 divided by square root of n where clearly i have
01:02:43.000 --> 01:02:51.000
rearranged the the inequality here so by adding y n bar on both sides of this inequality here
01:02:51.000 --> 01:02:59.000
i get of course the y n bar minus 1.96 on the right hand side here but this is then smaller
01:03:00.000 --> 01:03:07.000
than mu and i just grab the right hand side now on the left hand side here y n bar minus 1.96
01:03:07.000 --> 01:03:14.000
divided by square root of n is smaller than mu smaller than y n bar plus 1.96 divided by square root of n
01:03:15.000 --> 01:03:20.000
and the probability of this event is still the probability of as the probability of this event
01:03:20.000 --> 01:03:28.000
or that event of that event that's all the same it's 95 percent and so suddenly we know that
01:03:29.000 --> 01:03:41.000
mu lies between y n bar minus 1.96 divided by square root of n and plus 1.96 divided by square root of n
01:03:41.000 --> 01:03:51.000
with a probability of 95 percent so we have an idea of where the mu lies in relation to an
01:03:51.000 --> 01:03:57.000
estimator important is to note that this here is a theoretical relationship which
01:03:59.000 --> 01:04:03.000
refers to the estimator that is to say to the random variable which constitutes
01:04:04.000 --> 01:04:11.000
our estimation formula it is not as i will emphasize on the next slide saying that
01:04:11.000 --> 01:04:21.000
mu lies between estimate minus 1.96 of a square root of n and estimate plus 1.96 over square root of n
01:04:22.000 --> 01:04:27.000
that is an incorrect interpretation of confidence interval even though it is very widespread
01:04:29.000 --> 01:04:33.000
but the property that we have established relates to the estimator and not to the estimate
01:04:38.000 --> 01:04:46.000
so before going to into that actually it's just the next slide let me just sum up here this
01:04:46.000 --> 01:04:53.000
not much new on this slide as compared to the previous one as you know we ignore mu
01:04:53.000 --> 01:05:01.000
but we do know what y n bar is that we can compute from our sample and we know what n is
01:05:02.000 --> 01:05:08.000
so from this formula which we had derived we infer that the problem that with a probability
01:05:08.000 --> 01:05:17.000
of 95 percent the population mean lies in this interval y n bar minus 1.96 over square root of n
01:05:17.000 --> 01:05:26.000
y n bar plus 1.96 divided by square root of n where this interval however is a random variable
01:05:26.000 --> 01:05:31.000
because y n bar is the estimator this is just the random variable
01:05:33.000 --> 01:05:40.000
so therefore we can construct 95 confidence interval of mu by plugging in now the estimate
01:05:40.000 --> 01:05:51.000
for the estimator y n bar so small y n bar minus 1.96 over square root n and small y n bar
01:05:51.000 --> 01:06:01.000
plus 1.96 over square root of n is what we call the confidence interval it uses this one particular
01:06:01.000 --> 01:06:07.000
sample which we have at our disposal to compute the sample mean at the result the estimated sample
01:06:07.000 --> 01:06:17.000
mean is then y n bar no the sample mean is y n bar right for some realization of the sample
01:06:17.000 --> 01:06:19.000
of observations drawn from the population
01:06:22.000 --> 01:06:28.000
as a shorthand notation for this 95 percent confidence interval we also arrive
01:06:28.000 --> 01:06:37.000
we also often write y n bar plus minus 1.96 over square root of n right where this 1.96 is
01:06:38.000 --> 01:06:43.000
the 95 percent level but now this is why i try to
01:06:46.000 --> 01:06:55.000
emphasize this in my previous remarks note that it is incorrect to say that a specific
01:06:55.000 --> 01:07:02.000
confidence interval of this type here contains mu with a probability of 95 percent
01:07:04.000 --> 01:07:11.000
right this is incorrect to say that if you have a specific confidence interval y n bar
01:07:11.000 --> 01:07:19.000
plus 1.96 over square root of n that this interval contains mu with a probability of 95 percent
01:07:19.000 --> 01:07:28.000
rather it is correct to say that for 95 percent of all random samples
01:07:29.000 --> 01:07:37.000
the constructed confidence interval of this type here contains mu so 95 percent of the
01:07:38.000 --> 01:07:46.000
confidence intervals that we may obtain if we were in a position to draw many many samples
01:07:46.000 --> 01:07:53.000
of the same size from the same underlying population the 95 percent of these intervals
01:07:53.000 --> 01:08:02.000
would contain the mu but very often you will hear or read that the interval which we have
01:08:03.000 --> 01:08:11.000
constructed from a single draw of the population from a single sample contains mu with a probability
01:08:11.000 --> 01:08:17.000
of 95 percent and that is incorrect it is incorrect because it does not acknowledge
01:08:17.000 --> 01:08:25.000
the distinction between the sample mean y n bar and the random estimator that the random variable
01:08:25.000 --> 01:08:35.000
the the estimator capital y n bar perhaps the easiest way to think of that is the following
01:08:36.000 --> 01:08:43.000
suppose that we are in a position where we have really had bad luck with drawing our sample
01:08:44.000 --> 01:08:54.000
and the sample gives us a estimate for the sample mean which is far off the true sample mean far off
01:08:54.000 --> 01:09:01.000
the true mu in this this may happen with some probability that we just have bad luck in drawing
01:09:02.000 --> 01:09:11.000
that the sample and the sample mean which we compute is far far off from the true sample mean
01:09:12.000 --> 01:09:22.000
then using this particular very biased estimate of the sample mean and constructing the confidence
01:09:22.000 --> 01:09:29.000
interval so putting an interval around this estimate with 1.96 over square root of n does
01:09:29.000 --> 01:09:35.000
not at all with a probability of 95 percent contain mu the probability of this interval
01:09:36.000 --> 01:09:43.000
containing mu may be much much smaller because the center of this interval the y n bar is so far off
01:09:43.000 --> 01:09:52.000
the true value of mu so this is why it cannot be true that the confidence interval the specific
01:09:52.000 --> 01:09:57.000
confidence interval that we have computed contains mu with a probability of 95 percent
01:09:59.000 --> 01:10:06.000
what is true is that if we construct 100 of these intervals from 100 independent drawings
01:10:07.000 --> 01:10:13.000
so if we have 100 independently drawn samples and we construct 100 such intervals then 95 of these
01:10:13.000 --> 01:10:20.000
intervals will contain the mu but it may happen in our one sample which we have drawn that we
01:10:21.000 --> 01:10:29.000
are among the five drawings which do not contain the mu so therefore it is incorrect to say
01:10:29.000 --> 01:10:36.000
that a specific interval contains mu with a probability of 95 percent what people mean to
01:10:36.000 --> 01:10:42.000
say when they say that is that for 95 percent of all random samples the interval should contain
01:10:45.000 --> 01:10:52.000
and that explains you why i distinguish quite carefully currently between the sample that we
01:10:52.000 --> 01:10:59.000
have drawn and the estimate the sample mean which is constructed from the sample we have drawn
01:11:00.000 --> 01:11:04.000
and the estimator that is to say from the random variables which have not yet been
01:11:04.000 --> 01:11:07.000
drawn but are just random variables of observations that we may draw
01:11:09.000 --> 01:11:17.000
okay now what we have done here is that we have assumed that the standard deviation is one
01:11:17.000 --> 01:11:25.000
variance is one again a random observation the same actually holds true if this the standard
01:11:25.000 --> 01:11:36.000
deviation is some value sigma and we know this sigma so suppose we have sigma as the standard
01:11:36.000 --> 01:11:42.000
deviation of the observations which we have in our sample sigma is not necessarily equal to one but
01:11:42.000 --> 01:11:49.000
we do know the value of sigma then the same thing holds that we have already derived for sigma equal
01:11:49.000 --> 01:12:00.000
to one the 95 confidence percent net factors and confidence interval is y and bar minus 1.96 sigma
01:12:00.000 --> 01:12:06.000
divided by square root of n and the same thing with plus here so that's actually very easy and
01:12:06.000 --> 01:12:11.000
perhaps as a small exercise you might derive the whole thing following closely the steps that i
01:12:11.000 --> 01:12:23.000
have already laid out before for sigma equal to one case two is the case which is more realistic
01:12:23.000 --> 01:12:27.000
namely the case where we do not know what the underlying variance of our observations is
01:12:28.000 --> 01:12:34.000
so suppose for some reason we still know there is a normal distribution which generates our
01:12:34.000 --> 01:12:43.000
observations but we ignore both the expected value mu and the variance of this distribution
01:12:44.000 --> 01:12:50.000
so what we have to do in this case if since the sigma is unknown that we estimate it
01:12:52.000 --> 01:13:00.000
the first step would be that we estimate sigma an unbiased estimator of sigma is the sample
01:13:00.000 --> 01:13:07.000
standard deviation s n capital s n if i write it as a random variable which then yields an
01:13:07.000 --> 01:13:15.000
estimate small s n and this s n is one over n minus one times the sum of the y i's minus
01:13:16.000 --> 01:13:23.000
sample mean y n bar squared and taking the square root of that in order to get an estimate of the
01:13:23.000 --> 01:13:29.000
sample of the standard deviation rather than the variance so we talked about this estimator already
01:13:29.000 --> 01:13:35.000
when i introduced to you the example of an estimator bias in finite samples but as optotically
01:13:36.000 --> 01:13:44.000
consistent where i had the other estimator where i divided by one over n this here is the unbiased
01:13:44.000 --> 01:13:50.000
estimator where i divide by n minus one so that you have seen in other graduate statistics that
01:13:50.000 --> 01:13:56.000
this is a reasonable estimator which is actually advised so we may use this estimator here to
01:13:57.000 --> 01:14:05.000
estimate the unknown sample variance sigma one can then show that the standardized variable
01:14:05.000 --> 01:14:12.000
follows student t distribution so if i then use the same type of variable that i had used before
01:14:12.000 --> 01:14:21.000
at the central limit theorem y n bar minus the expected value mu now divided not by this true
01:14:21.000 --> 01:14:27.000
standard time deviation of y n bar but by the estimated standard deviation of y n bar so
01:14:28.000 --> 01:14:39.000
s n divided by square root of n this thing here is not distributed as a brand as a normal variable
01:14:39.000 --> 01:14:45.000
anymore but it is distributed as a student t variable with n minus one degrees of freedom
01:14:45.000 --> 01:14:52.000
so if the standard deviation is unknown then we have to use the t distribution rather than
01:14:52.000 --> 01:15:00.000
the normal distribution but everything else is the same using the t distribution is necessary
01:15:00.000 --> 01:15:05.000
only in small samples because the t distribution converges to the standard normal distribution
01:15:05.000 --> 01:15:11.000
as the number of observation as the number of observations goes to infinity so if the number
01:15:11.000 --> 01:15:17.000
of observations n goes to infinity then obviously the degrees of freedom n minus one also converge
01:15:17.000 --> 01:15:25.000
to infinity and this then implies that the t distribution converges in distribution to
01:15:25.000 --> 01:15:28.000
the normal distribution which i will show you in a minute
01:15:31.000 --> 01:15:36.000
the t distribution is symmetrical and centered around zero exactly like the standard normal
01:15:36.000 --> 01:15:42.000
distribution but it has fatter tails than the normal distribution and therefore it has higher
01:15:42.000 --> 01:15:50.000
cotosis this again you will see on the next slide the t distribution then approaches the standard
01:15:50.000 --> 01:15:56.000
normal distribution when n grows and it is extremely similar to the normal distribution
01:15:56.000 --> 01:16:01.000
already for moderate sample size of approximately n equal to 30
01:16:01.000 --> 01:16:09.000
so here are some graphs of the t distribution the blue distribution is always the standard
01:16:09.000 --> 01:16:16.000
normal distribution right the bell shaped PDF which you see here you see them in all the diagrams
01:16:16.000 --> 01:16:22.000
here for one degree of freedom here for 40 degrees of freedom for 19 degrees of freedom
01:16:22.000 --> 01:16:30.000
so 20 observations and here for 49 degrees of freedom so 50 observations you see a
01:16:30.000 --> 01:16:37.000
t distribution with one degree of freedom is extremely different from random from a standard
01:16:37.000 --> 01:16:46.000
normal distribution it has much fatter tails as you can see here much higher cotosis therefore
01:16:46.000 --> 01:16:52.000
right it is actually a very dangerous distribution because it doesn't even have a well-defined
01:16:52.000 --> 01:16:59.000
expected value that's a kind of technical thing but zero is not the expected value of a t distribution
01:16:59.000 --> 01:17:05.000
with just one degree of freedom t distribution with just one degree of freedom is so special
01:17:05.000 --> 01:17:12.000
that it actually bears a particular name namely a Cauchy distribution that's a t distribution with
01:17:12.000 --> 01:17:19.000
just one degree of freedom but that very rarely occurs in econometrics or statistics and beware
01:17:19.000 --> 01:17:24.000
of it never have an estimator which has just one degree of freedom use more degrees of freedom
01:17:24.000 --> 01:17:31.000
40 degrees of freedom is also very little but you see this looks already much nicer the t
01:17:31.000 --> 01:17:38.000
distribution still has fatter tails than the normal distribution so extreme events are more likely
01:17:38.000 --> 01:17:43.000
under the t distribution than under the normal distribution but they are really much slimmer
01:17:43.000 --> 01:17:49.000
details than in the case of the Cauchy distribution over here and the difference between the normal
01:17:49.000 --> 01:17:54.000
distribution and the t distribution so standard normal and t distribution is already much smaller
01:17:56.000 --> 01:18:02.000
with 20 observations it is already quite hard to see a difference between the t distribution and
01:18:02.000 --> 01:18:07.000
the standard number and still you see a tiny little difference here there's still more probability
01:18:07.000 --> 01:18:13.000
mass in the tails of the t distribution than in the standard normal distribution but very very
01:18:13.000 --> 01:18:18.000
little so that the normal distribution is already a pretty good approximation to the t distribution
01:18:18.000 --> 01:18:25.000
when we have 19 degrees of freedom and then of course this becomes closer and closer here it
01:18:25.000 --> 01:18:31.000
is for 50 observations and 49 degrees of freedom and you can hardly see any difference anymore
01:18:31.000 --> 01:18:42.000
between the t distribution and standard normal distribution. Okay let me stop here are there
01:18:42.000 --> 01:18:52.000
any questions or comments which we should discuss in the meeting if so then let's please
01:18:52.000 --> 01:18:59.000
use the wave function and indicate that we should move to the meeting I will stop the recording by
01:18:59.000 --> 01:19:00.000
the way