-
Notifications
You must be signed in to change notification settings - Fork 3
/
Econometrics-book.tex
180 lines (131 loc) · 16.2 KB
/
Econometrics-book.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
\documentclass[11pt,fullpage]{book}
\title{\textbf{~~~~~~~~~~~~~~~}\newline Econometrics book}
\author{Ben Lambert}
\usepackage{natbib}
\usepackage{appendix}
\usepackage{url,times}
\usepackage{graphicx}
\usepackage{epstopdf}
\usepackage{amsmath}
\usepackage[all]{xy}
\usepackage{pxfonts}
\usepackage{colortbl}
\usepackage{color}
\usepackage{subfigure}
\usepackage{gensymb}
\usepackage{ctable}
\usepackage[justification=centering]{caption}[2007/12/23]
\usepackage{longtable}
\usepackage{pstricks-add}
\usepackage{pstricks}
\usepackage{pst-func}
\usepackage{pst-math}
\usepackage{epigraph}
\setlength{\parindent}{0.0in}
\setlength{\parskip}{0.1in}
\maketitle
\begin{document}
\tableofcontents
\part{The basics}
\chapter{How to best use this book}
\chapter{What is econometrics?}
\chapter{Estimators and their purpose}
\section{Chapter mission statement}
After reading this chapter the student will understand the concept of an estimator, and its use in statistics.
\section{The goal of this chapter}
Statistical inference is the process of drawing conclusions about a population from a sample of observations. Estimators are the tools which statisticians use on samples of data; resulting in estimates of population-wide quantities which can be used to test hypotheses about the wider world. In this chapter the reader will be introduced to the concept of a \textit{sampling distribution}, and how these can be examined to gauge the quality of an estimator.
\section{What is an estimator, and why should we care?}
We rarely in life have all relevant data available to us before we make a decision. When we decide where to go on holiday, we don't travel to a country, speak first-hand with locals, and taste the local food prior to deciding on a final destination for the family. We don't know perfectly what the weather will be like this afternoon when we get dressed for work. Instead, we extrapolate based on samples of data, and use these as predictive windows into the unknown. Statistical inference is the logical framework that allows us to make reasonable decisions based on our limited access to the facts.
These limited facts are what is known as a \textit{sample}; by definition a subset of data from a \textit{population} of interest. A population of interest might be the UK 18-30 females. However, it needn't just be applied to people, and could just as easily be defined to be the countries within the EU. All that matters is that the population, as a concept, is a wider entity on which we would like to make well-informed statements.
\textit{Estimators} are tools that we use on samples that enable us to make \textit{estimates} of quantities within the wider population being investigated. We then use these estimates to test hypotheses about the population, which allow us to better understand the functioning of the world around us.
\begin{figure}
\centering
\scalebox{0.6}
{\includegraphics{Estimators_estimatorEstimate.pdf}}
\caption{The estimation process.}\label{fig:Estimators_estimatorEstimate}
\end{figure}
Before we choose an estimator to use, we need to specify a \textit{model} of the population which is proposed as a simplified and tractable representation of real life. Estimators can then be used to estimate components of these models, which when complete, can be deployed to help us explain some key features of a process we observe. Fortunately, statistical inference allows us to test the assumptions on which models are built, as well as interrogate their implications.
\section{Models}
\epigraph{Essentially, all models are wrong, but some are useful.}{George Box}
Real life in complex. It contains so many, seemingly-independently-moving things, that it can seem difficult/impossible to explain or predict anything effectively. To make life simpler for ourselves, we frequently build \textit{models}. When we are planning how much to budget for electricity this month, we might think that electricity spend in the corresponding month last year might be a reasonable guide. When we are determining how long to spend cramming for an exam, we implicitly have an internal model of the trade off between test score and effort. When we set our alarm for the morning after, we believe that it will roughly take 10 minutes for us to snooze, and another 20 minutes to get showered and ready before leaving the house. These are all examples of models.
All models are abstractions from reality, which allow us to isolate, and concentrate on what we believe are the important parts of a system of interest. They are necessarily simplifications, and hence are not exactly representative of reality. However, as George Box's quote suggests, they can \textit{sometimes} be useful.
The aforementioned are models which we most likely generate internally, but nonetheless are \textit{implicitly} used to help us make decisions. Often for self-betterment, and to allow interrogation of thought however, we want to \textit{explicitly} describe our models. The language of mathematics provides us with a logical framework with which to adequately describe these abstractions.
Before we talk through examples, it is worth reflecting on the various purposes for building models in the first place. Joshua Epstein in his article 'Why model?' lists amongst others, the following motivations for writing down \textit{explicit} models:
\begin{itemize}
\item Prediction
\item Explanation
\item Guide data collection
\item Discover new questions
\item Bound outcomes to plausible ranges
\item Illuminate uncertainties
\item Challenge the robustness of prevailing theory through perturbations
\item Reveal the apparently simple (complex) to be complex (simple)
\end{itemize}
There are no doubt other reasons, but I believe that the above covers the majority of rationales.
Imagine we are interested in determining the average test score for a particular standardised exam within the US. We don't have access to test scores for all individuals who take the exam in a specific year however. Instead we might suppose that test scores for individuals in the population are normally distributed about this theoretic mean, $\mu$, with some theoretic variance $\sigma^2$:
\begin{equation}\label{eq:Estimators_normalModelExample}
IQ_i \sim \mathcal{N}\left(\mu,\sigma^2\right)
\end{equation}
In (\ref{eq:Estimators_normalModelExample}), the $i$ subscript on $IQ$ represents the individuals in our population. So if we have a population size of 10,000, $i$ runs from 1 to 10,000. It is important to stress that, whilst in this example, we can suppose that $\mu$ exists, this is not generally the case for quantities of interest in statistics/econometrics. The assumption inherent in the above model is that test scores are normally distributed around this tangible quantity with some theoretic variance. We do not actually believe that test scores are exactly normally distributed\footnote{However, the central limit theorem provides some justification for making this approximation. See section \ref{sec:Estimators_CLT}.}, however to make life easier and more palatable, and thus easier to deal with, we make this approximation.
Another model which we might choose to state is that there is a linear relationship the number of years of experience and the wage which an individual commands, on average:
\begin{equation}\label{eq:Estimators_modelWageExperienceExample}
wage_i = \alpha + \beta experience_i + \epsilon_i
\end{equation}
In (\ref{eq:Estimators_modelWageExperienceExample}), we have chosen a straight-line\footnote{The model stated in (\ref{eq:Estimators_modelWageExperienceExample}) is, (with the exception of $\epsilon_i$ term), of the form $y = mx + c$, which is taught in high school.} relationship between the experience and wages, apart from a catch-all variable which encapsulates the various other idiosyncratic factors\footnote{Individual attributes.} which might impact the wage an individual obtains. For example, the type of job undertaken, the number of hours worked, or their level of education. We assume here that conditional on the level of experience for individuals, the average effect of these other factors is zero.
A model is only as good as the assumptions on which it rests. It might be that IQ test scores are more variable in the US population, than a normal distribution allows. It could be that there are diminishing returns to experience, meaning that the increment to wage for an extra year of work diminishes, dependent on the stage in the particular individual's career; invalidating the assumption of \textit{linearity}.
Much like models, assumptions are necessarily simplifying, and therefore \textit{wrong}. However, there is a spectrum of \textit{wrong}. An assumption is good so long as it captures the essence of reality sufficiently to allow the model to be used as is required. If the assumption is too \textit{wrong}, then the model will cease to be useful, and we are forced to go back and examine its foundations.
\begin{figure}
\centering
\scalebox{0.4}
{\includegraphics{Estimators_normalIQLinearWage.pdf}}
\caption{Left: the normal model for IQ. Right: the linear model between experience and wages, with the error terms $\epsilon_i$ indicated as vertical deviations from the straight line.}\label{fig:Estimators_normalIQLinearWage}
\end{figure}
The good thing about statistical models is that we can frequently test their foundations (see chapter \ref{chap:ModelTesting}). One way to do this is by asking what is the probability that we would have obtained a sample of data as extreme as the one we happened upon. For example, if we estimated a model for individuals' test scores; resulting in a normal distribution with a mean of 100, and a variance of 100. If we then obtained a further observation for an individual who obtained a score of 120, we could ask, how likely is it that we would have obtained a score more extreme than this from our model? To do this, we would need to work out the corresponding area under such a normal distribution\footnote{Do not worry if you don't understand this, as we will cover hypothesis testing in detail in chapter \ref{chap:ModelTesting}.} (see figure \ref{fig:Estimators_normalIQArea}), which yields a probability of about 2\%. If we had a sample of 100, we would hence expect, if our model was correct, to get results as extreme as ours for only for 1 person in 50. If this pattern was upheld if we collected more sample data, this would make us doubt that our model, and hence the assumptions on which it was formed, are in fact reasonable; prompting an re-examination of the latter.
\begin{figure}
\centering
\scalebox{0.6}
{\includegraphics{Estimators_normalIQArea.pdf}}
\caption{A normal distribution for IQ, with mean 100 and variance 100. The blue area represents the probability of obtaining an observation more extreme than 120.}\label{fig:Estimators_normalIQArea}
\end{figure}
\section{Sampling distributions}
When we carry out statistical inference we frequently wish to draw conclusions on populations of interest. The only snag is that we usually only have a sample from the wider world. In statistics, we use estimators on our sample to generate estimates of parameters within a model defined for the population as a whole. We then use these estimates to test hypotheses about the wider population, in a process known as \textit{statistical inference} (see figure ).
For example, imagine that we are wanting to investigate whether parental education positively influences the wage of an individual in the population, on average. We propose that there 'exists' some population process\footnote{Sometimes called \textit{estimands}.} which generates the weekly $\$$-wages of an individual $i$, of the linear form:
\begin{equation}\label{eq:Estimators_wageParentalEducationEstimand}
wage_i = \alpha + \beta peduc_i + \epsilon_i
\end{equation}
In (\ref{eq:Estimators_wageParentalEducationEstimand}), $peduc_i$ refers to the parental education of person $i$, perhaps measured in the number of full-time years of schooling completed. The term $\epsilon_i$ refers to the myriad of other individual factors which likely impact an individual's wage. Furthermore, we hypothesise that $\beta>0$, since we expect the impact of parental education to likely raise the prospects of their children, either directly through knowledge passed on, or indirectly through better schooling, and other opportunities.
Not wanting to exceed our research budget, we are forced to use a sample\footnote{We suppose that each sample can be thought of as being a \textit{random sample}. We will discuss this in chapter \ref{chap:OLS}.} of 100 individuals' data as a window to test whether parental education is an important determinant of an individual's wage they go on to earn. We can then use our sample, along with a particular estimator, for example least squares\footnote{We will cover this in chapter \ref{chap:OLS}.}, to estimate the parameter $\beta$ in the population model given in (\ref{eq:Estimators_wageParentalEducationEstimand}). Notwithstanding the many reasons which likely make this estimation methodology flawed, which we will discuss in detail in later chapters, we obtain an estimate of $\beta=50$; meaning that for each extra year of full-time education completed by parents, an individual on average earns $\$50$ more on average.
Not being content with the results obtained from the first sample, we then go on to obtain a second sample of size 100, on which to estimate $\beta$. Confusingly, we now obtain an estimate of $\beta=5$, which represents an effect only $\frac{1}{10}$ the size of our previous result!
Undeterred, we go back to the population of interest, each time generating a new sample, and again and again we obtain different estimates of the effect of parental education on individual wages. One time we even obtain a value that is negative!
You have now probably guessed what has been happening. The relation (\ref{eq:Estimators_wageParentalEducationEstimand}) is only meant to represent the \textit{average} population model; meaning some of the populous gained significantly from their parental education level, whereas for others, the effect was less marked. When we choose a sample of individuals from our population, we are implicitly picking individuals who exhibit heterogeneity in their personal impacts of parental education. Furthermore, this process of sample selection means that the each vat of people is slightly different in terms of their \textit{average} impact of parental education; reflected in the array of estimates we obtained for $\beta$, in each of the different samples.
The process which leads an estimator to produce a variety of parameter estimates is called \textit{sampling error}.
\section{Good properties of an estimator}
\section{The central limit theorem}\label{sec:Estimators_CLT}
\section{Econometrics: GM conditions}
\part{Cross sectional data: useful and important}
\chapter{Ordinary Least Squares: what is it, and when to use it?}\label{chap:OLS}
\chapter{How to make conclusions - an introduction to hypothesis testing}
\chapter{How to interpret regression results}
\chapter{Testing a model - does it work?}\label{chap:ModelTesting}
\section{Hypothesis tests}
\section{Replicate data generation}
\chapter{Testing the Gauss-Markov assumptions, and what to do if they are violated}
\chapter{Instrumental variables: allowing inference in difficult circumstances}
\chapter{Monte Carlo: How to test the quality of an estimator}
\part{Time series: harder to master, but necessary}
\chapter{Why and how do we need to think about time series differently to cross sectional?}
\chapter{The basic building blocks of time series models: autoregressive and moving averages}
\chapter{Testing for stationarity and what to do with non-stationary data}
\chapter{Cointegration: allowing for realism in time series models}
\chapter{An introduction to models for real processes: partial adjustment and error-correction models}
\part{Panel data: the best of both worlds}
\chapter{The benefits of panel data}
\chapter{Why do we need more estimators? An introduction to First Differences and Fixed Effects}
\chapter{The poor relation: Random Effects}
\part{A simple new paradigm in estimation: Maximum Likelihood}
\chapter{The flaws in the Linear Probability Model}
\chapter{Beautifully simple: An introduction to Maximum Likelihood}
\chapter{Draw conclusions by likelihood: the Wald, the Score and the LM tests}
\bibliographystyle{plain}
\bibliography{Bayes}
\end{document}