APPLIED ECONOMETRICS
The Linear Regression Model
Critical Evaluation of the LRM
Models of Limited Dependent Variables
Basic models with Panel data
Endogeneity
Model: Y = beta0 + beta1*Xi + ui
Economic data structure
Time series data
Cross sectional data
Panel data
Assumptions of the Classical LRM
Hypothesis testing
Testing individual coefficient: t test
Testing multiple coefficients: F test
Functional forms
Linear model
Log-linear
Lin-Log
Reciprocal
Polynomial
"Beta" change in Y when X increases by 1 unit.
The slope coefficients can be interpreted as elasticities
If X increases by 100%, predicted Y increases by "beta" units
The slope is negative
The slope is nonlinear
Multicollinearity
Heteroskedasticity
Specification errors
Definition
Perfect or Imperfect multicollinearity
Sources
Data collection method used
Model specification
Economic function
Variables sharing a common time trend
There is no exact linear relationship among the regressors
Consequences
The R square value may be very high
OLS estimators are still BLUE
Making the t ratios small
Detection
VIF
High pair-wise correlations
Significant F test for auxiliary regressions
Wrong expected sign but high R square
Solution
Do nothing
Restructuring of the model
Dropping one independent variable
Definition
The error term is constant or homoskedastic
Reasons
The presence of outliers
Incorrect functional form
Mixing observations with different measures of scale
Consequences
The estimators are less efficient
Making statistical inference less reliable
Detection
Graph squared residuals (or residuals) against predicted Y
Breusch-Pagan (BP) test
White’s test
Solutions
Weighted Least Squares (WLS)
Robust standard errors
Sources
Incorrect choice of variables
Incorrect functional forms
Omission of Relevant Variables
Regression coefficients will be biased
Inclusion of Irrelevant Variables
Unbiased and Consistent
Tests of hypotheses are invalid
Panel data
Pooled OLS regression
Fixed effect model
Combine time series and cross sectional data
The Fixed effect Least-Squared Dummy variables model
The Fixed effect Within-Group estimator
The Fixed effect First difference estimator
Random effects model
Assumed not to correlate with regressors
No distinction between subjects and times
RE vs Pooled OLS: BP test
FE vs Pooled OLS: F test
FE vs RE: Hausman test
The fourth assumption of OLS is violated
Sources
Omitted variables
Simultaneity or reverse causality
Measurement error
Consenquences
Biased and Inconsistent
Solutions
IV estimation
Panel data: fixed effects, random effects
GMM
Regression discontinuity
Natural experiments
DID or PSM
Logit and Probit model
Binary dependent variable
OLS with binary dep var
The Linear Probability Model
Disadvantages
Linearly correlate
May be out of [0,1]
Non-normally distribution
Unequal variance
Logit model
Logistic distribution with the Logit model
Assumes the logit linearly correlates with Xi
"beta" is the change in log-odd ratio when xj increase by 1 unit
The marginal effect of Xi changes
Normal distribution with the Probit model
Estimation method: Maximum likelihood
Multinominal logit model
Nominal dependent variable
Logistic distribution
Estimation method: Maximum likelihood
Ordered Probit model
Ordinal dependent variable
Normal distribution
Estimation method: Maximum likelihood
Tobit model
Count model
Y is Censored & Truncated data
Applying OLS => Biased
Estimation method: Maximum likelihood
Three types of marginal effects
The dep var is a non-negative integer
OLS may result in negative values
Poisson distribution
Assumption: mean = variance
Estimation method: Maximum likelihood
mean > variance: UNDERDISPERSION
mean < variance: OVERDISPERSION
If assumption is violated => negative binomial model
Do Huu Luat