APPLIED ECONOMETRICS

The Linear Regression Model

Critical Evaluation of the LRM

Models of Limited Dependent Variables

Basic models with Panel data

Endogeneity

Model: Y = beta0 + beta1*Xi + ui

Economic data structure

Time series data

Cross sectional data

Panel data

Assumptions of the Classical LRM

Hypothesis testing

Testing individual coefficient: t test

Testing multiple coefficients: F test

Functional forms

Linear model

Log-linear

Lin-Log

Reciprocal

Polynomial

"Beta" change in Y when X increases by 1 unit.

The slope coefficients can be interpreted as elasticities

If X increases by 100%, predicted Y increases by "beta" units

The slope is negative

The slope is nonlinear

Multicollinearity

Heteroskedasticity

Specification errors

Definition

Perfect or Imperfect multicollinearity

Sources

Data collection method used

Model specification

Economic function

Variables sharing a common time trend

There is no exact linear relationship among the regressors

Consequences

The R square value may be very high

OLS estimators are still BLUE

Making the t ratios small

Detection

VIF

High pair-wise correlations

Significant F test for auxiliary regressions

Wrong expected sign but high R square

Solution

Do nothing

Restructuring of the model

Dropping one independent variable

Definition

The error term is constant or homoskedastic

Reasons

The presence of outliers

Incorrect functional form

Mixing observations with different measures of scale

Consequences

The estimators are less efficient

Making statistical inference less reliable

Detection

Graph squared residuals (or residuals) against predicted Y

Breusch-Pagan (BP) test

White’s test

Solutions

Weighted Least Squares (WLS)

Robust standard errors

Sources

Incorrect choice of variables

Incorrect functional forms

Omission of Relevant Variables

Regression coefficients will be biased

Inclusion of Irrelevant Variables

Unbiased and Consistent

Tests of hypotheses are invalid

Panel data

Pooled OLS regression

Fixed effect model

Combine time series and cross sectional data

The Fixed effect Least-Squared Dummy variables model

The Fixed effect Within-Group estimator

The Fixed effect First difference estimator

Random effects model

Assumed not to correlate with regressors

No distinction between subjects and times

RE vs Pooled OLS: BP test

FE vs Pooled OLS: F test

FE vs RE: Hausman test

The fourth assumption of OLS is violated

Sources

Omitted variables

Simultaneity or reverse causality

Measurement error

Consenquences

Biased and Inconsistent

Solutions

IV estimation

Panel data: fixed effects, random effects

GMM

Regression discontinuity

Natural experiments

DID or PSM

Logit and Probit model

Binary dependent variable

OLS with binary dep var

The Linear Probability Model

Disadvantages

Linearly correlate

May be out of [0,1]

Non-normally distribution

Unequal variance

Logit model

Logistic distribution with the Logit model

Assumes the logit linearly correlates with Xi

"beta" is the change in log-odd ratio when xj increase by 1 unit

The marginal effect of Xi changes

Normal distribution with the Probit model

Estimation method: Maximum likelihood

Multinominal logit model

Nominal dependent variable

Logistic distribution

Estimation method: Maximum likelihood

Ordered Probit model

Ordinal dependent variable

Normal distribution

Estimation method: Maximum likelihood

Tobit model

Count model

Y is Censored & Truncated data

Applying OLS => Biased

Estimation method: Maximum likelihood

Three types of marginal effects

The dep var is a non-negative integer

OLS may result in negative values

Poisson distribution

Assumption: mean = variance

Estimation method: Maximum likelihood

mean > variance: UNDERDISPERSION

mean < variance: OVERDISPERSION

If assumption is violated => negative binomial model

Do Huu Luat