Please enable JavaScript.
Coggle requires JavaScript to display documents.
Dummy variables, functional forms, big data - Coggle Diagram
Dummy variables, functional forms, big data
Dummy variables
categorical variables
generalisation of binary variables
e.g. how strongly you agree with a statement 1-5How to deal with this?
- convert them into series of dummy variables
e.g. for education -
primary: highest degree primary/lower=1 if true =0 if false
etc...
the dummy variable trap
- set of multiple binary dummy variables
- mutually exclusive and exhaustive, multiple categories and every observation falls in only category
- if include all these dummy variables and a constant you will get perfect multicollinearity
- the dummy variable trap
why is there perfect multicollinearity
- Omit one of the groups
- This is taking a benchmark category
- perfect multicollinearity usually reflects a mistake/oddity in the data
- when benchmarking, think about which category makes it easier to interpret (most frequency)
-
-
functional forms
-
log functions
2. log-linear
ln(Y)=β0 +β1X
3. Log-Log
ln(Yi) = β0 + β1ln(Xi) + ui
- a 1% change in X is associated with a B1% change in Y
- interpretation of elasticity
1. Linear-log
Y = β0 + β1ln(X )
- compute Y before and after changing X
- subtract after - before
- shows % increase
-
Heteroskedasticity, homoskedasticity & SE
heteroskedasticity
non constant
- Robust SE, robust to heteroskedasticity
What does this mean
- if variance is NOT constant, u is said to be heteroskedastic
homoskedasticity
constant
- Un robust SE - homoskedasticity only
What does this mean?
- if variance is constant, u is said to be homoskedastic
- if homoskedastic, another formula for SE becomes available
- estimates are more precise if there is less variance
Defining if something is heteroskedasticl/homoskedastic
analysis of residuals
- estimate model, obtain residuals, construct test statistic
- can't analyse variance directly but we can analyse the residuals
-
Big data
-
The lasso
- Shrinks the estimate towards 0 by penalising large outliers (absolute values)
- in a sample with a lot of data, overfitting very likely , so lasso 'drops' many predictors
- lasso requires an additional parameter lambda