Crash Testing Data
Crash Testing Data
1st Issue: Multicollinearity
High Correlation between Explanatory Variables
2nd Issue: Heteroskedasticity
NFL Quarterback Salary Example
Data can crash if assumptions are violated
Brief Regression Review of the b's
Do the coefficient signs meet intuition?
Are variables significant? (2-t Rule of Thumb)
Explore Potential Multicollinearity
Correlation Matrix (High if greater than 0.7)
Scatterplot
Decision Tree for Checking
High Collinearity Leads to High Variance
Test for High Variance with VIF
Consequences of High Variance in an Explanatory Variable
Remedy for Multicollinearity: Ridge Regression
Avoid Dropping Variable
Ridge does not Use OLS (OLS uses B.L.U.E.)
Ridge cares more about reducing Variance
What High Variance of Coefficients looks like
(unbiased but not-precise/"wild")
Low Variance of Coefficients with Ridge (more precise, but some bias)
Quarterback Data Example (Better Coefficients, Lower Variance)
All Rights Reserved by Brent Marinan & University of Arizona
Tableau Video
What is a Residual?
Homoskedasticity (Constant Variance of the Residuals)
Heteroskedasticity (Non-Constant Variance of the Residuals)
vs.
Common Example: Spending & Income
Hypothesis Test for Heteroskedasticity: Breusch-Pagan
Remedy for Heteroskedasticity:
- Transform Response Variable OR
- Weight the Residuals
Check if Residuals are Normal
Example & Remedy of Heteroskedasticity (Spending & Income)
Visual of Residuals on Income
Regress Spending on Income (is Variance of observations increasing?)
Breusch-Pagan Test for Heterskedasticity
Attempted Remedies:
- Transform Spending to Log Form
- Weighted Least Squares
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.