Linear Regression
Type of Plot:
Scatterplot
Simple Linear Regression:qplot(x = explanatory,
y = response,
data = dataframe,
geom = "point")
Parameters: βi for i∈0,…,k−1
Point Estimates: Bi for i∈0,…,k−1
Hypothesis Test
Hypotheses:
\( H_0: \beta_i = 0 \)
1) \( H_a: \beta_i < 0 \)
2) \( H_a: \beta_i \ne 0 \)
3) \( H_a: \beta_i > 0 \)
Test Statistic Random Variable (Assuming \(H_0\) is true):
\( T = \dfrac{B_i - 0}{{SE}_i} \sim t(df = n - k) \) where \( SE_i \) is defined here
Observed Test Statistic:
\( t_{obs} = \dfrac{b_{i, obs} - 0}{{SE}_{i, obs}}\) where \( {SE}_{i, obs} \) is defined here
\( \mathbf{\textit{P}}\)-value:
1) \( \mathbb{P}(B_i \le b_{i, obs}) = \mathbb{P}(T \le t_{obs}) \)
2) \( \mathbb{P}\left(\big| B_i \big| \ge \big| b_{i, obs} \big| \right) = \mathbb{P}\left(\big| T \big| \ge \big| t_{obs} \big|\right) \)
3) \( \mathbb{P}(B_i \ge b_{i, obs}) = \mathbb{P}(T \ge t_{obs}) \)
Conditions for Distributional Approximation (To \( T\)):
- Linear relationship between between response and predictors (Check residual plot for randomly distributed errors)
- Independent observations, errors, and predictor variables (Check residual plot for no time series-like patterns and plot the predictors pairwise)
- Nearly normal residuals (Check qqplot of standardized residuals)
- Equal variances across explanatory variable (Check residual plot for fan-shaped patterns)
Confidence Interval
Formula for CI:
\( b_i \pm t_{obs}^* \cdot {SE}_{obs} \) where \( {SE}_{obs} \) is defined here
Conditions for Distributional Approximation (To \( T\)):
- Linear relationship between between response and predictors (Check residual plot for randomly distributed errors)
- Independent observations, errors, and predictor variables (Check residual plot for no time series-like patterns and plot the predictors pairwise)
- Nearly normal residuals (Check qqplot of standardized residuals)
- Equal variances across explanatory variable (Check residual plot for fan-shaped patterns)
\( Y = \beta_0 + \beta_1 X_1 + \ldots + \beta_{k - 1} X_{k - 1} + \varepsilon \)
\( k = 2 \) in Simple Linear Regression
Multiple Linear Regression plots are commonly in dimensions \( \ge 3 \)