Please enable JavaScript.
Coggle requires JavaScript to display documents.
Linear Regression, Ordinary Least Squares, Maths misc, Linear models w.
t…
Linear Regression
Algorithm
- Calculate p-value for R-squared and determine if it is statistically significant
- Use Least Squares to fit the data
-
What
-
-
-
ϵ = error term => actual error = difference between observations and true regression line
(which we will never know and can only estimate with βs)
Assumptions
-
-
-
-
Homoscedasticity (=Residuals varies similarly across the values of the independent variables; i.e. no identifiable pattern of residuals)
Transformations on dependent variable: log, square-root
-
Heteroscedasticity => log(Y) => Homoscedasticity

Fitness of model
R-squared
-
-
-
Interpretation
Depends on the problem, data, etc.
P-value for R^2
-
-
- Generate random dataset & calculate F-value
- Repeat 1 for hundreds/thousands of times
- Plot distribution of F-value
- Calculate F-value for dataset of interest
- Get p-value for that F-value (=prob of getting F-value >= that F-value)
-
-
-
-
Variable Selection
Forward
- Start w. Null Model
- Add var resulting in minimum RSS
- Repeat #2 until stopping rule is satisfied
Backward
- Start w. all vars
- Remove var w. largest p-value
- Stop when p-value < threshold
Mixed
- & 2. of Forward
- If p-value > threshold, remove
- Repeat
Until all variables in the model have sufficiently low p-value, & all variables outside the model have large p-value if added to the model
Outliers
|Studentized residuals| > 3 = Outlier

-
t-test + regression
e.g. Goal: Predict Mouse size;
Independent vars available: Mouse weight & Mouse type (Control=red; Mutant=green)
- Use Least Squares to fit data from each category
- Find y equation and design matrix for all data
- Calculate residuals for each fitted line (RSS)
1st=Both lines intercept y-axis at some point
2nd=Mutant offset on or off
3rd=Mouse weight data
p_fit = 3
-
-
-
-
-
-
-
-
Ordinary Least Squares
-
-
-
Matrix Form
Y = Xβ+e; Y=n x 1; e=n x 1
X=n x (k+1); β=(k+1) x 1;
k=num of independent variables;
n=num of observations
RSS = e'e
-
Minimize RSS

-
-
-
-
Maths misc
-
-
Summations

sum(y) = N*(sum(y)/N)
-
-
-
-
Design matrix
What?
-
-
e.g. Linear regression
-
-
e.g. y-intercept=0.01, slope=0.8;
y = 0.01 x 1 + 0.8 x 1.6 = 1.29
:!?: Basically it's just X matrix w. a different name, isn't it?
__eq(2)
-
-
-