Please enable JavaScript.
Coggle requires JavaScript to display documents.
Week 6: Multiple Linear Regression Model, Legend - Coggle Diagram
Week 6: Multiple Linear Regression Model
Simple Linear Regression Models
Not very accurate as Y tends to depend on more variables than just a single x
For example income not only depends in gender but education, age, etc.
The solution to this problem is to estimate a multiple linear regression model
The multiple regression model is calculated by the equation
It is estimated the same way in excel. Instead of electing a single column as the x input, we select several.
Must be in columns that are next to each other
Obtaining predictions from the model
We might want to know the average y for a person with certain characteristics (beta 1 and 2)
Consider this model
Income = β0 + β1 Age + β2 Education
= −117120 + 4541 × Education + 3369 × Age
To predict we simply plug in the x variables
Income = −117120 + 4541 × 12 + 3369 × 30 (1)
= 38442
Testing hypothesis in mlrm
Testing hypotheses about the slope
Formulate null and alternative hypothesis
Decide a significance level
Calculate Tstat and Critical Value
Make a decision and draw conclusion
Evaluating Regression models
R-Squared
Closely related to correlation coefficient
It is the square of the correlation between the actual Y variables and those predicted by the model (Y HAT)
It is called R squared in the excel output and it is a number between 0 and 1. Good models have value close to , close to 0 is a bad model
R squared explains the percentage of variation in y that is indeed explained by the model
Standard Error
(The standard deviation of the error term in the model)
On average this calculation explains how big the error in Y is, either above or below its actual value
Is it a big or small number?
Well is it useful to compare it to the sample mean of Y and/or the sorts of values Y takes
Error / Residual Plots
The aim of a regression model is to explain patterns in Y
What we would like to see if all patterns gone from the errors
If a pattern can be detected in the errors then it probably means we don't have a very good model
Non Linear Relationships Between x and Y
Categorical Variables with Two Categories
We can extend our regression model to also include a dummy variable
Now imagine we have a categorical variable with more than 2 categories
In this case we need t create dummy variables for 2 of them and include them in the model
The regression model will give us the information about the omitted variable, therefore the coefficients will be the variation relative to the omitted variable
Legend
Beta 0
Intercept
The estimated value of Y when x1 = 0 and x2 = 0
Beta 1 and 2
The
slopes
of y in respect to x1 and x2
estimate the value change in Y for 1 unit change in the respective variables
Take two people with the same age, one of whom has one more year of education than the other, the one person with one more year of education can expect to earn beta 1 more than the other
Error
Actual value of Y - the predicted value of Y
P Value
Mass of distribution thats in the tails