Please enable JavaScript.
Coggle requires JavaScript to display documents.
Regression Model (Week 4) (Simple regression = single predictable variable…
Regression Model (Week 4)
Simple regression = single predictable variable x to estimate the mean of an outcome y as a function of x
straight line = Linear regression E (Y|X) = bo + b1X
Correlation is a measure of the strength of the linear association, r; -1<1<1.
0 - no linear association . 1 = positive association; -1 negative association
predictive analytics; predictive interview, slope
Provide the variability in the outcome. numeric value for the variability
Method of leas squares. Find the line that minimize the sum of the squares of the virtual distance from the line
decompose the data into 1) the fitted values - prediction - forecast ; 2) residuals - vertical points to line - quality of the fit
Look at the outlier (bit residual) to understand why it wouldn't fit into the model
r square = proportion of variability in y explains by the regression model = r ^2 , higher the better %, comparative benchmark ; no units
RMSE (mean square error ) RMSE = measure the spread the line, the noise = standard deviations of the residuals ; lower are better, so that the points are closer to the line
normal distribution set on the regression line ; the line all have the same standard deviation
only use it within the range of data, an approximate 95% prediction interval for a new observation is
Forecast +/- * RMSE
reply on the normal distribution
check that the residuals are normal distributed
diagnonstics
fitting curves to data
take the log transform with the scale by logx and logy so that it will fit into a line (linear regression)
E (Log y| x) = b0 + b 1Log (x)
b1 = elasticity = % change in x = % change in y
log long regression model
Multiple Regression
2 predictors; E (Y|x1, x2) = b0 + b1X1 + b2X2
E (GP1000M | weight, horse power ) = 11.68 + 0.0089 weight + 0.0884 HoursePower
Logistic regression
discrete variable (yes/no; live/die; yes/no)
used to estimate the probability of success - using Bernoulli random variables
the probability must be < 1 thus can't do straight line; the logistic regression (s shape) always predict probability between 0 and 1