Regression
Intro
Measure the relationship
How much independent vari. affects dependent var.
causal relationship bw two variables must proven ahead
Purpose: prediction
if x changes by 1 -- how much change happen in y
regression - line generates the minimum least square
Intercept - value of Y when X = 0
The difference between actual y and predicted y is named 'arrowed' or 'residuals'
Assumptions
Normal distribution of dependent variable; X do not need to be mornally distributed
Hompskedasticity - varience of Y is the same at all vaules of X
Residuals should be normally distributed - use Q-Q plot to test it
Linear relationship with Y
Logistic Regression
Dependent variable
Independent variable
Categorical / Continuous
Categorical - 2 or more
Methods
Forced Entry
Binary logistic regression - 2 category
Multinomial logistic regression - >2 categories
Step wise
All predictors in 1 block
high predictive value selected
Not recommended
Assumptions
Sample size
number of predictors
predictors* 8 + 5
Multicollinearity
Collinearity diagnosis in statistics
Coefficient table
Collinearity statistics
Tolerance < 0.1
High intercorrelation
Outliers
Residuals
Problem with goodness of fit
Data preparation
Dichotomous
lack of sth - 0
more of sth -1
Continuous variable
High number - high value