Regression

Intro

Measure the relationship

How much independent vari. affects dependent var.

causal relationship bw two variables must proven ahead

Purpose: prediction

if x changes by 1 -- how much change happen in y

1_k2bLmeYIG7z7dCyxADedhQ

regression - line generates the minimum least square

Intercept - value of Y when X = 0

The difference between actual y and predicted y is named 'arrowed' or 'residuals'

Assumptions

Normal distribution of dependent variable; X do not need to be mornally distributed

Hompskedasticity - varience of Y is the same at all vaules of X

Residuals should be normally distributed - use Q-Q plot to test it

Linear relationship with Y

Logistic Regression

Dependent variable

Independent variable

Categorical / Continuous

Categorical - 2 or more

Methods

Forced Entry

Binary logistic regression - 2 category

Multinomial logistic regression - >2 categories

Step wise

All predictors in 1 block

high predictive value selected

Not recommended

Assumptions

Sample size

number of predictors

predictors* 8 + 5

Multicollinearity

Collinearity diagnosis in statistics

Coefficient table

Collinearity statistics

Tolerance < 0.1

High intercorrelation

Outliers

Residuals

Problem with goodness of fit

Data preparation

Dichotomous

lack of sth - 0

more of sth -1

Continuous variable

High number - high value