Please enable JavaScript.

Coggle requires JavaScript to display documents.

Regression (Simple Linear Regression (Equation: outcome = model + error…

- - - - Yi: outcome
      - Model
        
        bo/b1:
        regression coefficients (parameters)
        
        b1: regression coefficient of IV
        
        slope
        
        direction/str. of relationship
        
        bo: y- intercept.
        
        DV when IV = 0
        
        Xi: ith ppt's score on IV
      - ei: residual term (not always included)
        
        vertical dif btwn i's predicted score and actual score
        
        represents that model will not perfectly fit data
      - To use:
        
        Find line of best fit
        
        Get estimates of slope/intercept
        
        Plug in IV values to estimate value of DV
      - Assessing model
        
        Finding line of best fit: Method of Least Squares
        
        * look up
        
        Goodness of Fit
        
        F-statistic
        
        test ability if linear regression model to predict outcome
        AKA is model about to significantly predict outcome?
        
        R2
        
        % of variance in DV explained by model
        
        effect size
        
        Indiv. Predictors (b)
        
        test whether IV significantly predicts DV
        
        b
        
        If statistically significant, IV makes signif. contribution to predicting DV
        
        Slope ; str. of relationship
        good: signif. dif from 0
- - - - predicted from combination of all variables x respective coefficients + error
      - b0: DV when all X = 0
      - bn: regression coefficient of nth variable
      - If 1 IV, then plane instead of line
  - - - which, and in what order
        
        must be based on past-research/theory
        Goal: parsimonious model
        
        accomplishes a desired level of explanation or prediction with as few predictor variables as possible
      - Choosing a method
        
        Forced Entry
        
        used when no precedents for research question
        
        Stepwise
        
        Concerns:
        
        statistical significance may not match theoretical importance
        
        overfitting: too many predictors that don’t add much
        
        underfitting: missing important predictors
        
        backward preferable to forward bc less change of Type II error
        
        Final line: Limit use of stepwise methods to exploratory analysis
        
        Hierarchical
        (Best)
        
        Pro
        
        based on theory/research
        
        can see unique effect of new variable on DV
        
        Con:
        
        takes skill
    - - Stepwise
        
        predictors selected by computer based on semi-partial correlation w/ outcome
        
        3 methods
        
        Forward
        
        computer adds 1 signif. predictor at a time
        
        Step-wise
        
        same as forward, but removes any that become non-signif. at each step
        
        Backward
        
        puts all in, then removes 1 non-signif. predictor at a time
      - Forced Entry
        
        experimenter enters all predictors simultaneously
        
        controls for effects of all other variables
      - Hierarchical
        (blockwise)
        
        experimenter decides order
        
        variables of most interest in last
        
        Purpose:
        
        ID if "new" variables predict outcome
        
        control for covariates
        
        past research on some variables, but others exploratory
        
        steps vs blocks
        
        groups, with stepwise (forward, backward, stepwise)
        
        groups, forced entry in each block
        
        one by one
- - - - Cue: Model won't predict score very well
        
        Look for any case with large residual
        
        Determining Large residual
        
        Standardized residuals
        
        z-scores
        
        assess size, apply universal guidelines
        
        Residuals analysis
        
        test amount of error in a model
        
        ID extreme cases a/o outliers
        
        ID Influential cases
        
        Tests of Influence
        (of single case)
        
        Cook's Distance
        
        meas overall influence of case on model
        
        Value >1 ~ influential case
        
        Leverage
        (aka hat values)
        
        influence of observed value of DV on predicted values
        
        Avg. leverage value
        
        3 more items...
        
        Mahalanobis Distances
        
        distance btwn cases & means of predictors
        
        2 more items...
        
        DFBeta
        
        dif btwn parameter estimates using all cases VS when one case in excluded
        
        1 more item...
        
        standardized DFBeta
        
        1 more item...
        
        absolute values >1 -> influential case
        
        Does data pt consistently influences model?
  - - - Independent Errors
        
        For any 2 obs., resitual terms should be uncorrelated
        
        Durbin-Watson test
        
        <1 or >3 = violation of assumption
        
        <2 = (+)correlation
        
        2 = (-)correlation
        
        2 = uncorrelated
      - Homoscedasticity
        
        at each level of IV, variance in residual terms should be constant
        
        test visually
      - Multicollinearity
        
        Identifying Multicollinearity
        
        Tolerance
        
        reciprocal of VIF (1/VIF)
        
        values <.01 - problem
        
        Variance Inflation Factor (VIF)
        
        10 - problem
        
        whether predictor has strong linear relationship w/ other predictors
        
        correlations >.80, >.90 btwn 2 IV
        
        only bivariate; misses subtle forms of collinearlity
        
        strong correlation btwn 2+ predictors
        
        ??? There should or shouldn't be multicollinearity?
      - No perfect multicollinearity
        
        NO perfect linear relationship btwn 2+ predictors
      - Independence
        
        All values of outcome variable are independent (come from separate entity)
      - Predictors are uncorrelated w/ "external variables"
        
        influential variables not included in regression model
      - Non-zero variance
        
        variance =/= 0
        
        predictors should have some variation in value
      - Variable types
        
        DV continuous & unbounded
        
        ex. if variable can range from 1-10, but all values are 3-7, then variability is constrained
        
        unbounded: no constraints on variability
        
        all predictors quatitative OR dichotomous
      - Normally distributed errors
        
        residuals in model are normally distributed
        
        observed data doesn't need to be normally distributed
        
        ?
        
        test by examining histogram, p-p plots
      - Linearity
        
        relationship we are modeling is linear
        
        test by examining scatterplots
    - - assess accuracy of model across dif samples
      - if predictive power drops when applied to dif sample, then model does not generalize
      - Methods of Cross-Validation
        
        Adjusted R2
        
        indicates loss of predictive power from sample to population
        
        shrinkage
        
        estimate how much variance in Y would be accounted for if model had been derived from population
        
        ????
        
        Data Splitting
        
        Randomly split data set, compare regression equations
        Compare R2 & b in the 2 samples to see how well the model generalizes