Please enable JavaScript.

Coggle requires JavaScript to display documents.

Subset Selection, Regularization/ Shrinkage Methods, Dimension Reduction,…

- - - - "" ~ Best subset selection;
        
        For k = 0,...,p − 1:
        (a) Consider all p − k models that augment predictors in Mk with one additional predictor.
        (b) Choose the best among p − k models, and call it M(k+1)
        best = having smallest RSS or highest R^2.
        
        "" ~ Best subset selection;
    - - e.g. Best possible 1-variable model contains X1 but
        best possible 2-variable model contains X2 & X3 =>
        Forward Stepwise Selection Fails to select best possible 2-variable model cos it must contain X1 w. 1 additional variable
    - - After adding each new variable, Remove any variables that no longer provide an improvement in the model fit
  - - - Mp=full model with all p predictors
      - For k = p, p − 1,..., 1:
        (a) Consider all k models that contain all but one of the predictors in Mk, for a total of k − 1 predictors.
        (b) Choose the best among k models, called M(k−1)
        best = having smallest RSS or highest R-squared.
        
        "" ~ Best subset selection
- - - - λ [0, +∞]
        
        λ Selection w. CV :explode:
        
        Choose a grid of λ values, and compute the cross-validation error for each value of λ
        
        Select the tuning parameter value for which the cross-validation error is smallest
        
        Model is re-fit using all available observations and the selected value of the tuning parameter
        
        :arrow_double_up: λ increases => β :arrow_double_down:=> predictions (y) become less sensitive to predictor => distance of β from 0 (l2 norm) decreases
        
        λ=0 = No penalty effect
      - Steeper slope = y is very sensitive to relatively small changes in x (=large changes in y for small changes in x)
      - Ridge reg = numerator / (denominator + λ)
        Since λ >=0; ridge β1 < ols β1
      - e.g. Continuous predictor variable;
        λ =1; slope_red > slope_blue
        
        Reduces Variance
        
        Predictions made w. Ridge Regression Line are less sensitive to weight than the Least Squares Line
      - e.g. Discrete predictor variable; size = 1.5 + 0.7 x weight; avg size on Normal diet = 1.5;
        avg size on Fat diet - avg size on Normal = 0.7
        
        Ridge reg = RSS + λ x Diet diff^2
        λ :arrow_double_up: => Diet diff :arrow_double_down:
      - More complicated models:
        
        Size = y-intercept + slope x weight + diet_diff x high fat diet
        
        Minimize: RSS + λ (slope^2 + diet_diff^2)
- - - - ridge = β1^2+β2^2<=s
      - lasso = |β1|+|β2|<=s
- - - - 1) Project the data points perpendicularly to the line
      - 2) Line with (a) Minimum distance from the data points to the line (OR) (b) Maximum distance from the projected points to the origin (Easier to calculate) :check:
        
        Maximize Sum of squared distances= d1^2+…+dn^2 = SS(distances)
      - PC2 =>
        
        Linear combination of variables uncorrelated with PC1 & has largest variance subject to this constraint
        
        Zero correlation = PC2 perpendicular to PC1
    - - Use projected points to find where the samples are in PCA plot
      - Project the data points perpendicularly to the best fitted lines and rotate the graph so PC1 is horizontal
  - - - Eigenvector = Singular Vector = 1-unit hypotenus vector
        
        Eigenvector = Singular Vector for PC1
      - Loading scores = Linear combination = proportions of each variable to make PC
        e.g. 0.97, 0.242
      - Eigenvalue = SS(distances btw projected points and origin for PC)
      - Singular value = sqrt(Eigenvalue)
    - - Scree Plot
        
        Graphical representation of percentages of variation that each PC account for
      - Eigenvalue of PC/(n-1) where n = sample size
      - If Variation for PC1=15 & Variation for PC2=3, total variation = 18 and PC1 accounts for 15/18=0.83 (83%) of total variation around PCs