Please enable JavaScript.

Coggle requires JavaScript to display documents.

SageMaker - Coggle Diagram

- - - - Measures for the multicollinearity in your data:
        
        Variance Inflation Factor (VIF) is a measure of collinearity among variable pairs. Data Wrangler returns a VIF score as a measure of how closely the variables are related to each other.
        
        A VIF score is a positive number that is greater than or equal to 1
        
        VIF <= 5 moderately correlated with the other variables
        
        VIF >= highly correlated with the other variables
        
        PCA measures the variance of the data along different directions in the feature space. Also referred to Singular Value Decomposition (SVD)
        
        PCA generates an ordered list of variances (aka singular value) >=0
        
        When the numbers are roughly uniform, the data has very few instances of multicollinearity. When there is a lot of variability among the values, we have many instances of multicollinearity.
        
        Lasso feature selection uses the L1 regularization technique to only include the most predictive features in your dataset.
      - Multicollinearity is a circumstance where two or more predictor variables (features) are related to each other. When you have multicollinearity, the predictor variables are not only predictive of the target variable, but also predictive of each other
      - Multicollinearity is a special case of correlation multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model is linearly predicted from the others with a substantial degree of accuracy
    - - Visualization techniques
        
        Scatter plot
        
        Bar chart
        
        Box plot
      - Correlation matrix
        
        Determine the relationships between features in your dataset
        
        Include features with high correlation with each other when building model
        
        For the best data insights, include your target column in the correlation matrix
        
        Data type support:
        
        Supported: numeric, categorical, or binary
        
        Not Supported: datetime or text data
        
        Correlation type:
        
        Numeric comparison: Pearson (linear) or Spearman (non-linear or mixed)
        
        Categorical comparisons: MI (Mutual Information Classification)
        
        Categorical and mixed comparisons: Spearman & MI
        
        Notes:
        
        Analytics for models built on tabular datasets
        
        Multi-category text prediction models are not supported