Please enable JavaScript.

Coggle requires JavaScript to display documents.

Descriptive + Predictive ML - Coggle Diagram

- - - - Support count
      - Support %
    - - (x,y) / #(x)
    - - LIFT >1 = Customer who bought X is more likely to buy Y
    - - If itemset X is not frequent then anything containing X cannot be
- - - - Distance metrics : often euclidean
      - decide K (# neighbors)
      - determine normalization (AFTER SPLIT)
      - Compute distance
      - Pick K nearest neighbors
    - - Overfitting - sensitive to outliers
      - underfitting - may miss structure of local data
      - choose lowest error rate
  - - - Start at top - ROOT node
      - work down : internal or decision node
      - Bottom: LEAF node (contains class label)
- - - - deal with missing values
      - remove outliers
      - normalize
  - - - compute pairwise distance between cluster + global centroid
      - High BSS = low intersimilarity
    - - add sqrd errors of all data points in a cluster
      - repeat for all clusters + add
      - Low WSS = high intrasimilarity
- - - - Numerical
        
        Euclidean - shortest distance between two points
        
        Manhattan - sum of the absolute different between 2 coordinates
        
        Max Coordinate - find absolute max difference
      - Binary
        
        Matching distance - number of mismatches / # total attributes
        
        Jaccard - excludes matches where Noo
      - Categorical
        
        matching approach - (K - M) / K
        
        Taxonomy approach - using industry standard product heirarchy
        
        translation approach - ex : distance between city names converted to #
- - - - 0 = pure
  - - - Positive = over
      - negative = under
    - - Magnitude of error in any direction
    - - Avg magnitude of error
    - - square prediction errors
    - - % indicates deviation of predictions