Please enable JavaScript.

Coggle requires JavaScript to display documents.

Lecture 5: Differences between microbial communities - Coggle Diagram

- - - - Other dissimilarity measures: Bray-Curtis, Euclidean, Chi-square etc
- - - - Random Forests
        
        Bootstrap data: randomly draw a subset of j samples (with replacement)
        
        generate decision trees using bootstrapped data and at each step, use a random subset of i features
        
        evaluate random forest using samples not in bootstrap data ("out-of bag samples") -> count if classification is correct -> confusion matrix
        
        ROC: thresholds for sensitivity - specificity trade-offs
        AUC: 0.5 = not better than random, 1 = perfect classifier
      - Feature x sample matrix (ni x mj)
        
        For all features xi, apply binary label
        
        Calculate purity of each subset
        
        Feature that maximizes purity of subsets is new node (test)
        
        Repeat 1 - 3 for each node
        
        If subset is (less) pure (than previous subset), node becomes a leaf (classification), else repeat 4 - 5
      - Gini impurity: 1 - P(h)^2 - P(d)^2
        -> then estimate weighted mean
      - do not generalize well
        -> generate random forests