Please enable JavaScript.

Coggle requires JavaScript to display documents.

SNLP, Blindingly add the same counts to all grams., The updated counts are…

- - - - Term Frequency:
        \( TF(t, d) = \frac{f_{t, d}}{\sum_{t' \in d} f_{t', d}} \)
      - Inverse Document Frequency:
        \( IDF(t, D) = \log \frac{N}{|{d \in D : t \in d}|} \)
      - Term Strength: (not in exam)
        \( s(t) = p(t \in d_2 | t \in d_1) \)
    - - Information Gain:
        \(\begin{align*} G(t) &= H(C) - H(C|t) \\ &= H(C) + p(t) \sum_i p(c_i|t) \log p(c_i|t) + p(\neg t) \sum_i p(c_i|\neg t) \log p(c_i|\neg t) \end{align*}\)
      - Pointwise Mutual Information:
        \( pmi(t, c) = \log \left( \frac{p(t, c)}{p(t)p(c)} \right) \)
      - Chi-statistics:
        \( Chi^2 = \sum_{cells} \frac{(O-E)^2}{E} \)
        
        d.o.f = (m-1)(n-1).
- - - - TF normalizes to make it fair for short and long sentences/documents.
      - IDF penalize, e.g. stopwords.
  - - - difficult to update
        
        manual choose k
        
        only surface dependencies.
      - intepretable.
        
        fast, and easy to implement.
        
        handle synonyms.