Please enable JavaScript.

Coggle requires JavaScript to display documents.

Representing And Mining Text (Example: Mining News Stories to Predict…

- - - - tables of records with fields
        having fixed meanings (essentially, collections of feature vectors)
- - - - So if every word is a possible feature
        
        each document is represented by a one (if the token is present in the document) or
        a zero (the token is not present in the document)
- - - - in some applications, the importance of a term in a document should increase with the
        number of times that term occurs
        
        term frequency representation.
- - - - depend on the
        application
        
        . For clustering, there is no point keeping a term that occurs only once
        
        it will never be the basis of a meaningful cluster
  - - - Overly common terms are typically eliminated
        
        impose an
        arbitrary upper limit on the number (or fraction) of documents in which a word may
        occur
- - - - the document
        counts across the corpus form the IDF values