Please enable JavaScript.

Coggle requires JavaScript to display documents.

Provost Ch 10: Representing and Mining Text (Important step of data mining…

- - - - text is "unstructured" (linguistic) data
- - - - bag of words = treat every word as potentially important
        
        term frequency = count occurrence of words for importance in document
        
        measuring sparseness = occurrence frequency in entire corpus (inverse frequency)
        
        TFIDF = term frequency x inverse document frequency
        
        IDF and entropy are similar
- - - - easy to generate