Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 10: Representing and Mining Text (Text is just another form of…

- - - - Not the sort of structure you normally expect from data
    - - intended for humans not computers
    - - Not always flawless (spelling, grammatical errors)
        
        even when flawless it still may contain issues (synonyms, homographs, terminology)
  - - - simplest (least expensive) technique that works
    - - document is one piece of text no matter how large or small
    - - a document is composed of (word)
    - - Collection of documents
- - - - and assign values with TFIDF
        
        this approach is simple and inexpensive