Please enable JavaScript.

Coggle requires JavaScript to display documents.

Representing and mining text (Text lingo (Frequency= word count (To make a…

- - - - So has to be converted
  - - - Linguistic structure for humans not computers
    - - Grammar not correct, words misspelled,
      - Contains synonyms and homographs
    - - Go through a good amount of preprocessing
  - - - Ignores grammar, word order and sentence structure (usually punctuation as well)
      - Treats each word as a potentially keyword of the document
- - - - Normalize case, every term is in lowercase
      - Stemmed: suffixes removed
      - Stop words removed, like and, on, of
        
        Not always a good idea