Please enable JavaScript.

Coggle requires JavaScript to display documents.

Provost Chapter 10 - Representing and Mining Text (Why It's Important,…

- - - - Treat every word within a document individually
      - Ignore grammer, sentence structure, etc
    - - Take all the words in a document and count them. Remove prefixes and suffixes, remove capital letters, remove end-words
  - - - The consideration of sequences of adjacent words as terms
      - Easy to generate, require no linguistic knowledge, no complex parsing algorithm
    - - The recognition of common named things ie Silicon Valley, New York City, etc.
      - Have to be trained on a large corpus or hand coded with extensive knowledge