Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 10: Representing/Mining Text (Representation: (3 Steps: (Stemmed,…
Chapter 10: Representing/Mining Text
Representation:
Bag of Words
Term Frequency (TF)
Corpus
Document
Token
3 Steps:
Stemmed
Stop words
Normalized
Inverse Document Frequency (IDF)
IDF equation
Upper (Rare) and lower limits (Common)
TFIDF
Combing the two (only for single document, not corpus)
*Relationship of IDF and Entropy
Beyond Bag O Words
N-gram sequences
Word order matters (bi/tri)
Topic Models
Topic "Layers"
Named Entity Extraction
Sequencing that captures proper name (Game of Thrones)
Summary
Stock price example
Conclusion