Please enable JavaScript.
Coggle requires JavaScript to display documents.
text mining (Steps (Collect data (Which criteria we use to define "…
text mining
Steps
Collect data
Which criteria we use to define "popular" songs?
English or German?
Where do we find the lyrics and how do we collect them?
Convert date into readable text
packages
R: twitteR, tm, stringr
Python: nltk library, Tweepy package
Remove special characters from the text
Before remove, play around to get familiar with the data
Document term matrix
term-frequency matrix
correlation between words
draw a word cloud using term-frequency
predict patterns using modelling techniques
Remove numbers from the test data
Convert all the text to upper/lower case
Remove stop words: articles, conjunctions etc
Final analysis of process stemmed words and visualize results
Find patterns
option: fite a simple classifier
option: associations
Visualize results
forms
word clouds
sentiment studies
figures
packages
R: ggplots, igraph, text2vec, networkD3, ploty
topic modeling
nromalization
stemming
lemmatization
Language
R
More handy functions
Python
More intuitive
Beyond text mining
deep learning
statistical topic detection modeling