Please enable JavaScript.
Coggle requires JavaScript to display documents.
Natural Language Processing - Coggle Diagram
Natural Language Processing
topic identification
text clasification
regular expressions
regex
import re
translation
sentiment analysis
tokenization
nltk
SpaCy
NLP library similar to gensim, with different implementations
gensim
NLP library
polyglot
vectors for many different languages, more than 130
lemmatization
convert word into its base form
am, is, are -> be
text cleaning
unnecessary whitespaces and escape sequences
punctuations
special characters (numbers, emojis, etc.)
stopwords
N-gram modeling
is a fundamental concept in natural language processing (NLP) used to predict the next item in a sequence based on the previous n-1 items.
TF-IDF (Term Frequency-Inverse Document Frequency)
is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It is widely used in information retrieval and text mining.