Please enable JavaScript.
Coggle requires JavaScript to display documents.
text similarity measurement - Coggle Diagram
text similarity measurement
Lexical similarity
Character-based Methods
Levenshtein distance
Damerau–Levenshtein distance
Jaro similarity
Jaro-Winkler similarity
Hamming distance
Word-based Methods
Sørensen-Dice index
Overlap similarity
Tversky index
Jaccard index
Text representation and similarity (i.e., distance) measures
A. Text representation
Corpus-based
Shallow Window-Based methods
Word-based representation
Word2Vec
FastText
GloVe
BERT
RoBERTa
XLNet
Sentence-based representation
SBERT
USE
T5
statistics or frequency-based methods
Bag of Words (BoW)
TF-IDF
BM25
Graph-based
Knowledge graph
Graph neural network (GNN)
Semantic-based
DSSM (Deep Structured Semantic Model
CDSSM (Convolutional Deep Semantic Model)
MatchPyramid
MV-LSTM (Multi-View Bi-LSTM)
B. Similarity or distance measures
Word mover's distance
Cosine distance
Euclidean distance
Hybrid similarity
Definition: It combines lexical and textual representations with distance measures to calculate text similarity