Recommendations
Basic genre
dimensionality reduction
Content-based
Collaborative filtering
Similarity
Properties of items
Similarity
Relationship between users and items
movie
actors
director
production year
genre
doocument
TF.IDF scores
TFij=fijmaxkfkj
\({IDF_w}={log_2({N/{n_i}})}\)
Top-N scores
similarity
Jaccard
\(\cos\)
Different scales
random hyperplanes
minhashing
image
tags
if users are willing to
boolean value
numerical value
movie rating
screen size
disk capacity
\(SIM(S,T)=\frac{{\mid}S{\cap}T{\mid}}{{\mid}S{\cup}T{\mid}}\)
\(d(x, y)=1-SIM(S,T)\)
Classification
decision tree
\( \frac {AB} {\begin{Vmatrix}{A}\end{Vmatrix} {\cdot} \begin{Vmatrix}{B}\end{Vmatrix}}= \frac {\displaystyle\sum_{i=1}^{n}{A_i}{B_i}} {\sqrt{\displaystyle\sum_{i=1}^{n}{{A_i}^2}} {\sqrt{\displaystyle\sum_{i=1}^{n}{{B_i}^2}}} } \)
k-shingle
The quick brown fox jumps over the lazy dog
- The quick brown fox jumps
- quick brown fox jumps over
- brown fox jumps over the
- fox jumps over the lazy
- jumps over the lazy dog
signature
character matrix
minhash
Probability that \(h(S_1)=h(S_2)\)
=
\(SIM({S_1},{S_2})\)
\(n\) random permutations of rows
→ \(n\) randomly chosen hash functions
locality-sensitive hashing
Subtract average
duality
clustering
UV-decomposition
root-mean square error
KNN
non-numeric
categorical value
missing value
Accuracy
precision
recall
hit ratio
Job recommendation
problem
Jobs are not categorized with rigor
What kind of data I have?