Recommendations

Basic genre

dimensionality reduction

Content-based

Collaborative filtering

Similarity

Properties of items

Similarity

Relationship between users and items

movie

actors

director

production year

genre

doocument

TF.IDF scores

TFij=fijmaxkfkj

\({IDF_w}={log_2({N/{n_i}})}\)

Top-N scores

similarity

Jaccard

\(\cos\)

Different scales

random hyperplanes

minhashing

image

tags

if users are willing to

boolean value

numerical value

movie rating

screen size

disk capacity

\(SIM(S,T)=\frac{{\mid}S{\cap}T{\mid}}{{\mid}S{\cup}T{\mid}}\)

\(d(x, y)=1-SIM(S,T)\)

Classification

decision tree

\( \frac {AB} {\begin{Vmatrix}{A}\end{Vmatrix} {\cdot} \begin{Vmatrix}{B}\end{Vmatrix}}= \frac {\displaystyle\sum_{i=1}^{n}{A_i}{B_i}} {\sqrt{\displaystyle\sum_{i=1}^{n}{{A_i}^2}} {\sqrt{\displaystyle\sum_{i=1}^{n}{{B_i}^2}}} } \)

k-shingle

The quick brown fox jumps over the lazy dog

  1. The quick brown fox jumps
  2. quick brown fox jumps over
  3. brown fox jumps over the
  4. fox jumps over the lazy
  5. jumps over the lazy dog

signature

character matrix

minhash

Probability that \(h(S_1)=h(S_2)\)
=
\(SIM({S_1},{S_2})\)

\(n\) random permutations of rows
→ \(n\) randomly chosen hash functions

locality-sensitive hashing

Subtract average

duality

clustering

UV-decomposition

root-mean square error

KNN

non-numeric

categorical value

missing value

Accuracy

precision

recall

hit ratio

Job recommendation

problem

Jobs are not categorized with rigor

What kind of data I have?