Please enable JavaScript.
Coggle requires JavaScript to display documents.
Collaborative recommendation (Discussion and summary (CF is the best…
Collaborative recommendation
Main idea :bulb:
exploit
the opinions of an existing user community
information about the past behavior
predicting which items the current user of the system will most probably like or be interested in :checkered_flag:
pure
input
a matrix of given user–item ratings
output
prediction indicating to what degree
the current user will like or dislike a certain item
a list of n recommended items
(Such a top-N list should, of course,
not contain
items that
the current user has already bought)
approachs
user-based nearest neighbor recommendation
:house_buildings:
Overview
main idea :bulb:
input
identify other users (sometimes referred to as
peer users
or
nearest neighbors
) that had similar preferences to those of the active user in the past
output
given a ratings database
and the ID of the current (active) user as an input
assumptions
:lock:
user preferences remain stable
and consistent over time
:star:
if users had similar tastes in the past
they will have similar tastes in the future :star:
problem in real-world
:frowning_face:
computational complexity
In real-world applications, rating databases are much larger and
can comprise thousands or even millions of users and items
rating matrix is
typically very sparse,
every user will rate only a very small subset
of the available items
unclear
new users
new items
Better similarity and weighting metrics
metrics to measure the similarity among users
Pearson’s correlation coefficient :star: ( empirical analyses )
adjusted cosine similarity
Spearman’s rank correlation coefficient
mean squared difference
Consider
a more controversial item
vs
a generally liked item
:checkered_flag:
a transformation function
to the item ratings, which
reduces the relative importance
of the agreement on universally liked items
increases the influence of items that have a high variance in the
ratings
information retrieval field
(inverse user frequency)
variance weighting factor
only a very
few items that been rated
and
they in common are a bad choice
and lead to poor predictions
significance weighting
case amplification
multiplying the original weights by a constant factor ρ
values close to +1 and −1
Neighborhood selection (reducing the size of the neighborhood)
define a specific
minimum threshold of user similarity
too high
no predictions can be made
the size of the neighborhood will be very small
too low
the neighborhood sizes are not
significantly reduced
limit the size to a fixed number and to take only the k nearest neighbors into account
too high
too many
neighbors with limited similarity bring additional “noise” into the predictions
too small
the quality of the predictions may be negatively affected
An analysis of the MovieLens dataset indicates that “in most real-world situations, a neighborhood of 20 to 50 neighbors seems reasonable” :checkered_flag:
Item-based nearest neighbor recommendation
The cosine similarity measure
the similarity between two n-dimensional vectors based on
the angle between them
The possible similarity values are between 0 and 1
where values near to 1 indicate a strong similarity
The basic cosine measure does not take
the differences in the average rating behavior
of the users into account
adjusted cosine measure
in the user-based approach, the size of the considered neighborhood is
typically also
limited to a specific size
not all neighbors are taken into account for the prediction
Preprocessing data for item-based filtering
for large scale e-commerce sites without sacrificing recommendation accuracy
construct in advance the item similarity matrix
building the
weighted sum of u’s ratings for these items in the neighborhood
At run time, a prediction for a product p and user
u is determining the items that are most similar to i
For realistic scenarios
complexity is much lower because most of the customers have rated or bought
only a very small number of items
when the number of customers M is around several million, the calculation of predictions in real time is still infeasible, given the short response times that must be obeyed in the online environment.
Overview
Why
scan a vast number of potential neighbors makes it
impossible to compute predictions in real time
large e-commerce sites
main idea :bulb:
using the
similarity between items
example
Item5 (3, 5, 4, 1) are
similar to the ratings of Item1 (3, 4, 3, 1)
partial similarity with Item4 (3, 3, 5, 2)
Alice gave a “5” to Item1 and a
“4” to Item4
predict a rating for Item5 somewhere between 4 and 5 :star:
About ratings
Implicit and explicit ratings
Explicit Rating
Some aspects of the
usage of different rating scales
how the quality of recommendation
changes when the granularity is increased
how the users’ rating behavior changes
when different scales must be used
a
five-point
rating scale may be
too narrow
for users to
express their opinions
a
ten-point
scale was
better accepted
in the movie domain
The main problems with explicit ratings
ratings require additional efforts from the users
users might not
be willing to provide such ratings as long as the value cannot be easily seen
the number of available ratings could be too small
which in turn results
in poor recommendation quality
problems solving
WEB 2.0 - the role of online communities has
changed and users are more willing to contribute actively
the development of techniques and measures that can be used to
persuade the online user to provide more ratings is required
Implicit ratings
typically collected by the web shop or application in
which the recommender system is embedded
how
interpret
which behavior as a positive rating
the user retrieves a page with detailed item information
remains at this page for a longer period of time
customer buys an item
The system could also monitor the user’s browsing behavior
whether the user
behavior is correctly interpreted
A user might not like all the books he or she
has bought
the user also might have bought a book for someone else
if a sufficient number of ratings is available
these particular cases will be factored out by the high number of cases
in which the interpretation of the behavior was right
in some domains (personalized online radio stations)
collecting the implicit feedback can even
result in more accurate user models than can be done with explicit ratings
Data sparsity and the cold-start problem
compute good predictions when few ratings available
Solution
exploit additional information about the users
classify the user
such as gender, age, education, interests, or other
Question
combine the different classifiers arise
acquire
the additional information
technical :computer:
spreading activation
idea :bulb:
3 more items...
disadvantage :frowning_face:
2 more items...
Default voting
Applied
3 more items...
When this number is very
small
idea :bulb:
4 more items...
two different similarity types to improve the prediction accuracy
(“similar item ratings made by similar
users”
the prediction accuracy increases
New User
ask the user for a minimum number of ratings before the service can be used
A similar strategy of asking the user for a gauge set of ratings
Further model-based and preprocessing-based
approaches
Matrix factorization/latent factor models
technical
latent semantic analysis (LSA)
as
latent semantic indexing (LSI)
makes it possible to retrieve relevant documents even if it does not contain (many) words of the user’s query
dimensionality reduction
singular value decomposition (SVD)
idea :bulb:
input:
a large matrix of document vectors
output
: a smaller-rank approximation in which highly correlated and co-occurring terms are captured in a single factor
example
a given
matrix M can be decomposed into a product of three matrices
U and V are called left and right singular vectors
are called the singular values.
Principal component analysis – Eigentaste
using
principal component analysis (PCA)
(offline)
input: the original rating data
output
users are grouped into clusters of neighbors
the mean rating for the items is calculated
filter out the “most
important” aspects of the data that account for most of the variance
At run
time
new users are asked to rate a set of jokes (gauge set) on a numerical scale
the correct cluster is determined
a look-up in the preprocessed data
the computational
complexity at run time is independent of the number of users
derive a set of latent (hidden) factors from the rating patterns and characterize both users and items by such vectors of factors
movie domain
automatically identified factors can correspond to obvious
aspects of a movie such as the genre or the type (drama or action)
not work well when there are synonyms such as “car” and “automobile” and
polysemous words such as “chip” or “model” in the documents or the query
Discussion
the dimensionality reduction
technique
filtered out some “noise” in the data
detecting nontrivial correlations in the data
depend on the number of singular values to keep in an
SVD approach
based on experiments in a certain domain
newly arriving ratings without recomputing
an approach based on coclustering for building scalable CF recommenders
support the dynamic
update of the rating database
more elaborate and specialized methods
probabilistic LSA (pLSA)
discover the (otherwise hidden)
user communities and interest patterns in the ratings database
showed that
good accuracy levels can be achieved based on that method
based not on linear algebra but rather on statistics
and represents a “more principled approach which has a solid foundation in statistics”
recent and advanced topics
how additional information, such as demographic data, can be incorporated;
how temporal aspects, such as changing user preferences, can be dealt with; or
how existing rating bias can be taken into account
Association rule mining
Basic
meaning
example rule
“If a customer purchases
baby food
then he or she also buys diapers in 70 percent of the cases
used to identify rule like relationship patterns in large-scale sales transactions
the detection of pairs or groups of products in a supermarket that
are often purchased together.
general problem
A (sales) transaction T is a set of products that have been purchased together
T = {p1,...,pm}
. Association rules are often written in the form
X ⇒ Y , with X and Y being both subsets of P and X ∩ Y = ∅
The standard measures
support
confidence
example
the support value for this rule (without taking Alice’s ratings into
account) is 2/4; confidence is 2/2
At run time
Determine the set of X ⇒ Y association rules that are relevant for Alice :one:
Alice has bought Item1
Compute the union of items :two:
appearing in the consequent Y
Sort the products :three:
according to the confidence of the rule
Return the first N elements of this ordered list as a recommendation :four:
Advance
Movie domain
the same mechanism
can be used to detect like and dislike relationships between users
“90 percent of the articles liked by user A and user B are also liked by user C.
the task of detecting the recommendation rules
takes the particularities of the domain into account
searches only for rules that have a certain target item (user or article)
used to
improves the
algorithm’s efficiency
allows for the detection of rules for infrequently bought items,
which could be filtered out in a global search because of their
limited support value
can be parameterized with
lower and upper bounds on the number of rules it should try to identify
If this overall item score surpasses a defined threshold value, the item will
be recommended to the target user
“If User1 likes an item, and User2 dislikes
the item, Alice (the target user) will like the item.”
item (article) associations
an additional cutoff parameter
reduces the computational complexity
detection of rules for articles that have only a very few ratings
a mixed strategy
switches to article associations whenever the support values of the
user association rules are below a defined threshold
the recommendation of interesting pages for web
users
technical
aim to automatically store the navigation behavior of many users in
a central repository and then to learn which users are similar with respect to
their interests
rely on
web usage data
association rule mining as core mechanisms
to predict the relevance of web pages in the context of adaptive user interfaces and web page recommendations
Probabilistic recommendation approaches
a classification problem
a function has
to be developed that defines – based
classification model
Bayes classifiers
Expression
P(Item5 = 1|X) = 2/4 × 0.125 = 0.0625
P(X|Item5=1) =
P(Item1=1|Item5=1) × P(Item2=3|Item5=1)
× P(Item3=3|Item5=1) × P(Item4=2|Item5=1)
= 2/2 × 1/2 × 1/2 × 1/2
= 0.125
sparse training sets :disappointed:
use a preprocessed
rating database
use “like” and “dislike” ratings
assign default ratings
only to missing values
More advanced probabilistic techniques are thus required
the idea of grouping similar users (or items) into clusters
At run time
base on
2 more items...
a prediction for a certain user u and an item i can be made
idea
take a small number of
discrete values that correspond to the clusters of the user base
2 more items...
The problem
determine the parameters for a model
1 more item...
estimate a good number of clusters to use
1 more item...
The performance
2 more items...
modelbased and memory-based recommendations in a probabilistic framework
active learning approach (ask user for some rating)
reduce the computational complexity
n how the process of generating and updating the profile space
comparison
slightly outperforms the other algorithms in some test
domains
exhibits the best overall performance
in the popular
movie domain
performs significantly worse than a
user-based approach
default voting
inverse user frequency
case amplification
individual noise points
no strong trend of overfitting the model
also be used when the data
are incomplete
Overview
memory-based or model-based
the original rating database is held in memory and
used directly for generating the recommendations
the raw data are first processed offline
described for itembased filtering or some dimensionality reduction techniques
At run time, only the precomputed or “learned” model is required to make predictions
Problem
theoretically more precise because full data
are available for generating recommendations
such systems face problems of scalability
Recent practical approaches and systems
Slope One predictors
advantage
efficient run-time queries
support on-the-fly data updates
enhancing “like and dislike” patterns in the rating database
idea
two groups
group
containing the items that were liked by both users
one group containing
items both users disliked
problem
sparse ratings databases
predicts nothing from the fact that user A
likes item K and user B dislikes the same item K
real world
70 percent of the ratings are above the theoretical average of 3
indicate
users either (a) tend to provide ratings for items that they like
simply have a tendency to give rather high ratings and interpret a value of 3 to be a rather poor rating value
set to the average rating value of a user instead of
using an overall threshold
The Google News personalization engine
personalized Approach
advantage
implementing a large-scale recommender system
the section and sheds light on practical aspects
Idea
based on (as a positive rating)
the click history of the active
user
the history of the larger community
challenge
generate the list in real time
frequent changes in the “item
catalog”
quickly become out of date
technical
probabilistic latent semantic indexing (PLSI)
identify
clusters of like-minded users and related items
every user belongs to exactly one cluster
users
may have interests in various topics in parallel
new users or
items appear
not sufficient to retrain the network
happen far too often in this domain
enhance
A recommendation score
based on clustermembership probabilities and per-cluster statistics of the number of clicks for
each story
in the interval [0 ... 1]
“co-visits”
an article has
been visited by the same user within a defined period of time
MinHash
putting two users in the same cluster
based on the overlap of items that both users have viewed
Compare
recommendation lists were generated that interleaved items from one algorithm with the other
the
personalized approach did significantly better (around 38%)
except for the notso-frequent situations in which there were extraordinarily popular stories
nonpersonalized approach
articles were ranked according to their
recent popularity
Discussion and summary
CF is the best-researched technique
the first
recommender systems
most of today’s most successful online recommenders
Developement History
Early systems were built using memorybased neighborhood and correlation-based algorithms
Later, more complex
and model-based approaches
machine learning
information retrieval
data mining
algebraic methods such as SVD
more complex probabilistic models
lead to very accurate predictions
real-world benchmark
problems
the data to be analyzed
more knowledge is available than just the simple rating matrix
domain knowledge
test databases
movies and books
limitations
can not be applied in every domain
which the
system needs a more detailed picture of the users’ preferences
a domain in which no buying history exists
there are not enough users or ratings available
Enhance
additional knowledge
companies exchange item information
increasingly adopt exchange standards including defined product classification schemes
online
users share more and more information about themselves
hybrid recommendation
approaches