Item-based recommendation
Bài đăng này đã không được cập nhật trong 7 năm
Last month, I introduced some basic concept of recommendation based on user ratings and provide the simple way to evaluate data then giving recommended item. In this post, I'll show to all of you how to transform from user-based to item-based recommendation. There are many terms and Ruby methods I have already presented in last post about recommendation, I recommend all of you should read it first before take a challenge below.
1. Transform to item based data
From the last post, we have known that the source data has the format:
RATINGS = {
'John': {
'Kong': 7.0,
'John Wick': 7.5,
'Logan': 6.0,
'Split': 5.5,
'Moana': 6.5,
'La La La Land': 8.0
},
'Lee': {
'Kong': 6.5,
'John Wick': 5.0,
'Logan': 4.5,
'Split': 4,
'La La La Land': 6.0,
'Moana': 7.0
},
...
This data was accumulated from reviews of individuals and from these numbers, however, if we want to change to item-based recommendations, the data need to be reformatted to
{
'movie_1': {
film_1_1: 1.0,
film_1_2: 2.0,
...
},
'movie_2': {
film_2_1: 1.0,
film_2_2: 2.0,
...
},
...
}
So the first thing we need to do is transform data to new format
def convert_to_items_based ratings
{}.tap do |items_ratings|
# Get all movies names and iterating whole movies names
ratings.values.map{|reviews| reviews.keys}.flatten.uniq.each do |movie|
items_ratings[movie] = {}
ratings.each do |user, rate|
user_rate = rate[movie]
items_ratings[movie][user] = user_rate unless user_rate.nil?
end
end
end
end
And now we have item based data to provide recommendation based on items
2. Get top similar items
As I presented in last post, to ranking the ratings, we have to employed methods to find out similarities between the items: Euclidean distance and Pearson correlation Although there're many differences between two method in approaches, theories and implementation, both two methods help us to score each item and based on scores we can find out the similarities between the items And from that, we can build an top items that suitable similar with one items
def top_matches data, target_item, n = 5
scores = data.map do |item, _|
next if target_item == item
#In this case, I use Pearson score
{}.tap do |item_rating|
item_rating[item] = pearson_correlation(data, target_item, item)
end
end.compact
#Sort the list to get the highest score
scores.sort_by{|item_rating| item_rating.values.first }.reverse.take(n)
end
This method returns the top n
items which has highest score returning from Euclidean distance and Pearson correlation. For now we need to calculate similar items for each movie
def calculate_similar_items ratings, n = 10
{}.tap do |similar_items|
item_ratings = convert_to_items_based ratings
item_ratings.each do |movie, ratings|
puts "[INFO]: #{movie.to_s} - #{ratings.values.length} ratings"
scores = top_matches(item_ratings, movie, n = n)
similar_items[movie] = scores
end
end
end
Now let try it
2.4.1 :005 > calculate_similar_items RATINGS
[INFO]: Kong - 5 ratings
[INFO]: John Wick - 7 ratings
[INFO]: Logan - 4 ratings
[INFO]: Split - 7 ratings
[INFO]: Moana - 6 ratings
[INFO]: La La La Land - 6 ratings
=> {:Kong=>[{:Moana=>0.8058229640253802}, {:Logan=>0.6546536707079758}, {:"John Wick"=>0.39929785312496224}, {:Split=>0.2795084971874737}, {:"La La La Land"=>0.0}], :"John Wick"=>[{:Logan=>0.5703518254720301}, {:Split=>0.5111815065740504}, {:Kong=>0.39929785312496224}, {:"La La La Land"=>0.16297339597886237}, {:Moana=>-0.5213601623400473}], :Logan=>[{:"La La La Land"=>0.9116377679037143}, {:Split=>0.7230210236376229}, {:Kong=>0.6546536707079758}, {:"John Wick"=>0.5703518254720301}, {:Moana=>-0.3503292361635921}], :Split=>[{:Logan=>0.7230210236376229}, {:"John Wick"=>0.5111815065740504}, {:"La La La Land"=>0.38822469593451137}, {:Kong=>0.2795084971874737}, {:Moana=>-0.09059377806311973}], :Moana=>[{:Kong=>0.8058229640253802}, {:Split=>-0.09059377806311973}, {:Logan=>-0.3503292361635921}, {:"John Wick"=>-0.5213601623400473}, {:"La La La Land"=>-0.574620465390228}], :"La La La Land"=>[{:Logan=>0.9116377679037143}, {:Split=>0.38822469593451137}, {:"John Wick"=>0.16297339597886237}, {:Kong=>0.0}, {:Moana=>-0.574620465390228}]}
As you can see from code and screen, I have to show the logs when calculating each items because sometimes, with large dataset, calculating takes much more time than expectation. As explained in last post, the similar score of each item start from -1 to 1 because I use Pearson correlation to calculate score. As more close to 1, as more similar to item and in other hand, more close to -1, that item's more difference to target item. Because now we get 10 most similar items and the data-set has just a limited information so you can see the result above has some movies get negative score. We can prevent it by get only positive values. In real systems, if we can maintain a large data-set, the similar scores between items will be more stable
3. Building item based recommendations
Now we're ready to give recommendations based on similarity scores but in some case, each person has their own taste and we need to add personalities to recommendation. Because as you know, the similar scores that we calculate above's stable for every user so we need to mix similarity scores with personal review to provide the recommending item which is specialize for each user. On the other hand, when user visit item on web page, they expect to get suggestions for items that they've never seen before, so recommended items should be different to browsing history items.
The most easiest way to make it is using multiply operator to mixing similar ties scores and their previous ratings. The table below will show how does it work for user Jack
Movie | Rating | Kong | x.Kong | Logan | x.Logan | La La La Land | x.La La La Land |
---|---|---|---|---|---|---|---|
John Wick | 9.0 | 0.3993 | 3.5937 | 0.5704 | 5.1336 | 0.16297 | 1.46673 |
Moana | 4.0 | 0.8058 | 3.2232 | -0.3053 | -1.2212 | -0.57462 | -2.29848 |
Split | 8.0 | 0.2795 | 2.236 | 0.7230 | 5.784 | 0.38822 | 3.10576 |
Total | 1.4846 | 9.0529 | 0.9881 | 9.6964 | -0.02343 | 2.27401 | |
Normalized | 6.09787 | 9.8132 | -97.05548 |
Based on that we got a method to provide item based recommendation
def item_based_recommendation ratings, user
user_ratings = ratings[user]
scores = {}
total_sim = {}
similar_items = calculate_similar_items ratings
user_ratings.each do |movie, rating|
similar_items[movie].each do |sim_movie|
#ignore item has already had review
sim_movie_name = sim_movie.keys.first
sim_movie_score = sim_movie.values.first
next unless user_ratings[sim_movie_name].nil?
scores[sim_movie_name] = 0 if scores[sim_movie_name].nil?
scores[sim_movie_name] += sim_movie_score * rating
total_sim[sim_movie_name] = 0 if total_sim[sim_movie_name].nil?
total_sim[sim_movie_name] += sim_movie_score
end
end
rankings = scores.map do |item, score|
{}.tap{|rec| rec[item] = score/total_sim[item]}
end
rankings.sort_by{|rank| rank.values.first }.reverse
end
4. User based vs item based recommendations
In comparison, item-based recommendation is significantly faster than user-based when getting list of recommendation of large data-set, however we need to maintain data regularly. Also there is a difference in accuracy that depends on how "sparse" the data-set is. For example, if user gives rating to every movie, the data-set is dense (not "sparse"), on the other hand, each user provide just few ratings which will create a sparse data-set. Item-based filtering usually outperform user-based filtering in sparse data-set. Having said that, user-based filter is simpler to implement and doesn't have extra steps so it's suitable for smaller data-set system. However, showing people has same interesting on same-thing is quite strange on shopping website but for sharing link or music may be a good choice. Finally, all the things I shared is just a reference, if you found that your recommendation system works much better than my idea, that's no problem because we build recommendation system to reduce the gap between users and our system. So, let contact and sharing with me!
All rights reserved