user-user or item-item similarity scores #907
-
In a previous recommendation model (implicit ALS) I use the following scores to compute user-user similarity scores (normalized dot product of user factor vectors). factor = self.model.user_factors[user_idx]
factors = self.model.user_factors
norms = self.model.user_norms
norm = norms[user_idx]
scores = factors.dot(factor) / (norm * norms)
similar_users = {}
for selected_user in selected_users:
# convert np.float32 to python number
similar_users[selected_user] = scores[self.inv_user_map[selected_user]].item() How would I go about creating a similar (fast) user-user comparison score from say a matrix factorization model (ie using self.model.regressor.steps['FMRegressor'].weights )? I was considering gathering the specified user's predictions for items they have explicitly scored, and then computing a normalized distance between that and other user predictions for those items. I didn't see anything built-in to support this use case, and wanted to make sure I didn't miss something obvious. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Thanks for the feedback @victusfate, this is an interesting use case! Indeed we don't support it right now. Latent factors are stored as dictionaries of numpy arrays in We can use the internal math module to perform a dot product between two users: Line 252 in 082d5eb But this would be slow for many comparisons compared to vectorization. Maybe you could convert the dictionary of numpy arrays (or a subset of it) to one numpy array and use vectorization if speed is important to you. |
Beta Was this translation helpful? Give feedback.
Thanks for the feedback @victusfate, this is an interesting use case! Indeed we don't support it right now. Latent factors are stored as dictionaries of numpy arrays in
facto.FMRegressor
, so latent vectors are accessed explicitly:model.regressor.steps['FMRegressor'].latents['Bob']
.We can use the internal math module to perform a dot product between two users:
river/river/utils/math.py
Line 252 in 082d5eb
But this would be slow for many comparisons compared to vectorization. Maybe you could convert the dictionary of numpy arrays (or a subset of it) to one numpy array and use vectorization if speed is important to you.