-
-
Notifications
You must be signed in to change notification settings - Fork 555
possible issue with predict_one for recommendations yielding all the same predicted score #826
-
This could definitely be an error on my part in following the docs incorrectly. But I thought it was worth reaching out in case others have run into a similar issue. Model: Matrix Factorization Recommender constants.MIN_PLACE_SCORE = -1
constants.MAX_PLACE_SCORE = 3
# https://riverml.xyz/latest/examples/matrix-factorization-for-recommender-systems-part-2/
# yhat(x) = w0 + sumj=1->p(wjxj) + sumj=1->p(sumj'=j+1->p(<vj,vj'>xjxj'))
# <vj,vj'> = sumf=1->k(vj,f dot vj',f) (dot product of the latent factor vectors)
def fm_model():
fm_params = {
'n_factors': 10,
'weight_optimizer': optim.SGD(0.025),
'latent_optimizer': optim.SGD(0.05),
'sample_normalization': False,
'l1_weight': 0.,
'l2_weight': 0.,
'l1_latent': 0.,
'l2_latent': 0.,
'intercept': 1, # mean of scoring
'intercept_lr': .01,
'weight_initializer': optim.initializers.Zeros(),
'latent_initializer': optim.initializers.Normal(mu=0., sigma=0.1, seed=73),
}
regressor = compose.Select('user', 'item')
regressor |= facto.FMRegressor(**fm_params)
model = meta.PredClipper(
regressor=regressor,
y_min=constants.MIN_PLACE_SCORE,
y_max=constants.MAX_PLACE_SCORE
)
return model train on data set def learn(self,dataset):
for x, y in dataset:
y_pred = self.model.predict_one(x) # make a prediction
self.metric = self.metric.update(y, y_pred) # update the metric
self.model = self.model.learn_one(x, y) # make the model learn iterative updates: for update in message_data:
user_id = update['user_id']
item_id = update['item_id']
rating = update['rating']
dataset.append(({'user': user_id,'item': item_id},rating))
for x, y in dataset:
y_pred = self.model.predict_one(x) # make a prediction
self.metric = self.metric.update(y, y_pred) # update the metric
self.model = self.model.learn_one(x, y) # make the model learn predicted scores (randomly pick N items) for a given user: weights = self.model.regressor.steps['FMRegressor'].weights
keys = []
for(k,v) in weights.items():
akey = k.split('_')
wtype = akey[0]
item = akey[1]
if wtype == 'item':
keys.append(item)
random_items = random.sample(keys,n)
scores = {}
for item_id in random_items:
scores[item_id] = self.model.predict_one({'user': user_id,'item': item_id}) scores sometimes converge on a given value, ie all 2.0 { 'item0': 2.0, 'item1': 2.0 ... } I'm working on trying to reproduce this now. After full training the predictions look ok. But I have tested it later on after receiving iterative updated values, where all my personal scores for a random selection of items is all 2.0 (note none of my training input scores are 2.0 which makes this particularly interesting), perhaps related to Max and Min scores? |
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments · 20 replies
-
Ping @gbolmier, can I let you take a look? 🙏 |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
If you manage to reproduce the issue or if you persisted your model could you check the weights of the concerned users/items to see if the problem comes from the learning part? Checking the data that is passed to the model would help too. I guess you might not be able to publicly share the data but it would be nice if you could at least reproduce the results on your side. Hope it helps, cheers! |
Beta Was this translation helpful? Give feedback.
All reactions
-
quick update I have some scoring logged from continuous updates overnight yesterday and I can see the predicted scores rising vs time
|
Beta Was this translation helpful? Give feedback.
All reactions
-
And these do not correspond to repeated 3.0 scores for this event, so it's as if the model is not mean normalizing over time (issues with the underlying metrics model?) Here are the test weights associated with that event/item (the weight for this event appears stable)
|
Beta Was this translation helpful? Give feedback.
All reactions
-
when I first load the model I'm seeing an average prediction of 1.44 per item for my user_id, I'll try and explore this vs time early next week
|
Beta Was this translation helpful? Give feedback.
All reactions
-
If the variance of the ratings is low then it is normal to have low variance predictions. One way to check if the model is learning correctly from the received data is to monitor the metric.
In our case the FM model is composed of different set of weights (cf. FMRegressor doc):
So in the case the prediction for a specific user/item pair changes drastically over time, this could be explained by:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @gbolmier I'll keep that in mind (and reread that doc ref) The bug on my side is not being able to recreate it precisely on a fresh start + delta training from recent redis ratings since the last training snapshot. I should get the same predictions from the live model (snapshot load + continuous updates via redis pubsub) vs a new instance (most recent snapshot/hourly training + delta ratings training from redis ordered set). Once I nail that down, I'm hoping that will reveal the issue on the data side. Some good news, looks like the scores "converged" over the weekend, no more drift upwards for this event
|
Beta Was this translation helpful? Give feedback.
PredClipper
just floors and ceils the predicted value so I don't think this relates to it as the predicted value lies in between the upper and lower limits you set. It won't harm to check if they_min
andy_max
attributes of the model'sPredClipper
object are still set to-1
and3
.If you manage to reproduce the issue or if you persisted your model could you check the weights of the concerned users/items to see if the problem comes from the learning part?
Checking the data that is passed to the model would help too. I guess you might not be able to publicly share the data but it would be nice if you could at least reproduce the results on your side.
Hope it helps, cheers!