Option to add user and/or item features #159

martincousi · 2018-03-30T01:20:02Z

I started modifying the Dataset and Trainset classes to include the option of having user and/or item features since I later want to work on an algorithm that accepts these. I think I made good progress but I still need to figure out how to create a testset with these features.

PS. It appears that this branch also included other modifications that I did with respect to asymetric_measures and the cancellation of printing for the computation of baselines and similarities. Edit: I revert back the changes to accuracy.py and AlgoBase.

dumping is now done with pickle 'highest protocol'

added asym_rmse and asym_mae

NicolasHug · 2018-03-30T08:05:06Z

Thanks for the PR!

Coming up with a way to handle content-based feature is a lot of work. I gave it a quick try before, and found it not to be worth the hassle. I'm not saying it's not doable, it may even be quite easy if you're going for a very specifc solution, but integrating the features in the whole data pipeline (cross validation etc.) in a generic fashion can be tricky. It also require to make choice regarding the dataset loading, etc.

So I think the best way is probably for you to work on it on our own fork for now, and submit a complete PR once you think it's done (if you still want to). But even then, I can't promise we will be able to merge it: it will depend on how useful this addition can be and how well it integrates with the current codebase.

Would that work for you?

Thanks!
Nicolas

…aset Revert "Revert "Features dataset""

martincousi · 2018-04-05T21:42:10Z

@NicolasHug What do you think of my implementation? It appears that some tests need to be modified.

Lasso prediction algorithm

[GSF] Syncing Fork

NicolasHug · 2018-04-07T13:23:50Z

Thanks,

I really appreciate all the efforts on the clear coding and the good documentation :) !

I have only rapidly checked the code yet, but I'm wondering why you're passing the user / item features to the predict and estimate methods, and in the Prediction object? I may be wrong but it seems to me that the features should simply be stored in the Trainset object?

Could you brief me up a bit on what those user / item features actually are? Meaning, what kind of datasets need such features? What are some examples of algorithms that use those features? Any reference for the Lasso algorithm that you implemented? Are there publically available datasets that we could natively support?

For the tests: you'll need to add scikit-learn as a dependency in all the requirements.txt files (including Travis). That being said I'm not sure yet if adding scikit-learn as a depency would be a wise move -- if the Lasso algorithm you implemented is simply a regularized linear regression model, maybe it would be best to code it ourselves. I love scikit-learn but I'd like to minimize the dependencies as much as possbile.

Thanks!
Nicolas

martincousi · 2018-04-09T15:03:48Z

I am passing user/item features to the predict and estimate methods since it makes more sense (in my opinion) that the features are stored in the testset object. In this way, it is also possible to predict for a new user and/or item for which the features were not added to the Dataset and/or Trainset objects. For the Prediction object, it is not necessary to put the features in it. I just made it this way to simplify post-analysis of the predictions.

These features can consist of a variety of things. For example, user features could consist of demographic information (e.g., age, gender) or other elicited information (e.g., preferences for certain actors or movie genres). The item features can consist of attributes associated with the item (e.g., movie genre or studio) or expert information (e.g., expert rating). Most recommender systems do not have such information but for some applications it is possible to ask for this information or to scrape it off the web (e.g., expert information).

Algorithms can be designed to use only user or item features, or both user and item features. These can be hybrid algorithms [1] (i.e., mix of algorithms) or a specific algorithm. The lasso algorithm I have implemented is only a naïve / simplistic implementation of [2]. I did this just to test my implementation of the features add-on. I am now working towards an implementation of factorization machines [3] which is (in my opinion) a much better approach. I plan to implement factorization machines by importing the tffm library. This library appears more complete than polylearn, another library for factorization machines.

I am pretty sure that there most exist publicly available datasets that include such features but I am unaware of which ones and I don't have time to look at it for the moment. Maybe one of the Yahoo datasets?

As I said, I don't think the lasso algorithm should be necessarily implemented. I had removed it but it came back when I updated my master branch. I now removed it again. I am now currently working on an implementation of factorization machines on another branch. I will do a PR when ready.

[1] R. Burke, “Hybrid Recommender Systems: Survey and Experiments,” User Model. User-adapt. Interact., vol. 12, no. 4, pp. 331–370, 2002.
[2] A. Ansari, S. Essegaier, and R. Kohli, “Internet Recommendation Systems,” J. Mark. Res., vol. 37, no. 3, pp. 363–375, 2000.
[3] S. Rendle, “Factorization machines,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2010, pp. 995–1000.

NicolasHug · 2018-04-13T13:02:13Z

Thanks a lot for the update!

I am passing user/item features to the predict and estimate methods since it makes more sense (in my opinion) that the features are stored in the testset object. In this way, it is also possible to predict for a new user and/or item for which the features were not added to the Dataset and/or Trainset objects.

Yeah you're absolutely right I was looking at it the wrong way.

For the Prediction object, it is not necessary to put the features in it. I just made it this way to simplify post-analysis of the predictions.

Probably best to remove it from there, then.

I've been thinking about adding additional dependencies (scikit-learn, tffm or whatever) and I think it's OK as long as we keep them optional. E.g. if you implement FM with tffm, only those that want to use the FM model would need to install tffm, but it's still not a core dependency (concretely, we don't add tffm to requirements.txt).

martincousi · 2018-04-13T13:41:28Z

Ok, so I removed the features from the Prediction object. Do we want to merge this branch now or do we wait for at least an algorithm that supports features (i.e., should the additional algorithms be part of this PR or an additional one)?

Also, do we need to correct the test codes?

NicolasHug · 2018-04-13T13:48:29Z

Thanks,

I can merge this PR into a new feature branch if you want? And you can send more PRs to the new branch.

For the test: I suspect you'll have other failed tests when you implement the algorithm, so it's up to you. If you prefer to solve tests issues all at once I'm OK with that.

BTW, have you thought of a way to integrate the new changes with the cross validation iterators?

martincousi · 2018-04-13T13:52:39Z

The features option already works with the cross validation iterators (to my knowledge).

martincousi · 2018-04-13T14:24:54Z

@NicolasHug How can we enable tests on this base branch so that I can see which tests fail?

NicolasHug · 2018-04-13T14:28:48Z

We'd need to modify the.travis.yml file, but you should definitely run the tests locally before any commit anyway.

martincousi · 2018-04-13T14:30:23Z

What is the best way to run the tests locally without having to do python test_name.py for each test?

NicolasHug · 2018-04-13T14:31:55Z

Just run pytest at the root directory.

If you haven't already, check out the contributing guidelines

martincousi · 2018-04-13T15:23:32Z

I have corrected the tests so that they now work on my computer running pytest. Unfortunately, I don't have time to make new tests to check the features option.

NicolasHug · 2018-04-13T20:03:38Z

No worries, it can wait.
But I'm sure you understand I cannot merge anything that is not thoroughly tested, especially when it's such a big feature / improvement.

EDIT: I mean merging into the master branch for a future release. I don't mind merging untested code into a feature branch.

igorsantana · 2018-06-06T01:03:45Z

Hey, any updates on this? I've been following the conversation and would like to know if you guys have any plans to merge this. I am working with context-aware recommender systems and I'm re-writing my code from java to python (which I'm kinda newbie).

Is there a way to populate the Dataset with more info then just user item rating [timestamp]? I've searched through the docs and haven't found it.

Keep up with the nice work! 😊

martincousi · 2018-06-06T01:12:20Z

I have been working on other projects in the mean time but this branch should work without issues. However, I would recommend using my factorization-machines branch as it should contain the latest updates.

However, this branch takes into account user and item features, but not context. Also, by looking at the code, I don't think that the timestamp option is working. To add context with many variables, it would be easy to extend my code to add features on the user-item pairs. Then, you would need to extend the algorithms to take these features into account.

NicolasHug · 2018-06-06T10:03:23Z

@martincousi is 100% correct

No plan to merge this (or the other branch) unfortunately, because I don't have enough visibility on how well it would integrate with the current code base.

Paola123456 · 2019-07-31T09:26:35Z

Hi Martin and Nicolas,
Has the matrix factorisation algorithm (SVD or SVD++) with user-item features been implemented yet? If it has could you point me in the right direction? Many thanks.

martincousi · 2019-07-31T14:22:04Z

If you want to add user/item features to a factorization algorithm, you should take a look at factorization machines.

I have a working implementation at factorization_machines.py. Note that this the sample_weight branch of my fork. It is the most up-to-date and requires PyTorch in order to use the FM class.

To use this class, you first add your features using Dataset.load_features_df(). Then after building your Trainset, you case use the FM class with the options rating_lst=('userID', 'itemID') (similar to SVD) or rating_lst=('userID', 'itemID', 'imp_u_rating') (similar to SVD++). The labels of the features are provided through user_lst and item_lst.

martincousi added 16 commits March 26, 2018 16:45

added asym_rmse and asym_mae

8a3532f

Merge pull request #1 from NicolasHug/master

de2cd0c

dumping is now done with pickle 'highest protocol'

Merge pull request #2 from martincousi/asymetric-measures

6d18af6

added asym_rmse and asym_mae

disable print in AlgoBase.compute_baselines()

3f6b1d0

Cancel printing of computation of similarities

daab1ba

Cancel printing of similiraty computation

05ef072

add load_features_df() method

902246f

modified construct_trainset() and load_features_df()

fb64e98

modified Trainset.__init__()

13f3a28

corrected bugs in print statement

900c0c0

use user_features_nb to test if initialized

68ccfca

revert back changes to accuracy.py

f7fa4d8

revert back changes to AlgoBase

c6591ae

Update .gitignore

e31e857

Update .gitignore

7d67963

fixed python 2 compatibility

73bea50

martincousi added 13 commits April 4, 2018 09:51

construction of Lasso.fit()

4063da8

modified predict and estimate methods

34dd04b

include features in testset and prediction objects

d275f84

update matrix factorization estimate method

14d1248

adapt estimate methods for all prediction algorithms

a2b87c4

add sklearn arguments to Lasso

3c5f7e6

single underscore for dummy variable

7b82e78

update documentation for Lasso and change filename

bf335c2

correct conflict with master

e34a5f9

add interaction terms in Lasso

4081244

add interaction terms to Lasso.estimate

d3dd0dd

correct conflicts with master

47ff477

correct verbose conflicts in knns

1279424

Merge pull request #5 from martincousi/revert-4-revert-3-features-dat…

c52d707

…aset Revert "Revert "Features dataset""

martincousi added 4 commits April 5, 2018 17:45

add lasso

4fabe29

Merge pull request #6 from martincousi/lasso

e7adc87

Lasso prediction algorithm

Merge pull request #7 from NicolasHug/master

4fc4242

[GSF] Syncing Fork

Merge branch 'master' into features-dataset

5eaccce

martincousi added 3 commits April 9, 2018 10:59

Delete linear.py

7427b22

Update __init__.py

773bd24

Update __init__.py

bb0012c

remove features from Prediction object + typos

c76a51a

NicolasHug changed the base branch from master to user_item_features April 13, 2018 13:34

correct accuracy

c34a817

Correct tests

fec4d4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to add user and/or item features #159

Option to add user and/or item features #159

martincousi commented Mar 30, 2018 •

edited

Loading

NicolasHug commented Mar 30, 2018

martincousi commented Apr 5, 2018

NicolasHug commented Apr 7, 2018 •

edited

Loading

martincousi commented Apr 9, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018 •

edited

Loading

NicolasHug commented Apr 13, 2018 •

edited

Loading

martincousi commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018 •

edited

Loading

igorsantana commented Jun 6, 2018

martincousi commented Jun 6, 2018 •

edited

Loading

NicolasHug commented Jun 6, 2018

Paola123456 commented Jul 31, 2019

martincousi commented Jul 31, 2019

Option to add user and/or item features #159

Are you sure you want to change the base?

Option to add user and/or item features #159

Conversation

martincousi commented Mar 30, 2018 • edited Loading

NicolasHug commented Mar 30, 2018

martincousi commented Apr 5, 2018

NicolasHug commented Apr 7, 2018 • edited Loading

martincousi commented Apr 9, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018 • edited Loading

NicolasHug commented Apr 13, 2018 • edited Loading

martincousi commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018

martincousi commented Apr 13, 2018

NicolasHug commented Apr 13, 2018 • edited Loading

igorsantana commented Jun 6, 2018

martincousi commented Jun 6, 2018 • edited Loading

NicolasHug commented Jun 6, 2018

Paola123456 commented Jul 31, 2019

martincousi commented Jul 31, 2019

martincousi commented Mar 30, 2018 •

edited

Loading

NicolasHug commented Apr 7, 2018 •

edited

Loading

martincousi commented Apr 13, 2018 •

edited

Loading

NicolasHug commented Apr 13, 2018 •

edited

Loading

NicolasHug commented Apr 13, 2018 •

edited

Loading

martincousi commented Jun 6, 2018 •

edited

Loading