ONCE: Boosting Content-based Recommendation with Open- and Closed-source LLMs

A. Ibrahimi, L. Becker, R. van Emmerik

This project aims to reproduce and add an extension to the paper "ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models" by Liu et al. (2024). Please request our report for detailed information on the implementation of our reproduction and extension of the ONCE architecture.

Installations

Requirements

Linux or Windows with Python = 3.11

Environment Setup

For DIRE and ONCE:

conda create --name once python=3.11 
conda activate once

pip install -r requirements_once.txt

Or for the GENRE implementation:

conda create --name once python=3.8 
conda activate genre

pip install -r requirements_genre.txt

Model and Data Download

Navigate to the Ekstra Bladet website to download the small and/or large EB-NeRD dataset. Move the data to your data to src/data/.

Run the following scripts to download the LLaMA 7b model and its embeddings to src/data/:

python src/scripts/scripts_dire/1_download_llama_vocab_model.py

python src/scripts/scripts_dire/2_download_llama_embeddings.py

Reproduction

GENRE Implementation: Prompting Closed-source LLMs for Content-based Recommendation

conda activate genre

Move to the GENRE repository:

cd src/lib/GENRE/

Move or duplicate the EB-NeRD data to this folder for GENRE implementation src/lib/GENRE/data/eb-nerd/eb-nerd-data/ebnerd_small. To convert the EB-NeRD articles to match the ONCE datatype, run:

python src/lib/GENRE/scripts_genre/articles_process.py

Overview

We use the GPT-3.5-turbo API provided by OpenAI as a closed-source LLM. All the request codes are available in this repository.

Codes and Corresponding Generated Data

Dataset	Schemes	Request Code	Generated Data
EB-NeRD	Content Summarizer	`news_summarizer.py`	`data/eb-nerd/eb-nerd-outputs/news_summarizer.log`
EB-NeRD	User Profiler Train	`user_profiler.py`	`data/eb-nerd/eb-nerd-outputs/user_profiler.log`
EB-NeRD	User Profiler Val	`user_profiler.py`	`data/eb-nerd/eb-nerd-outputs/user_profiler_val.log`

To update the datasets with the generated data, run these scripts:

python src/lib/GENRE/scripts_genre/update_articles.py
python src/lib/GENRE/scripts_genre/update_history.py

To obtain statistics of the augmented data, run the following script:

python src/lib/GENRE/scripts_genre/statistics_eb_nerd.py

DIRE Implementation: Finetuning Open-source LLMs for Content-based Recommendation

conda activate once

Tokenize the EB-NeRD data located in src/data/ with the basic tokenizer:

python src/scripts/scripts_dire/3_processor_dire.py
python src/scripts/scripts_dire/4_processor_dire_llama.py
python src/scripts/scripts_dire/5_ebnerd_fusion_dire.py

Move to the Legommenders repository and pre-train. Replace {yourModel} with the recommender model you would like to use out of 'fastformer', 'naml', or 'nrms'. You will need to update the config files to point to src/data/:

cd src/lib/Legommenders/

python worker.py --embed config/embed/llama-token.yaml --model config/model/llm/llama-naml.yaml \
    --exp config/exp/llama-split.yaml --data config/data/eb-nerd.yaml --version small --llm_ver 13b \
    --hidden_size 64 --layer 0 --lora 0  --fast_eval 0 --embed_hidden_size 5120 --page_size 32 --cuda -1

Training and testing with default parameters:

cd src/lib/Legommenders/

python  worker.py --data config/data/eb-nerd.yaml --embed config/embed/llama-token.yaml \
     --model config/model/llm/llama-fastformer.yaml --exp config/exp/tt-llm.yaml --embed_hidden_size 4096 \
     --llm_ver 7b --layer 31 --version small --lr 0.0001   --item_lr 0.00001 --batch_size 64 --acc_batch 1 --epoch_batch -4

ONCE

conda activate once

Tokenize the EB-NeRD data located in src/data/ with the tokenizer for the basic data and topics and region:

python src/scripts/scripts_dire/6_processor_once.py
python src/scripts/scripts_dire/7_processor_llama_once.py
python src/scripts/scripts_dire/8_ebnerd_fusion_once.py

Move to the Legommenders repository and pre-train. Replace {yourModel} with the recommender model you would like to use out of 'fastformer', 'naml', or 'nrms'. You will need to update the config files to point to src/data/:

cd src/lib/Legommenders/

python worker.py --embed config/embed/llama-token.yaml --model config/model/llm/llama-{yourModel}-once.yaml --exp config/exp/llama-split-once.yaml --data config/data/eb-nerd-once.yaml --version small --llm_ver 7b --hidden_size 64 --layer 0 --lora 0 --fast_eval 0 --embed_hidden_size 4096 --page_size 8

Training and testing with default parameters:

python worker.py  --data config/data/eb-nerd-once.yaml --embed config/embed/llama-token.yaml  --model config/model/llm/llama-{yourModel}-once.yaml --exp config/exp/tt-llm.yaml --embed_hidden_size 4096 --llm_ver 7b --layer 31 --version small --lr 0.0001 --item_lr 0.00001 --batch_size 32 --acc_batch 2 --epoch_batch -4

Extension

DIRE with Sentiment

conda activate once

Tokenize the EB-NeRD data located in src/data/ with the tokenizer for the basic data and sentiment:

python src/scripts/scripts_dire/9_processor_sentiment.py
python src/scripts/scripts_dire/10_processor_llama_sentiment.py
python src/scripts/scripts_dire/11_ebnerd_fusion-sentiment.py

Prepare script: open src/lib/Legommenders/model/inputer/llm_concat_inputer.py and uncomment line 38. If you are curious what is behind the list check /src/scripts/scripts_dire/0_get_col_prompts_additions.py.

Move to the Legommenders repository and pre-train. Replace {yourModel} with the recommender model you would like to use out of 'fastformer', 'naml', or 'nrms'. You will need to update the config files to point to src/data/:

cd src/lib/Legommenders/

python worker.py --embed config/embed/llama-token.yaml --model config/model/llm/llama-{yourModel}-sentiment.yaml --exp config/exp/llama-split-sentiment.yaml --data config/data/eb-nerd-sentiment.yaml --version small --llm_ver 7b --hidden_size 64 --layer 0 --lora 0 --fast_eval 0 --embed_hidden_size 4096 --page_size 8 --cuda -1

Training and testing with default parameters:

python worker.py  --data config/data/eb-nerd-sentiment.yaml --embed config/embed/llama-token.yaml  --model config/model/llm/llama-{yourModel}-sentiment.yaml --exp config/exp/tt-llm.yaml --embed_hidden_size 4096 --llm_ver 7b --layer 31 --version small --lr 0.0001 --item_lr 0.00001 --batch_size 32 --acc_batch 2 --epoch_batch -4

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src		src
README.md		README.md
requirements_genre.txt		requirements_genre.txt
requirements_once.txt		requirements_once.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONCE: Boosting Content-based Recommendation with Open- and Closed-source LLMs

A. Ibrahimi, L. Becker, R. van Emmerik

Installations

Requirements

Environment Setup

Model and Data Download

Reproduction

GENRE Implementation: Prompting Closed-source LLMs for Content-based Recommendation

Overview

Codes and Corresponding Generated Data

DIRE Implementation: Finetuning Open-source LLMs for Content-based Recommendation

ONCE

Extension

DIRE with Sentiment

About

Releases

Packages

Contributors 3

Languages

lisannbecker/ONCE-extension

Folders and files

Latest commit

History

Repository files navigation

ONCE: Boosting Content-based Recommendation with Open- and Closed-source LLMs

A. Ibrahimi, L. Becker, R. van Emmerik

Installations

Requirements

Environment Setup

Model and Data Download

Reproduction

GENRE Implementation: Prompting Closed-source LLMs for Content-based Recommendation

Overview

Codes and Corresponding Generated Data

DIRE Implementation: Finetuning Open-source LLMs for Content-based Recommendation

ONCE

Extension

DIRE with Sentiment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages