Clone this repository first, and download the data from here, uncompress data.tar to the folder data/
If you want to gain access to the data, please contact me via [email protected] for the password.
The file structure should be like this:
├── data
│ ├── old_data
│ │ ├── MIND_small
│ │ ├── MIND_large
│ │ ├── hm
│ │ └── bilibili
│ ├── setup_scripts
│ ├── ...
├── ...
Then run the following command to setup the environment and the data:
conda create -n plmrs python=3.8
conda activate plmrs
wget https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp38-cp38-linux_x86_64.whl
pip install torch-1.12.1+cu113-cp38-cp38-linux_x86_64.whl
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
cd data/setup_scripts
python bilibili.py
python hm.py
python MIND_large.py
python MIND_small.py
cd ../../Then you can run the following command to train the model:
python run.py --input_type "text" --plm_name "facebook/opt-125m" --dataset "MIND_large"if you encounter the following error:
RuntimeError: torch_shm_manager at "/opt/anaconda3/envs/plmrs/lib/python3.8/site-packages/torch/bin/torch_shm_manager": could not generate a random directory for manager socketProbably the reason is that you are using a shared server, and the shared server has a limit on hard drive space. You can try to delete cache files and try again.
cd ~/.cache/huggingface/hub
rm tmp*Or setting --num_workers to 0 and --pre_inference_num_workers to 0 if using pre-inference.
-
For super large model like OPT13B or larger, we split the model into layers and infer the embs layer after layer. It could save GPU memory when only a few layers on top are needed to be fine-tuned.
-
Although we can store the pre-inferenced embs as an non-trainale embedding layer inside the recommender model, it still takes GPU memory. So we store them as a pt file and load them as a tensor in dataloader when needed. This slow down the training process compared with store as a non-trainable embedding layer, but save more GPU memory. Take MIND_small as an example, the number of items is 52771, if we padding or truncate the item decription sequence to a fixed length 30, the size of item description matrix in float32 is 52771 * 30 * 768 * 4 Bytes = 4.9GB, which is too large to be loaded into GPU memory.
- When using BCE los, valid and test should access all the items, check Accessing DataLoaders within LightningModule.
--input_typecan betextorid--datasetcan beMIND_largeorMIND_small--max_epochsis the maximum number of epochs--early_stop_patienceis the number of epochs to wait before early stopping--batch_sizeis the batch size--num_workersis the number of workers for data loading--devicesis the accelerators to use, should be specify as a list of integers: "0 1 2 3" when using multiple accelerators--acceleratoris the accelerator to use, default isgpu--precisionis the precision to use, default is32--min_item_seq_lenis the minimum length of item sequence after preprocessing--max_item_seq_lenis the maximum length of item sequence after preprocessing--strategyis the distributed training strategy, can benone,deepspeed_stage_2,deepspeed_stage_3,deepspeed_stage_2_offload,deepspeed_stage_3_offload,fsdp_offload. if it isnone, then use single GPU training or multi-GPU training withddpaccelerator
--sasrec_seq_lenis the length of item sequence for SASRec--weight_decayis the weight decay for the whole model--lris the learning rate for SASRec--sasrec_hidden_sizeis the hidden size of SASRec--sasrec_inner_sizeis the inner feedforward size of SASRec--sasrec_n_layersis the number of encoder layers of SASRec--sasrec_n_headsis the number of heads of attention in SASRec--sasrec_layer_norm_epsis the epsilon of layer normalization in SASRec--sasrec_hidden_dropoutis the dropout rate of hidden states in SASRec--sasrec_attention_dropoutis the dropout rate of attention weights in SASRec--sasrec_initializer_rangeis the initializer range of linear layers in SASRec--topk_listis the list of topk for evaluation metrics
--tokenized_lenis the length of tokenized sequence for PLM--plm_namecan befacebook/opt-125mtofacebook/opt-66borbert-base-uncasedtobert-large-uncased--plm_last_n_unfreezeis the number of layers to be unfrozen, default is 0, which means all layers are frozen. However, if you want to use all layers of the pretrained model, you should set it to -1, rather than pretrain model'snum_hidden_layers, which only means fine-tune all decoders or encoders, but still freeze the embedding. In the pre-inference stage, the unfrozen layers are not used.--plm_lris the learning rate for PLM when fine-tuning--plm_lr_layer_decayis the learning rate decay for each layer of PLM when fine-tuning--projection_n_layersis the number of projection layers which connect PLM and SASRec--projection_inner_sizesis the inner size of projection layers which connect PLM and SASRec, should be a list of integers and the length should be equal toprojection_n_layers- 2, because the first and last layer are set to be PLM's hidden size and SASRec's hidden size respectively.--pooling_methodcan bemean,lastormean_last(fusion of mean and last) for OPT model, ormeanorclsfor BERT model
--use_promptcan beTrueorFalse--prompt_projectioncan beTrueorFalse--prompt_hidden_sizecan be the hidden size of prompt--pre_seq_lenis the length of deep prefix prompt--post_seq_lenis the length of deep suffix prompt, only used when model is OPT--last_query_lenis the length of last shallow prompt, only used when model is OPT
--pre_inferencecan beTrueorFalse, if it isTrue, then usepre_inference_batch_size,pre_inference_devicesandpre_inference_precisionto do inference before training using the frozen part of PLM model--pre_inference_batch_sizeis the batch size of inference--pre_inference_devicesis the devices of inference--pre_inference_precisionis the precision of inference--pre_inference_num_workersis the number of workers for data loading of inference--pre_inference_layer_wisecan beTrueorFalse, if it isTrue, then do inference layer by layer, otherwise do inference for the whole model
Use command like following:
python datamodules/preinference.py \
--dataset "MIND_small" \
--plm_name "facebook/opt-125m" \
--sasrec_seq_len 20 \
--tokenized_len 30 \
--min_item_seq_len 5 \
--max_item_seq_len None \
--pre_inference_devices "0 1 2 3 4 5 6 7" \
--pre_inference_precision 32 \
--pre_inference_batch_size 1 \
--pre_inference_num_workers 4 \
--pre_inference_layer_wise True Args of preinference.py:
--datasetcan beMIND_large,MIND_small,hmorbilibili--plm_namecan befacebook/opt-125mtofacebook/opt-66borbert-base-uncasedtobert-large-uncased--sasrec_seq_lenis the length of item sequence for SASRec--tokenized_lenis the length of tokenized sequence for PLM--min_item_seq_lenis the minimum length of item sequence after preprocessing--max_item_seq_lenis the maximum length of item sequence after preprocessing--pre_inference_devicesis the devices of inference--pre_inference_precisionis the precision of inference--pre_inference_batch_sizeis the batch size of inference--pre_inference_num_workersis the number of workers for data loading of inference--pre_inference_layer_wisecan beTrueorFalse, if it isTrue, then do inference layer by layer, otherwise do inference for the whole model--plm_last_n_unfreezeis the number of layers to be unfrozen, default is 0, which means all layers are frozen. However, if you want to use all layers of the pretrained model, you should set it to -1, rather than pretrain model'snum_hidden_layers, which only means fine-tune all decoders or encoders, but still freeze the embedding. In the pre-inference stage, the unfrozen layers are not used.