Project for the seminar "Event Processing" (M.A. Computational Linguistics at Heidelberg University), examining temporal reasoning of large language models (LLM). This study experiments with the TimeLlama model on the TRAM benchmark dataset.
Set up a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
Download the dataset:
bash download-data.sh
In case you are encountering access/login issues with Huggingface, you might have to be logged in (huggingface-cli login) and request llama access with the same email address as your Huggingface account.
To reproduce the experiments, run python src/run.py
or sbatch run.sh
on a GPU (insert your email address here).
To reproduce the analyses in analysis.ipynb, run the following to unzip the output file:
cd output
unzip outputs-nc.zip
rm outputs-nc.zip
- model: TimeLlama Paper GitHub HuggingFace
- dataset: TRAM Paper GitHub
Lydia Körber