LooGLE v2: A novel real-world benchmark for long-dependency understanding
First, create a conda environment and install the required dependencies:
conda create -n loogle python=3.10
conda activate loogle
pip install vllm
Then, clone the benchmark repository:
git clone https://github.com/GraphPKU/LooGLE-v2.git
cd LooGLE-v2
You can download the benchmark dataset into the ./datasets
directory with the following command:
git clone https://huggingface.co/datasets/GraphPKU/LooGLE-v2 ./datasets/LooGLE-v2
We take Llama-3.1-8B-Instruct
as an example for inference.
First, launch the model server using vllm serve
:
vllm serve meta-llama/Llama-3.1-8B-Instruct \
--api-key GraphPKU \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.95 \
--max_model_len 131072 \
--trust-remote-code
Note:
--tensor-parallel-size
should be set to the number of available GPUs.
To run predictions on the benchmark using your model:
python predict.py \
--model Llama-3.1-8B-Instruct \
--data_dir ./datasets/LooGLE-v2
After inference is complete, run the evaluation script:
python eval/eval.py \
--input_path ./results/Llama-3.1-8B-Instruct.jsonl
This will compute accuracy and other metrics for the model's performance on LooGLE-v2.