Skip to content

GraphPKU/LooGLE-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LooGLE-v2

LooGLE v2: A novel real-world benchmark for long-dependency understanding

Evaluation

First, create a conda environment and install the required dependencies:

conda create -n loogle python=3.10
conda activate loogle
pip install vllm

Then, clone the benchmark repository:

git clone https://github.com/GraphPKU/LooGLE-v2.git
cd LooGLE-v2

Download the Dataset

You can download the benchmark dataset into the ./datasets directory with the following command:

git clone https://huggingface.co/datasets/GraphPKU/LooGLE-v2 ./datasets/LooGLE-v2

Example: Evaluation with Llama-3.1-8B-Instruct

We take Llama-3.1-8B-Instruct as an example for inference.
First, launch the model server using vllm serve:

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --api-key GraphPKU \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.95 \
  --max_model_len 131072 \
  --trust-remote-code

Note: --tensor-parallel-size should be set to the number of available GPUs.

Prediction

To run predictions on the benchmark using your model:

python predict.py \
  --model Llama-3.1-8B-Instruct \
  --data_dir ./datasets/LooGLE-v2

Evaluation

After inference is complete, run the evaluation script:

python eval/eval.py \
  --input_path ./results/Llama-3.1-8B-Instruct.jsonl

This will compute accuracy and other metrics for the model's performance on LooGLE-v2.

About

LooGLE v2: A novel real-world benchmark for long dependency understanding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages