Skip to content

Code for the paper Prediction-Powered Ranking of Large Language Models, Arxiv 2024.

License

Notifications You must be signed in to change notification settings

Networks-Learning/prediction-powered-ranking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction-Powered Ranking

This repository contains the code for the paper Prediction-Powered Ranking of Large Language Models.

Dependencies

All the code is written in Python 3.11.2
In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Usage

python3 scripts/llm-ranking.py <file_config>

where

  • file_config : json file of configuration parameters.

Configuration parameters:

  • seed: Seed used for random sampling.
  • iterations: number of times each experiment is run
  • human_file: dataset containing pairwise comparisons by humans
  • llm_files: list of datasets containing pairwise comparisons by strong LLMs (one for each)
  • experiments_base_dir: folder where the output will be stored.
  • judges: list of names of the strong LLMs (same order as their corresponding files in llm_files)
  • n : Number of comparisons to subsample from human_file.
  • alpha: error probability parameter
  • ignore_ties: Default - 0. If 1, ignore comparisons where the verdict is a tie.
  • methods: list of methods to construct rank-sets, among baseline, human only, llm, ppr.
  • models: list of models to be ranked. If [], all models in human_file are ranked.

Structure

The file config.json contains the configuration parameters we used for our experimentation.

The folder data contains the datasets used for our experimentation:

The folder scripts contains the code to construct rank-sets and run experiments:

The folder plots contains the code to create the plots:

Output

The results are stored in directory experiments_base_dir. For every combination of n and $\alpha$ values in n and alpha, a new child folder is created inside experiments_base_dir. For example, for n=1000 and alpha=0.05, folder experiments_base_dir/n1000_a05 will be created.

Inside each child folder, multiple json files are created (number equal to number of iterations). Each json file is named x.json where x the iteration number. These json files contain the rank-sets of their respective iteration, in json format:

{
    method 1:   { model 1: [low rank, up rank],
                  ...
                  model k: [low rank, up rank]
                },
    ...
    method m:   { model 1: [low rank, up rank],
                  ...
                  model k: [low rank, up rank]
                }
 }

Plots

First run the experiments via llm-ranking.py using config.json.

Then, install the plot code requirements:

pip install -r plots/plot_requirements.txt

Then, run:

python3 plots/create_plots.py

Figures 3, 4, 9 and 10 are stored in folder plots/ranksets.
Figures 1, 2, 6, 7 and 8 are stored in folder plots/intersect_size.

Citation

If you use parts of the code in this repository for your own research purposes, please consider citing:

@article{chatzi2024predictionpowered,
  title={Prediction-Powered Ranking of Large Language Models},
  author={Ivi Chatzi and Eleni Straitouri and Suhas Thejaswi and Manuel Gomez Rodriguez},
  year={2024},
  journal={arXiv preprint arXiv:2402.17826}
  }

Releases

No releases published

Packages

No packages published

Languages