Prediction-Powered Ranking

This repository contains the code for the paper Prediction-Powered Ranking of Large Language Models.

Dependencies

All the code is written in Python 3.11.2
In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Usage

python3 scripts/llm-ranking.py <file_config>

where

file_config : json file of configuration parameters.

Configuration parameters:

seed: Seed used for random sampling.
iterations: number of times each experiment is run
human_file: dataset containing pairwise comparisons by humans
llm_files: list of datasets containing pairwise comparisons by strong LLMs (one for each)
experiments_base_dir: folder where the output will be stored.
judges: list of names of the strong LLMs (same order as their corresponding files in llm_files)
n : Number of comparisons to subsample from human_file.
alpha: error probability parameter
ignore_ties: Default - 0. If 1, ignore comparisons where the verdict is a tie.
methods: list of methods to construct rank-sets, among baseline, human only, llm, ppr.
models: list of models to be ranked. If [], all models in human_file are ranked.

Structure

The file config.json contains the configuration parameters we used for our experimentation.

The folder data contains the datasets used for our experimentation:

human.json: pairwise comparisons by humans.
gpt-4-0125-preview.json: pairwise comparisons by GPT 4.
claude-3-opus-20240229.json: pairwise comparisons by Claude 3.
gpt-3.5-turbo.json: pairwise comparisons by GPT 3.5.

The folder scripts contains the code to construct rank-sets and run experiments:

llm-ranking.py: main file
data_process.py: inputs and subsamples from datasets
estimate.py: implements Algorithms 1,3,4 from the paper to compute $\hat{\theta}$ and $\widehat{\Sigma}$
ranksets.py: implements Algorithm 2 from the paper to construct rank-sets
run_experiments.py: runs experiments for all input parameters

The folder plots contains the code to create the plots:

create_plots.py: generates all plots
Result.py: class that computes metrics for each experiment
ExperimentCollection.py: class that contains multiple experiments
PlotRanksets.py: code to plot figures 3, 4, 9 and 10
PlotIntersectSize.py: code to plot figures 1, 2, 6, 7 and 8

Output

The results are stored in directory experiments_base_dir. For every combination of n and $\alpha$ values in n and alpha, a new child folder is created inside experiments_base_dir. For example, for n=1000 and alpha=0.05, folder experiments_base_dir/n1000_a05 will be created.

Inside each child folder, multiple json files are created (number equal to number of iterations). Each json file is named x.json where x the iteration number. These json files contain the rank-sets of their respective iteration, in json format:

{
    method 1:   { model 1: [low rank, up rank],
                  ...
                  model k: [low rank, up rank]
                },
    ...
    method m:   { model 1: [low rank, up rank],
                  ...
                  model k: [low rank, up rank]
                }
 }

Plots

First run the experiments via llm-ranking.py using config.json.

Then, install the plot code requirements:

pip install -r plots/plot_requirements.txt

Then, run:

python3 plots/create_plots.py

Figures 3, 4, 9 and 10 are stored in folder plots/ranksets.
Figures 1, 2, 6, 7 and 8 are stored in folder plots/intersect_size.

Citation

If you use parts of the code in this repository for your own research purposes, please consider citing:

@article{chatzi2024predictionpowered,
  title={Prediction-Powered Ranking of Large Language Models},
  author={Ivi Chatzi and Eleni Straitouri and Suhas Thejaswi and Manuel Gomez Rodriguez},
  year={2024},
  journal={arXiv preprint arXiv:2402.17826}
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

plots

plots

scripts

scripts

LICENSE

LICENSE

README.md

README.md

config.json

config.json

requirements.txt

requirements.txt

Repository files navigation

Prediction-Powered Ranking

Dependencies

Usage

Configuration parameters:

Structure

Output

Plots

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
plots		plots
scripts		scripts
LICENSE		LICENSE
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt

License

Networks-Learning/prediction-powered-ranking

Folders and files

Latest commit

History

Repository files navigation

Prediction-Powered Ranking

Dependencies

Usage

Configuration parameters:

Structure

Output

Plots

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages