Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hallucination multicalibrator, with example benchmark #152

Merged
merged 74 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 70 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
52e96ea
edit installation instructions in readme
gianlucadetommaso May 15, 2023
5e0076d
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 15, 2023
4c7fd28
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 15, 2023
6cb6581
bump up version
gianlucadetommaso May 15, 2023
1b39780
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 16, 2023
cb2b49a
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 16, 2023
14e3ca4
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 25, 2023
580067d
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso May 27, 2023
048ef09
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 2, 2023
ad542a4
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 12, 2023
41417c1
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 12, 2023
64be374
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 14, 2023
a2d0f34
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 14, 2023
66bba06
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 15, 2023
911aa82
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 15, 2023
01f959b
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 15, 2023
79f8dca
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 15, 2023
4dea50f
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jun 21, 2023
1ced008
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 18, 2023
6992692
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 18, 2023
b2540c1
make small change in readme because of publish to pypi error
gianlucadetommaso Jul 18, 2023
2362998
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 18, 2023
6e030f2
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 25, 2023
9bd6f67
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 25, 2023
c5bc94f
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 25, 2023
d3ab46b
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 26, 2023
0e2aca5
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 26, 2023
9520273
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 30, 2023
e9c4108
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 30, 2023
bc64a01
bump up version
gianlucadetommaso Jul 30, 2023
25072da
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 30, 2023
e27b378
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Jul 30, 2023
a175e16
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 1, 2023
6e202f1
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 1, 2023
635e7c9
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 9, 2023
8e23b32
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 16, 2023
f5efef8
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 24, 2023
958b245
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 24, 2023
577d169
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 28, 2023
69a454e
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 30, 2023
6e880ba
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Aug 30, 2023
f606545
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 11, 2023
63e09bb
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 11, 2023
b2402b5
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 12, 2023
591d842
refactor tabular analysis of benchmarks
gianlucadetommaso Sep 13, 2023
3dcf217
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 13, 2023
d1b5b4a
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 18, 2023
b4c161e
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 21, 2023
744dff1
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 21, 2023
a22f97f
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 24, 2023
fffdd76
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 26, 2023
c23d16d
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 26, 2023
1cb2917
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 27, 2023
9c1d07a
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Sep 29, 2023
4b83638
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 10, 2023
610fc37
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 10, 2023
e5b67ba
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 10, 2023
1f03d4e
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 10, 2023
d49ed29
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 11, 2023
8200e42
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 19, 2023
882733b
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 19, 2023
c8ca7e6
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 27, 2023
b1e67fc
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 30, 2023
e6b8c85
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Oct 30, 2023
2197430
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Nov 7, 2023
8a5dfdd
copy embeddings during normalization
gianlucadetommaso Nov 8, 2023
742954d
add hallucination multicalibrator
gianlucadetommaso Nov 16, 2023
078e275
Merge branch 'main' of https://github.com/awslabs/fortuna
gianlucadetommaso Nov 16, 2023
abe2eec
Merge branch 'main' into grouping2
gianlucadetommaso Nov 16, 2023
86d6ec5
improve type hinting
gianlucadetommaso Nov 16, 2023
75d4f7c
small refactoring of hallucination multicalibrator
gianlucadetommaso Nov 16, 2023
ea14d25
batchify processing of multiple answers for speedup
gianlucadetommaso Nov 17, 2023
dbe8ecd
fix embedding dimension
gianlucadetommaso Nov 17, 2023
493b020
change max number of clusters
gianlucadetommaso Nov 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions benchmarks/hallucination/mmlu/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
import pickle
from string import ascii_uppercase as auc

from datasets import (
get_dataset_config_names,
load_dataset,
)
import numpy as np
from transformers import (
GPT2LMHeadModel,
GPT2TokenizerFast,
)

from fortuna.hallucination import HallucinationMulticalibrator
from fortuna.hallucination.utils import string_cleaner
from fortuna.metric.classification import accuracy

SEED = 0
CALIB_FRAC = 0.8

if __name__ == "__main__":
device = "cuda"
model_id = "gpt2-large"
model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)

# download and prepare data
task_list = get_dataset_config_names("lukaemon/mmlu")
dataset_list = [
(
load_dataset(
"lukaemon/mmlu",
task,
),
task,
)
for task in task_list
]

answer_map = {a: i for i, a in enumerate(auc)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choices seem to be in list("ABCD"). why has the answer map more options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tries to be more generic, but you're right, can be restricted to "ABCD".

samples = []
for datasets, task in dataset_list:
for dataset_key, dataset in datasets.items():
for sample in dataset:
samples.append(
dict(
question=string_cleaner(sample["input"]),
choices=[sample[letter] for letter in ["A", "B", "C", "D"]],
targets=answer_map[sample["target"]],
)
)

# shuffle and split
rng = np.random.default_rng(seed=SEED)
tot_size = len(samples)
perm = rng.choice(tot_size, tot_size, replace=False)
samples = [samples[i] for i in perm]

calib_size = int(np.ceil(CALIB_FRAC * tot_size))
calib_choices, calib_questions, calib_targets = [], [], []
test_choices, test_questions, test_targets = [], [], []
for i, sample in enumerate(samples):
if i < calib_size:
calib_questions.append(sample["question"])
calib_choices.append(sample["choices"])
calib_targets.append(sample["targets"])
else:
test_questions.append(sample["question"])
test_choices.append(sample["choices"])
test_targets.append(sample["targets"])

# calibrate
calibrator = HallucinationMulticalibrator(
generative_model=model, tokenizer=tokenizer
)

status = calibrator.fit(
texts=calib_choices,
contexts=calib_questions,
targets=calib_targets,
)

with open("fitted_calibrator.pth", "wb") as filehandler:
pickle.dump(calibrator, filehandler, -1)
gianlucadetommaso marked this conversation as resolved.
Show resolved Hide resolved

# test
test_probs = calibrator.predict_proba(
texts=test_choices, contexts=test_questions, calibrate=False
)
test_preds = calibrator.predict(
texts=test_choices, contexts=test_questions, probs=test_probs
)

calib_test_probs = calibrator.predict_proba(
texts=test_choices, contexts=test_questions
)
calib_test_preds = calibrator.predict(
texts=test_choices, contexts=test_questions, probs=calib_test_probs
)

# measure
mse_before = calibrator.multicalibrator.mean_squared_error(
probs=test_probs, targets=np.array(test_targets)
)
acc_before = accuracy(test_preds, np.array(test_targets))
mse_after = calibrator.multicalibrator.mean_squared_error(
probs=calib_test_probs, targets=np.array(test_targets)
)
acc_after = accuracy(calib_test_preds, np.array(test_targets))

print(f"MSE before calibration: {round(float(mse_before), 4)}.")
print(f"Accuracy before calibration: {round(float(acc_before), 4)}.")
print(f"MSE after calibration: {round(float(mse_after), 4)}.")
print(f"Accuracy after calibration: {round(float(acc_before), 4)}.")
3 changes: 2 additions & 1 deletion fortuna/hallucination/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from fortuna.hallucination.embedding import EmbeddingManager
from fortuna.hallucination.base import HallucinationMulticalibrator
from fortuna.hallucination.grouping.clustering.base import GroupingModel
from fortuna.hallucination.scoring.inv_perplexity import inv_perplexity
Loading
Loading