New Alternative to LLM-as-a-judge! #24

milangritta · 2024-10-21T19:09:25Z

Hello Clementine and the Evaluation Community,

We would like to introduce you to our new metric, HumanRankEval, an alternative to the popular 'llm-as-a-judge'. Instead of using the LLM to judge machine-generated text, we use human-generated text to 'judge' the LLM! :) Please take a look, thank you very much! Let us know what you think :)

NAACL '24 PAPER LINK: https://aclanthology.org/2024.naacl-long.456/
CODE: https://github.com/huawei-noah/noah-research/tree/master/NLP/HumanRankEval
DATA: https://huggingface.co/datasets/huawei-noah/human_rank_eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Alternative to LLM-as-a-judge! #24

New Alternative to LLM-as-a-judge! #24

milangritta commented Oct 21, 2024

New Alternative to LLM-as-a-judge! #24

New Alternative to LLM-as-a-judge! #24

Comments

milangritta commented Oct 21, 2024