You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Copyright (c) Alibaba, Inc. and its affiliates."""1. InstallationEvalScope: pip install evalscope[opencompass]2. Download dataset to data/ folderwget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zipunzip OpenCompassData-core-20240207.zip3. Deploy model serving swift deploy --model_type qwen2-1_5b-instruct4. Run eval task"""fromevalscope.backend.opencompassimportOpenCompassBackendManagerfromevalscope.runimportrun_taskfromevalscope.summarizerimportSummarizerdefrun_swift_eval():
# List all datasets# e.g. ['mmlu', 'WSC', 'DRCD', 'chid', 'gsm8k', 'AX_g', 'BoolQ', 'cmnli', 'ARC_e', 'ocnli_fc', 'summedits', 'MultiRC', 'GaokaoBench', 'obqa', 'math', 'agieval', 'hellaswag', 'RTE', 'race', 'ocnli', 'strategyqa', 'triviaqa', 'WiC', 'COPA', 'piqa', 'nq', 'mbpp', 'csl', 'Xsum', 'CB', 'tnews', 'ARC_c', 'afqmc', 'eprstmt', 'ReCoRD', 'bbh', 'CMRC', 'AX_b', 'siqa', 'storycloze', 'humaneval', 'cluewsc', 'winogrande', 'lambada', 'ceval', 'bustm', 'C3', 'lcsts']print(
f"** All datasets from OpenCompass backend: {OpenCompassBackendManager.list_datasets()}"
)
# Prepare the config""" Attributes: `eval_backend`: Default to 'OpenCompass' `datasets`: list, refer to `OpenCompassBackendManager.list_datasets()` `models`: list of dict, each dict must contain `path` and `openai_api_base` `path`: reuse the value of '--model_type' in the command line `swift deploy` `openai_api_base`: the base URL of swift model serving `work_dir`: str, the directory to save the evaluation results、logs and summaries. Default to 'outputs/default' Refer to `opencompass.cli.arguments.ApiModelConfig` for other optional attributes. """# Option 1: Use dict format# Args:# path: The path of the model, it means the `model_type` for swift, e.g. 'llama3-8b-instruct'# is_chat: True for chat model, False for base model# key: The OpenAI api-key of the model api, default to 'EMPTY'# openai_api_base: The base URL of the OpenAI API, it means the swift model serving URL.task_cfg=dict(
eval_backend="OpenCompass",
eval_config={
"datasets": ["winogrande"],
"models": [
{
"path": "qwen2-7b-instruct", # Please make sure the model is deployed"openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
"is_chat": True,
"batch_size": 16,
},
],
"work_dir": "outputs/qwen2_eval_result",
"limit": 10,
},
)
# Option 2: Use yaml file# task_cfg = 'examples/tasks/default_eval_swift_openai_api.yaml'# Option 3: Use json file# task_cfg = 'examples/tasks/default_eval_swift_openai_api.json'# Run taskrun_task(task_cfg=task_cfg)
# [Optional] Get the final report with summarizerprint(">> Start to get the report with summarizer ...")
report_list=Summarizer.get_report_from_cfg(task_cfg)
print(f"\n>>The report list: {report_list}")
if__name__=="__main__":
run_swift_eval()
The text was updated successfully, but these errors were encountered:
下面这个代码,我想测试10个测试集,然后统一设定下载的data目录,该如何指定?
The text was updated successfully, but these errors were encountered: