Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

采用 Openai server 测试,能够产生 inference 结果,但是 eval 输出结果为空 #208

Open
9 tasks
HaoWuSR opened this issue Nov 22, 2024 · 2 comments
Assignees

Comments

@HaoWuSR
Copy link

HaoWuSR commented Nov 22, 2024

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

使用的工具 / Tools Used

  • Native / 原生框架
  • [✅] Opencompass backend
  • VLMEvalKit backend
  • RAGEval backend
  • Perf / 模型推理压测工具
  • Arena /竞技场模式

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

from evalscope.run import run_task
from evalscope.summarizer import Summarizer

task_cfg_dict = dict(
    eval_backend='OpenCompass',
    eval_config={
        'datasets': ['gsm8k'],
        'models': [
            {'path': '/workspace/models/Llama-2-13b-chat-hf', 
            'openai_api_base': 'http://127.0.0.1:8008/v1/chat/completions', 
            'is_chat': True,
            'batch_size': 16},
        ],
        'work_dir': 'outputs/llama-2-13b-chat-hf',
        'limit': None,
        },
    )

def run_eval():
    # 选项 1: python 字典
    task_cfg = task_cfg_dict

    # 选项 2: yaml 配置文件
    # task_cfg = 'eval_openai_api.yaml'

    # 选项 3: json 配置文件
    # task_cfg = 'eval_openai_api.json'
    # print(task_cfg)

    run_task(task_cfg=task_cfg)

    print('>> Start to get the report with summarizer ...')
    report_list = Summarizer.get_report_from_cfg(task_cfg)
    print(f'\n>> The report list: {report_list}')

run_eval()
# 例如:在终端中执行的指令 / Terminal command executed
python script.py

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset                                 version    metric    mode    /workspace/models/Llama-2-13b-chat-hf
--------------------------------------  ---------  --------  ------  ---------------------------------------
--------- 考试 Exam ---------           -          -         -       -
ceval                                   -          -         -       -
agieval                                 -          -         -       -
mmlu                                    -          -         -       -
GaokaoBench                             -          -         -       -
ARC-c                                   -          -         -       -
--------- 语言 Language ---------       -          -         -       -
WiC                                     -          -         -       -
summedits                               -          -         -       -
chid-dev                                -          -         -       -
afqmc-dev                               -          -         -       -
bustm-dev                               -          -         -       -
cluewsc-dev                             -          -         -       -
WSC                                     -          -         -       -
winogrande                              -          -         -       -
flores_100                              -          -         -       -
--------- 知识 Knowledge ---------      -          -         -       -
BoolQ                                   -          -         -       -
commonsense_qa                          -          -         -       -
nq                                      -          -         -       -
triviaqa                                -          -         -       -
--------- 推理 Reasoning ---------      -          -         -       -
cmnli                                   -          -         -       -
ocnli                                   -          -         -       -
ocnli_fc-dev                            -          -         -       -
AX_b                                    -          -         -       -
AX_g                                    -          -         -       -
CB                                      -          -         -       -
RTE                                     -          -         -       -
story_cloze                             -          -         -       -
COPA                                    -          -         -       -
ReCoRD                                  -          -         -       -
hellaswag                               -          -         -       -
piqa                                    -          -         -       -
siqa                                    -          -         -       -
strategyqa                              -          -         -       -
math                                    -          -         -       -
gsm8k                                   -          -         -       -
TheoremQA                               -          -         -       -
openai_humaneval                        -          -         -       -
mbpp                                    -          -         -       -
bbh                                     -          -         -       -
--------- 理解 Understanding ---------  -          -         -       -
C3                                      -          -         -       -
CMRC_dev                                -          -         -       -
DRCD_dev                                -          -         -       -
MultiRC                                 -          -         -       -
race-middle                             -          -         -       -
race-high                               -          -         -       -
openbookqa_fact                         -          -         -       -
csl_dev                                 -          -         -       -
lcsts                                   -          -         -       -
Xsum                                    -          -         -       -
eprstmt-dev                             -          -         -       -
lambada                                 -          -         -       -
tnews-dev                               -          -         -       -

运行环境 / Runtime Environment

  • 操作系统 / Operating System:

    • Windows
    • macOS
    • [✅] Ubuntu
  • Python版本 / Python Version:

    • 3.11
    • [ ✅] 3.10
    • 3.9

其他信息 / Additional Information

如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.

@HaoWuSR
Copy link
Author

HaoWuSR commented Nov 22, 2024

下面是模型路径下的 out 文件
`11/21 15:20:16 - OpenCompass - INFO - Task [/workspace/models/Llama-2-13b-chat-hf/gsm8k]
11/21 15:20:24 - OpenCompass - WARNING - Max Completion tokens for /workspace/models/Llama-2-13b-chat-hf is :16384
11/21 15:20:26 - OpenCompass - INFO - Start inferencing [/workspace/models/Llama-2-13b-chat-hf/gsm8k]
[2024-11-21 15:20:27,222] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-11-21 15:20:27,222] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

0%| | 0/83 [00:00<?, ?it/s]
1%| | 1/83 [00:12<17:28, 12.78s/it]
2%|▏ | 2/83 [00:24<16:26, 12.18s/it]
4%|▎ | 3/83 [00:36<16:00, 12.01s/it]
5%|▍ | 4/83 [00:48<15:58, 12.14s/it]
6%|▌ | 5/83 [01:01<16:10, 12.44s/it]
7%|▋ | 6/83 [01:14<16:03, 12.51s/it]
8%|▊ | 7/83 [01:26<15:44, 12.43s/it]
10%|▉ | 8/83 [01:38<15:17, 12.24s/it]
11%|█ | 9/83 [01:50<15:09, 12.29s/it]
12%|█▏ | 10/83 [02:03<15:08, 12.45s/it]
13%|█▎ | 11/83 [02:16<15:01, 12.53s/it]
14%|█▍ | 12/83 [02:28<14:42, 12.43s/it]
16%|█▌ | 13/83 [02:41<14:47, 12.68s/it]
17%|█▋ | 14/83 [02:53<14:09, 12.31s/it]
18%|█▊ | 15/83 [03:05<13:47, 12.17s/it]
19%|█▉ | 16/83 [03:17<13:36, 12.19s/it]
20%|██ | 17/83 [03:29<13:24, 12.19s/it]
22%|██▏ | 18/83 [03:43<13:41, 12.64s/it]
23%|██▎ | 19/83 [03:56<13:32, 12.70s/it]
24%|██▍ | 20/83 [04:08<13:07, 12.50s/it]
25%|██▌ | 21/83 [04:20<12:49, 12.41s/it]
27%|██▋ | 22/83 [04:32<12:31, 12.32s/it]
28%|██▊ | 23/83 [04:44<12:10, 12.18s/it]
29%|██▉ | 24/83 [04:56<12:05, 12.29s/it]
30%|███ | 25/83 [05:10<12:09, 12.59s/it]
31%|███▏ | 26/83 [05:22<12:00, 12.65s/it]
33%|███▎ | 27/83 [05:36<12:00, 12.86s/it]
34%|███▎ | 28/83 [05:49<11:48, 12.87s/it]
35%|███▍ | 29/83 [06:02<11:40, 12.97s/it]
36%|███▌ | 30/83 [06:15<11:24, 12.91s/it]
37%|███▋ | 31/83 [06:27<11:05, 12.79s/it]
39%|███▊ | 32/83 [06:39<10:38, 12.52s/it]
40%|███▉ | 33/83 [06:51<10:21, 12.44s/it]
41%|████ | 34/83 [07:04<10:15, 12.56s/it]
42%|████▏ | 35/83 [07:17<10:06, 12.64s/it]
43%|████▎ | 36/83 [07:29<09:42, 12.38s/it]
45%|████▍ | 37/83 [07:40<09:18, 12.15s/it]
46%|████▌ | 38/83 [07:53<09:16, 12.37s/it]
47%|████▋ | 39/83 [08:06<09:11, 12.52s/it]
48%|████▊ | 40/83 [08:18<08:53, 12.41s/it]
49%|████▉ | 41/83 [08:31<08:42, 12.43s/it]
51%|█████ | 42/83 [08:44<08:38, 12.64s/it]
52%|█████▏ | 43/83 [08:55<08:14, 12.35s/it]
53%|█████▎ | 44/83 [09:08<08:06, 12.47s/it]
54%|█████▍ | 45/83 [09:20<07:47, 12.30s/it]
55%|█████▌ | 46/83 [09:33<07:43, 12.51s/it]
57%|█████▋ | 47/83 [09:46<07:31, 12.54s/it]
58%|█████▊ | 48/83 [09:59<07:24, 12.71s/it]
59%|█████▉ | 49/83 [10:11<07:03, 12.45s/it]
60%|██████ | 50/83 [10:23<06:53, 12.54s/it]
61%|██████▏ | 51/83 [10:37<06:46, 12.71s/it]
63%|██████▎ | 52/83 [10:49<06:28, 12.52s/it]
64%|██████▍ | 53/83 [11:02<06:22, 12.73s/it]
65%|██████▌ | 54/83 [11:14<06:04, 12.57s/it]
66%|██████▋ | 55/83 [11:26<05:44, 12.31s/it]
67%|██████▋ | 56/83 [11:38<05:35, 12.42s/it]
69%|██████▊ | 57/83 [11:50<05:18, 12.27s/it]
70%|██████▉ | 58/83 [12:03<05:08, 12.33s/it]
71%|███████ | 59/83 [12:16<05:01, 12.56s/it]
72%|███████▏ | 60/83 [12:28<04:44, 12.36s/it]
73%|███████▎ | 61/83 [12:40<04:30, 12.30s/it]
75%|███████▍ | 62/83 [12:52<04:16, 12.21s/it]
76%|███████▌ | 63/83 [13:05<04:07, 12.39s/it]
77%|███████▋ | 64/83 [13:17<03:56, 12.47s/it]
78%|███████▊ | 65/83 [13:30<03:46, 12.59s/it]
80%|███████▉ | 66/83 [13:42<03:28, 12.29s/it]
81%|████████ | 67/83 [13:54<03:16, 12.26s/it]
82%|████████▏ | 68/83 [14:07<03:06, 12.46s/it]
83%|████████▎ | 69/83 [14:20<02:55, 12.51s/it]
84%|████████▍ | 70/83 [14:32<02:42, 12.53s/it]
86%|████████▌ | 71/83 [14:44<02:29, 12.43s/it]
87%|████████▋ | 72/83 [14:57<02:17, 12.53s/it]
88%|████████▊ | 73/83 [15:09<02:03, 12.39s/it]
89%|████████▉ | 74/83 [15:21<01:51, 12.34s/it]
90%|█████████ | 75/83 [15:35<01:40, 12.59s/it]
92%|█████████▏| 76/83 [15:47<01:28, 12.65s/it]
93%|█████████▎| 77/83 [16:01<01:16, 12.78s/it]
94%|█████████▍| 78/83 [16:14<01:04, 12.99s/it]
95%|█████████▌| 79/83 [16:27<00:52, 13.05s/it]
96%|█████████▋| 80/83 [16:40<00:39, 13.01s/it]
98%|█████████▊| 81/83 [16:52<00:25, 12.69s/it]
99%|█████████▉| 82/83 [17:04<00:12, 12.59s/it]
100%|██████████| 83/83 [17:12<00:00, 11.18s/it]
100%|██████████| 83/83 [17:12<00:00, 12.44s/it]
11/21 15:37:40 - OpenCompass - INFO - time elapsed: 1043.18s
`

@Yunnglin
Copy link
Collaborator

请执行pip list | grep ms-opencompass看一下opencomapss版本

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants