Skip to content

[Bug] commonsense_qa和strategyqa的results为空 #1941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
gxlover0625 opened this issue Mar 13, 2025 · 2 comments
Open
2 tasks done

[Bug] commonsense_qa和strategyqa的results为空 #1941

gxlover0625 opened this issue Mar 13, 2025 · 2 comments
Assignees

Comments

@gxlover0625
Copy link

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
 'CUDA_HOME': '/usr/local/cuda',
 'GCC': 'gcc (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3 2.17)',
 'GPU 0,1,2,3': 'NVIDIA H20',
 'MMEngine': '0.10.7',
 'MUSA available': False,
 'NVCC': 'Cuda compilation tools, release 12.4, V12.4.99',
 'OpenCV': '4.11.0',
 'PyTorch': '2.5.1+cu124',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2024.2-Product Build 20240605 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.5.3 (Git Hash '
                              '66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - CUDA Runtime 12.4\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 90.1\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
                              'CUDNN_VERSION=9.1.0, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
                              '-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wsuggest-override '
                              '-Wno-psabi -Wno-error=old-style-cast '
                              '-Wno-missing-braces -fdiagnostics-color=always '
                              '-faligned-new -Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
                              'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
                              'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
                              'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
                              'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
                              'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
 'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
 'TorchVision': '0.20.1+cu124',
 'lmdeploy': "not installed:No module named 'lmdeploy'",
 'numpy_random_seed': 2147483648,
 'opencompass': '0.4.1+709bc4a',
 'sys.platform': 'linux',
 'transformers': '4.49.0'}

重现问题 - 代码/配置示例

无,我采用的是命令的方式,参考下一部分

重现问题 - 命令或脚本

我参考了文档的快速开始章节,为commonsenseqastrategyqa编写评估脚本

export HF_ENDPOINT=https://hf-mirror.com
export DATASET_SOURCE=ModelScope
CUDA_VISIBLE_DEVICES=2 python3 run.py \
    --datasets commonsenseqa_gen \ # --datasets strategyqa_gen 
    --hf-type chat \
    --hf-path /home/admin/workspace/llm/Qwen/Qwen2.5-3B-Instruct \
    --debug

重现问题 - 错误信息

程序正常结束,没有报错,文件夹内容完整

Image
对应prediction的commonsense_qa.json内容正常,但是summary里面的csv、md都有问题,评测为空

dataset version metric mode Qwen2.5-3B-Instruct_hf
commonsense_qa - - - -

其他信息

No response

@tonysy
Copy link
Collaborator

tonysy commented Mar 13, 2025

Please check the log/eval and the results folder

@gxlover0625
Copy link
Author

There is one file named commonsense_qa.out in the log/eval folder. The following is the content of the file.

03/13 14:12:16 - OpenCompass - INFO - Try to load the data from /home/adc/.cache/opencompass/./data/commonsenseqa
commonsense_qa train 9741
commonsense_qa validation 1221
03/13 14:12:17 - OpenCompass - INFO - Task [Qwen2.5-3B-Instruct_hf/commonsense_qa]: {}
03/13 14:12:17 - OpenCompass - INFO - time elapsed: 27.98s

And I notice that file named commonsense_qa.json in the results/model folder is empty like that

Image
Could you please me how to solve this problem which prevents me from getting the evaluation results of commonsense_q and strategy_qa?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants