Skip to content

Evaluation results not saved incrementally; restarts from first task on resume #852

@pspdada

Description

@pspdada

Hi, I'm running an evaluation using a script that specifies multiple tasks, but I noticed the following behavior:

After successfully completing the first task, no results are saved to the output_path, and if I interrupt the run (e.g., during the second task), restarting the job re-runs the first task again instead of resuming from where it left off.

Is this expected behavior? It seems that results are only written after all tasks finish, rather than being saved incrementally per task.

It would be very helpful if:

  • Results were saved immediately after each task was completed.
  • The evaluator could skip already-completed tasks on restart (e.g., via a --resume flag or based on existing output files).

The scripts I use:

#!/bin/bash

MOUNT_DIR="/root"

CONDA_PATH=/root/miniconda3
CONDA_ENV_NAME=psp-lmms-eval

MODEL_DIR="${MOUNT_DIR}/save_models/opensource/VisionThink/VisionThink-Efficient"
MODEL_NAME="VisionThink-Efficient"
MODEL_CLASS="visionthink_vllm_tool"

BATCH_SIZE=1024
GPU_LIST="4,5,6,7"        # 字注意力头为 28 个,所以只能 4 卡不能 8 卡
LOG_SAMPLES_SUFFIX="vllm" # Specify a suffix for the log_samples file name

TASKS="mmbench_en_dev,pope,realworldqa,mme,mathvista_testmini,mathverse_testmini_vision_only,mmvet"

# 计算 GPU_LIST 的数量
TENSOR_PARALLEL_SIZE=$(echo $GPU_LIST | awk -F',' '{print NF}')

# shellcheck disable=SC1091
if ! { source "${CONDA_PATH}/bin/activate" && eval "$(conda shell.bash hook)" && conda activate $CONDA_ENV_NAME; }; then
    exit 1
fi

echo "成功激活 Conda 环境: ${CONDA_ENV_NAME}"

LMMS_EVAL_DATASET_CACHE="${MOUNT_DIR}/dataset/opensource/lmms_eval"
VLLM_CACHE_ROOT="${MOUNT_DIR}/save_models/vllm_cache"
PROJECT_DIR="${MOUNT_DIR}/opensource/lmms-eval"
OUTPUT_PATH="${PROJECT_DIR}/eval_outputs/${MODEL_NAME}"

EVAL_MODEL_NAME="Qwen2.5-VL-72B-Instruct"
# EVAL_MODEL_NAME="api_doubao_Doubao-Seed-1.6-250615_nothink"

API_TYPE="utools_api"

CUR_TIME=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${OUTPUT_PATH}/logs/${CUR_TIME}.log"
mkdir -p "$(dirname "$LOG_FILE")"
echo "日志文件: $LOG_FILE"

# 环境变量设置
export http_proxy=""
export https_proxy=""
export HF_DATASETS_OFFLINE=1
export HF_HUB_OFFLINE=1
# self._cache_dir = os.path.join(LMMS_EVAL_HOME, "eval_cache", cache_hash)
export LMMS_EVAL_HOME="${PROJECT_DIR}"
export HF_HOME="$LMMS_EVAL_DATASET_CACHE"
export VLLM_CACHE_ROOT="$VLLM_CACHE_ROOT"
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
export HF_TOKEN="$HF_TOKEN"
export LMMS_EVAL_USE_CACHE=True

# evaluate 环境变量
export EVAL_MODEL_NAME="$EVAL_MODEL_NAME"
export API_TYPE="$API_TYPE"

CUDA_VISIBLE_DEVICES=$GPU_LIST python -m lmms_eval \
    --model "$MODEL_CLASS" \
    --model_args "model_version=${MODEL_DIR},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},\
        trust_remote_code=True,max_images=2,prompt=tool_call,enable_tool_call=True,\
        downsample_image=True,max_token=40960" \
    --tasks "${TASKS}" \
    --batch_size "${BATCH_SIZE}" \
    --log_samples \
    --log_samples_suffix "${LOG_SAMPLES_SUFFIX}" \
    --cache_requests "true" \
    --output_path "${OUTPUT_PATH}" \
    --verbosity DEBUG \
    --seed 42 | tee "${LOG_FILE}"

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions