Evaluation results not saved incrementally; restarts from first task on resume


Hi, I'm running an evaluation using a script that specifies multiple tasks, but I noticed the following behavior:

After successfully completing the first task, no results are saved to the `output_path`, and if I interrupt the run (e.g., during the second task), restarting the job re-runs the first task again instead of resuming from where it left off.

Is this expected behavior? It seems that results are only written after *all* tasks finish, rather than being saved incrementally per task.

It would be very helpful if:
- Results were saved immediately after each task was completed.
- The evaluator could skip already-completed tasks on restart (e.g., via a `--resume` flag or based on existing output files).

The scripts I use:

```bash
#!/bin/bash

MOUNT_DIR="/root"

CONDA_PATH=/root/miniconda3
CONDA_ENV_NAME=psp-lmms-eval

MODEL_DIR="${MOUNT_DIR}/save_models/opensource/VisionThink/VisionThink-Efficient"
MODEL_NAME="VisionThink-Efficient"
MODEL_CLASS="visionthink_vllm_tool"

BATCH_SIZE=1024
GPU_LIST="4,5,6,7"        # 字注意力头为 28 个，所以只能 4 卡不能 8 卡
LOG_SAMPLES_SUFFIX="vllm" # Specify a suffix for the log_samples file name

TASKS="mmbench_en_dev,pope,realworldqa,mme,mathvista_testmini,mathverse_testmini_vision_only,mmvet"

# 计算 GPU_LIST 的数量
TENSOR_PARALLEL_SIZE=$(echo $GPU_LIST | awk -F',' '{print NF}')

# shellcheck disable=SC1091
if ! { source "${CONDA_PATH}/bin/activate" && eval "$(conda shell.bash hook)" && conda activate $CONDA_ENV_NAME; }; then
    exit 1
fi

echo "成功激活 Conda 环境: ${CONDA_ENV_NAME}"

LMMS_EVAL_DATASET_CACHE="${MOUNT_DIR}/dataset/opensource/lmms_eval"
VLLM_CACHE_ROOT="${MOUNT_DIR}/save_models/vllm_cache"
PROJECT_DIR="${MOUNT_DIR}/opensource/lmms-eval"
OUTPUT_PATH="${PROJECT_DIR}/eval_outputs/${MODEL_NAME}"

EVAL_MODEL_NAME="Qwen2.5-VL-72B-Instruct"
# EVAL_MODEL_NAME="api_doubao_Doubao-Seed-1.6-250615_nothink"

API_TYPE="utools_api"

CUR_TIME=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${OUTPUT_PATH}/logs/${CUR_TIME}.log"
mkdir -p "$(dirname "$LOG_FILE")"
echo "日志文件: $LOG_FILE"

# 环境变量设置
export http_proxy=""
export https_proxy=""
export HF_DATASETS_OFFLINE=1
export HF_HUB_OFFLINE=1
# self._cache_dir = os.path.join(LMMS_EVAL_HOME, "eval_cache", cache_hash)
export LMMS_EVAL_HOME="${PROJECT_DIR}"
export HF_HOME="$LMMS_EVAL_DATASET_CACHE"
export VLLM_CACHE_ROOT="$VLLM_CACHE_ROOT"
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
export HF_TOKEN="$HF_TOKEN"
export LMMS_EVAL_USE_CACHE=True

# evaluate 环境变量
export EVAL_MODEL_NAME="$EVAL_MODEL_NAME"
export API_TYPE="$API_TYPE"

CUDA_VISIBLE_DEVICES=$GPU_LIST python -m lmms_eval \
    --model "$MODEL_CLASS" \
    --model_args "model_version=${MODEL_DIR},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},\
        trust_remote_code=True,max_images=2,prompt=tool_call,enable_tool_call=True,\
        downsample_image=True,max_token=40960" \
    --tasks "${TASKS}" \
    --batch_size "${BATCH_SIZE}" \
    --log_samples \
    --log_samples_suffix "${LOG_SAMPLES_SUFFIX}" \
    --cache_requests "true" \
    --output_path "${OUTPUT_PATH}" \
    --verbosity DEBUG \
    --seed 42 | tee "${LOG_FILE}"

```

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation results not saved incrementally; restarts from first task on resume #852

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation results not saved incrementally; restarts from first task on resume #852

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions