-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Description
Your current environment
(abbreviated)
🐛 Describe the bug
According to the vLLM's source code(vllm/entrypoints/llm.py:1746), there is a tqdm(progress bar library)-related code like below.
The given code calculates total_in_toks divided by pbar.format_dict["elapsed"] in order to calculate the value of the variable in_spd. However, according to the tqdm library's documentation, that elapsed field indicates the number of seconds elapsed since start. If that value is zero (or extremely close to zero) at the time of division, this leads to a ZeroDivisionError.
In my situation (using the DeepSeek‑OCR model via vLLM), all output from vLLM is suppressed (via quiet_stdio()), and when I call llm.generate(..., use_tqdm=True) (the default value of use_tqdm is True currently) I consistently get ZeroDivisionError: integer division or modulo by zero error with the following traceback logs:
╭────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────╮
│ /home/grlee/ghrepo/pdfscribe2ds/ocr_pipeline/pipeline.py:72 in run_pdf_pipeline │
│ │
│ 69 │ for img_path in image_paths: │
│ 70 │ │ try: │
│ 71 │ │ │ with quiet.quiet_stdio(): │
│ ❱ 72 │ │ │ │ raw_md = ocr.image_to_markdown(img_path) │
│ 73 │ │ │ │
│ 74 │ │ │ page_stem = img_path.stem # e.g. "page_001" │
│ 75 │ │ │ assets_dir = md_out_dir / f"{page_stem}_assets" │
│ │
│ /home/grlee/ghrepo/pdfscribe2ds/ocr_pipeline/ocr_engine.py:67 in image_to_markdown │
│ │
│ 64 │ │ │ skip_special_tokens=False, │
│ 65 │ │ ) │
│ 66 │ │ │
│ ❱ 67 │ │ outputs = self.llm.generate(model_input, sampling_param) # type: ignore │
│ 68 │ │ text_output = outputs[0].outputs[0].text │
│ 69 │ │ return text_output │
│ 70 │
│ │
│ /home/grlee/ghrepo/pdfscribe2ds/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py:446 in │
│ generate │
│ │
│ 443 │ │ │ priority=priority, │
│ 444 │ │ ) │
│ 445 │ │ │
│ ❱ 446 │ │ outputs = self._run_engine(use_tqdm=use_tqdm) │
│ 447 │ │ return self.engine_class.validate_outputs(outputs, RequestOutput) │
│ 448 │ │
│ 449 │ def _get_modality_specific_lora_reqs( │
│ │
│ /home/grlee/ghrepo/pdfscribe2ds/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py:1746 in │
│ _run_engine │
│ │
│ 1743 │ │ │ │ │ │ │ n = len(output.outputs) │
│ 1744 │ │ │ │ │ │ │ assert output.prompt_token_ids is not None │
│ 1745 │ │ │ │ │ │ │ total_in_toks += len(output.prompt_token_ids) * n │
│ ❱ 1746 │ │ │ │ │ │ │ in_spd = total_in_toks / pbar.format_dict["elapsed"] │
│ 1747 │ │ │ │ │ │ │ total_out_toks += sum( │
│ 1748 │ │ │ │ │ │ │ │ len(stp.token_ids) for stp in output.outputs │
│ 1749 │ │ │ │ │ │ │ ) │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The dump above was generated from the code in my project. Currently, I just disable the tqdm option to False to avoid the error.
I recommend replacing that llm.py's code with max(pbar.format_dict["elapsed"], 1e-6), instead of simply using pbar.format_dict["elapsed"]` when dividing, just in case, to safely guard against zero.
Note: Currently, I'm using the vllm==0.11.1rc6.dev107+g878fd5a16.cu129 version of vLLM because there is an ongoing version issue regarding to utilizing the DeepSeek-OCR model through vLLM framework. (#28030) So I'm not sure whether this bug is already addressed or not during the current development procedure, but at least I couldn't find related information on the issue tab, so I'm leaving this one.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.