The accidental triggering of a KeyError exception can cause the entire service to crash. #1496

yinghaodang · 2024-05-14T08:40:32Z

Describe the bug

I use docker-compose to deploy Xinference. Most of the time it works fine, but at some random moment, a KeyError is triggered, causing the entire service to fail. Here are my steps.

To Reproduce

version: '3.8'

services:
  xinference-local:
    image: xprobe/xinference:v0.11.0
    container_name: xinference-local
    ports:
      - 9999:9997
    environment:
      - XINFERENCE_MODEL_SRC=modelscope
      - XINFERENCE_HOME=/root/MODEL_PATH
    volumes:
      - /home/ecidi/MODEL_PATH:/root/MODEL_PATH
    restart: always
    shm_size: '512g'
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    command: xinference-local -H 0.0.0.0 --log-level debug
    networks:
      - xinference-local
networks:
  xinference-local:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: "172.30.2.0/24"

This file is named xinference-local.yml, then docker-compose -f xinference-local.yml up -d to set up.

After a prolonged period of usage (using code).....Here is the error log:

xinference-local  | , generate config: {'temperature': 0.1, 'stream': True, 'stop': ['<|endoftext|>', '<|im_start|>', '<|im_end|>'], 'stop_token_ids': [151643, 151644, 151645]}
xinference-local  | 2024-05-14 08:08:04,096 xinference.core.model 113 DEBUG    After request chat, current serve request count: 0 for the model qwen1.5-chat
xinference-local  | 2024-05-14 08:08:04,097 xinference.core.model 113 DEBUG    Leave wrapped_func, elapsed time: 0 s
xinference-local  | 2024-05-14 08:08:04,099 xinference.api.restful_api 1 ERROR    Chat completion stream got an error: b'\xc5a\xae \t\x1b\xf4\x8a\xa0\x95\x9e\xe4\xc3\xd5\xb1\xc5BG\x11?]:\xb1\x0c\xb8\x83\xb3\x9d\xb7\xa2}0'
xinference-local  | Traceback (most recent call last):
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1365, in stream_results
xinference-local  |     async for item in iterator:
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 335, in __anext__
xinference-local  |     self._actor_ref = await actor_ref(
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 125, in actor_ref
xinference-local  |     return await ctx.actor_ref(*args, **kwargs)
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 197, in actor_ref
xinference-local  |     result = await self._wait(future, actor_ref.address, message)
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
xinference-local  |     return await future
xinference-local  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/core.py", line 87, in _listen
xinference-local  |     future = self._client_to_message_futures[client].pop(message.message_id)
xinference-local  | KeyError: b'\xc5a\xae \t\x1b\xf4\x8a\xa0\x95\x9e\xe4\xc3\xd5\xb1\xc5BG\x11?]:\xb1\x0c\xb8\x83\xb3\x9d\xb7\xa2}0'
xinference-local  | INFO 05-14 08:08:04 metrics.py:229] Avg prompt throughput: 1085.6 tokens/s, Avg generation throughput: 42.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 43.6%, CPU KV cache usage: 0.0%
xinference-local  | INFO 05-14 08:08:04 async_llm_engine.py:120] Finished request 15c42c7e-11c9-11ef-a06b-0242ac1e0202.

I want to know how this error is triggered. Is it due to excessive memory usage, or is there dirty data in the request?
I deployed the qwen1.5-14b-chat large model in its entirety on a single A100. This exception is triggered after continuously calling the model for about 6 hours. Each prompt is different, and after redeployment, the same prompt does not trigger the exception.

Expected behavior

The large model will not get stuck, and even if one of the large models gets stuck, it will not affect the others.

Additional context

The memory usage is 38770MiB / 40960MiB, and the GPU utilization is approximately 80%-90% based on visual observation.

The text was updated successfully, but these errors were encountered:

XprobeBot added the gpu label May 14, 2024

XprobeBot modified the milestones: v0.11.1, v0.11.2 May 14, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4 May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The accidental triggering of a KeyError exception can cause the entire service to crash. #1496

The accidental triggering of a KeyError exception can cause the entire service to crash. #1496

yinghaodang commented May 14, 2024

The accidental triggering of a KeyError exception can cause the entire service to crash. #1496

The accidental triggering of a KeyError exception can cause the entire service to crash. #1496

Comments

yinghaodang commented May 14, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context