Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multibyte UTF-8 Characters Broken in Streaming Mode (� Substitution) (A5000, H100) #4370

Open
Nurgl opened this issue Feb 27, 2025 · 0 comments

Comments

@Nurgl
Copy link

Nurgl commented Feb 27, 2025

Description
When using Triton Inference Server in streaming mode (generate_stream endpoint) with REST API or gRPC, multibyte UTF-8 characters (e.g., emojis or non-ASCII symbols like Cyrillic) are incorrectly split and replaced with the "�" symbol in the response. This issue is observed across different versions of Triton and is visible in raw server output (e.g., via curl) as well as client-side processing. Notably, this problem does not occur in non-streaming mode, where the response is returned as a single, correctly encoded UTF-8 string.

The problem appears to occur before the decoding stage on the server side, where multibyte UTF-8 sequences (e.g., emojis or Cyrillic characters) are not preserved as complete units during the streaming process. As a result, the text_output field in the response contains "�" for affected characters, and client-side buffering cannot recover the original data since the substitution happens prior to transmission.

curl -X POST localhost:8000/v2/models/ensemble/generate_stream -d '{
    "text_input": "<|im_start|>system <|im_end|> <|im_start|>user Write a poem using the words click Nutcracker and yoga. Format it with emojis.<|im_end|><|im_start|>assistant ", 
    "parameters": {
      "max_tokens": 512,
          "temperature": 0.7,
          "stream": true,
          "top_p": 0.8,
          "top_k": 1,
          "length_penalty": 0.8,
          "beam_width": 1,
      "stop_words":["<|end|>"]
    }
}'
data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"✨"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"🌟"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" **"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"Click"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":","}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" Nut"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"cr"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"acker"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":","}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" and"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" Yoga"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"**"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" �"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"�"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"�"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"✨"}

data: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\n\n"}

Environment

TensorRT Version:
10.8

NVIDIA GPU:
A5000, H100

NVIDIA Driver Version:
570

CUDA Version:
12.8

Operating System: Ubuntu 22.04

Python Version (if applicable): 3.12.3

Model link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

Steps To Reproduce

Run docker container

cd tensorrtllm_backend/
sudo docker run --rm -it -p 8000:8000 -p 8001:8001 -p 8002:8002 --shm-size=4g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /home/dfraltcov/digital_mind/tensorrtllm_backend:/tensorrtllm_backend triton_trt_llm bash

Create the checkpoint

cd /tensorrtllm_backend/tensorrt_llm/examples/qwen/
python convert_checkpoint.py --model_dir ./Qwen2.5-72B-Instruct --output_dir ./qwen_checkpoint_4gpu_tp4 --dtype bfloat16 --workers 4 --tp_size 4

Build the engines

trtllm-build --checkpoint_dir ./qwen_checkpoint/ --output_dir ./tmp/qwen/72B/16_batch --gpt_attention_plugin bfloat16 --gemm_plugin bfloat16 --remove_input_padding enable --context_fmha enable --kv_cache_type paged --use_paged_context_fmha enable --max_num_tokens 131072 --max_input_len 4096 --max_batch_size 16 --log_level info --monitor_memory --workers 4

cd /tensorrtllm_backend
python3 scripts/launch_triton_server.py --world_size=4 --model_repo=/tensorrtllm_backend/triton_qwen_repo

Deploy an ensemble model on Triton Inference Server that generates text with multibyte UTF-8 characters (e.g., emojis or Cyrillic).

curl -X POST 192.168.108.68:8000/v2/models/ensemble/generate_stream -d '{
    "text_input": "<|im_start|>system <|im_end|> <|im_start|>user Write a poem using the words click Nutcracker and yoga. Format it with emojis.<|im_end|><|im_start|>assistant ", 
    "parameters": {
      "max_tokens": 512,
          "temperature": 0.7,
          "stream": true,
          "top_p": 0.8,
          "top_k": 1,
          "length_penalty": 0.8,
          "beam_width": 1,
      "stop_words":["<|im_end|>"]
    }
}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant