grpc channel stub cannot end request when timeout exceed，eg：timeout=30ms，request ended after 200ms #36712

johnsondeng · 2024-05-24T09:41:09Z

What version of gRPC and what language are you using?

python
grpcio 1.54.0
grpcio-tools 1.54.0

What operating system (Linux, Windows,...) and version?

linux

What runtime / compiler are you using (e.g. python version or version of gcc)

Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux

What did you do?

Just write a grpc client multit-thread demo, set the timeout 30ms for per request, and grpc server "10.155.1.127:1234" will always sleep for 60ms. So what to be expected should be that all requests end with timeout exceed in 30ms. But Actually, after lots of request time sended and grep elapsed time to stastics, many request cost much more than 30ms.

grpc_server/demo_client.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-


from api import test_pb2, test_pb2_grpc
import os
import sys
import argparse
import logging
import time
import toml
from threading import Thread
from multiprocessing import Process

import grpc


grpc_channel_options = [
    ('grpc.max_send_message_length', -1),
    ('grpc.max_receive_message_length', -1),
    # ('grpc.client_idle_timeout_ms', 10 * 1000)
]
channel = grpc.insecure_channel("10.155.1.127:1234",
                                grpc_channel_options)


def start():

    request_cnt = 0
    while request_cnt < total:
        node = None
        try:
            # node, channel = client.get_channel()
            start_time = time.perf_counter()

            stub = test_pb2_grpc.TestStub(channel)
            req = test_pb2.Req(msg="hello")

            if args.api == "echo":
                rsp = stub.Echo(req, timeout=0.03)
            if args.api == "proxyecho":
                rsp = stub.ProxyEcho(req, timeout=1)
            elif args.api == "redisecho":
                rsp = stub.RedisEcho(req, timeout=1)

            if request_cnt % 1000 == 0:
                logging.info("request cnt %d, rsp %s", request_cnt, rsp.msg)
        except Exception as e:
            elapsed = (time.perf_counter() - start_time) * 1000
            # logging.error("exception, %s", e)
            print("exception %s, elapsed %f" % (e, elapsed))

        request_cnt += 1
        # time.sleep(0.002)


parser = argparse.ArgumentParser()
parser.add_argument("--api",
                    type=str,
                    default="echo",
                    help="echo/redisecho/proxyecho")

if __name__ == "__main__":
    args = parser.parse_args()
    # time.sleep(10000)

    total = 10000000
    worker_num = 32
    threads = []
    [threads.append(Thread(target=start)) for i in range(worker_num)]

    for th in threads:
        th.start()
    for th in threads:
        th.join()

Run the program nohup python grpc_server/demo_client.py for nearly 5 min, and print the time cost of top 10 longest request

What did you expect to see?

All request ended in 30ms

What did you see instead?

Some requests ended in 100ms+.

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

See TROUBLESHOOTING.md for how to diagnose problems better.

Anything else we should know about your project / environment?

The text was updated successfully, but these errors were encountered:

johnsondeng · 2024-05-29T07:21:53Z

Any help? @gnossen @XuanWang-Amos @hassox @beccasaurus

sourabhsinghs · 2024-06-18T16:34:00Z

@johnsondeng I wasn't able to repro the issue setting up a simple server against your client.

Please provide the server side script also to be able to repro this issue. If possible a docker image with your server/client setup.

johnsondeng · 2024-07-12T06:57:47Z

@johnsondeng I wasn't able to repro the issue setting up a simple server against your client.

Please provide the server side script also to be able to repro this issue. If possible a docker image with your server/client setup.

issue.zip

@sourabhsinghs The code and dockerfile is in zip attach files. I build a simple image for this issue, and can still repro the issue after run the client for 5min. Please give help, the problem show impact on out production server for long time.

dockerfile

FROM python:3.10

WORKDIR /workspace

RUN pip install grpcio==1.65.0 grpcio-tools==1.65.0

RUN mkdir /opt/protoc && cd /opt/protoc \
    && wget https://github.com/protocolbuffers/protobuf/releases/download/v27.2/protoc-27.2-linux-x86_64.zip \
    && unzip protoc-27.2-linux-x86_64.zip \
    && export PATH=$PATH:/opt/protoc/bin/ \
    && protoc --version

johnsondeng · 2024-07-19T01:42:49Z

Can anyone help？ @gnossen @sourabhsinghs @XuanWang-Amos

sourabhsinghs · 2024-07-23T04:03:53Z

@johnsondeng , i have tried again to repro this issue with suggested setup. these are my findings.

Elapsed Time Range	Number of Calls
(10, 20]	0
(20, 30]	0
(30, 40]	0
(40, 50]	0
(50, 60]	3200119
(60, 70]	64
(70, 80]	2
(80, 90]	0
(90, 100]	0

Do you have any suggestion on why we are unable to see similar delays ?

Did another test to see constant overhead for timeout delay . sharing graph for that also.

johnsondeng · 2024-07-30T03:33:11Z

@johnsondeng , i have tried again to repro this issue with suggested setup. these are my findings.

Elapsed Time Range Number of Calls
(10, 20] 0
(20, 30] 0
(30, 40] 0
(40, 50] 0
(50, 60] 3200119
(60, 70] 64
(70, 80] 2
(80, 90] 0
(90, 100] 0
Do you have any suggestion on why we are unable to see similar delays ?

Did another test to see constant overhead for timeout delay . sharing graph for that also.

@sourabhsinghs I produce a histogram too, it seems like that requests which time cost > 60ms is more than yours. I think maybe hardware diff? My container is running on Mac M1.

But my question is, why the request > 60ms cannot end requests in 50ms(timeout params setting in rpc call)。 Your histogram above also shows 64 request in (60, 70]。The timeout is very inaccurate。

0.00 - 10.00: 0
10.00 - 20.00: 0
20.00 - 30.00: 0
30.00 - 40.00: 0
40.00 - 50.00: 0
50.00 - 60.00: 125234
60.00 - 70.00: 197
70.00 - 80.00: 33
80.00 - 90.00: 19
90.00 - 100.00: 8
100.00 - 500.00: 1

sourabhsinghs · 2024-08-08T08:58:33Z

@johnsondeng In data you've shared looks like 99.794% of RPCs (125234/125492) ends in 50.00 - 60.00 ms. Latency in rest of the RPC can be attributed to Hardware / GIL / Network etc.

johnsondeng added kind/bug lang/Python priority/P2 labels May 24, 2024

johnsondeng assigned gnossen and XuanWang-Amos May 24, 2024

XuanWang-Amos assigned sourabhsinghs and unassigned gnossen and XuanWang-Amos May 30, 2024

sourabhsinghs added the disposition/requires reporter action label Jun 18, 2024

grpc-bot removed the disposition/requires reporter action label Jul 13, 2024

sourabhsinghs added the disposition/requires reporter action label Jul 24, 2024

grpc-bot removed the disposition/requires reporter action label Jul 31, 2024

sourabhsinghs added the disposition/requires reporter action label Aug 9, 2024

sourabhsinghs closed this as completed Aug 26, 2024

sourabhsinghs added kind/question and removed disposition/requires reporter action kind/bug labels Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpc channel stub cannot end request when timeout exceed，eg：timeout=30ms，request ended after 200ms #36712

grpc channel stub cannot end request when timeout exceed，eg：timeout=30ms，request ended after 200ms #36712

johnsondeng commented May 24, 2024 •

edited

Loading

johnsondeng commented May 29, 2024

sourabhsinghs commented Jun 18, 2024

johnsondeng commented Jul 12, 2024 •

edited

Loading

johnsondeng commented Jul 19, 2024

sourabhsinghs commented Jul 23, 2024

johnsondeng commented Jul 30, 2024 •

edited

Loading

sourabhsinghs commented Aug 8, 2024

grpc channel stub cannot end request when timeout exceed，eg：timeout=30ms，request ended after 200ms #36712

grpc channel stub cannot end request when timeout exceed，eg：timeout=30ms，request ended after 200ms #36712

Comments

johnsondeng commented May 24, 2024 • edited Loading

What version of gRPC and what language are you using?

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

What did you expect to see?

What did you see instead?

Anything else we should know about your project / environment?

johnsondeng commented May 29, 2024

sourabhsinghs commented Jun 18, 2024

johnsondeng commented Jul 12, 2024 • edited Loading

johnsondeng commented Jul 19, 2024

sourabhsinghs commented Jul 23, 2024

johnsondeng commented Jul 30, 2024 • edited Loading

sourabhsinghs commented Aug 8, 2024

johnsondeng commented May 24, 2024 •

edited

Loading

johnsondeng commented Jul 12, 2024 •

edited

Loading

johnsondeng commented Jul 30, 2024 •

edited

Loading