Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadQuota=1 not respected in TheadManager (leads to termination) #36360

Open
hofst opened this issue Apr 15, 2024 · 4 comments
Open

ThreadQuota=1 not respected in TheadManager (leads to termination) #36360

hofst opened this issue Apr 15, 2024 · 4 comments

Comments

@hofst
Copy link

hofst commented Apr 15, 2024

What version of gRPC and what language are you using?

1.62.1

What operating system (Linux, Windows,...) and version?

Linux

What runtime / compiler are you using (e.g. python version or version of gcc)

clang 17

What did you do?

Start a grpc callback server with SetResourceQuota(ResourceQuota().SetMaxThreads(1));
Put some load on the server so we hit the following code-path:

resource_exhausted = true;

This code path does not make sense to me: It first reduces the poller count by one and then checks } else if (num_pollers_ > 0) {. This way it will always ignore the own thread.
As far as I understand, the resource exhausted case should not occur when I have configured a thread limit of 1 and I have 1 thread worker (as expected). I'd rather expect the condition to be } else if ((num_pollers_+1) > 0) { since we already decremented the num_pollers and should not ignore our own thread.

What did you expect to see?

Start a grpc callback server with SetResourceQuota(ResourceQuota().SetMaxThreads(1)); should work even under load.

What did you see instead?

grpc server terminates.

@drfloob
Copy link
Member

drfloob commented Apr 30, 2024

Please provide a minimum repro in the form of a runnable example in a github repo, or a gist. I'm not sure what you mean by termination here, and the ThreadManager is used for sync servers.

@hofst
Copy link
Author

hofst commented May 2, 2024

@drfloob

I'm not sure what you mean by termination here

Termination as in: The grpc server will shut down because it tries to spawn an additional thread but detects that this would violate the thread resource quota of one thread.

Did you have a look at the code that I referenced in thread_manager.cc ?
The MainWorkLoop is quite a simple function and I believe I already described coherently what's going wrong in that function.

@drfloob
Copy link
Member

drfloob commented May 28, 2024

Termination as in: The grpc server will shut down because it tries to spawn an additional thread but detects that this would violate the thread resource quota of one thread.

That sounds like something we could evaluate in much more detail if logs were available, or better: a minimal runnable example that reproduces the problem. Something simple enough to ensure there's no illegal or improper use of the application APIs.

Did you have a look at the code that I referenced in thread_manager.cc ?

Yes, I'm familiar. It's not used for the callback API. grep -E '#include.*timer_manager.h and you'll see what I mean. If you're exclusively using the callback API, and that TimerManager code is involved, then something is super fishy ... and that's difficult to believe. Which is again why a minimal runnable example is about the only way I'll continue to invest time in debugging your issue. This needs more evidence & data.

@hofst
Copy link
Author

hofst commented May 29, 2024

I'm currently on vacation with no access to code, I can provide something when I get back. It may have been sync mode but definitely not async mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants