Skip to content

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context#1798

Merged
abetlen merged 7 commits intoabetlen:mainfrom
gjpower:fix/server_llama_call_thread_starvation
Dec 6, 2024
Merged

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context#1798
abetlen merged 7 commits intoabetlen:mainfrom
gjpower:fix/server_llama_call_thread_starvation

Conversation

@gjpower
Copy link
Contributor

@gjpower gjpower commented Oct 15, 2024

Supersedes previous MR #1795

Previous implementation creates and locks threads when acquiring llama_proxy, this can cause thread starvation on many parallel requests.
This also prevents call to await run_in_threadpool(llama.create_chat_completion, **kwargs) proceeding as all worker threads are stuck awaiting lock so no progress may be made.

This MR adapts acquiring of llama_proxy to async pattern taking advantage of asyncio mechanisms. ExitStack is replaced with AsyncExitStack and improper closing of the ExitStack is addressed

@gjpower gjpower force-pushed the fix/server_llama_call_thread_starvation branch from de01a63 to 9ec5460 Compare November 5, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants