Skip to content

Conversation

@gjpower
Copy link
Contributor

@gjpower gjpower commented Oct 15, 2024

Supersedes previous MR #1795

Previous implementation creates and locks threads when acquiring llama_proxy, this can cause thread starvation on many parallel requests.
This also prevents call to await run_in_threadpool(llama.create_chat_completion, **kwargs) proceeding as all worker threads are stuck awaiting lock so no progress may be made.

This MR adapts acquiring of llama_proxy to async pattern taking advantage of asyncio mechanisms. ExitStack is replaced with AsyncExitStack and improper closing of the ExitStack is addressed

@gjpower gjpower force-pushed the fix/server_llama_call_thread_starvation branch from de01a63 to 9ec5460 Compare November 5, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants