-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support disaggregated prefill ? #708
Comments
demo start args. pd masterpython -m lightllm.server.api_server --model_dir /dev/shm/llama2-7b prefill nodenvidia-cuda-mps-control -d decode nodenvidia-cuda-mps-control -d not all model and run mode suppport pd. |
How is the performance of disaggregated prefill? Does it support multiple P nodes and multiple D nodes(xPyD)? |
When using pd to separate according to the above command, an error is reported: master log
prefil node log
decode node log
At the same time, my startup command is:
prefil
decode
Have you encountered the above problems? Is there any solution? |
@Dimensionzw prefill node and decode node need use same --tp params. currently this constraint is needed. |
Does support multi-node PD disaggregated currently ? |
So does it support multiple P nodes and multiple D nodes? |
@sitabulaixizawaluduo @DayDayupupupup supported now, only start p and d, the pd master will manager all details. |
I saw your code referring to PD disaggragate. Please tell me how to use it
The text was updated successfully, but these errors were encountered: