You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a multi-threaded situation, if the GPU server resources are insufficient, will cache kv preemption occur? For example, there are two conversations at the same time, both of which are long. If the two conversations are halfway through and conversation a cuts into conversation b, the cache kv in conversation b should be lost, that is, the cache kv of conversation a is used. Due to the involvement of gpu computing and insufficient resources, verification cannot be carried out
The text was updated successfully, but these errors were encountered:
In a multi-threaded situation, if the GPU server resources are insufficient, will cache kv preemption occur? For example, there are two conversations at the same time, both of which are long. If the two conversations are halfway through and conversation a cuts into conversation b, the cache kv in conversation b should be lost, that is, the cache kv of conversation a is used. Due to the involvement of gpu computing and insufficient resources, verification cannot be carried out
The text was updated successfully, but these errors were encountered: