Seamless Session Switching #48
PotatoSpudowski
announced in
Announcements
Replies: 2 comments
-
|
Link to demo on twitter! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
PR #52 by @amitsingh19975 shaved off 100 MB 👀🔥 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello fastLLaMa community! 🎉
We are excited to announce the release of a new feature in our project - Quick Context Switching between sessions. This feature is aimed at improving the efficiency of instances serving multiple sessions while ensuring that each session's state is maintained with minimal latency.
In scenarios where a single running instance serves multiple sessions, it is crucial to maintain each session's state while minimizing latency. By implementing a mechanism to save and load session states quickly, we can enhance the efficiency of a single running instance, ultimately reducing resource usage and cost.
Here's what we are saving and loading to enable quick context switching:
These states are saved and loaded using the
save_stateandload_statefunctions, respectively.This feature brings several benefits, including:
Room for improvement:
We have noticed that the saved states are close to 600 MB (high due to KV cache buffer), which indicates that there might be some room for improvement. We encourage our fellow contributors to dive into the code, investigate this issue, and feel free to raise PRs with potential optimizations.
We hope that this new feature will make your experience with our project more seamless and efficient. We’d love to hear your thoughts, feedback, and suggestions on this feature! Feel free to leave a comment below, open an issue, or even submit a pull request. Your input helps us continue to improve and grow our project. 🚀
Thank you for your support, and happy coding! 🎉
Beta Was this translation helpful? Give feedback.
All reactions