⚠️ Notice
Significant changes have been implemented in this release; please consider adjusting them to fit your specific use case.
- The default parallelism has been increased from 1 to 4, which might increase VRAM usage. (#3832)
- Introduce a new embedding kind
llama.cpp/before_b4356_embedding
for llamafile or other embedding services utilizing the legacy llama.cpp embedding API. (#3828)
🚀 Features
- Expose thinking process of Answer Engine to the answers in thread message. (#3785) (#3672)
- Enable the Answer Engine to access the repository's directory file list as needed. (#3796)
- Enable the use of
@
to mention a symbol in Chat Sidebar. (#3778) - Provide default question recommendations that are repository-aware on Answer Engine. (#3815)
🧰 Fixed and Improvements
- Provide a configuration to truncate text content prior to dispatching it to embedding service.. (#3816)
- Bump llama.cpp version to b4651. (#3798)
- Automatically retry embedding when the service occasionally fails due to issues with llama.cpp. (#3805)
- Enhance the user interface experience for Answer Engine. (#3845) (#3794)
- Resolve the deserialization issue related to
finish_reason
in chat response from LiteLLM Proxy Server.(#3882)
💫 New Contributors
@zhanba made their first contribution in #3675
@faceCutWall made their first contribution in #3812
Full Changelog: v0.24.0...v0.25.0