n_tokens <= n_batch, beta, conversation history #176
Replies: 4 comments 7 replies
-
todo : explore : or integrate with : If your application is GPL 3.0 compliant, feel free to inspire yourself here as to how that can go: https://github.com/nathanlesage/local-chat |
Beta Was this translation helpful? Give feedback.
-
@giladgd how does node-llama-cpp manage long conversation history if it is longer than the context of the model (with v3/beta ) ? |
Beta Was this translation helpful? Give feedback.
-
thxs @giladgd for your example. //session.dispose(); DisposedError: Object is disposed */ and i add to create a new context. here is what works for me with gpu:false
I think it should be a good idea to be openai Api compatible, using role: system, role: user, role: assistant, and content for each
translation from one template to another was made with TemplateChatWrapper in v2 but i don't know if it's possible in v3 ? |
Beta Was this translation helpful? Give feedback.
-
and why is response an array with only one text ? |
Beta Was this translation helpful? Give feedback.
-
Transfered from #105 (comment)
for the issue of @scenaristeur scenaristeur mentioned this pull request Mar 4, 2024
n_tokens <= n_batch
i have tried to migrate to "node-llama-cpp": "^3.0.0-beta.13", but not i have a crash on my laptop ideapad (https://www.google.com/search?client=firefox-b-lm&q=ideapad+3+15alc6 ) (no GPU, AMD rizen 5000, 16 core CPU / 16GBram)
It worked like a charm with "node-llama-cpp": "^2.8.8", (i had no issue of memory apart n_tokens < n_batch with long ConversationHistory) but now it crashes even with a small conversationhistory with "radv/amdgpu: Not enough memory for command submission."
with this usage https://github.com/scenaristeur/igora/blob/node_llama_cpp_v3_beta/src/mcConnector/index.js
with V2.8.8 i got https://github.com/scenaristeur/igora/blob/3342a1a48172eae1d31489e33a64fe025e1cb522/src/mcConnector/index.js
and it works until token.length is about 300 with (328 is ok , 536 is ko)
if more token, i get n_tokens <= n_batch
it's a 16 core CPU only, no GPU, i'll try getLlama with GPU false . thxs. Perharps i've istalled some Vulkan tools trying some llm but it's a CPU only
thxs ,
Works with "gpu:false", but i've lost conversationHistory, how to deal with conversationHistory in the beta version ? I'm working on a server where there can be multiple sessions, with multiple history, in what format should history be injected to a session ? to which class ?
Beta Was this translation helpful? Give feedback.
All reactions