Support for privateGPT, including support for RAG functions like rspecting context and parsing of sources #312

Aquan1412 · 2024-05-12T09:35:47Z

Hi, as suggested in our recent discussion #305, here's a PR that adds support for interacting with privateGPT. Its based on the openai-backend, with some additions. The focus was to enable the use of two specific key words, as described in the API reference:

use_context: directs the LLM to use the context of documents that have been "ingested"
include_sources: directs the LLM to also return information about the sources of the context (so far: file name and page number, if appropriate).

I changed the code of gptel-openai.elto send the additional keywords to the llm server, and to correctly parse the answer if sources are provided.

I tested the code with different queries, and so far it works for my use case. However, there is at least one missing feature: currently, the two new keywords use_context and include_sources are hardcoded to t. It would probably be better to make them configurable when creating the backend. Unfortunately, I'm not sure how to do that correctly.

for context and parsing of sources

karthink · 2024-05-18T03:07:10Z

@Aquan1412 Thanks for the PR.

I had a look at the code, It uses free/global variables and uses some cl-lib functions that aren't declared at compile-time. I can clean up the code but I need some sample privategpt output to be sure I don't break anything.

Could you run (setq gptel-log-level 'info), use privategpt so it generates some sources (more than one), and paste the results of the *gptel-log* buffer here?

Aquan1412 · 2024-05-18T09:54:03Z

Thanks for cleaning up the code, I'm unfortunately not very experienced writing elisp code.

I ran the command you asked for, however, the output is quite large, which is why I appended it as a file to this comment.

The user query and the expected response are as follows:

*** What are the Navier-Stokes equations?

The Navier-Stokes equations are a set of partial differential equations that describe the conservation of mass, momentum, and energy for a viscous fluid. They are given by equation (2.1) for mass conservation, equation (2.2) for momentum conservation, and equation (2.3) for energy conservation in compressible form. The pressure p, density ρ, velocity components ui, vi, wi, specific internal energy e, specific enthalpy h, temperature T, effective stress tensor τij, and convective heat flux qj are involved in these equations. For gases, the ideal gas law (2.4) relates pressure p to density ρ and specific volume RT/γ, where R is the gas constant and γ is the ratio of specific heats. The specific internal energy e and enthalpy h can be expressed as functions of temperature T using equations (2.5).

Sources:

Beneddine-2017-Characterization_of_unsteady_flow_behavior_by_linear_stability_analysis.pdf (page 62)

Iorio-2015-Global_Stability_Analysis_of_Turbulent_Transonic_Flows_on_Airfoil_geometries.pdf (page 33)

gptel-privategpt-output.txt

karthink · 2024-05-18T23:12:04Z

From what I can see, you're not actually buffering the sources from each response. persistent-sources in your code is reset with every chunk, so when it's finally processed it's set to the sources included in the last-but-one response chunk. Is this intentional?

Can each chunk have different sources,
or is the list of sources the same across all chunks,
or is the list of sources provided with each chunk an accumulation of the sources used so far?

Aquan1412 · 2024-05-19T08:43:26Z

As far as I'm aware,the list of sources is the same across all chunks. I didn't find an explicit confirmation in the privateGPT API description, but in all my tests it was like that.
Therefore I only always reset the chunk, to avoid parsing the list of sources for each chunk.

karthink · 2024-05-19T15:01:40Z

Wow, that's really wasteful API design. It's sending about 200-1000x as much JSON as required then.

…

On Sun, May 19, 2024, 1:43 AM Aquan1412 ***@***.***> wrote: As far as I'm aware,the list of sources is the same across all chunks. I didn't find an explicit confirmation in the privateGPT API description, but in all my tests it was like that. Therefore I only always reset the chunk, to avoid parsing the list of sources for each chunk. — Reply to this email directly, view it on GitHub <#312 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBVOLFKQJT7U5NVMPR7YQ3ZDBQ4JAVCNFSM6AAAAABHSVGPXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGE2TIOJRGI> . You are receiving this because you commented.Message ID: ***@***.***>

* gptel-privategpt.el (gptel--privategpt-parse-sources, gptel-curl--parse-stream, gptel--parse-response): Parse sources when parsing the response. Sources are appended to the response at the end.

* gptel-privategpt.el (gptel-make-privategpt, gptel--request-data): Add fields `context` and `sources` to gptel-privategpt backends. When set to true (the default), gptel will ask Privategpt to use available context and provide sources with the response.

karthink · 2024-05-19T17:13:41Z

I cleaned up the code and pushed a couple of commits to your branch. I also made "use_context" and "include_sources" configurable. Please check if this introduced any bugs, as I can't test it.

Aquan1412 · 2024-05-20T06:29:39Z

Great, thanks! Unfortunately, I'm currently travelling and won't have access to my computer for the next two weeks. But afterwards I'll test it as soon as possible.

Adds support for privateGPT based on the openai-API, including support

98bb90c

for context and parsing of sources

karthink added 2 commits May 19, 2024 10:04

gptel-privategpt: Parse sources

76b1793

* gptel-privategpt.el (gptel--privategpt-parse-sources, gptel-curl--parse-stream, gptel--parse-response): Parse sources when parsing the response. Sources are appended to the response at the end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for privateGPT, including support for RAG functions like rspecting context and parsing of sources #312

Support for privateGPT, including support for RAG functions like rspecting context and parsing of sources #312

Aquan1412 commented May 12, 2024

karthink commented May 18, 2024 •

edited

Aquan1412 commented May 18, 2024

karthink commented May 18, 2024 •

edited

Aquan1412 commented May 19, 2024

karthink commented May 19, 2024 via email

karthink commented May 19, 2024

Aquan1412 commented May 20, 2024

Support for privateGPT, including support for RAG functions like rspecting context and parsing of sources #312

Are you sure you want to change the base?

Support for privateGPT, including support for RAG functions like rspecting context and parsing of sources #312

Conversation

Aquan1412 commented May 12, 2024

karthink commented May 18, 2024 • edited

Aquan1412 commented May 18, 2024

karthink commented May 18, 2024 • edited

Aquan1412 commented May 19, 2024

karthink commented May 19, 2024 via email

karthink commented May 19, 2024

Aquan1412 commented May 20, 2024

karthink commented May 18, 2024 •

edited

karthink commented May 18, 2024 •

edited