Memory Usage During Ingestion - Can Be Cut In Half #386

johnbrisbin · 2023-05-22T15:38:58Z

johnbrisbin
May 22, 2023

I have been doing some extended testing of the ingestion phase of privateGPT (where you load up your documents). I have been using a collection of ~1500 epub books which are on average about 1MB each, about 1.57GB in total as Windows measures it.

In the standard configuration, toward the end of the process the memory usage while the DB is loaded and during ingestion is a bit over 20GB. When the data is written to disk, it jumps to over 30GB. 64GB seemed like a lot when I built this machine, less now. The final size of the DB is about 11GB.

In the alternate configuration, when the DB is loaded and during ingestion the usage is right at 10GB which jumps ~50% to 15GB when persisting to disk. The final size of the DB is about 5.5 GB

On this machine, books are ingested at about 100/hr in either configuration.

The difference in the two configurations is that in the standard configuration documents are chunked into pieces of max 500 bytes with 50 bytes overlap, while in the alternate config, the documents are chunked into pieces of 1000 bytes with 100 bytes overlap.

This reduces the number of embeddings by a bit more than 1/2 and the vectors of numbers for each embedded chunk are the bulk of the space used. The 'a bit more' is because larger chunks are slightly more efficient than the smaller ones. Nominal 500 byte chunks average a little under 400 bytes, while nominal 1000 byte chunks run a bit over 800 bytes on average.

The size of the embeddings effects more than just the size of the database in memory or on disk. It has effects on query operations as well (and hopefully you spend more time querying than ingesting in the long term).
Theoretical pro and cons:
Pros -

You can have more files in your privateGPT with the larger chunks because it takes less memory at ingestion and query times. Alternatively you don't need as big a computer memory to run a given set of files for the same reason.
When it comes time to match chunks of files to the query you enter, finding the similarities, queries will be faster with fewer items to compare. (How much I don't know)

Cons-

The larger size of chunks will be reflected in the size of results returned when queried. This is significant because there is limited space in the context of the LLM when it is queried and for its response. The usual strategy for using a database of documents like this is to get the largest number of similar items you can fit into the query and leave adequate space for the response from the LLM. This number of similar items will be reduced because each one is about twice the size, so the LLM might have less breadth of information to work with in answering your query. (I can't find a spec of the context length of the model we are using, but for both query and response, ~16KB or 1K tokens would seem to be a minimum.)
Larger chunks can dilute the meaning of a part of that chunk so that it matches the query less strongly than it would if it appeared in a smaller chunk. (But larger in this context doesn't mean much because even at 1000 bytes they are the size of the tiny screen of text on an Apple II 40 years ago or a quarter of a page in a dead-trees book.)
?

On the balance, cutting the memory requirement in half by doubling the chunk size looks like a win to me.
What do you think? Are there other downsides to increasing the chunk size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Usage During Ingestion - Can Be Cut In Half #386

{{title}}

Replies: 0 comments

Select a reply

Memory Usage During Ingestion - Can Be Cut In Half #386

johnbrisbin May 22, 2023

Replies: 0 comments

johnbrisbin
May 22, 2023