collection bindings OOM for large data sets #214

huahaiy · 2023-05-27T18:09:53Z

From clojurian datalevin channel:

andersmurphy 4:56 AM
So I’m finding when I use collection bindings. I.e pass a collection and :in $ [?x …] I’m much more likely to run out of memory, even if the collection contains a single value. Inlining the values and performing an or doesn’t result in running out of memory. Is there something that makes collection bindings inherently expensive?
This implementation does run out of memory (with large datasets):

(d/q '[:find (pull ?a [:artist/name])
       :in $ [?c ...]
       :where [?a :artist/country ?country]
              [?country :country/name ?c]]
     db ["Canada" "Japan"])

The implementations bellow don’t run out of memory (with large datasets):

(d/q '[:find (pull ?a [:artist/name])
       :where [?a :artist/country ?country]
       (or [?country :country/name "Canada"]
         [?country :country/name "Japan"])]
  db)

or

(d/q '[:find (pull ?a [:artist/name])
         :in $ ?c1 ?c2
         :where [?a :artist/country ?country]
         (or [?country :country/name ?c1]
           [?country :country/name ?c2])]
    db
    "Canada"
    "Japan")

huahaiy · 2024-02-12T00:12:24Z

The first approach is very expensive, because a cross product of two countries with all artists are produced, whereas the latter two options does a natural join of these. Natural join is a lot cheaper than cartesian product.

A possible optimization is to automatically translate the first to the laters. This is going to be a more advanced optimization that is further down the line. The next release of the optimizer probably won't have this feature, as we are focusing on optimizing the where clauses and simple bindings. The optimization of collection bindings will wait after that is done.

huahaiy added the enhancement New feature or request label May 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collection bindings OOM for large data sets #214

collection bindings OOM for large data sets #214

huahaiy commented May 27, 2023 •

edited

huahaiy commented Feb 12, 2024 •

edited

collection bindings OOM for large data sets #214

collection bindings OOM for large data sets #214

Comments

huahaiy commented May 27, 2023 • edited

huahaiy commented Feb 12, 2024 • edited

huahaiy commented May 27, 2023 •

edited

huahaiy commented Feb 12, 2024 •

edited