Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback-based filter #2076

Open
1plaintext opened this issue Mar 31, 2024 · 1 comment
Open

Callback-based filter #2076

1plaintext opened this issue Mar 31, 2024 · 1 comment

Comments

@1plaintext
Copy link

I would have thought this be a popular ask, but I didnt seem to find any discussion/solution
I am trying to absolutely minimize memory usage because the document has a huge array, the doc looks like this
{<unpredictable stuff>, "array":[{"name":<long strings>},...]}}
using a filter like this
filter["array"][0]["name"] = true;
I am still running out of memory, because, with this filter, the final product still contains all the "name"s (again, this is a very long array)

I wonder if there are other ways to more concisely filter... perhaps something like this (doesn't seem to work)
filter["array"][3-14]["name"] = true;
so I get only forth to fifteenth elements? 16 elements fits perfectly in memory vs all the elements in my case.

An alternative idea maybe a callback
filter["array"][]["name"] = callback;
so i can examine each time a "name" is found, perhaps the call back tells me the index and lets me know the "name", so I can decide to keep it, throw it away, or even stop the parsing (for speed purpose)

Now I also thought about "deserialization-in-chunks" using findUntil.. but the preceding "<unpredictable stuff>" makes it unreliable, there maybe similar named elements at different nested level etc (unless I write my own JSON parsing code, which defeats the purpose of using this library)- after all, the idea of an annotated JSON doc is so that the doc can be out of order, with additional things you don't care, etc...

Thanks for any idea.

@bblanchon
Copy link
Owner

Hi @1plaintext,

This is indeed a popular ask, see #2072, #1723, #1486, #1316, #1708.
I like the idea of a callback-based filter, but it was impossible to do with v6, so I never added it to the backlog.

Best regards,
Benoit

@bblanchon bblanchon changed the title filtering with very long array in a doc: only want a small number of the elements Callback-base filter Apr 2, 2024
@bblanchon bblanchon changed the title Callback-base filter Callback-based filter Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants