feat(pull): filter by scanning all rows #351
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch introduces a
--scann
flag that modifies the behavior of a pull. This is particularly useful when you need to extract a significant portion of your database. Instead of filtering data during the extraction process, this mode allows you to first pull all the data and then apply the filter as a post-processing step.This method can be faster than querying the database for each individual row with filters applied, especially when dealing with large datasets. It minimizes the need for multiple database queries and speeds up the extraction process by retrieving all the data at once, then excluding unwanted rows afterward.
Changes Overview:
CLI Update:
scann
flag (--scann
) is added to the pull command for filtering in memory.scann
flag to filter in memory.Handler Changes:
scann
option, which influences how the filters are applied.Driver Interface Update:
Pull
method's signature is updated to accept an additional parameterincluded KeyStore
to support filtering in memory.Test Additions:
--scann
functionality, including tests for filtering with files, applying filters, handling no matches, and ensuring order consistency.Summary of Key Modifications:
--scann
flag and uses it to load data into memory for filtering instead of using a database filter.puller
andpullerParallel
now handle the newincluded KeyStore
to filter data in memory whenscann
is enabled.--scann
flag, including cases where there are no matches or multiple rows with specific filters.Example of the
--scann
Flag Use:lino pull source --scann --filter-from-file customer_filter.jsonl
The flag ensures that the entire dataset is pulled and then filtered in memory based on the provided
customer_filter.jsonl
file.