Description
What’s the problem you're trying to solve
In order to enhance our query performance after users send the API request to run our data endpoint to get the result from the data source. We need to provide a Caching (pre-loading) Layer with the duckDB to enhance query performance.
Describe the solution you’d like
Define the cache layer in YAML by cache
field, and define each cacheTableName
and its config. For example we define an sql select * from order
query the result from data source pg
and keep to duckDB by order
table name.
Otherwise, the refreshTime
and refreshExpression
means when should we need to refresh the query and keep the new result in the cache table in duckDB. The index
is a duckDB indexes feature to index like the original database and make query efficient.
# API schema YAML
cache:
# "order" table keep in duckDB
- cacheTableName: "order"
sql: "select * from order",
# optional
profile: pg
# optional
refreshTime:
every: "5m"
# optional
refreshExpression:
expression: "MAX(created_at)"
every: "5m"
# optional
indexes:
'idx_a': 'col_a'
'idx_b': 'col_b'
# "product" table keep in duckDB
- cacheTableName: "product"
sql: "select * from product",
# used data source
profiles: pg
Here is our SQL file, which makes our query send to duckDB to get results. In the scop of {% cache %} ... {% endcache %}
, when we send the query to duckDB, we will search the each table keep in duckDB or not, so if your YAML not defined the same cache table name, it will make the query failed.
-- The "cache" scope means the SQL statement will send the query to duckDB cache data source
{% cache %}
select * from order
where type = {{ context.params.type }}
and where exists (
select * from product where price >= {{ context.params.price }}
and order.product_id = product.id
)
{% endcache %}
Breakdown
- Parse
cache
config from API Schema YAML #151 - Support SQL syntax of
{% cache %} ... {% endcache %}
tag #152 - Support
cache
tag could keep result query result to variable likereq
#153 - Define cache layer loader to load parquet files to in-memory duckDB #154
- Support BigQuery exporting results to Parquet directly #155
- Support Snowflake exporting results to Parquet directly #156
- Support indexing column and refreshing data by time the in the cache layer #164
The #157 not contains in the epic, we will arrange to develop it in the future.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status