Closed as not planned
Description
While querying by joining two tables with multiple join conditions(more than 2 in my case), I observed that one of the conditions was being ignored hence giving incorrect results
To Reproduce
- Setup an AWS aggregator connection
- For example When joining the tables
aws_dax_parameter_group
andaws_dax_cluster
onparameter_group_name
,account_id
andregion
. Thep.account_id = c.account_id
condition is being ignored. Thus returning results from multiple accounts
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
and p.region = c.region
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 123456789012 | 111122223333 | us-east-1 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+-----------+------------+
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=12.747..12.866 rows=2 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.364..7.380 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=4.604..4.707 rows=2 loops=1) |
| Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 3 |
| Planning Time: 8.682 ms |
| Execution Time: 14.922 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- The query plan above shows that the filter for
p.account_id = c.account_id
is ignored. - On removing one of the join conditions from the same query, for eg.
p.region = c.region
below, the result displayed are correct
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+----------------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+----------------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | ap-south-1 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-west-2 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+----------------+------------+
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=25000000 width=192) (actual time=12.474..12.939 rows=13 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=4.827..4.840 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=7.004..7.454 rows=13 loops=1) |
| Filter: ((account_id = c.account_id) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 16 |
| Planning Time: 10.968 ms |
| Execution Time: 16.336 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- So does the plan show that all the join conditions were considered.
- Now on removing a different condition from the same query, for eg.
c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
below, the result displayed are correct again
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on p.account_id = c.account_id
and p.region = c.region
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+-----------+------------+
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..70057540002500.00 rows=25000000 width=192) (actual time=17.646..17.691 rows=1 loops=1) |
| Join Filter: ((p.account_id = c.account_id) AND (p.region = c.region)) |
| Rows Removed by Join Filter: 28 |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=3.382..4.065 rows=29 loops=1) |
| -> Materialize (cost=0.00..30000000045040.00 rows=1000000 width=300) (actual time=0.468..0.469 rows=1 loops=29) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..30000000000000.00 rows=1000000 width=300) (actual time=13.549..13.587 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| Planning Time: 11.774 ms |
| Execution Time: 21.489 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
- So does the plan show that all the join conditions were considered.
- Same query with cross-join and the join conditions in the
where
clause, generates a similar plan
> explain analyse select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p,
aws_dax_cluster as c
where
c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
and p.region = c.region
and c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=11.617..11.679 rows=2 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=6.915..6.929 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=4.235..4.283 rows=2 loops=1) |
| Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 3 |
| Planning Time: 13.727 ms |
| Execution Time: 13.760 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- Similar example
> explain analyse select
b.arn as eventbridge_bus_arn
from
aws_eventbridge_rule r
join aws_eventbridge_bus b on r.event_bus_name = b.name
and r.region = b.region
and r.account_id = b.account_id
where
r.arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'
+------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=32) (actual time=15.128..25.383 rows=2 loops=1) |
| -> Foreign Scan on aws_eventbridge_rule r (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.076..17.280 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'::text) |
| Rows Removed by Filter: 437 |
| -> Foreign Scan on aws_eventbridge_bus b (cost=0.00..40000.00 rows=100 width=400) (actual time=7.371..7.419 rows=2 loops=1) |
| Filter: ((r.region = region) AND (r.event_bus_name = name)) |
| Planning Time: 11.155 ms |
| Execution Time: 28.488 ms |
+------------------------------------------------------------------------------------------------------------------------------------------------+
Steampipe version
Tested with both 0.18.0
and 0.17.4