Skip to content

Incorrect results when joining two tables with multiple join conditions #271

Closed as not planned
@karanpopat

Description

@karanpopat

While querying by joining two tables with multiple join conditions(more than 2 in my case), I observed that one of the conditions was being ignored hence giving incorrect results

To Reproduce

  1. Setup an AWS aggregator connection
  2. For example When joining the tables aws_dax_parameter_group and aws_dax_cluster on parameter_group_name, account_id and region. The p.account_id = c.account_id condition is being ignored. Thus returning results from multiple accounts
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 123456789012  | 111122223333   | us-east-1 | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=12.747..12.866 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.364..7.380 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.604..4.707 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 8.682 ms                                                                                                                  |
| Execution Time: 14.922 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. The query plan above shows that the filter for p.account_id = c.account_id is ignored.
  2. On removing one of the join conditions from the same query, for eg. p.region = c.region below, the result displayed are correct
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+----------------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region      | dax_region |
+----------------+----------------+---------------+----------------+----------------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | ap-south-1     | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-west-2      | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1      | us-east-1  |
+----------------+----------------+---------------+----------------+----------------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=25000000 width=192) (actual time=12.474..12.939 rows=13 loops=1)                         |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=4.827..4.840 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.004..7.454 rows=13 loops=1)     |
|         Filter: ((account_id = c.account_id) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))            |
|         Rows Removed by Filter: 16                                                                                                       |
| Planning Time: 10.968 ms                                                                                                                 |
| Execution Time: 16.336 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Now on removing a different condition from the same query, for eg. c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name below, the result displayed are correct again
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+---------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..70057540002500.00 rows=25000000 width=192) (actual time=17.646..17.691 rows=1 loops=1)                                   |
|   Join Filter: ((p.account_id = c.account_id) AND (p.region = c.region))                                                                          |
|   Rows Removed by Join Filter: 28                                                                                                                 |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=3.382..4.065 rows=29 loops=1) |
|   ->  Materialize  (cost=0.00..30000000045040.00 rows=1000000 width=300) (actual time=0.468..0.469 rows=1 loops=29)                               |
|         ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..30000000000000.00 rows=1000000 width=300) (actual time=13.549..13.587 rows=1 loops=1)  |
|               Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                     |
| Planning Time: 11.774 ms                                                                                                                          |
| Execution Time: 21.489 ms                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Same query with cross-join and the join conditions in the where clause, generates a similar plan
> explain analyse select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p,
  aws_dax_cluster as c
where
  c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
  and p.account_id = c.account_id
  and p.region = c.region
  and c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=11.617..11.679 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=6.915..6.929 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.235..4.283 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 13.727 ms                                                                                                                 |
| Execution Time: 13.760 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. Similar example
> explain analyse select
  b.arn as eventbridge_bus_arn
from
  aws_eventbridge_rule r
  join aws_eventbridge_bus b on r.event_bus_name = b.name
    and r.region = b.region
    and r.account_id = b.account_id
where
  r.arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'
+------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=32) (actual time=15.128..25.383 rows=2 loops=1)                                   |
|   ->  Foreign Scan on aws_eventbridge_rule r  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.076..17.280 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'::text)                                                               |
|         Rows Removed by Filter: 437                                                                                                            |
|   ->  Foreign Scan on aws_eventbridge_bus b  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.371..7.419 rows=2 loops=1)                |
|         Filter: ((r.region = region) AND (r.event_bus_name = name))                                                                            |
| Planning Time: 11.155 ms                                                                                                                       |
| Execution Time: 28.488 ms                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------+

Steampipe version
Tested with both 0.18.0 and 0.17.4

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions