Cherrypick FIXME enable 64bit bitmapset and update visimap etc #843

yjhjstz · 2025-01-05T07:32:39Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

Remove the unnecessary extra call as we are already calling this function a couple of lines above.

Currently, partition definition element can have mix of partition definitions and also COLUMN ENCODING clauses within it. For example: CREATE TABLE foo (a1 int, a2 char(5)) using ao_row partition by range (a1) ( start(1) end(5) every(1), --> partition definition COLUMN a1 ENCODING (compresstype=zstd) --> column encoding ); Due to this when GpPartitionDefinition has single list of partDefElems which contains partition definitions and also column encodings. Fixme was to evaluate if current grammer can be optimized to generate separate lists and eliminate the need for separating these later in transformGpPartitionDefinition(). So, not essentially anything to fix, just optimize. After looking into the details - seems not straightforward without modifing the grammer which will be breaking change. Also, don't see much performance benefit either from the change as current code is looping over partDefElems to transform, so not much extra cost added from having them together. Also, we don't expect to have huge number of elements in this list to overpower the cost from actually creating these many partition tables. Reviewed-by: Alexandra Wang <[email protected]> Reviewed-by: Huansong Fu <[email protected]>

The fixme is for pg_basebackup --no-verify-checksums flag. This feature doesn't work and hence is not supported currently in GPDB as no way to identify based on physical files if AO or HEAP. Need to come up with some mechanism to enable this feature. Till that happens, no point having FIXME related to it in test file. Adding breadcrumbs with GPDB_12_MERGE_FEATURE_NOT_SUPPORTED comment to fix the isolation2 setup file as well once feature is enabled. Removing the fixme from setup.sql and setup.out file. Reviewed-by: Huansong Fu <[email protected]>

Currently, Greenplum doesn't support foreign key constraint validation. However, we only emit a warning message saying Greenplum doesn't support it without skipping the validation process. I noticed this issue when I was running the following query: CREATE TABLE fk_test_reference (col2 text unique not null); INSERT INTO fk_test_reference VALUES ('stuff'); CREATE UNLOGGED TABLE id_taptest_table ( col1 bigint, col2 text not null default 'stuff', col3 timestamptz DEFAULT now(), col4 text, FOREIGN KEY (col2) REFERENCES fk_test_reference(col2)) PARTITION BY RANGE (col1); CREATE TABLE id_taptest_table_p3000000000 ( LIKE id_taptest_table INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS INCLUDING GENERATED INCLUDING INDEXES); ALTER TABLE id_taptest_table ATTACH PARTITION id_taptest_table_p3000000000 FOR VALUES FROM (3000000000) TO (3000000010); There're 2 possible failures when attaching the newly created partitioned table. 1. Greenplum emits an error message: ERROR: XX000: no pre-assigned OID for pg_constraint tuple "id_taptest_table_col2_fkey" (namespace:0 keyOid1:16472 keyOid2:102) (oid_dispatch.c:374) (seg2 127.0.0.1:7004 pid=195321) (oid_dispatch.c:374) LOCATION: GetNewOrPreassignedOid, oid_dispatch.c:374 2. Greenplum emits an error message: ERROR: 0A000: function cannot execute on a QE slice because it accesses relation "public.id_taptest_table_p3000000000" (seg2 127.0.0.1:7004 pid=108395) CONTEXT: SQL statement "SELECT fk."col2" FROM ONLY "public"."id_taptest_table_p3000000000" fk LEFT OUTER JOIN ONLY "public"."fk_test_reference" pk ON ( pk."col2" OPERATOR(pg_catalog.=) fk."col2") WHERE pk."col2" IS NULL AND (fk."col2" IS NOT NULL)" LOCATION: querytree_safe_for_qe_walker, functions.c:238 The root cause analysis: When attaching a partitioned table with foreign key references, Greenplum will check the constraints by running the query both on the coordinator and on segments: SELECT fk."col2" FROM ONLY "public"."id_taptest_table_p3000000000" fk LEFT OUTER JOIN ONLY "public"."fk_test_reference" pk ON ( pk."col2" OPERATOR(pg_catalog.=) fk."col2") WHERE pk."col2" IS NULL AND (fk."col2" IS NOT NULL) If the check get passed, coordinator will dispatch the real command for 'ALTER TABLE ... ATTACH PARTITION ...' to segments. When performing the 'ALTER TABLE' command, coordinator needs to generate an OID for the pg_constraint tuple 'id_taptest_table_col2_fkey' to keep the catalog table consistent across the cluster. The dispatched OID will be cleaned up if that transaction is aborted or committed. The 1st possible failure occurs if we execute the 'ALTER TABLE' command directly, the dispatched OID for altering table get cleaned up after the constraint checking query finishing (the constraint checking query executed in function 'RI_Initial_Check'). The 2nd possible failure occurs if we execute the 'ALTER TABLE' command in a transaction block. The constraint checking query on coordinator will success and coordinator will dispatch the real 'ALTER TABLE' command to segments. The segment will also run the constraint checking query but in QE mode (Gp_role == GP_ROLE_EXECUTE), so the query cannot be executed because it's accessing some relation. Anyway, Greenplum doesn't support foreign key constraints validation, the simplest fixing is skipping the validation. Fix #14279

Fixup commit d28d587847f0baf267fb1211be8a8028d49b2919

Note that when SharedSnapshotDump is called, the caller must hold the SharedSnapshotLock in advance, so this commit fixes the case that didn't satisfy this condition. Co-authored-by: wuchengwen <[email protected]>

In an extreme case, two gpfdist occupy one port(ipv4/ipv6). The reason for this problem includes two aspects. The first one is that 'bind' is not mutually exclusive. And another is that when listening to a port fail, gpfdist will try the same port with a different protocol(ipv4 or ipv6). So the PR fixes the problem by changing the handling for failed listening behavior.

Concurrent index builds are not supported in GPDB. The currently implemented code in this function is all heap centric and not really AO. Hence, deleting the function implementation and adding ERROR if called. When we get to implement the feature of CONCURRENT index builds then can properly implement the interface.

Yes, wish is these macros shouldn't be required. Table AM serves and hides all the AM level details. Though that's far from reality and not simple or easy to make happen given the current usage of these macros. Yes, iteratively we can move towards the direction to make reach the goal eventually. Till then just convert the FIXME to code comment.

The implementation questioned is not new and has been there for GPDB4,5,6 and 7. Longer term goal is to make it better but nothing in short term to fix here. Plus, the fixme in SyncRepGetSyncStandbys() can't be done as standbconfig is not created or populated for coordinator-standby. Hence, the code needs to live in SyncRepGetSyncStandbys(). Once proper implementation is done for coodinator and standby then this part can be tackled as well.

There is nothing special about bgwriter in GPDB compared to upstream. So, remove the fixme and treat bgwriter section same as checkpointer or parts of the sample config.

… belong to opfamily of distributed key(#14977) If opno of clause does not belong to opfamily of distributed key, and use direct dispatch we will get wrong results: create table bar(a varchar); insert into bar values('a '); explain select * from bar where a = 'a'::bpchar; ``` QUERY PLAN ------------------------------------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=8) -> Seq Scan on bar (cost=0.00..431.00 rows=1 width=8) Filter: ((a)::bpchar = 'a'::bpchar) Optimizer: Pivotal Optimizer (GPORCA) (4 rows) select * from bar where a = 'a'::bpchar; a --- (0 rows) ``` if opno of the clause does not belong to opfamily of distributed key, do not use direct dispatch to avoid wrong results. There are some cases that we could use direct dispatch, but we don't after this commit. create table t(a citext); explain select * from t where a = 'a'::text; texteq does not belong to the opfamily of the table's distributed key citext type. But from the implementation can deduce: texteq ==> citext_eq, and we can do the direct dispatch. But we do not have the kind of implication rule in Postgres: texteq ==> citext_eq. more details please refer to issue: https://github.com/greenplum-db/gpdb/issues/14887 fix direct_dispatch

these code seems useless, so just remove them and keep the codebase neat and clean. Co-authored-by: Junwang Zhao <[email protected]>

It was removed while merging PostgreSQL 12 because it hung back then, it has no issue now, get it back. The previous not sure analysis: https://greenplum.slack.com/archives/C74P7FZFG/p1586440484210500

The encoding clause is necessary to maintain catalog consistency for dropped columns in pg_attribute_encoding across major version upgrades. Without this, the column encoding will be set to the hardcoded server default in the new cluster. If the encoding was modified from the default in the old cluster, an error would occur if the column needed to be scanned for any reason. Resolves the issue `ERROR: decompression information missing` after a column is dropped from an AO/CO table and the table is run through pg_upgrade Authored-by: Brent Doil <[email protected]>

QE will hang in `ExecTupleSplit()` while executing the following query. ``` select DQA_1(expr1) filter (subquery1), DQA_2(expr2) ... ``` If a tuple is filtered out by DQA_1's filter, `filter_out` will be set to true. Because dqa2 has no filters, `filter_out` will not be set to false on the next loop. Then it's an endless loop. The solution is simple, if current Expr has no filter, set `filter_out` to false.

The log directory can often be symlinked in the same way the pg_wal directory is. Today, even if BOTH the source and target servers have their log directories symlinked, pg_rewind will run into the error: "log" is of different type in source and target This is because when we use the libpq query to fetch the filemap from the source server, we consider the log directory as a directory, even if it is a symlink. This is because pg_stat_file() is used in that query in libpqProcessFileList() and pg_stat_file() returns isdir=t for symlinks to directories. This shortcoming is somewhat called out: * XXX: There is no backend function to get a symbolic link's target in * general, so if the admin has put any custom symbolic links in the data * directory, they won't be copied correctly. We could fix the query and/or pg_stat_file(). However, we would also like to support deployments where only one of the primaries and/or standbys have the symlink. That is not hard to conceive, given primaries and standbys can have drastically disparate log volume and/or log collection requirements. So, we decide to extend the log directory the same courtesy as was done for pg_wal (pg_xlog) in 0e42397.

Prevent potential array out of bounds access by validating the expression ID is within the valid range of distinct qualified aggregates before accessing the aggregate filter array.

It includes the portion about enabling 64bit bms and other necessary changes related to visimap, plus updates of cmockery unit tests.

Chibin and others added 11 commits January 5, 2025 08:30

Remove extra InvalidateCatalogSnapshotConditionally call

a6bedde

Remove the unnecessary extra call as we are already calling this function a couple of lines above.

Modify 019_replslot_limit.pl test for Greenplum, take 2

6cddda8

Fixup commit d28d587847f0baf267fb1211be8a8028d49b2919

Fix LWLockHeldByMe assert failure in SharedSnapshotDump (#15007)

d0eaf51

Note that when SharedSnapshotDump is called, the caller must hold the SharedSnapshotLock in advance, so this commit fixes the case that didn't satisfy this condition. Co-authored-by: wuchengwen <[email protected]>

Remove FIXME related to bgwriter in sample config

2ddc0a0

There is nothing special about bgwriter in GPDB compared to upstream. So, remove the fixme and treat bgwriter section same as checkpointer or parts of the sample config.

yjhjstz added the cherry-pick cherry-pick upstream commts label Jan 5, 2025

zxuejing and others added 9 commits January 5, 2025 17:05

FTS: remove dead code and fix some typo

13816b4

these code seems useless, so just remove them and keep the codebase neat and clean. Co-authored-by: Junwang Zhao <[email protected]>

gpinitsystem: fix bash syntax when remote locale is incorrect (#15053)

5cead13

Get the case qp_functions_in_subquery_constant back

45ea057

It was removed while merging PostgreSQL 12 because it hung back then, it has no issue now, get it back. The previous not sure analysis: https://greenplum.slack.com/archives/C74P7FZFG/p1586440484210500

fix: Add bounds checking for aggregate filter array access

e454271

Prevent potential array out of bounds access by validating the expression ID is within the valid range of distinct qualified aggregates before accessing the aggregate filter array.

FIXME enable 64bit bitmapset and update visimap (#14784)

bacb81d

It includes the portion about enabling 64bit bms and other necessary changes related to visimap, plus updates of cmockery unit tests.

yjhjstz force-pushed the pick_0104 branch from bc9c45f to bacb81d Compare January 5, 2025 09:31

my-ship-it approved these changes Jan 6, 2025

View reviewed changes

yjhjstz requested a review from avamingli January 6, 2025 08:25

avamingli approved these changes Jan 6, 2025

View reviewed changes

yjhjstz merged commit a03d2b8 into apache:main Jan 6, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherrypick FIXME enable 64bit bitmapset and update visimap etc #843

Cherrypick FIXME enable 64bit bitmapset and update visimap etc #843

yjhjstz commented Jan 5, 2025

Cherrypick FIXME enable 64bit bitmapset and update visimap etc #843

Cherrypick FIXME enable 64bit bitmapset and update visimap etc #843

Conversation

yjhjstz commented Jan 5, 2025

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions