-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
MDEV-36316/MDEV-36327/MDEV-36328 Msan debug innodb #4061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+146
−31
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
dr-m
approved these changes
May 23, 2025
Clang ~16+ on MSAN became quite strict with uninitalized data being passed and returned from functions. Non-debug builds have a basic optimization that hides these from those builds Two innodb cases violate the assumptions, however once inlined with a basic optimization those that existed for uninitialized values are removed. (MDEV-36316) rec_set_bit_field_2 calling mach_read_from_2 hits a read of bits it wasn't actually changing. (MDEV-36327) The function dict_process_sys_columns_rec left nth_v_col uninitialized unless it was a virtual column. This was ok as the function i_s_sys_columns_fill_table also didn't read this value unless it was a virtual column.
The stacksize was still too large. Back port from the 10.11 merge commit 1c7209e
In original fix, commit 82d7419, a 16k stack frame limit was imposed. Under the stack usage is doubled due to MSAN. Debug builds without optimization can use more as well. ASAN Debug builds also exceeded the 16k stack frame limit. To keep some safety limit, a 64k limit is imposed to the compiler under MSAN or ASAN with CMAKE_BUILD_TYPE=Debug.
Despite being included in the HAVE_valgrind define. As such it's best differenciated from valgrind in the server identifier as they have for the purposes a distinct and different set of behaviours. MSAN has its own set of test inclusions that that are different from valgrind and such including "valgrind" in a server string that gets tested for valgrind will incorrectly exclude some tests that are suitable for MSAN but not valgrind. There's a have_sanitizer system variable for exposing the sanitizer being used so there's no need for version verboseness. Correct have_sanitizer system variable description to include MSAN has been possible for a while.
Without this increase the mtr test case pre/post conditions will fail as the stack usage has increased under MSAN with clang-20.1. ASAN takes a 11M stack, however there was no obvious gain in MSAN test success after 2M. The resulting behaviour observed on smaller stack size was a SEGV normally. Hide the default stack size from the sysvar tests that expose thread-stack as a variable with its default value.
…ructure THD::reset_sub_statement_state and THD::restore_sub_staement_state swap auto_inc_intervals_forced(Discrete_intervals_list) of a THD class with a local variable temporary to execute other things before restoring at the end of Table_triggers_list::process_triggers under a rpl_master_erroneous_autoinc(true) condition as exposed by the rpl.rpl_trigger test. The uninitialized data isn't used and the only required action is to copy the data in one direction. As the intent is for the auto_inc_intervals_forced value to be overwritten or unused, MEM_UNDEFINED is used on it to ensure the previous state is considered invalid. The other uses of reset_sub_statement_state in Item_sp::execute_impl also follow the same pattern of taking a copy to restore within the same function.
Calling SetArrayOptions with Nodes[i - 1].Key as its nm argument exposed a MemorySanitizer error when i=0 for the tests: * connect.bson_udf * connect.json_udf * connect.json_udf_bin Its assumed that a basic optimization would have eliminated these invalid expressions. As the nm argument was unused, it has been removed.
With MSAN the following test behavious where observed: * funcs_1.myisam_views-big - normal big test for non-debug * innodb_gis.rtree_purge - normal big test with MSAN * main.alter_table_lock - very quick - unclear why disabled * main.cte_recursive - slow on Debug only * main.join_cache_notasan - special MSAN handing for returning OOM added * main.sum_distinct-big - 90 seconds on non-debug - still big however * maria.max_length - normal big test with MSAN * perfschema.statement_digest_long_query - overflows stack on debug Timingsi (on old memory constrained hardware): non-debug: funcs_1.myisam_views-big w2 [ pass ] 78564 innodb_gis.rtree_purge '16k' w2 [ pass ] 5784 innodb_gis.rtree_purge '32k' w2 [ pass ] 5242 innodb_gis.rtree_purge '4k' w1 [ pass ] 8303 innodb_gis.rtree_purge '64k' w1 [ pass ] 6348 innodb_gis.rtree_purge '8k' w2 [ pass ] 5870 main.alter_table_lock w1 [ pass ] 41 main.cte_recursive w1 [ pass ] 15485 main.join_cache_notasan w1 [ pass ] 39 main.sum_distinct-big w2 [ pass ] 96256 maria.max_length w1 [ pass ] 92990 perfschema.statement_digest_long_query w2 [ pass ] 8 debug: funcs_1.myisam_views-big w1 [ skipped ] Can't be run WITH_MSAN and CMAKE_BUILD_TYPE=Debug innodb_gis.rtree_purge '16k' w2 [ pass ] 109788 innodb_gis.rtree_purge '32k' w2 [ pass ] 62361 innodb_gis.rtree_purge '4k' w1 [ pass ] 89423 innodb_gis.rtree_purge '64k' w1 [ pass ] 72082 innodb_gis.rtree_purge '8k' w1 [ pass ] 98452 main.alter_table_lock w2 [ pass ] 38 main.cte_recursive w2 [ pass ] 180047 main.join_cache_notasan w1 [ pass ] 166 main.sum_distinct-big w1 [ skipped ] Can't be run WITH_MSAN and CMAKE_BUILD_TYPE=Debug maria.max_length w1 [ skipped ] Can't be run WITH_MSAN and CMAKE_BUILD_TYPE=Debug perfschema.statement_digest_long_query w1 [ skipped ] Can't be run WITH_MSAN and CMAKE_BUILD_TYPE=Debug
5f4b0fa
to
676aea8
Compare
dr-m
approved these changes
May 28, 2025
The following mroonga functions had approaching 64k stack frames, so exclude these: * chunk_merge - ~60k * buffer_merge - ~78k * grn_ii_update_one - ~60k
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
MSAN + CMAKE_BUILD_TYPE=Debug has stack overrun issues
Just a POC at the moment. Some tests still failing:
Release Notes
non-user facing
How can this PR be tested?
CMAKE_BUILD_TYPE=Debug under MSAN like https://mariadb.com/kb/en/compile-and-using-mariadb-with-sanitizers-asan-ubsan-tsan-msan/ but without CMAKE_C,CXX_FLAGS=-O2
Basing the PR against the correct MariaDB version
main
branch.PR quality check