Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when vm.overcommit_memory is set to 2 #138

Open
villepeh opened this issue Apr 8, 2024 · 4 comments
Open

Segmentation fault when vm.overcommit_memory is set to 2 #138

villepeh opened this issue Apr 8, 2024 · 4 comments

Comments

@villepeh
Copy link

villepeh commented Apr 8, 2024

I used Synapse Admin API to purge some rooms that no longer had local users. After that I started seeing panic messages like the ones in #79.

I tried deleting the compressor entries from the database but now I'm getting segfaults.

I cloned the repository again and rebuilt auto compressor with cargo build but the result is the same. Running the command with sudo -u postgres or root makes no difference.

$ ./synapse_auto_compressor -p "user=postgres dbname=matrix host=/run/postgresql" -c 500 -n 100 
[2024-04-08T18:34:06Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2024-04-08T18:34:06Z INFO  synapse_auto_compressor::manager] Running compressor on room !room:myhomeserver.tld with chunk size 500
Segmentation fault (core dumped)

I tried to get some debug info with GDB:

$ gdb --args synapse_auto_compressor -p "user=postgres dbname=matrix host=/run/postgresql" -c 500 -n 100
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-11.1.el9_3
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from synapse_auto_compressor...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /opt/rust-synapse-compress-state/target/debug/synapse_auto_compressor.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) run
Starting program: /opt/rust-synapse-compress-state/target/debug/synapse_auto_compressor -p user=postgres\ dbname=matrix\ host=/run/postgresql -c 500 -n 100
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[2024-04-08T18:38:54Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2024-04-08T18:38:54Z INFO  synapse_auto_compressor::manager] Running compressor on room !room:myhomeserver.tld with chunk size 500
[New Thread 0x7ffff7bff640 (LWP 1744908)]
[Thread 0x7ffff7bff640 (LWP 1744908) exited]
[New Thread 0x7ffff7bff640 (LWP 1744910)]
[Thread 0x7ffff7bff640 (LWP 1744910) exited]
[New Thread 0x7ffff7bff640 (LWP 1744911)]
[New Thread 0x7ffff79fe640 (LWP 1744912)]
[New Thread 0x7ffff77fd640 (LWP 1744913)]
[New Thread 0x7ffff75fc640 (LWP 1744914)]
[New Thread 0x7ffff73fb640 (LWP 1744915)]
[New Thread 0x7ffff71fa640 (LWP 1744916)]
[New Thread 0x7ffff6ff9640 (LWP 1744917)]
[New Thread 0x7ffff6df8640 (LWP 1744918)]
[New Thread 0x7ffff6bf7640 (LWP 1744919)]
[New Thread 0x7ffff69f6640 (LWP 1744920)]
[New Thread 0x7ffff67f5640 (LWP 1744921)]
[New Thread 0x7ffff65f4640 (LWP 1744922)]

Thread 13 "synapse_auto_co" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff69f6640 (LWP 1744920)]
___pthread_mutex_trylock (mutex=0x28e8) at pthread_mutex_trylock.c:34
34	  switch (__builtin_expect (PTHREAD_MUTEX_TYPE_ELISION (mutex),
(gdb) bt
#0  ___pthread_mutex_trylock (mutex=0x28e8) at pthread_mutex_trylock.c:34
#1  0x0000555555b946ac in malloc_mutex_trylock_final (mutex=0x28a8) at include/jemalloc/internal/mutex.h:157
#2  0x0000555555b948e1 in malloc_mutex_lock (tsdn=0x7ffff69f59d0, mutex=0x28a8) at include/jemalloc/internal/mutex.h:216
#3  0x0000555555ba0845 in _rjem_je_tcache_arena_associate (tsdn=0x7ffff69f59d0, tcache_slow=0x7ffff69f5ad0, tcache=0x7ffff69f5d28, arena=0x0) at src/tcache.c:588
#4  0x0000555555b98180 in arena_choose_impl (tsd=0x7ffff69f59d0, arena=0x0, internal=false) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:60
#5  0x0000555555b98641 in arena_choose (tsd=0x7ffff69f59d0, arena=0x0) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:88
#6  0x0000555555ba37ed in _rjem_je_tsd_tcache_data_init (tsd=0x7ffff69f59d0) at src/tcache.c:740
#7  0x0000555555ba0ee2 in _rjem_je_tsd_tcache_enabled_data_init (tsd=0x7ffff69f59d0) at src/tcache.c:644
#8  0x0000555555bb3e68 in tsd_data_init (tsd=0x7ffff69f59d0) at src/tsd.c:244
#9  0x0000555555bb47a9 in _rjem_je_tsd_fetch_slow (tsd=0x7ffff69f59d0, minimal=false) at src/tsd.c:297
#10 0x00005555558ecefa in tsd_fetch_impl (minimal=false, init=true) at include/jemalloc/internal/tsd.h:422
#11 tsd_fetch () at include/jemalloc/internal/tsd.h:448
#12 imalloc (dopts=0x7ffff69eba60, sopts=0x7ffff69ebaa0) at src/jemalloc.c:2681
#13 _rjem_je_malloc_default (size=1520) at src/jemalloc.c:2722
#14 0x00005555559237c1 in imalloc_fastpath (fallback_alloc=0x5555558ecbec <_rjem_je_malloc_default>, size=1520) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:310
#15 _rjem_malloc (size=1520) at src/jemalloc.c:2746
#16 0x00005555556651b7 in tikv_jemallocator::{impl#0}::alloc (self=0x555556379a05, layout=...) at /home/matrix/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tikv-jemallocator-0.5.4/src/lib.rs:105
#17 0x0000555555664c52 in synapse_auto_compressor::_::__rust_alloc (arg0=1520, arg1=8) at synapse_auto_compressor/src/main.rs:21
#18 0x000055555615a3fe in alloc::alloc::alloc (layout=...) at /builddir/build/BUILD/rustc-1.71.1-src/library/alloc/src/alloc.rs:102
#19 alloc::alloc::Global::alloc_impl (self=0x555556436f41, layout=..., zeroed=false) at /builddir/build/BUILD/rustc-1.71.1-src/library/alloc/src/alloc.rs:185
#20 0x000055555615a1a7 in alloc::alloc::{impl#1}::allocate (layout=...) at /builddir/build/BUILD/rustc-1.71.1-src/library/alloc/src/alloc.rs:245
#21 alloc::alloc::exchange_malloc (size=1520, align=8) at /builddir/build/BUILD/rustc-1.71.1-src/library/alloc/src/alloc.rs:334
#22 0x000055555615655c in alloc::boxed::{impl#0}::new<crossbeam_deque::deque::Block<rayon_core::job::JobRef>> (x=...) at /builddir/build/BUILD/rustc-1.71.1-src/library/alloc/src/boxed.rs:217
#23 crossbeam_deque::deque::{impl#17}::default<rayon_core::job::JobRef> () at /home/matrix/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-deque-0.8.3/src/deque.rs:1312
#24 0x000055555615420e in crossbeam_deque::deque::Injector<rayon_core::job::JobRef>::new<rayon_core::job::JobRef> () at /home/matrix/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-deque-0.8.3/src/deque.rs:1338
#25 0x0000555556151d57 in rayon_core::job::JobFifo::new () at src/job.rs:245
#26 0x000055555616a1dd in rayon_core::registry::{impl#8}::from (thread=<error reading variable: Cannot access memory at address 0x28f0>) at src/registry.rs:698
#27 0x000055555616b300 in rayon_core::registry::main_loop (thread=<error reading variable: Cannot access memory at address 0x28f0>) at src/registry.rs:925
#28 0x00005555561669b6 in rayon_core::registry::ThreadBuilder::run (self=<error reading variable: Cannot access memory at address 0x2930>) at src/registry.rs:54
#29 0x0000555556166e4d in rayon_core::registry::{impl#2}::spawn::{closure#0} () at src/registry.rs:99
#30 0x0000555556174759 in std::sys_common::backtrace::__rust_begin_short_backtrace<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()> (f=<error reading variable: Cannot access memory at address 0x2930>)
    at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/sys_common/backtrace.rs:135
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x00005555561972fd in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()> () at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:529
#32 0x0000555556185671 in core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>> (self=...)
    at /builddir/build/BUILD/rustc-1.71.1-src/library/core/src/panic/unwind_safe.rs:271
#33 0x0000555556150782 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>>, ()> (
    data=0x7ffff69f4db0) at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/panicking.rs:500
#34 0x0000555556151d9b in __rust_try ()
#35 0x0000555556150552 in std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>>> (f=...)
    at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/panicking.rs:464
#36 0x0000555556196cc1 in std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>>, ()> (f=...)
    at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/panic.rs:142
#37 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()> () at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:528
#38 0x00005555561637cf in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>, ()> ()
    at /builddir/build/BUILD/rustc-1.71.1-src/library/core/src/ops/function.rs:250
#39 0x000055555634ed35 in std::sys::unix::thread::Thread::new::thread_start ()
#40 0x00007ffff7c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#41 0x00007ffff7c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
@villepeh
Copy link
Author

Similar results with "manual" compressor, trying to compress Matrix HQ room.

# RUST_BACKTRACE=1 RUST_LOG=debug LD_PRELOAD=/usr/lib64/libjemalloc.so.2 ./synapse_compress_state -p "user=postgres dbname=matrix host=/run/postgresql" -r '!OGEhHVWSdvArJzumhm:matrix.org' -o out.sql -t -n 500 -b 170833
[2024-04-09T23:43:16Z INFO  synapse_compress_state] Fetching state from DB for room '!OGEhHVWSdvArJzumhm:matrix.org'... 
[2024-04-09T23:43:16Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT id FROM (SELECT id FROM state_groups WHERE room_id = $1 AND id > $2 ORDER BY id ASC LIMIT $3) AS ids ORDER BY ids.id DESC LIMIT 1  
[2024-04-09T23:43:16Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["!OGEhHVWSdvArJzumhm:matrix.org", Some(170833), Some(500)] 
[2024-04-09T23:43:16Z DEBUG tokio_postgres::prepare] preparing query s1:
SELECT m.id, prev_state_group, type, state_key, s.event_id
FROM state_groups AS m  
LEFT JOIN state_groups_state AS s ON (m.id = s.state_group)  
LEFT JOIN state_group_edges AS e ON (m.id = e.state_group)
WHERE m.room_id = $1 AND m.id <= $2 
AND m.id > $3  
[2024-04-09T23:43:16Z DEBUG tokio_postgres::query] executing statement s1 with parameters: ["!OGEhHVWSdvArJzumhm:matrix.org", 173804, 170833] 
  [2m] 14444321 rows retrieved[2024-04-09T23:45:33Z DEBUG synapse_compress_state::database] Got initial state from database. Checking for any missing s
tate groups... 
[2024-04-09T23:45:33Z INFO  synapse_compress_state] Fetched state groups up to 173804  
[2024-04-09T23:45:33Z INFO  synapse_compress_state] Number of state groups: 500  
[2024-04-09T23:45:33Z INFO  synapse_compress_state] Number of rows in current table: 14444035 
[2024-04-09T23:45:33Z INFO  synapse_compress_state] Compressing state... 
[00:01:32] ████████████████████ 500/500 state groups  
[2024-04-09T23:47:06Z INFO  synapse_compress_state] Number of rows after compression: 2943697 (20.38%) 
[2024-04-09T23:47:06Z INFO  synapse_compress_state] Compression Statistics: 
[2024-04-09T23:47:06Z INFO  synapse_compress_state]Number of forced resets due to lacking prev: 29
[2024-04-09T23:47:06Z INFO  synapse_compress_state]Number of compressed rows caused by the above: 2680484
[2024-04-09T23:47:06Z INFO  synapse_compress_state]Number of state groups changed: 161 
[2024-04-09T23:47:06Z INFO  synapse_compress_state] Checking that state maps match...
[00:00:00] ░░░░░░░░░░░░░░░░░░░░ 0/500 state groups 
Segmentation fault (core dumped)  

Coredump

# gdb /opt/rust-synapse-compress-state/target/debug/synapse_compress_state --core /root/core.synapse_compres.0.90ce0546c6a44f6ea07a0538e09d8004.1952675.1712706710000000
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-11.1.el9_3
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
 <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/rust-synapse-compress-state/target/debug/synapse_compress_state...
[New LWP 1952986]
[New LWP 1952675]
[New LWP 1952977]
[New LWP 1952976]
[New LWP 1952978]
[New LWP 1952979]
[New LWP 1952980]
[New LWP 1952981]
[New LWP 1952985]
[New LWP 1952975]
[New LWP 1952982]
[New LWP 1952983]
[New LWP 1952984]
[New LWP 1952987]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `./synapse_compress_state -p user=postgres dbname=matrix host=/run/postgresql -r'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
34	  switch (__builtin_expect (PTHREAD_MUTEX_TYPE_ELISION (mutex),
[Current thread is 1 (Thread 0x7f8317ff4640 (LWP 1952986))]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /opt/rust-synapse-compress-state/target/debug/synapse_compress_state.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) bt
#0  ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
#1  0x00007f831a080a08 in malloc_mutex_trylock_final (mutex=0x28a8) at include/jemalloc/internal/mutex.h:157
#2  malloc_mutex_lock (mutex=0x28a8, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/mutex.h:216
#3  je_tcache_arena_associate (tsdn=tsdn@entry=0x7f8317ff2f88, tcache_slow=tcache_slow@entry=0x7f8317ff3088, tcache=tcache@entry=0x7f8317ff32e0, arena=arena@entry=0x0) at src/tcache.c:588
#4  0x00007f831a08442b in arena_choose_impl.constprop.1 (tsd=0x7f8317ff2f88, arena=<optimized out>, internal=false) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:60
#5  0x00007f831a01d397 in arena_choose (arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:88
#6  tcache_alloc_small (slow_path=<optimized out>, zero=true, binind=2, size=32, tcache=0x7f8317ff32e0, arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/tcache_inlines.h:56
#7  arena_malloc (slow_path=<optimized out>, tcache=0x7f8317ff32e0, zero=true, ind=2, size=32, arena=0x0, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/arena_inlines_b.h:151
#8  iallocztm (slow_path=<optimized out>, arena=0x0, is_internal=false, tcache=0x7f8317ff32e0, zero=true, ind=2, size=32, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:55
#9  imalloc_no_sample (ind=2, usize=32, size=32, tsd=0x7f8317ff2f88, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2398
#10 imalloc_body (tsd=0x7f8317ff2f88, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2573
#11 imalloc (dopts=<optimized out>, sopts=<optimized out>) at src/jemalloc.c:2687
#12 calloc (num=num@entry=1, size=size@entry=32) at src/jemalloc.c:2852
#13 0x00007f8319657b43 in __cxa_thread_atexit_impl (func=0xff8c83398676680f, obj=0x7f8317ff4460, dso_symbol=0x5620ea26d730 <_rust_extern_with_linkage___dso_handle>) at cxa_thread_atexit_impl.c:107
#14 0x00005620ea009899 in std::sys::unix::stack_overflow::imp::signal_handler ()
#15 <signal handler called>
#16 ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
#17 0x00007f831a080a08 in malloc_mutex_trylock_final (mutex=0x28a8) at include/jemalloc/internal/mutex.h:157
#18 malloc_mutex_lock (mutex=0x28a8, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/mutex.h:216
#19 je_tcache_arena_associate (tsdn=tsdn@entry=0x7f8317ff2f88, tcache_slow=tcache_slow@entry=0x7f8317ff3088, tcache=tcache@entry=0x7f8317ff32e0, arena=arena@entry=0x0) at src/tcache.c:588
#20 0x00007f831a0859a8 in arena_choose_impl (arena=<optimized out>, internal=false, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:60
#21 arena_choose_impl (arena=0x0, internal=false, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:32
#22 arena_choose (arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:88
#23 je_tsd_tcache_data_init.isra.0 (tsd=0x7f8317ff2f88) at src/tcache.c:740
#24 0x00007f831a085df9 in je_tsd_tcache_enabled_data_init (tsd=<optimized out>) at src/tcache.c:644
#25 0x00007f831a085e8c in je_tsd_fetch_slow.constprop.0 (minimal=minimal@entry=false, tsd=<optimized out>) at src/tsd.c:311
#26 0x00007f831a024445 in tsd_fetch_impl (minimal=false, init=true) at include/jemalloc/internal/tsd.h:422
#27 tsd_fetch () at include/jemalloc/internal/tsd.h:448
#28 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2681
#29 realloc (ptr=ptr@entry=0x0, size=size@entry=32) at src/jemalloc.c:3653
#30 0x00007f83196a0bea in __pthread_getattr_np (thread_id=140201020048960, attr=0x7f8317ff23d0) at pthread_getattr_np.c:181
#31 0x00005620ea00a2a1 in std::sys::unix::thread::guard::current ()
--Type <RET> for more, q to quit, c to continue without paging--
#32 0x00005620e9e56523 in std::thread::{impl#0}::spawn_unchecked_::{closure#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()> () at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:527
#33 0x00005620e9e2316f in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>, ()> ()
 at /builddir/build/BUILD/rustc-1.71.1-src/library/core/src/ops/function.rs:250
#34 0x00005620ea00a095 in std::sys::unix::thread::Thread::new::thread_start ()
#35 0x00007f831969f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#36 0x00007f831963f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@villepeh
Copy link
Author

Interesting. After I

  • got jemalloc from EPEL (slightly older version). The mock-compiled one hadn't caused problems before but wanted to be sure.
  • removed some postgresql-related sysctl settings
  • tuned amount of hugepages down
  • rebooted the server

...segfault no longer happens. The behavior seems weird regardless, but I've got nothing more to add to this except that the older version of didn't fix segfaults before I did the other steps.

If it's impossible to investigate and no one else is able to reproduce this, I think this issue can be closed.

@reivilibre
Copy link
Contributor

From the stack trace, sounds like it could be a problem with jemallocator. So using a different version is probably a reasonable workaround.
If it happens on the latest jemallocator version, we should probably look into this, otherwise I'm not sure there's much for us to do here.

What do you mean by 'mock-compiled' one?

@villepeh
Copy link
Author

villepeh commented May 9, 2024

From the stack trace, sounds like it could be a problem with jemallocator. So using a different version is probably a reasonable workaround. If it happens on the latest jemallocator version, we should probably look into this, otherwise I'm not sure there's much for us to do here.

I can't believe I forgot to post about the actual reason this kept happening. It was very likely this sysctl parameter: vm.overcommit_memory=2. It was suggested here so I just went with it (bad idea). When I set it to default 0 the issue was gone. And compressor now works with the latest jemalloc as well.

I suppose the compressor crashing with the option enabled isn't intended behavior but I don't know if it's worth the trouble of fixing either.

0	-	Heuristic overcommit handling. Obvious overcommits of
		address space are refused. Used for a typical system. It
		ensures a seriously wild allocation fails while allowing
		overcommit to reduce swap usage.  root is allowed to 
		allocate slightly more memory in this mode. This is the 
		default.

1	-	Always overcommit. Appropriate for some scientific
		applications. Classic example is code using sparse arrays
		and just relying on the virtual memory consisting almost
		entirely of zero pages.

2	-	Don't overcommit. The total address space commit
		for the system is not permitted to exceed swap + a
		configurable amount (default is 50%) of physical RAM.
		Depending on the amount you use, in most situations
		this means a process will not be killed while accessing
		pages but will receive errors on memory allocation as
		appropriate.

		Useful for applications that want to guarantee their
		memory allocations will be available in the future
		without having to initialize every page.

What do you mean by 'mock-compiled' one?

mock is a handy tool for creating RPMs.
In short, some software might not be available for RHEL (and its clones like Oracle Linux and Rocky Linux) or might be quite old. For example, RHEL offers HAProxy 2.4 but I'd much rather run the latest 2.8 LTS.

Instead of working with the ./configure && make && make install hassle, you can just grab a source RPM. Then run mock haproxy-2.8.5-1.fc39.src.rpm and it handles the compilation voodoo itself and creates an installable .rpm for your distro. I did the same for jemalloc because EPEL offers version 5.2.1 rather than the latest 5.3.0 which is supposed to have several improvements and optimizations.

@villepeh villepeh changed the title Segmentation fault after deleting rooms Segmentation fault when vm.overcommit_memory is set to 2 May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants