Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(duplication):some nodes coredump after start duplication for a long time #2014

Open
ninsmiracle opened this issue May 22, 2024 · 0 comments
Labels
type/bug This issue reports a bug.

Comments

@ninsmiracle
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    -Deploy duplication matser and back-up cluster.
    -Begin duplicate.
    -Run about 2~3 days.
    -Some nodes coredump

  2. What did you expect to see?
    Node run as normal.

  3. What did you see instead?
    memory monitoring table.
    image

coredump detail:

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/work/app/pegasus/c3srv-browser/replica/package/bin/pegasus_server config.'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f01575401d7 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f01575401d7 in raise () from /lib64/libc.so.6
#1  0x00007f01575418c8 in abort () from /lib64/libc.so.6
#2  0x00007f015c628f9e in dsn_coredump () at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/service_api_c.cpp:93

#3  0x00007f015c422c83 in dsn::replication::log_file::log_file (this=0x73aa4a630, path=0x740561c98 "/home/work/ssd2/pegasus/c3srv-browser/replica/reps/72.173.pegasus/plog/log.92534.3105061495163", 
    handle=<optimized out>, index=<optimized out>, start_offset=3105061495163, is_read=<optimized out>) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/log_file.cpp:166

#4  0x00007f015c4247ce in dsn::replication::log_file::open_read (path=0x740561c98 "/home/work/ssd2/pegasus/c3srv-browser/replica/reps/72.173.pegasus/plog/log.92534.3105061495163", err=...)
    at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/log_file.cpp:92
#5  0x00007f015c43ccfa in dsn::replication::log_utils::open_read (path=..., file=...) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/mutation_log_utils.cpp:43
#6  0x00007f015c4ff7fa in dsn::replication::load_from_private_log::find_log_file_to_start (this=this@entry=0x384c74640)
    at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:123
#7  0x00007f015c500360 in dsn::replication::load_from_private_log::run (this=0x384c74640) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:100
#8  0x00007f015c665f91 in dsn::task::exec_internal (this=this@entry=0x2b9bce1e0) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task.cpp:176
#9  0x00007f015c67b642 in dsn::task_worker::loop (this=0x2a67c30) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:224
#10 0x00007f015c67b7c0 in dsn::task_worker::run_internal (this=0x2a67c30) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:204
#11 0x00007f015b2f8a3f in execute_native_thread_routine () from /home/work/app/pegasus/c3srv-browser/replica/package/bin/libdsn_utils.so
#12 0x00007f0159103dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f015760273d in clone () from /lib64/libc.so.6
(gdb)

stdout file (error log):

E2024-05-15 05:48:17.721 (1715723297721553663 62544) replica.rep_long9.040400031452989e: native_linux_aio_provider.cpp:49:open(): create file failed, err = No such file or directory

E2024-05-15 05:48:17.721 (1715723297721596680 62544) replica.rep_long9.040400031452989e: load_from_private_log.cpp:125:find_log_file_to_start(): [[email protected]:34801] ERR_FILE_OPERATION_FAILED: failed to open the log file (/home/work/ssd7/pegasus/c3srv-xxxxxx/replica/reps/72.171.pegasus/plog/log.91190.3060048707709)

F2024-05-15 06:03:20.656 (1715724200656901498 62545) replica.rep_long10.04040005181bcdaf: log_file.cpp:166:log_file(): assertion expression: false
F2024-05-15 06:03:20.656 (1715724200656954168 62545) replica.rep_long10.04040005181bcdaf: log_file.cpp:166:log_file(): fail to get file size of /home/work/ssd2/pegasus/c3srv-xxxxx/replica/reps/72.173.pegasus/plog/log.92534.3105061495163
  1. What version of Pegasus are you using?
    peagsus v2.4
@ninsmiracle ninsmiracle added the type/bug This issue reports a bug. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

1 participant