Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict getting the restore process start time to CRaC API #20262

Merged
merged 1 commit into from
Oct 1, 2024

Conversation

ThanHenderson
Copy link
Contributor

There is a race condition on the existence of the criu restore process and retrieving its start time, when --restore-detached is passed to criu restore. This patch restrict retrieving the process start time to restoring via CRaC which is not affected.

Issues: #20214
Signed-off-by: Nathan Henderson [email protected]

@ThanHenderson
Copy link
Contributor Author

fyi @tajila

@ThanHenderson
Copy link
Contributor Author

OMR issue: eclipse/omr#7469

@ThanHenderson ThanHenderson added criu Used to track CRIU snapshot related work comp:vm labels Sep 30, 2024
@tajila
Copy link
Contributor

tajila commented Sep 30, 2024

jenkins test sanity zlinux jdk21

@tajila
Copy link
Contributor

tajila commented Sep 30, 2024

jenkins test sanity xlinux jdk17

@tajila
Copy link
Contributor

tajila commented Oct 1, 2024

@ThanHenderson

17:07:07  FAILED test targets:
17:07:07  	cmdLineTester_criu_nonPortableRestore_1
17:07:07  	cmdLineTester_criu_nonPortableRestore_4
17:07:07  	cmdLineTester_criu_nonPortableRestore_5
17:07:07  	cmdLineTester_criu_nonPortableRestore_8
17:07:07  	cmdLineTester_criu_nonPortableRestore_10
13:45:47  Testing: Create CRIU checkpoint image and restore once - testGetProcessRestoreStartTime
13:45:47  Test start time: 2024/09/30 13:45:46 Eastern Standard Time
13:45:47  Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Personal_testList_1/jdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9vm.684-696,j9vm.699,j9vm.717-747} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.TimeChangeTest testGetProcessRestoreStartTime 1 false false
13:45:47  Time spent starting: 2 milliseconds
13:50:50  ***[TEST INFO 2024/09/30 13:50:46] ProcessKiller detected a timeout after 300000 milliseconds!***
13:50:50  ***[TEST INFO 2024/09/30 13:50:46] executing /usr/bin/gdb -batch -x /tmp/debugger11436814874758910703.txt bash 2753770***
13:50:50  GDB OUT 0x000003ff8f1c964a in waitpid () from /lib64/libc.so.6
13:50:50  GDB OUT From                To                  Syms Read   Shared Object Library
13:50:50  GDB OUT 0x000003ff8f38e788  0x000003ff8f39d140  Yes (*)     /lib64/libtinfo.so.6
13:50:50  GDB OUT 0x000003ff8f300da8  0x000003ff8f301af0  Yes (*)     /lib64/libdl.so.2
13:50:50  GDB OUT 0x000003ff8f123120  0x000003ff8f248008  Yes (*)     /lib64/libc.so.6
13:50:50  GDB OUT 0x000003ff8f481120  0x000003ff8f49ecb0  Yes         /lib/ld64.so.1
13:50:50  GDB OUT (*): Shared library is missing debugging information.
13:50:50  GDB OUT pswm           0x705000180000000   505810539591499776
13:50:50  GDB OUT pswa           0x3ff8f1c964a       4396152559178
13:50:50  GDB OUT r0             0x1                 1
13:50:50  GDB OUT r1             0x3ff00000000       4393751543808
13:50:50  GDB OUT r2             0xfffffffffffffe00  18446744073709551104
13:50:50  GDB OUT r3             0x3ffecef7bb8       4397726661560
13:50:50  GDB OUT r4             0x0                 0
13:50:50  GDB OUT r5             0x0                 0
13:50:50  GDB OUT r6             0x2aa0014f818       2929169070104
13:50:50  GDB OUT r7             0x2aa212cea00       2929724287488
13:50:50  GDB OUT r8             0x2aa00148dd8       2929169042904
13:50:50  GDB OUT r9             0x1                 1
13:50:50  GDB OUT r10            0x2aa00148dd8       2929169042904
13:50:50  GDB OUT r11            0x3ff00000000       4393751543808
13:50:50  GDB OUT r12            0x3ff8f4aaf70       4396155580272

@ThanHenderson
Copy link
Contributor Author

ThanHenderson commented Oct 1, 2024

This test has been removed, but I hadn't removed if from the .xml file.

Edit: I've removed it now.

There is a race condition on the existence of the criu restore
process and retrieving its start time, when --restore-detached
is passed to criu restore. This patch restrict retrieving the
process start time to restoring via CRaC which is not affected.

Issues: eclipse-openj9#20214
Signed-off-by: Nathan Henderson <[email protected]>
@ThanHenderson
Copy link
Contributor Author

We only need to restart the sanity.functional tests, the others were passing.

@tajila
Copy link
Contributor

tajila commented Oct 1, 2024

jenkins test sanity.functional xlinux jdk17

@tajila
Copy link
Contributor

tajila commented Oct 1, 2024

jenkins test sanity.functional zlinux jdk21

@tajila tajila merged commit 047e551 into eclipse-openj9:master Oct 1, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:vm criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants