Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when using pthread_cancel #800

Open
SeanTAllen opened this issue Aug 17, 2020 · 4 comments
Open

Segfault when using pthread_cancel #800

SeanTAllen opened this issue Aug 17, 2020 · 4 comments
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority

Comments

@SeanTAllen
Copy link
Contributor

When running the following test:

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        sleep(1);
        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

Both myself and @vtikoo get a segfault fairly early on.

Backtrace follows:

Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe03dadfb10 (LWP 17645)]
0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
895     {
(gdb) bt
#0  0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
#1  0x00007fe00009a2eb in __send_signal (force=<optimized out>, type=<optimized out>, t=<optimized out>, info=<optimized out>,
    sig=<optimized out>) at kernel/signal.c:1076
#2  send_signal (sig=33, info=0x7fe03dabf0d0, t=0x0, type=PIDTYPE_PID) at kernel/signal.c:1236
#3  0x00007fe00009b6cd in do_send_sig_info (sig=1072106240, info=0x21, p=0x7fe03dabf0d0, type=PIDTYPE_PID) at kernel/signal.c:1285
#4  0x00007fe00009b763 in do_send_specific (tgid=33, pid=<optimized out>, sig=1072106240, info=0x0) at kernel/signal.c:3772
#5  0x00007fe00009b816 in do_tkill (tgid=33, pid=0, sig=1034678480) at kernel/signal.c:3798
#6  0x00007fe00009c504 in __do_sys_tkill (sig=<optimized out>, pid=<optimized out>) at kernel/signal.c:3833
#7  __se_sys_tkill (pid=<optimized out>, sig=<optimized out>) at kernel/signal.c:3827
#8  0x00007fe03dabf180 in ?? ()
#9  0x00007fe00008b6cf in run_syscall (params=<optimized out>, no=<optimized out>) at arch/lkl/kernel/syscalls.c:44
#10 lkl_syscall (no=0, params=0x7fe03dabf0d0) at arch/lkl/kernel/syscalls.c:192

Given the backtrace, this might be connected to our various signal problems. However, that isn't known yet so I wanted to open this issue to keep track of this problem.


A slight variation on this is that with the following version, the application will eventually hang (for me, always after the creation of 255 threads):

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

note the only difference from the first one is the lack of the sleep(1) call in main.

@SeanTAllen SeanTAllen added bug area: sgx-lkl Core SGX-LKL functionality labels Aug 17, 2020
@github-actions github-actions bot added the needs-triage Bug does not yet have a priority assigned label Aug 17, 2020
@prp
Copy link
Member

prp commented Aug 17, 2020

This will be broken because we are not delivering signals to the correct thread (see #644).

@davidchisnall
Copy link
Contributor

@vtikoo, I think the syscall_cp assembly was not yet ported over to LKL, which may be related here?

@bodzhang bodzhang added this to Needs triage in Issue triage via automation Aug 18, 2020
@vtikoo
Copy link
Contributor

vtikoo commented Aug 18, 2020

@davidchisnall I tried adding a breakpoint at the entry of __syscall_cp.s, it doesn't look syscall_cp is called during this test.
The tkill syscall mentioned in the stacktrace is most probably from cancel_handler - https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/thread/pthread_cancel.c#L67

@davidchisnall
Copy link
Contributor

@KenGordon is working on fixing signal delivery to the correct thread.

@paulcallen paulcallen moved this from Needs triage to Proposed p0 in Issue triage Aug 19, 2020
@bodzhang bodzhang added p0 Blocking priority and removed needs-triage Bug does not yet have a priority assigned labels Aug 21, 2020
@bodzhang bodzhang removed this from Proposed p0 in Issue triage Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority
Projects
None yet
Development

No branches or pull requests

5 participants