0%

c++线程异常退出流程

问题:

  1. 当std::thread调用析构时,若thread没有正确退出(没有return),则会产生coredump,为什么?
  2. 若在user space捕捉SIGABRT信号,thread异常退出为何还会产生coredump?

先来看看coredump的产生

kernel space

内核在处理进程信号时,调用do_signal(arch/arm/kernel/signal.c)

1
static int do_signal(struct pt_regs *regs, int syscall)

接着调用get_signal处理信号,

1
2
3
if (get_signal(&ksig)) {
handle_signal(&ksig, regs);
}

取出信号,若用户有绑定signal handler,则返回non-zero值,调用用户handler,不进行内核默认处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bool get_signal(struct ksignal *ksig) 
...
signr = dequeue_signal(current, &current->blocked, &ksig->info);
...
if (ka->sa.sa_handler == SIG_IGN) /* Do nothing. */
continue;
if (ka->sa.sa_handler != SIG_DFL) {
/* Run the handler. */
ksig->ka = *ka;

if (ka->sa.sa_flags & SA_ONESHOT)
ka->sa.sa_handler = SIG_DFL;

break; /* will return non-zero "signr" value */
}

若用户没有绑定signal handler,则使用内核默认处理方式

若signal在coredump信号列表定义内,则会产生coredump

1
2
3
4
5
6
7
8
9
10
11
12
13
14
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(ksig->info.si_signo);
proc_coredump_connector(current);
/*
* If it was able to dump core, this kills all
* other threads in the group and synchronizes with
* their demise. If we lost the race with another
* thread getting here, it set group_exit_code
* first and our do_group_exit call below will use
* that value and ignore the one we pass it.
*/
do_coredump(&ksig->info);
}

注意SIGABRT在coredump信号定义内

1
2
3
4
5
6
7
#define SIG_KERNEL_COREDUMP_MASK (\
rt_sigmask(SIGQUIT) | rt_sigmask(SIGILL) | \
rt_sigmask(SIGTRAP) | rt_sigmask(SIGABRT) | \
rt_sigmask(SIGFPE) | rt_sigmask(SIGSEGV) | \
rt_sigmask(SIGBUS) | rt_sigmask(SIGSYS) | \
rt_sigmask(SIGXCPU) | rt_sigmask(SIGXFSZ) | \
SIGEMT_MASK )

显然,线程退出时,进程收到了满足coredump的信号,才会产生coredump

接着看user space的实现,线程析构时,发出了什么信号

user space

局部对象实例在离开作用域时会被调用析构函数,std线程的析构函数如下:

若线程因一些原因如busy waiting而没有退出,其joinable会为true,则析构会调用terminate

terminate默认会调用abort

看下glibc中abort的实现:

https://code.woboq.org/userspace/glibc/stdlib/abort.c.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* Send signal which possibly calls a user handler.  */
if (stage == 1)
{
/* This stage is special: we must allow repeated calls of
`abort' when a user defined handler for SIGABRT is installed.
This is risky since the `raise' implementation might also
fail but I don't see another possibility. */
int save_stage = stage;
stage = 0;
__libc_lock_unlock_recursive (lock);
raise (SIGABRT);
__libc_lock_lock_recursive (lock);
stage = save_stage + 1;
}
/* There was a handler installed. Now remove it. */
if (stage == 2)
{
++stage;
memset (&act, '\0', sizeof (struct sigaction));
act.sa_handler = SIG_DFL;
__sigfillset (&act.sa_mask);
act.sa_flags = 0;
__sigaction (SIGABRT, &act, NULL);
}
/* Try again. */
if (stage == 3)
{
++stage;
raise (SIGABRT);
}

可以看到,abort会先发送一次SIGABRT

然后清掉用户捕捉信号的handler

再发送一次SIGABRT

conclusion

线程析构发生terminate时,会调用abort先产生一次SIGABRT

内核发现用户若有注册handler,则调用用户handler,此时不产生coredump

之后,abort函数清除掉用户的handler,再发一次ABRT信号

此时内核发现没有用户handler,走默认处理函数,ARBT信号产生coredump

所以线程异常退出产生coredump的原因是发出了SIGABRT信号

而即使在user space设置了SIGABRT的捕捉信号,由于abort函数的机制,仍然会产生coredump