thr3ads.net - Virtualization - [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec

If this information is useful, please help other people find it:
Share via:

Mike Christie

2023-May-18 00:09 UTC

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is
set when we are dealing with PF_USER_WORKER tasks.

When a vhost_task gets a SIGKILL, we could have outstanding IO in flight.
We can easily stop new work/IO from being queued to the vhost_task, but
for IO that's already been sent to something like the block layer we
need to wait for the response then process it. These type of IO
completions use the vhost_task to process the completion so we can't
exit immediately.

We need to handle wait for then handle those completions from the
vhost_task, but when we have a SIGKLL pending, functions like
schedule() return immediately so we can't wait like normal. Functions
like vhost_worker() degrade to just a while(1); loop.

This patch has get_signal drop down to the normal code path when
SIGNAL_GROUP_EXIT/group_exec_task is set so the caller can still detect
there is a SIGKILL but still perform some blocking cleanup.

Note that in that chunk I'm now bypassing that does:

sigdelset(&current->pending.signal, SIGKILL);

we look to be ok, because in the places we set SIGNAL_GROUP_EXIT/
group_exec_task we are already doing that on the threads in the
group.

Signed-off-by: Mike Christie <michael.christie at oracle.com>
---
 kernel/signal.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8f6330f0e9ca..ae4972eea5db 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2705,9 +2705,18 @@ bool get_signal(struct ksignal *ksig)
 		struct k_sigaction *ka;
 		enum pid_type type;
 
-		/* Has this task already been marked for death? */
-		if ((signal->flags & SIGNAL_GROUP_EXIT) ||
-		     signal->group_exec_task) {
+		/*
+		 * Has this task already been marked for death?
+		 *
+		 * If this is a PF_USER_WORKER then the task may need to do
+		 * extra work that requires waiting on running work, so we want
+		 * to dequeue the signal below and tell the caller its time to
+		 * start its exit procedure. When the work has completed then
+		 * the task will exit.
+		 */
+		if (!(current->flags & PF_USER_WORKER) &&
+		    ((signal->flags & SIGNAL_GROUP_EXIT) ||
+		     signal->group_exec_task)) {
 			clear_siginfo(&ksig->info);
 			ksig->info.si_signo = signr = SIGKILL;
 			sigdelset(&current->pending.signal, SIGKILL);
@@ -2861,11 +2870,11 @@ bool get_signal(struct ksignal *ksig)
 		}
 
 		/*
-		 * PF_IO_WORKER threads will catch and exit on fatal signals
+		 * PF_USER_WORKER threads will catch and exit on fatal signals
 		 * themselves. They have cleanup that must be performed, so
 		 * we cannot call do_exit() on their behalf.
 		 */
-		if (current->flags & PF_IO_WORKER)
+		if (current->flags & PF_USER_WORKER)
 			goto out;
 
 		/*
-- 
2.25.1

Eric W. Biederman

2023-May-18 02:34 UTC

head link

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

Mike Christie <michael.christie at oracle.com> writes:
> This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is
> set when we are dealing with PF_USER_WORKER tasks.
> When a vhost_task gets a SIGKILL, we could have outstanding IO in flight.
> We can easily stop new work/IO from being queued to the vhost_task, but
> for IO that's already been sent to something like the block layer we
> need to wait for the response then process it. These type of IO
> completions use the vhost_task to process the completion so we can't
> exit immediately.

I understand the concern.
> We need to handle wait for then handle those completions from the
> vhost_task, but when we have a SIGKLL pending, functions like
> schedule() return immediately so we can't wait like normal. Functions
> like vhost_worker() degrade to just a while(1); loop.
>
> This patch has get_signal drop down to the normal code path when
> SIGNAL_GROUP_EXIT/group_exec_task is set so the caller can still detect
> there is a SIGKILL but still perform some blocking cleanup.
>
> Note that in that chunk I'm now bypassing that does:
>
> sigdelset(&current->pending.signal, SIGKILL);
>
> we look to be ok, because in the places we set SIGNAL_GROUP_EXIT/
> group_exec_task we are already doing that on the threads in the
> group.
What you are doing does not make any sense to me.

First there is the semantic non-sense, of queuing something that
is not a signal.   The per task SIGKILL bit is used as a flag with
essentially the same meaning as SIGNAL_GROUP_EXIT, reporting that
the task has been scheduled for exit.

More so is what happens afterwards.

As I read your patch it is roughly equivalent to doing:

	if ((current->flags & PF_USER_WORKER) &&
       	    fatal_signal_pending(current)) {
		sigdelset(&current->pending.signal, SIGKILL);
	        clear_siginfo(&ksig->info);
                ksig->info.si_signo = SIGKILL;
                ksig->info.si_code = SI_USER;
                recalc_sigpending();
		trace_signal_deliver(SIGKILL, &ksig->info,
			&sighand->action[SIGKILL - 1]);
                goto fatal;
	}

Before the "(SIGNAL_GROUP_EXIT || signal->group_exec_task)" test.

To get that code I stripped the active statements out of the
dequeue_signal path the code executes after your change below.

I don't get why you are making it though because the code you
are opting out of does:

		/* Has this task already been marked for death? */
		if ((signal->flags & SIGNAL_GROUP_EXIT) ||
		     signal->group_exec_task) {
			clear_siginfo(&ksig->info);
			ksig->info.si_signo = signr = SIGKILL;
			sigdelset(&current->pending.signal, SIGKILL);
			trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
				&sighand->action[SIGKILL - 1]);
			recalc_sigpending();
			goto fatal;
		}

I don't see what in practice changes, other than the fact that by going
through the ordinary dequeue_signal path that other signals can be
processed after a SIGKILL has arrived.  Of course those signal all
should be blocked.




The trailing bit that expands the PF_IO_WORKER test to be PF_USER_WORKER
appears reasonable, and possibly needed.

Eric

> Signed-off-by: Mike Christie <michael.christie at oracle.com>
> ---
>  kernel/signal.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8f6330f0e9ca..ae4972eea5db 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2705,9 +2705,18 @@ bool get_signal(struct ksignal *ksig)
>  		struct k_sigaction *ka;
>  		enum pid_type type;
>  
> -		/* Has this task already been marked for death? */
> -		if ((signal->flags & SIGNAL_GROUP_EXIT) ||
> -		     signal->group_exec_task) {
> +		/*
> +		 * Has this task already been marked for death?
> +		 *
> +		 * If this is a PF_USER_WORKER then the task may need to do
> +		 * extra work that requires waiting on running work, so we want
> +		 * to dequeue the signal below and tell the caller its time to
> +		 * start its exit procedure. When the work has completed then
> +		 * the task will exit.
> +		 */
> +		if (!(current->flags & PF_USER_WORKER) &&
> +		    ((signal->flags & SIGNAL_GROUP_EXIT) ||
> +		     signal->group_exec_task)) {
>  			clear_siginfo(&ksig->info);
>  			ksig->info.si_signo = signr = SIGKILL;
>  			sigdelset(&current->pending.signal, SIGKILL);
> @@ -2861,11 +2870,11 @@ bool get_signal(struct ksignal *ksig)
>  		}
>  
>  		/*
> -		 * PF_IO_WORKER threads will catch and exit on fatal signals
> +		 * PF_USER_WORKER threads will catch and exit on fatal signals
>  		 * themselves. They have cleanup that must be performed, so
>  		 * we cannot call do_exit() on their behalf.
>  		 */
> -		if (current->flags & PF_IO_WORKER)
> +		if (current->flags & PF_USER_WORKER)
>  			goto out;
>  
>  		/*

Eric W. Biederman

2023-May-18 03:49 UTC

head link

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

Long story short.

In the patch below the first hunk is a noop.

The code you are bypassing was added to ensure that process termination
(aka SIGKILL) is processed before any other signals.  Other than signal
processing order there are not any substantive differences in the two
code paths.  With all signals except SIGSTOP == 19 and SIGKILL == 9
blocked SIGKILL should always be processed before SIGSTOP.

Can you try patch with just the last hunk that does
s/PF_IO_WORKER/PF_USER_WORKER/ and see if that is enough?

I have no objections to the final hunk.

Mike Christie <michael.christie at oracle.com> writes:
> This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is
> set when we are dealing with PF_USER_WORKER tasks.
>
> When a vhost_task gets a SIGKILL, we could have outstanding IO in flight.
> We can easily stop new work/IO from being queued to the vhost_task, but
> for IO that's already been sent to something like the block layer we
> need to wait for the response then process it. These type of IO
> completions use the vhost_task to process the completion so we can't
> exit immediately.
>
> We need to handle wait for then handle those completions from the
> vhost_task, but when we have a SIGKLL pending, functions like
> schedule() return immediately so we can't wait like normal. Functions
> like vhost_worker() degrade to just a while(1); loop.
>
> This patch has get_signal drop down to the normal code path when
> SIGNAL_GROUP_EXIT/group_exec_task is set so the caller can still detect
> there is a SIGKILL but still perform some blocking cleanup.
>
> Note that in that chunk I'm now bypassing that does:
>
> sigdelset(&current->pending.signal, SIGKILL);
>
> we look to be ok, because in the places we set SIGNAL_GROUP_EXIT/
> group_exec_task we are already doing that on the threads in the
> group.
>
> Signed-off-by: Mike Christie <michael.christie at oracle.com>
> ---
>  kernel/signal.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8f6330f0e9ca..ae4972eea5db 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2705,9 +2705,18 @@ bool get_signal(struct ksignal *ksig)
>  		struct k_sigaction *ka;
>  		enum pid_type type;
>  
> -		/* Has this task already been marked for death? */
> -		if ((signal->flags & SIGNAL_GROUP_EXIT) ||
> -		     signal->group_exec_task) {
> +		/*
> +		 * Has this task already been marked for death?
> +		 *
> +		 * If this is a PF_USER_WORKER then the task may need to do
> +		 * extra work that requires waiting on running work, so we want
> +		 * to dequeue the signal below and tell the caller its time to
> +		 * start its exit procedure. When the work has completed then
> +		 * the task will exit.
> +		 */
> +		if (!(current->flags & PF_USER_WORKER) &&
> +		    ((signal->flags & SIGNAL_GROUP_EXIT) ||
> +		     signal->group_exec_task)) {
>  			clear_siginfo(&ksig->info);
>  			ksig->info.si_signo = signr = SIGKILL;
>  			sigdelset(&current->pending.signal, SIGKILL);
This hunk is a confusing no-op.
> @@ -2861,11 +2870,11 @@ bool get_signal(struct ksignal *ksig)
>  		}
>  
>  		/*
> -		 * PF_IO_WORKER threads will catch and exit on fatal signals
> +		 * PF_USER_WORKER threads will catch and exit on fatal signals
>  		 * themselves. They have cleanup that must be performed, so
>  		 * we cannot call do_exit() on their behalf.
>  		 */
> -		if (current->flags & PF_IO_WORKER)
> +		if (current->flags & PF_USER_WORKER)
>  			goto out;
>  
>  		/*
This hunk is good and makes sense.

Eric

Virtualization - May 2023 - [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set