Mike Christie
2023-May-18 15:27 UTC
[RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set
On 5/18/23 3:08 AM, Christian Brauner wrote:> On Wed, May 17, 2023 at 07:09:13PM -0500, Mike Christie wrote: >> This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is >> set when we are dealing with PF_USER_WORKER tasks. >> >> When a vhost_task gets a SIGKILL, we could have outstanding IO in flight. >> We can easily stop new work/IO from being queued to the vhost_task, but >> for IO that's already been sent to something like the block layer we >> need to wait for the response then process it. These type of IO >> completions use the vhost_task to process the completion so we can't >> exit immediately. >> >> We need to handle wait for then handle those completions from the >> vhost_task, but when we have a SIGKLL pending, functions like >> schedule() return immediately so we can't wait like normal. Functions >> like vhost_worker() degrade to just a while(1); loop. >> >> This patch has get_signal drop down to the normal code path when >> SIGNAL_GROUP_EXIT/group_exec_task is set so the caller can still detect >> there is a SIGKILL but still perform some blocking cleanup. >> >> Note that in that chunk I'm now bypassing that does: >> >> sigdelset(¤t->pending.signal, SIGKILL); >> >> we look to be ok, because in the places we set SIGNAL_GROUP_EXIT/ >> group_exec_task we are already doing that on the threads in the >> group. >> >> Signed-off-by: Mike Christie <michael.christie at oracle.com> >> --- > > I think you just got confused by the original discussion that was split > into two separate threads: > > (1) The discussion based on your original proposal to adjust the signal > handling logic to accommodate vhost workers as they are right now. > That's where Oleg jumped in. > (2) My request - which you did in this series - of rewriting vhost > workers to behave more like io_uring workers. > > Both problems are orthogonal. The gist of my proposal is to avoid (1) by > doing (2). So the only change that's needed is > s/PF_IO_WORKER/PF_USER_WORKER/ which is pretty obvious as io_uring > workers and vhost workers no almost fully collapse into the same > concept. > > So forget (1). If additional signal patches are needed as discussed in > (1) then it must be because of a bug that would affect io_uring workers > today.I maybe didn't exactly misunderstand you. I did patch 1/8 to show issues I hit when I'm doing 2-8. See my reply to Eric's question about what I'm hitting and why the last part of the patch only did not work for me: https://lore.kernel.org/lkml/20230518000920.191583-2-michael.christie at oracle.com/T/#mc6286d1a42c79761248ba55f1dd7a433379be6d1