Oleg Nesterov
2023-Jun-06 12:16 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 06/05, Mike Christie wrote:> > On 6/5/23 10:10 AM, Oleg Nesterov wrote: > > On 06/03, michael.christie at oracle.com wrote: > >> > >> On 6/2/23 11:15 PM, Eric W. Biederman wrote: > >> The problem is that as part of the flush the drivers/vhost/scsi.c code > >> will wait for outstanding commands, because we can't free the device and > >> it's resources before the commands complete or we will hit the accessing > >> freed memory bug. > > > > ignoring send-fd/clone issues, can we assume that the final fput/release > > should always come from vhost_worker's sub-thread (which shares mm/etc) ? > > I think I'm misunderstanding the sub-thread term. > > - Is it the task_struct's context that we did the > kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the > thread we did VHOST_SET_OWNER from.Yes,> So it works like if we were using a kthread still: > > 1. Userapce thread0 opens /dev/vhost-$something. > 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to > create the task_struct which runs the vhost_worker() function which handles > the work->fns. > 3. If userspace now does a SIGKILL or just exits without doing a close() on > /dev/vhost-$something, then when thread0 does exit_files() that will do the > fput that does vhost-$something's file_operations->release.So, at least in this simple case vhost_worker() can just exit after SIGKILL, and thread0 can flush the outstanding commands when it calls vhost_dev_flush() rather than wait for vhost_worker(). Right? not that I think this can help in the general case ... Oleg.
Mike Christie
2023-Jun-06 15:57 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 6/6/23 7:16 AM, Oleg Nesterov wrote:> On 06/05, Mike Christie wrote: >> >> On 6/5/23 10:10 AM, Oleg Nesterov wrote: >>> On 06/03, michael.christie at oracle.com wrote: >>>> >>>> On 6/2/23 11:15 PM, Eric W. Biederman wrote: >>>> The problem is that as part of the flush the drivers/vhost/scsi.c code >>>> will wait for outstanding commands, because we can't free the device and >>>> it's resources before the commands complete or we will hit the accessing >>>> freed memory bug. >>> >>> ignoring send-fd/clone issues, can we assume that the final fput/release >>> should always come from vhost_worker's sub-thread (which shares mm/etc) ? >> >> I think I'm misunderstanding the sub-thread term. >> >> - Is it the task_struct's context that we did the >> kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the >> thread we did VHOST_SET_OWNER from. > > Yes, > >> So it works like if we were using a kthread still: >> >> 1. Userapce thread0 opens /dev/vhost-$something. >> 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to >> create the task_struct which runs the vhost_worker() function which handles >> the work->fns. >> 3. If userspace now does a SIGKILL or just exits without doing a close() on >> /dev/vhost-$something, then when thread0 does exit_files() that will do the >> fput that does vhost-$something's file_operations->release. > > So, at least in this simple case vhost_worker() can just exit after SIGKILL, > and thread0 can flush the outstanding commands when it calls vhost_dev_flush() > rather than wait for vhost_worker(). > > Right?With the current code, the answer is no. We would hang like I mentioned here: https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c at oracle.com/ We need to add code like I mentioned in that reply because we don't have a way to call into the layers below us to flush those commands. We need more like an abort and don't call back into us type of operation. Or, I'm just trying to add a check where we detect what happened then instead of trying to use the vhost_task we try to complete in the context the lower level completes us in.