Mike Christie
2023-Jun-05 15:46 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 6/5/23 10:10 AM, Oleg Nesterov wrote:> On 06/03, michael.christie at oracle.com wrote: >> >> On 6/2/23 11:15 PM, Eric W. Biederman wrote: >> The problem is that as part of the flush the drivers/vhost/scsi.c code >> will wait for outstanding commands, because we can't free the device and >> it's resources before the commands complete or we will hit the accessing >> freed memory bug. > > ignoring send-fd/clone issues, can we assume that the final fput/release > should always come from vhost_worker's sub-thread (which shares mm/etc) ?I think I'm misunderstanding the sub-thread term. - Is it the task_struct's context that we did the kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the thread we did VHOST_SET_OWNER from. If so, then yes. - Is it the task_struct that gets created by kernel/vhost_taskc.c:vhost_task_create()? If so, then the answer is no. vhost_task_create has set the no_files arg on kernel_clone_args, so copy_files() sets task_struct->files to NULL and we don't clone or dup the files. So it works like if we were using a kthread still: 1. Userapce thread0 opens /dev/vhost-$something. 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to create the task_struct which runs the vhost_worker() function which handles the work->fns. 3. If userspace now does a SIGKILL or just exits without doing a close() on /dev/vhost-$something, then when thread0 does exit_files() that will do the fput that does vhost-$something's file_operations->release.
Oleg Nesterov
2023-Jun-06 12:16 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 06/05, Mike Christie wrote:> > On 6/5/23 10:10 AM, Oleg Nesterov wrote: > > On 06/03, michael.christie at oracle.com wrote: > >> > >> On 6/2/23 11:15 PM, Eric W. Biederman wrote: > >> The problem is that as part of the flush the drivers/vhost/scsi.c code > >> will wait for outstanding commands, because we can't free the device and > >> it's resources before the commands complete or we will hit the accessing > >> freed memory bug. > > > > ignoring send-fd/clone issues, can we assume that the final fput/release > > should always come from vhost_worker's sub-thread (which shares mm/etc) ? > > I think I'm misunderstanding the sub-thread term. > > - Is it the task_struct's context that we did the > kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the > thread we did VHOST_SET_OWNER from.Yes,> So it works like if we were using a kthread still: > > 1. Userapce thread0 opens /dev/vhost-$something. > 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to > create the task_struct which runs the vhost_worker() function which handles > the work->fns. > 3. If userspace now does a SIGKILL or just exits without doing a close() on > /dev/vhost-$something, then when thread0 does exit_files() that will do the > fput that does vhost-$something's file_operations->release.So, at least in this simple case vhost_worker() can just exit after SIGKILL, and thread0 can flush the outstanding commands when it calls vhost_dev_flush() rather than wait for vhost_worker(). Right? not that I think this can help in the general case ... Oleg.