Mike Christie
2023-Jun-06 15:57 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 6/6/23 7:16 AM, Oleg Nesterov wrote:> On 06/05, Mike Christie wrote: >> >> On 6/5/23 10:10 AM, Oleg Nesterov wrote: >>> On 06/03, michael.christie at oracle.com wrote: >>>> >>>> On 6/2/23 11:15 PM, Eric W. Biederman wrote: >>>> The problem is that as part of the flush the drivers/vhost/scsi.c code >>>> will wait for outstanding commands, because we can't free the device and >>>> it's resources before the commands complete or we will hit the accessing >>>> freed memory bug. >>> >>> ignoring send-fd/clone issues, can we assume that the final fput/release >>> should always come from vhost_worker's sub-thread (which shares mm/etc) ? >> >> I think I'm misunderstanding the sub-thread term. >> >> - Is it the task_struct's context that we did the >> kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the >> thread we did VHOST_SET_OWNER from. > > Yes, > >> So it works like if we were using a kthread still: >> >> 1. Userapce thread0 opens /dev/vhost-$something. >> 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to >> create the task_struct which runs the vhost_worker() function which handles >> the work->fns. >> 3. If userspace now does a SIGKILL or just exits without doing a close() on >> /dev/vhost-$something, then when thread0 does exit_files() that will do the >> fput that does vhost-$something's file_operations->release. > > So, at least in this simple case vhost_worker() can just exit after SIGKILL, > and thread0 can flush the outstanding commands when it calls vhost_dev_flush() > rather than wait for vhost_worker(). > > Right?With the current code, the answer is no. We would hang like I mentioned here: https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c at oracle.com/ We need to add code like I mentioned in that reply because we don't have a way to call into the layers below us to flush those commands. We need more like an abort and don't call back into us type of operation. Or, I'm just trying to add a check where we detect what happened then instead of trying to use the vhost_task we try to complete in the context the lower level completes us in.
Oleg Nesterov
2023-Jun-06 19:39 UTC
[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 06/06, Mike Christie wrote:> > On 6/6/23 7:16 AM, Oleg Nesterov wrote: > > On 06/05, Mike Christie wrote: > > > >> So it works like if we were using a kthread still: > >> > >> 1. Userapce thread0 opens /dev/vhost-$something. > >> 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to > >> create the task_struct which runs the vhost_worker() function which handles > >> the work->fns. > >> 3. If userspace now does a SIGKILL or just exits without doing a close() on > >> /dev/vhost-$something, then when thread0 does exit_files() that will do the > >> fput that does vhost-$something's file_operations->release. > > > > So, at least in this simple case vhost_worker() can just exit after SIGKILL, > > and thread0 can flush the outstanding commands when it calls vhost_dev_flush() > > rather than wait for vhost_worker(). > > > > Right? > > With the current code, the answer is no. We would hang like I mentioned here: > > https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c at oracle.com/If only I could fully understand this email ;) Could you spell to explain why this can't work (again, in this simple case) ? My current (and I know, very poor) understanding is that .release() should roughly do the following: 1. Ensure that vhost_work_queue() can't add the new callbacks 2. Call vhost_dev_flush() to ensure that worker->work_list is empty 3. Call vhost_task_stop() so why this sequence can't work if we turn vhost_dev_flush() into something like void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; if (dev->worker) { // this assumes that vhost_task_create() uses CLONE_THREAD if (same_thread_group(current, dev->worker->vtsk->task)) { ... run the pending callbacks ... return; } // this is what we currently have init_completion(&flush.wait_event); vhost_work_init(&flush.work, vhost_flush_work); vhost_work_queue(dev, &flush.work); wait_for_completion(&flush.wait_event); } } ? Mike, I am just trying to understand what exactly vhost_worker() should do.> We need to add code like I mentioned in that reply because we don't have a > way to call into the layers below us to flush those commands.This tells me nothing, but this is my fault, not yours. Again, again, I know nothing about drivers/vhost. Oleg.
Possibly Parallel Threads
- [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
- [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
- [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls