thr3ads.net - Virtualization - [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression [Jun 2023]

If this information is useful, please help other people find it:
Share via:

Mike Christie

2023-Jun-06 15:57 UTC

[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

On 6/6/23 7:16 AM, Oleg Nesterov wrote:> On 06/05, Mike Christie wrote:
>>
>> On 6/5/23 10:10 AM, Oleg Nesterov wrote:
>>> On 06/03, michael.christie at oracle.com wrote:
>>>>
>>>> On 6/2/23 11:15 PM, Eric W. Biederman wrote:
>>>> The problem is that as part of the flush the
drivers/vhost/scsi.c code
>>>> will wait for outstanding commands, because we can't free
the device and
>>>> it's resources before the commands complete or we will hit
the accessing
>>>> freed memory bug.
>>>
>>> ignoring send-fd/clone issues, can we assume that the final
fput/release
>>> should always come from vhost_worker's sub-thread (which shares
mm/etc) ?
>>
>> I think I'm misunderstanding the sub-thread term.
>>
>> - Is it the task_struct's context that we did the
>> kernel/vhost_taskc.c:vhost_task_create() from? Below it would be the
>> thread we did VHOST_SET_OWNER from.
> 
> Yes,
> 
>> So it works like if we were using a kthread still:
>>
>> 1. Userapce thread0 opens /dev/vhost-$something.
>> 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create()
to
>> create the task_struct which runs the vhost_worker() function which
handles
>> the work->fns.
>> 3. If userspace now does a SIGKILL or just exits without doing a
close() on
>> /dev/vhost-$something, then when thread0 does exit_files() that will do
the
>> fput that does vhost-$something's file_operations->release.
> 
> So, at least in this simple case vhost_worker() can just exit after
SIGKILL,
> and thread0 can flush the outstanding commands when it calls
vhost_dev_flush()
> rather than wait for vhost_worker().
> 
> Right?
With the current code, the answer is no. We would hang like I mentioned here:

https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c at oracle.com/

We need to add code like I mentioned in that reply because we don't have a
way to call into the layers below us to flush those commands. We need more
like an abort and don't call back into us type of operation. Or, I'm
just trying
to add a check where we detect what happened then instead of trying to use
the vhost_task we try to complete in the context the lower level completes us
in.

Oleg Nesterov

2023-Jun-06 19:39 UTC

head link

[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

On 06/06, Mike Christie wrote:>
> On 6/6/23 7:16 AM, Oleg Nesterov wrote:
> > On 06/05, Mike Christie wrote:
> >
> >> So it works like if we were using a kthread still:
> >>
> >> 1. Userapce thread0 opens /dev/vhost-$something.
> >> 2. thread0 does VHOST_SET_OWNER ioctl. This calls
vhost_task_create() to
> >> create the task_struct which runs the vhost_worker() function
which handles
> >> the work->fns.
> >> 3. If userspace now does a SIGKILL or just exits without doing a
close() on
> >> /dev/vhost-$something, then when thread0 does exit_files() that
will do the
> >> fput that does vhost-$something's file_operations->release.
> >
> > So, at least in this simple case vhost_worker() can just exit after
SIGKILL,
> > and thread0 can flush the outstanding commands when it calls
vhost_dev_flush()
> > rather than wait for vhost_worker().
> >
> > Right?
>
> With the current code, the answer is no. We would hang like I mentioned
here:
>
> https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c at
oracle.com/
If only I could fully understand this email ;)

Could you spell to explain why this can't work (again, in this simple case)
?

My current (and I know, very poor) understanding is that .release() should
roughly do the following:

	1. Ensure that vhost_work_queue() can't add the new callbacks

	2. Call vhost_dev_flush() to ensure that worker->work_list is empty

	3. Call vhost_task_stop()

so why this sequence can't work if we turn vhost_dev_flush() into something
like

	void vhost_dev_flush(struct vhost_dev *dev)
	{
		struct vhost_flush_struct flush;

		if (dev->worker) {
			// this assumes that vhost_task_create() uses CLONE_THREAD
			if (same_thread_group(current, dev->worker->vtsk->task)) {
				... run the pending callbacks ...
				return;
			}


			// this is what we currently have

			init_completion(&flush.wait_event);
			vhost_work_init(&flush.work, vhost_flush_work);

			vhost_work_queue(dev, &flush.work);
			wait_for_completion(&flush.wait_event);
		}
	}

?

Mike, I am just trying to understand what exactly vhost_worker() should do.
> We need to add code like I mentioned in that reply because we don't
have a
> way to call into the layers below us to flush those commands.
This tells me nothing, but this is my fault, not yours. Again, again, I know
nothing about drivers/vhost.

Oleg.

Seemingly Similar Threads

Search for more apparently analagous threads

Virtualization - Jun 2023 - [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

Seemingly Similar Threads