thr3ads.net - Virtualization - [PATCH v11 8/8] vhost: use vhost

If this information is useful, please help other people find it:
Share via:

Mike Christie

2023-May-05 22:37 UTC

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads

On 5/5/23 1:22 PM, Linus Torvalds wrote:> On Fri, May 5, 2023 at 6:40?AM Nicolas Dichtel
> <nicolas.dichtel at 6wind.com> wrote:
>>
>> Is this an intended behavior?
>> This breaks some of our scripts.
> 
> It doesn't just break your scripts (which counts as a regression), I
> think it's really wrong.
> 
> The worker threads should show up as threads of the thing that started
> them, not as processes.
> 
> So they should show up in 'ps' only when one of the "show
threads" flag is set.
> 
> But I suspect the fix is trivial:  the virtio code should likely use
> CLONE_THREAD for the copy_process() it does.
> 
> It should look more like "create_io_thread()" than
"copy_process()", I think.
> 
> For example, do virtio worker threads really want their own signals
> and files? That sounds wrong. create_io_thread() uses all of
> 
>  CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_IO
> 
> to share much more of the context with the process it is actually run
within.
> 
For the vhost tasks and the CLONE flags:

1. I didn't use CLONE_FILES in the vhost task patches because you are right
and we didn't need our own. We needed it to work like kthreads where there
are no files, so I set the kernel_clone_args.no_files bit to have copy_files
not do a dup or clone (task->files is NULL).

2. vhost tasks didn't use CLONE_SIGHAND, because userspace apps like qemu
use
signals for management operations. But, the vhost thread's worker functions
assume signals are ignored like they were with kthreads. So if they were doing
IO and got a signal like a SIGHUP they might return early and fail from whatever
network/block function they were calling. And currently the parent like qemu
handles something like a SIGSTOP by shutting everything down by calling into
the vhost interface to remove the device.

So similar to files I used the kernel_clone_args.ignore_signals bit so
copy_process has the vhost thread have it's own signal handle that just
ignores
signals.

3. I didn't use CLONE_THREAD because before my patches you could do
"ps -u root" and see all the vhost threads. If we use CLONE_THREAD,
then we
can only see it when we do something like "ps -T -p $parent" like you
mentioned
above. I guess I messed up and did the reverse and thought it would be a
regression if "ps -u root" no longer showed the vhost threads.

If it's ok to change the behavior of "ps -u root", then we can do
this patch:
(Nicolas, I confirmed it fixes the 'ps a' case, but couldn't
replicate the 'ps'
case. If you could test the ps only case or give me info on what
/usr/bin/example
was doing I can replicate and test here):

diff --git a/kernel/fork.c b/kernel/fork.c
index ed4e01daccaa..eb9ffc58e211 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2269,8 +2269,14 @@ __latent_entropy struct task_struct *copy_process(
 	/*
 	 * Thread groups must share signals as well, and detached threads
 	 * can only be started up within the thread group.
+	 *
+	 * A userworker's parent thread will normally have a signal handler
+	 * that performs management operations, but the worker will not
+	 * because the parent will handle the signal then user a worker
+	 * specific interface to manage the thread and related resources.
 	 */
-	if ((clone_flags & CLONE_THREAD) && !(clone_flags &
CLONE_SIGHAND))
+	if ((clone_flags & CLONE_THREAD) && !(clone_flags &
CLONE_SIGHAND) &&
+	    !args->user_worker && !args->ignore_signals)
 		return ERR_PTR(-EINVAL);

 	/*
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index b7cbd66f889e..3700c21ea39d 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -75,7 +78,8 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), void
*arg,
 				     const char *name)
 {
 	struct kernel_clone_args args = {
-		.flags		= CLONE_FS | CLONE_UNTRACED | CLONE_VM,
+		.flags		= CLONE_FS | CLONE_THREAD | CLONE_VM |
+				  CLONE_UNTRACED,
 		.exit_signal	= 0,
 		.fn		= vhost_task_fn,
 		.name		= name,

Linus Torvalds

2023-May-06 01:53 UTC

head link

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads

On Fri, May 5, 2023 at 3:38?PM Mike Christie
<michael.christie at oracle.com> wrote:>
> If it's ok to change the behavior of "ps -u root", then we
can do this patch:
I think this is the right thing to do.

Making the user worker threads show up as threads with the vhost
process as the parent really seems like a much better model, and more
accurate.

Yes, they used to show up as random kernel threads, and you'd see them
as such (not just for "ps -u root", but simply also with just a normal
"ps ax" kind of thing). But that isn't all that helpful, and
it's
really just annoying to see our kernel threads in "ps ax" output, and
I've often wished we didn't do that (it think of all the random
"kworker/xyz-kcryptd" etc things that show up).

So I think showing them as the threaded children of the vhost process
is much nicer, and probably the best option.

Because I don't thin kanything is going to get the *old* behavior of
showing them as the '[vhost-xyz]' system threads (or whatever the old
output ended up being in 'ps ax'), but hopefully nothing wants that
horror anyway.

At a minimum, the parenting is fundamentally going to look different
in the new model.

                   Linus

Thorsten Leemhuis

2023-May-13 12:39 UTC

head link

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads

[CCing the regression list]

On 06.05.23 00:37, Mike Christie wrote:> On 5/5/23 1:22 PM, Linus Torvalds wrote:
>> On Fri, May 5, 2023 at 6:40?AM Nicolas Dichtel
>> <nicolas.dichtel at 6wind.com> wrote:
>>>
>>> Is this an intended behavior?
>>> This breaks some of our scripts.
Jumping in here, as I found another problem with that patch: it broke
s2idle on my laptop when a qemu-kvm VM is running, as freezing user
space processes now fails for me:

```
 [  195.442949] PM: suspend entry (s2idle)
 [  195.641271] Filesystems sync: 0.198 seconds
 [  195.833828] Freezing user space processes
 [  215.841084] Freezing user space processes failed after 20.007
seconds (1 tasks refusing to freeze, wq_busy=0):
 [  215.841255] task:vhost-3221      state:R stack:0     pid:3250
ppid:3221   flags:0x00004006
 [  215.841264] Call Trace:
 [  215.841266]  <TASK>
 [  215.841270]  ? update_rq_clock+0x39/0x270
 [  215.841283]  ? _raw_spin_unlock+0x19/0x40
 [  215.841290]  ? __schedule+0x3f/0x1510
 [  215.841296]  ? sysvec_apic_timer_interrupt+0xaf/0xd0
 [  215.841306]  ? schedule+0x61/0xe0
 [  215.841313]  ? vhost_worker+0x87/0xb0 [vhost]
 [  215.841329]  ? vhost_task_fn+0x1a/0x30
 [  215.841336]  ? __pfx_vhost_task_fn+0x10/0x10
 [  215.841341]  ? ret_from_fork+0x2c/0x50
 [  215.841352]  </TASK>
 [  215.841936] OOM killer enabled.
 [  215.841938] Restarting tasks ... done.
 [  215.844204] random: crng reseeded on system resumption
 [  215.957095] PM: suspend exit
 [  215.957185] PM: suspend entry (s2idle)
 [  215.967646] Filesystems sync: 0.010 seconds
 [  215.971326] Freezing user space processes
 [  235.974400] Freezing user space processes failed after 20.003
seconds (1 tasks refusing to freeze, wq_busy=0):
 [  235.974574] task:vhost-3221      state:R stack:0     pid:3250
ppid:3221   flags:0x00004806
 [  235.974583] Call Trace:
 [  235.974586]  <TASK>
 [  235.974593]  ? __schedule+0x184/0x1510
 [  235.974605]  ? sysvec_apic_timer_interrupt+0xaf/0xd0
 [  235.974616]  ? schedule+0x61/0xe0
 [  235.974624]  ? vhost_worker+0x87/0xb0 [vhost]
 [  235.974648]  ? vhost_task_fn+0x1a/0x30
 [  235.974656]  ? __pfx_vhost_task_fn+0x10/0x10
 [  235.974662]  ? ret_from_fork+0x2c/0x50
 [  235.974673]  </TASK>
 [  235.975190] OOM killer enabled.
 [  235.975192] Restarting tasks ... done.
 [  235.978131] random: crng reseeded on system resumption
 [  236.091219] PM: suspend exit
```

After running into the problem I booted 6.3.1-rc1 again and there s2idle
still worked. Didn't do a bisection, just looked at the vhost commits
during the latest merge window; 6e890c5d502 ("vhost: use vhost_tasks for
worker threads") looked suspicious, so I reverted it on top of latest
mainline and then things work again. Through a search on lore I arrived
in this thread and found below patch from Mike. Gave it a try on top of
latest mainline, but it didn't help.

Ciao, Thorsten
> [...]
> If it's ok to change the behavior of "ps -u root", then we
can do this patch:
> (Nicolas, I confirmed it fixes the 'ps a' case, but couldn't
replicate the 'ps'
> case. If you could test the ps only case or give me info on what
/usr/bin/example
> was doing I can replicate and test here):
> 
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ed4e01daccaa..eb9ffc58e211 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2269,8 +2269,14 @@ __latent_entropy struct task_struct *copy_process(
>  	/*
>  	 * Thread groups must share signals as well, and detached threads
>  	 * can only be started up within the thread group.
> +	 *
> +	 * A userworker's parent thread will normally have a signal handler
> +	 * that performs management operations, but the worker will not
> +	 * because the parent will handle the signal then user a worker
> +	 * specific interface to manage the thread and related resources.
>  	 */
> -	if ((clone_flags & CLONE_THREAD) && !(clone_flags &
CLONE_SIGHAND))
> +	if ((clone_flags & CLONE_THREAD) && !(clone_flags &
CLONE_SIGHAND) &&
> +	    !args->user_worker && !args->ignore_signals)
>  		return ERR_PTR(-EINVAL);
>  
>  	/*
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index b7cbd66f889e..3700c21ea39d 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -75,7 +78,8 @@ struct vhost_task *vhost_task_create(int (*fn)(void *),
void *arg,
>  				     const char *name)
>  {
>  	struct kernel_clone_args args = {
> -		.flags		= CLONE_FS | CLONE_UNTRACED | CLONE_VM,
> +		.flags		= CLONE_FS | CLONE_THREAD | CLONE_VM |
> +				  CLONE_UNTRACED,
>  		.exit_signal	= 0,
>  		.fn		= vhost_task_fn,
>  		.name		= name

Virtualization - May 2023 - [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads

[PATCH v11 8/8] vhost: use vhost_tasks for worker threads