Eric W. Biederman
2023-May-25 15:30 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
Oleg Nesterov <oleg at redhat.com> writes:> On 05/24, Eric W. Biederman wrote: >> >> Oleg Nesterov <oleg at redhat.com> writes: >> >> > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt >> > vhost_worker(). >> >> Actually I think it reveals that exiting with SIGABRT will cause >> a deadlock. >> >> coredump_wait will wait for all of the threads to reach >> coredump_task_exit. Meanwhile vhost_worker is waiting for >> all of the other threads to reach exit_files to close their >> file descriptors. > > Indeed, I didn't think about this. > > > So why do we actually need CLONE_THREAD ? Can't vhost_worker() be a kernel thread? > > kthread_create() won't be convenient, but how about kernel_thread() ? it inherits > mm/cgroups/rlimits/etc, kthread_stop() should work just fine.Basically with no patches that is where Linus's kernel is. User complained about the new thread showing up in ps. So the exploration of how we could use CLONE_THREAD instead of what is currently merged began. Eric
Linus Torvalds
2023-May-25 16:20 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On Thu, May 25, 2023 at 8:30?AM Eric W. Biederman <ebiederm at xmission.com> wrote:> > Basically with no patches that is where Linus's kernel is. > > User complained about the new thread showing up in ps.Well, not only that, but it actively broke existing workflows for managing things. Showing up in 'ps' wasn't just some purely cosmetic issue, but had semantic meaning. And honestly, I think the core issue is that we should just make this work. Kernel threads randomly switching to user memory threads was wrong, so CLONE_VM is absolutely the right thing to do. But while "CLONE_VM without real threading" is a very traditional thing in Linux - it was the original model for clone(), after all - I don't believe it is the *correct* model. There was a very real reason clone() has grown CLONE_THREAD and friends. So honestly, I really think we want to complete the vhost move to CLONE_THREAD (and thus CLONE_SIGNAL). Not because the old kthread model didn't _work_, but because it's really really wrong to try to randomly take on user-space attributes at run-time. And once you do the "user threads in kernel space" model, at that point you really do want to act like a proper thread. Both because of that 'ps' issue (which is really just "show the world what your relationship is), but simply because that is the modern threading model that we use for everything else, and special cases are bad. So I'd really like to finish this. Even if we end up with a hack or two in signal handling that we can hopefully fix up later by having vhost fix up some of its current assumptions. It has worked wonderfully well for io_uring - but we *did* have quite a bit of conversion patches over some time as people found issues. Which is why I don't expect the vhost conevrsion to be entirely pain-free either, and I don't think we necessarily have to get to a "perfect and clean" state immediately, just a "working and conceptually in the right direction" state. Linus