Oleg Nesterov
2023-May-29 11:19 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 05/27, Eric W. Biederman wrote:> > Looking forward I don't see not asking the worker threads to stop > for the coredump right now causing any problems in the future. > So I think we can use this to resolve the coredump issue I spotted.But we have almost the same problem with exec. Execing thread will wait for vhost_worker() while vhost_worker will wait for .release -> vhost_task_stop(). And even O_CLOEXEC won't help, do_close_on_exec() is called after de_thread(). Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES... If we want CLONE_THREAD, I think vhost_worker() should exit after get_signal() returns SIGKILL. Perhaps it should "disable" vhost_work_queue() somehow and flush the pending works on ->work_list before exit, I dunno. But imo it should not wait for the final fput(). Oleg.
michael.christie at oracle.com
2023-May-29 16:09 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 5/29/23 6:19 AM, Oleg Nesterov wrote:> On 05/27, Eric W. Biederman wrote: >> >> Looking forward I don't see not asking the worker threads to stop >> for the coredump right now causing any problems in the future. >> So I think we can use this to resolve the coredump issue I spotted. > > But we have almost the same problem with exec. > > Execing thread will wait for vhost_worker() while vhost_worker will wait for > .release -> vhost_task_stop().For this type of case, what is the goal or correct behavior in the end? When get_signal returns true we can code things like you mention below and clean up the task_struct. However, we now have a non-functioning vhost device open and just sitting around taking up memory and it can't do any IO. For this type of case, do we expect just not to crash/hang, or was this new exec'd thread suppose to be able to use the vhost device? I would normally say it probably wants to use the vhost device still. However, I don't think this comes up so just not hanging might be ok. Before 6.4-rc1, we ignored signals so it would have worked if we are concerned about a possible regression if this was a common thing.> > And even O_CLOEXEC won't help, do_close_on_exec() is called after de_thread(). > > Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES...You mean the vhost_task's task/thread doing a function that does a copy_process right? That type of thing is not needed. I can add a check in vhost_task_create for this so new code doesn't try to do it. I don't think it will come up that some code vhost is using will call kernel_thread/copy_process directly since those calls are so rare and the functions are not exported to modules.> > If we want CLONE_THREAD, I think vhost_worker() should exit after get_signal() > returns SIGKILL. Perhaps it should "disable" vhost_work_queue() somehow and > flush the pending works on ->work_list before exit, I dunno. But imo it should > not wait for the final fput(). > > Oleg. >
michael.christie at oracle.com
2023-May-29 16:11 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 5/29/23 6:19 AM, Oleg Nesterov wrote:> If we want CLONE_THREAD, I think vhost_worker() should exit after get_signal() > returns SIGKILL. Perhaps it should "disable" vhost_work_queue() somehow and > flush the pending works on ->work_list before exit, I dunno. But imo it should > not wait for the final fput().Just a FYI, I coded this. It's pre-RFC quality. It's invasive. If we want to go this route then we have to do a temp hack for 6.4 or revert then do this route for 6.5 or 6.6 (vhost devs will need time to review and we need time to full test).