Oleg Nesterov
2023-May-29 17:46 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
Mike, sorry, I don't understand your email. Just in case, let me remind I know nothing about drivers/vhost/ On 05/29, michael.christie at oracle.com wrote:> > On 5/29/23 6:19 AM, Oleg Nesterov wrote: > > On 05/27, Eric W. Biederman wrote: > >> > >> Looking forward I don't see not asking the worker threads to stop > >> for the coredump right now causing any problems in the future. > >> So I think we can use this to resolve the coredump issue I spotted. > > > > But we have almost the same problem with exec. > > > > Execing thread will wait for vhost_worker() while vhost_worker will wait for > > .release -> vhost_task_stop(). > > For this type of case, what is the goal or correct behavior in the end? > > When get_signal returns true we can code things like you mention belowand you have mentioned in the next email that you have already coded something like this, so perhaps we can delay the further discussions until you send the new code?> and > clean up the task_struct.Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap itself,> However, we now have a non-functioning vhost device > open and just sitting around taking up memory and it can't do any IO.can't comment, see above.> For this type of case, do we expect just not to crash/hang, or was this new > exec'd thread suppose to be able to use the vhost device?I just tried to point out that (unless I missed something) there are more corner cases, not just coredump.> > Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES... > > You mean the vhost_task's task/thread doing a function that does a copy_process > right?I meant that the vhost_task's sub-thread can do sys_clone(CLONE_FILES) from userspace. Yes, this implies copy_process() but I still can't understand you.> That type of thing is not needed.Do you mean that userspace should never do this? But this doesn't matter, the kernel should handle this case anyway. Or what? In short let me repeat that I don't understand you and - of course! - quite possibly I missed something. Oleg.
Oleg Nesterov
2023-May-29 17:54 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 05/29, Oleg Nesterov wrote:> > Mike, sorry, I don't understand your email. > > Just in case, let me remind I know nothing about drivers/vhost/And... this is slightly off-topic but you didn't answer my previous question and I am just curious. Do you agree that, even if we forget about CLONE_THREAD/signals, vhost_worker() needs fixes anyway because it can race with vhost_work_queue() ? Oleg.
Mike Christie
2023-May-29 19:35 UTC
[PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On 5/29/23 12:46 PM, Oleg Nesterov wrote:> Mike, sorry, I don't understand your email. > > Just in case, let me remind I know nothing about drivers/vhost/ >No problem. I get it. I don't know the signal/thread code so it's one of those things where I'm also learning as I go.> On 05/29, michael.christie at oracle.com wrote: >> >> On 5/29/23 6:19 AM, Oleg Nesterov wrote: >>> On 05/27, Eric W. Biederman wrote: >>>> >>>> Looking forward I don't see not asking the worker threads to stop >>>> for the coredump right now causing any problems in the future. >>>> So I think we can use this to resolve the coredump issue I spotted. >>> >>> But we have almost the same problem with exec. >>> >>> Execing thread will wait for vhost_worker() while vhost_worker will wait for >>> .release -> vhost_task_stop(). >> >> For this type of case, what is the goal or correct behavior in the end? >> >> When get_signal returns true we can code things like you mention below > > and you have mentioned in the next email that you have already coded something > like this, so perhaps we can delay the further discussions until you send the > new code? >Ok. Let me post that. You guys and the vhost devs can argue about if it's too ugly to merge :)>> and >> clean up the task_struct. > > Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap itself,Oh wait, are you saying that when we get auto-reaped then we would do the last fput and call the file_operations->release function right? We actually set task_struct->files = NULL for the vhost_task task_struct, so I think we call release a little sooner than you think. vhost_task_create() sets kernel_clone_args->no_files, so the vhost_task task_struc that gets created works like kthreads where it doesn't do a CLONE_FILES and it doesn't do a dup_fd. So when we do de_thread() -> zap_other_threads(), that will kill all the threads in the group right? So when they exit, it will call our release function since we don't have refcount on ourself.> >> However, we now have a non-functioning vhost device >> open and just sitting around taking up memory and it can't do any IO. > > can't comment, see above. > >> For this type of case, do we expect just not to crash/hang, or was this new >> exec'd thread suppose to be able to use the vhost device? > > I just tried to point out that (unless I missed something) there are more corner > cases, not just coredump.Ok. I see.> >>> Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES... >> >> You mean the vhost_task's task/thread doing a function that does a copy_process >> right? > > I meant that the vhost_task's sub-thread can do sys_clone(CLONE_FILES) from > userspace.I think we were talking about different things. I was saying that when the vhost layer does vhost_task_create() -> copy_process(), the kernel_clone_args.fn is vhost_task_fn() -> vhost_worker(). I thought you were concerned about vhost_worker() or some function it calls, calling copy_process() with CLONE_FILES.