Stefan Hajnoczi
2021-Jun-10 08:06 UTC
[PATCH 7/9] vhost: allow userspace to create workers
On Wed, Jun 09, 2021 at 04:03:55PM -0500, Mike Christie wrote:> On 6/7/21 10:19 AM, Stefan Hajnoczi wrote: > > My concern is that threads should probably accounted against > > RLIMIT_NPROC and max_threads rather than something indirect like 128 * > > RLIMIT_NOFILE (a userspace process can only have RLIMIT_NOFILE > > vhost-user file descriptors open). > > > > Ah ok, I see what you want I think. > > Ok, I think the options are: > > 0. Nothing. Just use existing indirect/RLIMIT_NOFILE. > > 1. Do something like io_uring's create_io_thread/copy_process. If we call > copy_process from the vhost ioctl context, then the userspace process that > did the ioctl will have it's processes count incremented and checked against > its rlimit. > > The drawbacks: > - This gets a little more complicated than just calling copy_process though. > We end up duplicating a lot of the kthread API. > - We have to deal with new error cases like the parent exiting early. > - I think all devs sharing a worker have to have the same owner. kthread_use_mm > and kthread_unuse_mm to switch between mm's for differrent owner's devs seem to > be causing lots of errors. I'm still looking into this one though. > > 2. It's not really what you want, but for unbound work io_uring has a check for > RLIMIT_NPROC in the io_uring code. It does: > > wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers > task_rlimit(current, RLIMIT_NPROC); > > then does: > > if (!ret && acct->nr_workers < acct->max_workers) { > > Drawbacks: > In vhost.c, we could do something similar. It would make sure that vhost.c does > not create more worker threads than the rlimit value, but we wouldn't be > incrementing the userspace process's process count. The userspace process could > then create RLIMIT_NPROC threads and vhost.c could also create RLIMIT_NPROC > threads, so we end up with 2 * RLIMIT_NPROC threads.Yes, in that case we might as well go with Option 0, so I think this option can be eliminated.> 3. Change the kthread and copy_process code so we can pass in the thread > (or it's creds or some struct that has the values that need to be check) that > needs to be checked and updated. > > Drawback: > This might be considered too ugly for how special case vhost is. For example, we > need checks/code like the io_thread/PF_IO_WORKER code in copy_process for io_uring. > I can see how added that for io_uring because it affects so many users, but I can > see how vhost is not special enough.This seems like the most general solution. If you try it and get negative feedback then maybe the maintainers can help suggest how to solve this problem :). Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210610/33a5e6b2/attachment.sig>
On 6/10/21 3:06 AM, Stefan Hajnoczi wrote:> On Wed, Jun 09, 2021 at 04:03:55PM -0500, Mike Christie wrote: >> On 6/7/21 10:19 AM, Stefan Hajnoczi wrote: >>> My concern is that threads should probably accounted against >>> RLIMIT_NPROC and max_threads rather than something indirect like 128 * >>> RLIMIT_NOFILE (a userspace process can only have RLIMIT_NOFILE >>> vhost-user file descriptors open). >>> >> >> Ah ok, I see what you want I think. >> >> Ok, I think the options are: >> >> 0. Nothing. Just use existing indirect/RLIMIT_NOFILE. >> >> 1. Do something like io_uring's create_io_thread/copy_process. If we call >> copy_process from the vhost ioctl context, then the userspace process that >> did the ioctl will have it's processes count incremented and checked against >> its rlimit. >> >> The drawbacks: >> - This gets a little more complicated than just calling copy_process though. >> We end up duplicating a lot of the kthread API. >> - We have to deal with new error cases like the parent exiting early. >> - I think all devs sharing a worker have to have the same owner. kthread_use_mm >> and kthread_unuse_mm to switch between mm's for differrent owner's devs seem to >> be causing lots of errors. I'm still looking into this one though. >> >> 2. It's not really what you want, but for unbound work io_uring has a check for >> RLIMIT_NPROC in the io_uring code. It does: >> >> wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers >> task_rlimit(current, RLIMIT_NPROC); >> >> then does: >> >> if (!ret && acct->nr_workers < acct->max_workers) { >> >> Drawbacks: >> In vhost.c, we could do something similar. It would make sure that vhost.c does >> not create more worker threads than the rlimit value, but we wouldn't be >> incrementing the userspace process's process count. The userspace process could >> then create RLIMIT_NPROC threads and vhost.c could also create RLIMIT_NPROC >> threads, so we end up with 2 * RLIMIT_NPROC threads. > > Yes, in that case we might as well go with Option 0, so I think this > option can be eliminated. > >> 3. Change the kthread and copy_process code so we can pass in the thread >> (or it's creds or some struct that has the values that need to be check) that >> needs to be checked and updated. >> >> Drawback: >> This might be considered too ugly for how special case vhost is. For example, we >> need checks/code like the io_thread/PF_IO_WORKER code in copy_process for io_uring. >> I can see how added that for io_uring because it affects so many users, but I can >> see how vhost is not special enough. > > This seems like the most general solution. If you try it and get > negative feedback then maybe the maintainers can help suggest how to > solve this problem :).Hey, I implemented #3 here: https://github.com/mikechristie/linux/commit/76f7a555a85147420a22d0163c15259e01e02193 in this patchset: https://github.com/mikechristie/linux/commits/kthread-node-user but before I post I wanted to bring up an option 4 someone mentioned to me offlist. Again it's io_uring. Check out fs/io_uring.c:__io_account_mem(). For RLIMIT_MEMLOCK it just does the check and increments the user's counter itself. It's simple like option 2, and it handles the issue where the process doing the ioctl wasn't having its RLIMIT_NPROC checked/updated.