Stefan Hajnoczi
2021-Jun-07 15:19 UTC
[PATCH 7/9] vhost: allow userspace to create workers
On Sat, Jun 05, 2021 at 06:53:58PM -0500, michael.christie at oracle.com wrote:> On 6/3/21 9:30 AM, Stefan Hajnoczi wrote: > >> + if (info->pid == VHOST_VRING_NEW_WORKER) { > >> + worker = vhost_worker_create(dev); > > > > The maximum number of kthreads created is limited by > > vhost_dev_init(nvqs)? For example VHOST_SCSI_MAX_VQ 128. > > > > IIUC kthread_create is not limited by or accounted against the current > > task, so I'm a little worried that a process can create a lot of > > kthreads. > > > > I haven't investigated other kthread_create() users reachable from > > userspace applications to see how they bound the number of threads > > effectively. > > Do we want something like io_uring's copy_process use? It's what fork uses, > so we get checks like RLIMIT_NPROC and max_threads. > > I know I didn't look at everything, but it looks like for some software > drivers we just allow the user to run wild. For example for nbd, when we > create the device to do alloc_workqueue and use the default max_active > value (256). We then don't have a limit on connections, so we could end > up with 256 workqueue threads per device. And then there is no limit on > devices a user can make. > > > > > > Any thoughts? > > > > Is the concern a bad VM could create N devs each with 128 vqs/threads > and it would slow down other VMs? How do we handle the case where > some VM makes M * N devs and that is equal to N * 128 so we would end > up with the same number of threads either way? Is there a limit to the > number of vhost devices a VM can make and can I just stick in a similar > check for workers? > > For vhost-scsi specifically, the 128 limit does not make a lot of sense. > I think we want the max to be the number of vCPUs the VM has so we can > add checks for that. Then we would assume someone making a VM with lots of > CPUs is going to have the resources to support them. > > Note: It does make sense from the point of view that we don't know the > number of vCPUs when vhost-scsi calls vhost_dev_init, so I get we had to > select an initial limit.My concern is that threads should probably accounted against RLIMIT_NPROC and max_threads rather than something indirect like 128 * RLIMIT_NOFILE (a userspace process can only have RLIMIT_NOFILE vhost-user file descriptors open).> >> + if (!dev->workers) { > >> + vhost_worker_put(worker); > >> + return -ENOMEM; > >> + } > >> + } > >> + > >> + vq->worker = worker; > >> + > >> + dev->workers[dev->num_workers] = worker; > >> + dev->num_workers++; > > > > Hmm...should we really append to workers[] in the vhost_worker_find() > > case? > > > As it's coded now, yes. Every successful vhost_worker_find call does a > get on the worker's refcount. Later when we delete the device, we loop > over the workers array and for every entry we do a put. > > I can add in some code to first check if the worker is already in the > dev's worker list. If so then skip the refcount and skip adding to the > workers array. If not in the dev's worker list then do a vhost_worker_find. > > I thought it might be nicer how it is now with the single path. It's less > code at least. Later if we add support to change a vq's worker then we also > don't have to worry about refcounts as much. We just always drop the count > taken from when it was added.Thanks for explaining. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210607/10a5efe5/attachment.sig>
On 6/7/21 10:19 AM, Stefan Hajnoczi wrote:> My concern is that threads should probably accounted against > RLIMIT_NPROC and max_threads rather than something indirect like 128 * > RLIMIT_NOFILE (a userspace process can only have RLIMIT_NOFILE > vhost-user file descriptors open). >Ah ok, I see what you want I think. Ok, I think the options are: 0. Nothing. Just use existing indirect/RLIMIT_NOFILE. 1. Do something like io_uring's create_io_thread/copy_process. If we call copy_process from the vhost ioctl context, then the userspace process that did the ioctl will have it's processes count incremented and checked against its rlimit. The drawbacks: - This gets a little more complicated than just calling copy_process though. We end up duplicating a lot of the kthread API. - We have to deal with new error cases like the parent exiting early. - I think all devs sharing a worker have to have the same owner. kthread_use_mm and kthread_unuse_mm to switch between mm's for differrent owner's devs seem to be causing lots of errors. I'm still looking into this one though. 2. It's not really what you want, but for unbound work io_uring has a check for RLIMIT_NPROC in the io_uring code. It does: wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers task_rlimit(current, RLIMIT_NPROC); then does: if (!ret && acct->nr_workers < acct->max_workers) { Drawbacks: In vhost.c, we could do something similar. It would make sure that vhost.c does not create more worker threads than the rlimit value, but we wouldn't be incrementing the userspace process's process count. The userspace process could then create RLIMIT_NPROC threads and vhost.c could also create RLIMIT_NPROC threads, so we end up with 2 * RLIMIT_NPROC threads. 3. Change the kthread and copy_process code so we can pass in the thread (or it's creds or some struct that has the values that need to be check) that needs to be checked and updated. Drawback: This might be considered too ugly for how special case vhost is. For example, we need checks/code like the io_thread/PF_IO_WORKER code in copy_process for io_uring. I can see how added that for io_uring because it affects so many users, but I can see how vhost is not special enough.