michael.christie at oracle.com
2021-Oct-22 18:17 UTC
[PATCH V3 11/11] vhost: allow userspace to create workers
On 10/22/21 11:12 AM, michael.christie at oracle.com wrote:> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote: >>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h >>> index c998860d7bbc..e5c0669430e5 100644 >>> --- a/include/uapi/linux/vhost.h >>> +++ b/include/uapi/linux/vhost.h >>> @@ -70,6 +70,17 @@ >>> #define VHOST_VRING_BIG_ENDIAN 1 >>> #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) >>> #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) >>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER >>> + * that its virtqueues share. This allows userspace to create a vhost_worker >>> + * and map a virtqueue to it or map a virtqueue to an existing worker. >>> + * >>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound >>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be >>> + * created and bound to the vq. >>> + * >>> + * This must be called after VHOST_SET_OWNER and before the vq is active. >>> + */ >> >> A couple of things here: >> it's probably a good idea not to make it match pid exactly, >> if for no other reason than I'm not sure we want to >> commit this being a pid. Let's just call it an id? > > Ok. > >> And maybe byteswap it or xor with some value >> just to make sure userspace does not begin abusing it anyway. >> >> Also, interaction with pid namespace is unclear to me. >> Can you document what happens here? > > This current patchset only allows the vhost_dev owner to > create/bind workers for devices it owns, so namespace don't comeI made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER to the same owner like I wrote. However, it looks like we could have 2 threads with the same mm pointer so vhost_dev_check_owner returns true, but they could be in different namespaces. Even though we are not going to pass the pid_t between user/kernel space, should I add a pid namespace check when I repost the patches?> into play. If a thread from another namespace tried to create/bind > a worker we would hit the owner checks in vhost_dev_ioctl which is > done before vhost_vring_ioctl normally (for vdpa we hit the use_worker > check and fail there). > > However, with the kernel worker API changes the worker threads will > now be in the vhost dev owner's namespace and not the kthreadd/default > one, so in the future we are covered if we want to do something more > advanced. For example, I've seen people working on an API to export the > worker pids: > > https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj at steredhat/T/ > > and in the future for interfaces that export that info we could restrict > access to root or users from the same namespace or I guess add interfaces > to allow different namespaces to see the workers and share them. > > >> No need to fix funky things like moving the fd between >> pid namespaces while also creating/destroying workers, but let's >> document it's not supported. > > Ok. I'll add a comment. >
Michael S. Tsirkin
2021-Oct-23 20:11 UTC
[PATCH V3 11/11] vhost: allow userspace to create workers
On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie at oracle.com wrote:> On 10/22/21 11:12 AM, michael.christie at oracle.com wrote: > > On 10/22/21 5:47 AM, Michael S. Tsirkin wrote: > >>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > >>> index c998860d7bbc..e5c0669430e5 100644 > >>> --- a/include/uapi/linux/vhost.h > >>> +++ b/include/uapi/linux/vhost.h > >>> @@ -70,6 +70,17 @@ > >>> #define VHOST_VRING_BIG_ENDIAN 1 > >>> #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) > >>> #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) > >>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER > >>> + * that its virtqueues share. This allows userspace to create a vhost_worker > >>> + * and map a virtqueue to it or map a virtqueue to an existing worker. > >>> + * > >>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound > >>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be > >>> + * created and bound to the vq. > >>> + * > >>> + * This must be called after VHOST_SET_OWNER and before the vq is active. > >>> + */ > >> > >> A couple of things here: > >> it's probably a good idea not to make it match pid exactly, > >> if for no other reason than I'm not sure we want to > >> commit this being a pid. Let's just call it an id? > > > > Ok. > > > >> And maybe byteswap it or xor with some value > >> just to make sure userspace does not begin abusing it anyway. > >> > >> Also, interaction with pid namespace is unclear to me. > >> Can you document what happens here? > > > > This current patchset only allows the vhost_dev owner to > > create/bind workers for devices it owns, so namespace don't come > > I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER > to the same owner like I wrote. However, it looks like we could have 2 > threads with the same mm pointer so vhost_dev_check_owner returns true, > but they could be in different namespaces. > > Even though we are not going to pass the pid_t between user/kernel > space, should I add a pid namespace check when I repost the patches?Um it's part of the ioctl. How you are not going to pass it around? So if we do worry about this, I would just make it a 64 bit integer, rename it "id" and increment each time a thread is created.> > > into play. If a thread from another namespace tried to create/bind > > a worker we would hit the owner checks in vhost_dev_ioctl which is > > done before vhost_vring_ioctl normally (for vdpa we hit the use_worker > > check and fail there). > > > > However, with the kernel worker API changes the worker threads will > > now be in the vhost dev owner's namespace and not the kthreadd/default > > one, so in the future we are covered if we want to do something more > > advanced. For example, I've seen people working on an API to export the > > worker pids: > > > > https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj at steredhat/T/ > > > > and in the future for interfaces that export that info we could restrict > > access to root or users from the same namespace or I guess add interfaces > > to allow different namespaces to see the workers and share them. > > > > > >> No need to fix funky things like moving the fd between > >> pid namespaces while also creating/destroying workers, but let's > >> document it's not supported. > > > > Ok. I'll add a comment. > >