thr3ads.net - Linux Virtualization - [PATCH v2] vhost: introduce mdev based hardware backend [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2019-Oct-24 10:42 UTC

[PATCH v2] vhost: introduce mdev based hardware backend

On 2019/10/24 ??5:18, Tiwei Bie wrote:> On Thu, Oct 24, 2019 at 04:32:42PM +0800, Jason Wang wrote:
>> On 2019/10/24 ??4:03, Jason Wang wrote:
>>> On 2019/10/24 ??12:21, Tiwei Bie wrote:
>>>> On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote:
>>>>> On 2019/10/23 ??6:11, Tiwei Bie wrote:
>>>>>> On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang
wrote:
>>>>>>> On 2019/10/23 ??3:07, Tiwei Bie wrote:
>>>>>>>> On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason
Wang wrote:
>>>>>>>>> On 2019/10/23 ??11:02, Tiwei Bie wrote:
>>>>>>>>>> On Tue, Oct 22, 2019 at 09:30:16PM
+0800, Jason Wang wrote:
>>>>>>>>>>> On 2019/10/22 ??5:52, Tiwei Bie
wrote:
>>>>>>>>>>>> This patch introduces a mdev
based hardware vhost backend.
>>>>>>>>>>>> This backend is built on top of
the same abstraction used
>>>>>>>>>>>> in virtio-mdev and provides a
generic vhost interface for
>>>>>>>>>>>> userspace to accelerate the
virtio devices in guest.
>>>>>>>>>>>>
>>>>>>>>>>>> This backend is implemented as
a mdev device driver on top
>>>>>>>>>>>> of the same mdev device ops
used in virtio-mdev but using
>>>>>>>>>>>> a different mdev class id, and
it will register the device
>>>>>>>>>>>> as a VFIO device for userspace
to use. Userspace can setup
>>>>>>>>>>>> the IOMMU with the existing
VFIO container/group APIs and
>>>>>>>>>>>> then get the device fd with the
device name. After getting
>>>>>>>>>>>> the device fd of this device,
userspace can use vhost ioctls
>>>>>>>>>>>> to setup the backend.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Tiwei Bie
<tiwei.bie at intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> This patch depends on below
series:
>>>>>>>>>>>>
https://lkml.org/lkml/2019/10/17/286
>>>>>>>>>>>>
>>>>>>>>>>>> v1 -> v2:
>>>>>>>>>>>> - Replace _SET_STATE with
_SET_STATUS (MST);
>>>>>>>>>>>> - Check status bits at each
step (MST);
>>>>>>>>>>>> - Report the max ring size and
max number of queues (MST);
>>>>>>>>>>>> - Add missing
MODULE_DEVICE_TABLE (Jason);
>>>>>>>>>>>> - Only support the network
backend w/o multiqueue for now;
>>>>>>>>>>> Any idea on how to extend it to
support
>>>>>>>>>>> devices other than net? I think we
>>>>>>>>>>> want a generic API or an API that
could
>>>>>>>>>>> be made generic in the future.
>>>>>>>>>>>
>>>>>>>>>>> Do we want to e.g having a generic
vhost
>>>>>>>>>>> mdev for all kinds of devices or
>>>>>>>>>>> introducing e.g vhost-net-mdev and
vhost-scsi-mdev?
>>>>>>>>>> One possible way is to do what
vhost-user does. I.e. Apart from
>>>>>>>>>> the generic ring, features, ... related
ioctls, we also introduce
>>>>>>>>>> device specific ioctls when we need
them. As vhost-mdev just needs
>>>>>>>>>> to forward configs between parent and
userspace and even won't
>>>>>>>>>> cache any info when possible,
>>>>>>>>> So it looks to me this is only possible if
we
>>>>>>>>> expose e.g set_config and
>>>>>>>>> get_config to userspace.
>>>>>>>> The set_config and get_config interface
isn't really everything
>>>>>>>> of device specific settings. We also have ctrlq
in virtio-net.
>>>>>>> Yes, but it could be processed by the exist API.
Isn't
>>>>>>> it? Just set ctrl vq
>>>>>>> address and let parent to deal with that.
>>>>>> I mean how to expose ctrlq related settings to
userspace?
>>>>> I think it works like:
>>>>>
>>>>> 1) userspace find ctrl_vq is supported
>>>>>
>>>>> 2) then it can allocate memory for ctrl vq and set its
address through
>>>>> vhost-mdev
>>>>>
>>>>> 3) userspace can populate ctrl vq itself
>>>> I see. That is to say, userspace e.g. QEMU will program the
>>>> ctrl vq with the existing VHOST_*_VRING_* ioctls, and parent
>>>> drivers should know that the addresses used in ctrl vq are
>>>> host virtual addresses in vhost-mdev's case.
>>>
>>> That's really good point. And that means parent needs to differ
vhost
>>> from virtio. It should work.
>>
>> HVA may only work when we have something similar to VHOST_SET_OWNER
which
>> can reuse MM of its owner.
> We already have VHOST_SET_OWNER in vhost now, parent can handle
> the commands in its .kick_vq() which is called by vq's .handle_kick
> callback. Virtio-user did something similar:
>
>
https://github.com/DPDK/dpdk/blob/0da7f445df445630c794897347ee360d6fe6348b/drivers/net/virtio/virtio_user_ethdev.c#L313-L322

This probably means a process context is required, something like 
kthread that is used by vhost which seems a burden for parent. Or we can 
extend ioctl to processing kick in the system call context.

>
>>
>>> But is there any chance to use DMA address? I'm asking since
the API
>>> then tends to be device specific.
>>
>> I wonder whether we can introduce MAP IOMMU notifier and get DMA
mappings
>> from that.
> I think this will complicate things unnecessarily and may
> bring pains. Because, in vhost-mdev, mdev's ctrl vq is
> supposed to be managed by host.

Yes.

>   And we should try to avoid
> putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
> guests having the chance to bypass the host (e.g. QEMU) to
> setup the backend accelerator directly.

That's really good point.? So when "vhost" type is created, parent
should assume addr of ctrl_vq is hva.

Thanks

>
>> Thanks
>>

Jason Wang

2019-Oct-25 09:54 UTC

head link

[PATCH v2] vhost: introduce mdev based hardware backend

On 2019/10/24 ??6:42, Jason Wang wrote:>
> Yes.
>
>
>> ? And we should try to avoid
>> putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
>> guests having the chance to bypass the host (e.g. QEMU) to
>> setup the backend accelerator directly.
>
>
> That's really good point.? So when "vhost" type is created,
parent
> should assume addr of ctrl_vq is hva.
>
> Thanks

This works for vhost but not virtio since there's no way for virtio 
kernel driver to differ ctrl_vq with the rest when doing DMA map. One 
possible solution is to provide DMA domain isolation between virtqueues. 
Then ctrl vq can use its dedicated DMA domain for the work.

Anyway, this could be done in the future. We can have a version first 
that doesn't support ctrl_vq.

Thanks

Michael S. Tsirkin

2019-Oct-25 12:16 UTC

head link

[PATCH v2] vhost: introduce mdev based hardware backend

On Fri, Oct 25, 2019 at 05:54:55PM +0800, Jason Wang
wrote:> 
> On 2019/10/24 ??6:42, Jason Wang wrote:
> > 
> > Yes.
> > 
> > 
> > > ? And we should try to avoid
> > > putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
> > > guests having the chance to bypass the host (e.g. QEMU) to
> > > setup the backend accelerator directly.
> > 
> > 
> > That's really good point.? So when "vhost" type is
created, parent
> > should assume addr of ctrl_vq is hva.
> > 
> > Thanks
> 
> 
> This works for vhost but not virtio since there's no way for virtio
kernel
> driver to differ ctrl_vq with the rest when doing DMA map. One possible
> solution is to provide DMA domain isolation between virtqueues. Then ctrl
vq
> can use its dedicated DMA domain for the work.
> 
> Anyway, this could be done in the future. We can have a version first that
> doesn't support ctrl_vq.
> 
> Thanks
Well no ctrl_vq implies either no offloads, or no XDP (since XDP needs
to disable offloads dynamically).

        if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
            && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4)
||
                virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
                virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
                virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
                virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
                NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is
implementing LRO/CSUM, disable LRO/CSUM first");
                return -EOPNOTSUPP;
        }

neither is very attractive.

So yes ok just for development but we do need to figure out how it will
work down the road in production.

So really this specific virtio net device does not support control vq,
instead it supports a different transport specific way to send commands
to device.

Some kind of extension to the transport? Ideas?


-- 
MST

Reasonably Related Threads

Search for more possibly parallel threads

Linux Virtualization - Oct 2019 - [PATCH v2] vhost: introduce mdev based hardware backend

[PATCH v2] vhost: introduce mdev based hardware backend

[PATCH v2] vhost: introduce mdev based hardware backend

[PATCH v2] vhost: introduce mdev based hardware backend

Reasonably Related Threads