On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote:> On 2019/10/23 ??6:11, Tiwei Bie wrote: > > On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote: > > > On 2019/10/23 ??3:07, Tiwei Bie wrote: > > > > On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote: > > > > > On 2019/10/23 ??11:02, Tiwei Bie wrote: > > > > > > On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote: > > > > > > > On 2019/10/22 ??5:52, Tiwei Bie wrote: > > > > > > > > This patch introduces a mdev based hardware vhost backend. > > > > > > > > This backend is built on top of the same abstraction used > > > > > > > > in virtio-mdev and provides a generic vhost interface for > > > > > > > > userspace to accelerate the virtio devices in guest. > > > > > > > > > > > > > > > > This backend is implemented as a mdev device driver on top > > > > > > > > of the same mdev device ops used in virtio-mdev but using > > > > > > > > a different mdev class id, and it will register the device > > > > > > > > as a VFIO device for userspace to use. Userspace can setup > > > > > > > > the IOMMU with the existing VFIO container/group APIs and > > > > > > > > then get the device fd with the device name. After getting > > > > > > > > the device fd of this device, userspace can use vhost ioctls > > > > > > > > to setup the backend. > > > > > > > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie at intel.com> > > > > > > > > --- > > > > > > > > This patch depends on below series: > > > > > > > > https://lkml.org/lkml/2019/10/17/286 > > > > > > > > > > > > > > > > v1 -> v2: > > > > > > > > - Replace _SET_STATE with _SET_STATUS (MST); > > > > > > > > - Check status bits at each step (MST); > > > > > > > > - Report the max ring size and max number of queues (MST); > > > > > > > > - Add missing MODULE_DEVICE_TABLE (Jason); > > > > > > > > - Only support the network backend w/o multiqueue for now; > > > > > > > Any idea on how to extend it to support devices other than net? I think we > > > > > > > want a generic API or an API that could be made generic in the future. > > > > > > > > > > > > > > Do we want to e.g having a generic vhost mdev for all kinds of devices or > > > > > > > introducing e.g vhost-net-mdev and vhost-scsi-mdev? > > > > > > One possible way is to do what vhost-user does. I.e. Apart from > > > > > > the generic ring, features, ... related ioctls, we also introduce > > > > > > device specific ioctls when we need them. As vhost-mdev just needs > > > > > > to forward configs between parent and userspace and even won't > > > > > > cache any info when possible, > > > > > So it looks to me this is only possible if we expose e.g set_config and > > > > > get_config to userspace. > > > > The set_config and get_config interface isn't really everything > > > > of device specific settings. We also have ctrlq in virtio-net. > > > > > > Yes, but it could be processed by the exist API. Isn't it? Just set ctrl vq > > > address and let parent to deal with that. > > I mean how to expose ctrlq related settings to userspace? > > > I think it works like: > > 1) userspace find ctrl_vq is supported > > 2) then it can allocate memory for ctrl vq and set its address through > vhost-mdev > > 3) userspace can populate ctrl vq itselfI see. That is to say, userspace e.g. QEMU will program the ctrl vq with the existing VHOST_*_VRING_* ioctls, and parent drivers should know that the addresses used in ctrl vq are host virtual addresses in vhost-mdev's case.> > > > > > > > > > > > > I think it might be better to do > > > > > > this in one generic vhost-mdev module. > > > > > Looking at definitions of VhostUserRequest in qemu, it mixed generic API > > > > > with device specific API. If we want go this ways (a generic vhost-mdev), > > > > > more questions needs to be answered: > > > > > > > > > > 1) How could userspace know which type of vhost it would use? Do we need to > > > > > expose virtio subsystem device in for userspace this case? > > > > > > > > > > 2) That generic vhost-mdev module still need to filter out unsupported > > > > > ioctls for a specific type. E.g if it probes a net device, it should refuse > > > > > API for other type. This in fact a vhost-mdev-net but just not modularize it > > > > > on top of vhost-mdev. > > > > > > > > > > > > > > > > > > - Some minor fixes and improvements; > > > > > > > > - Rebase on top of virtio-mdev series v4; > > > > [...] > > > > > > > > + > > > > > > > > +static long vhost_mdev_get_features(struct vhost_mdev *m, u64 __user *featurep) > > > > > > > > +{ > > > > > > > > + if (copy_to_user(featurep, &m->features, sizeof(m->features))) > > > > > > > > + return -EFAULT; > > > > > > > As discussed in previous version do we need to filter out MQ feature here? > > > > > > I think it's more straightforward to let the parent drivers to > > > > > > filter out the unsupported features. Otherwise it would be tricky > > > > > > when we want to add more features in vhost-mdev module, > > > > > It's as simple as remove the feature from blacklist? > > > > It's not really that easy. It may break the old drivers. > > > > > > I'm not sure I understand here, we do feature negotiation anyhow. For old > > > drivers do you mean the guest drivers without MQ? > > For old drivers I mean old parent drivers. It's possible > > to compile old drivers on new kernels. > > > Yes, but if old parent driver itself can not support MQ it should just not > advertise that feature. > > > > > > I'm not quite sure how will we implement MQ support in > > vhost-mdev. > > > Yes, that's why I ask here. I think we want the vhost-mdev to be generic > which means it's better not let vhost-mdev to know anything which is device > specific. So this is a question that should be considered.+1> > > > If we need to introduce new virtio_mdev_device_ops > > callbacks and an old driver exposed the MQ feature, > > then the new vhost-mdev will see this old driver expose > > MQ feature but not provide corresponding callbacks.ean > > > That's exact the issue which current API can not handle, so that's why I > suggest to filter MQ out for vhost-mdev. > > And in the future, we can: > > 1) invent new ioctls and convert them to config access or > > 2) just exposing config for userspace to access (then vhost-mdev work much > more similar to virtio-mdev). > > > > > > > > > > > > > i.e. if > > > > > > the parent drivers may expose unsupported features and relay on > > > > > > vhost-mdev to filter them out, these features will be exposed > > > > > > to userspace automatically when they are enabled in vhost-mdev > > > > > > in the future. > > > > > The issue is, it's only that vhost-mdev knows its own limitation. E.g in > > > > > this patch, vhost-mdev only implements a subset of transport API, but parent > > > > > doesn't know about that. > > > > > > > > > > Still MQ as an example, there's no way (or no need) for parent to know that > > > > > vhost-mdev does not support MQ. > > > > The mdev is a MDEV_CLASS_ID_VHOST mdev device. When the parent > > > > is being developed, it should know the currently supported features > > > > of vhost-mdev. > > > > > > How can parent know MQ is not supported by vhost-mdev? > > Good point. I agree vhost-mdev should filter out the unsupported > > features. But in the meantime, I think drivers also shouldn't > > expose unsupported features. > > > Exactly. But there's a case in the middle, e.g parent drivers support MQ and > virtio-mdev can do that but not vhost-mdev.As we have different mdev class IDs between virtio-mdev and vhost-mdev, maybe parent can leverage it to return different sets of supported features for virtio-mdev and vhost-mdev respectively.> > > > > > > > > > > > And this allows old kenrel to work with new > > > > > parent drivers. > > > > The new drivers should provide things like VIRTIO_MDEV_F_VERSION_1 > > > > to be compatible with the old kernels. When VIRTIO_MDEV_F_VERSION_1 > > > > is provided/negotiated, the behaviours should be consistent. > > > > > > To be clear, I didn't mean a change in virtio-mdev API, I meant: > > > > > > 1) old vhost-mdev kernel driver that filters out MQ > > > > > > 2) new parent driver that support MQ > > > > > > > > > > > So basically we have three choices here: > > > > > > > > > > 1) Implement what vhost-user did and implement a generic vhost-mdev (but may > > > > > still have lots of device specific code). To support advanced feature which > > > > > requires the access to config, still lots of API that needs to be added. > > > > > > > > > > 2) Implement what vhost-kernel did, have a generic vhost-mdev driver and a > > > > > vhost bus on top for match a device specific API e.g vhost-mdev-net. We > > > > > still have device specific API but limit them only to device specific > > > > > module. Still require new ioctls for advanced feature like MQ. > > > > > > > > > > 3) Simply expose all virtio-mdev transport to userspace. > > > > Currently, virtio-mdev transport is a set of function callbacks > > > > defined in kernel. How to simply expose virtio-mdev transport to > > > > userspace? > > > > > > The most straightforward way is to have an 1:1 mapping between ioctl and > > > virito_mdev_device_ops. > > Seems we are already trying to do 1:1 mapping between ioctl > > and virtio_mdev_device_ops in vhost-mdev now (the major piece > > missing is get_device_id/get_config/set_config). > > > Yes, with this we can have a device independent API. Do you think this is > better?Yeah, I agree. Thanks, Tiwei> > Thanks > > > > > > > > > Thanks > > > > > > > > > > > > > > > A generic module > > > > > without any type specific code (like virtio-mdev). No need dedicated API for > > > > > e.g MQ. But then the API will look much different than current vhost did. > > > > > > > > > > Consider the limitation of 1) I tend to choose 2 or 3. What's you opinion? > > > > > > > > > > >
On 2019/10/24 ??12:21, Tiwei Bie wrote:> On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote: >> On 2019/10/23 ??6:11, Tiwei Bie wrote: >>> On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote: >>>> On 2019/10/23 ??3:07, Tiwei Bie wrote: >>>>> On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote: >>>>>> On 2019/10/23 ??11:02, Tiwei Bie wrote: >>>>>>> On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote: >>>>>>>> On 2019/10/22 ??5:52, Tiwei Bie wrote: >>>>>>>>> This patch introduces a mdev based hardware vhost backend. >>>>>>>>> This backend is built on top of the same abstraction used >>>>>>>>> in virtio-mdev and provides a generic vhost interface for >>>>>>>>> userspace to accelerate the virtio devices in guest. >>>>>>>>> >>>>>>>>> This backend is implemented as a mdev device driver on top >>>>>>>>> of the same mdev device ops used in virtio-mdev but using >>>>>>>>> a different mdev class id, and it will register the device >>>>>>>>> as a VFIO device for userspace to use. Userspace can setup >>>>>>>>> the IOMMU with the existing VFIO container/group APIs and >>>>>>>>> then get the device fd with the device name. After getting >>>>>>>>> the device fd of this device, userspace can use vhost ioctls >>>>>>>>> to setup the backend. >>>>>>>>> >>>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie at intel.com> >>>>>>>>> --- >>>>>>>>> This patch depends on below series: >>>>>>>>> https://lkml.org/lkml/2019/10/17/286 >>>>>>>>> >>>>>>>>> v1 -> v2: >>>>>>>>> - Replace _SET_STATE with _SET_STATUS (MST); >>>>>>>>> - Check status bits at each step (MST); >>>>>>>>> - Report the max ring size and max number of queues (MST); >>>>>>>>> - Add missing MODULE_DEVICE_TABLE (Jason); >>>>>>>>> - Only support the network backend w/o multiqueue for now; >>>>>>>> Any idea on how to extend it to support devices other than net? I think we >>>>>>>> want a generic API or an API that could be made generic in the future. >>>>>>>> >>>>>>>> Do we want to e.g having a generic vhost mdev for all kinds of devices or >>>>>>>> introducing e.g vhost-net-mdev and vhost-scsi-mdev? >>>>>>> One possible way is to do what vhost-user does. I.e. Apart from >>>>>>> the generic ring, features, ... related ioctls, we also introduce >>>>>>> device specific ioctls when we need them. As vhost-mdev just needs >>>>>>> to forward configs between parent and userspace and even won't >>>>>>> cache any info when possible, >>>>>> So it looks to me this is only possible if we expose e.g set_config and >>>>>> get_config to userspace. >>>>> The set_config and get_config interface isn't really everything >>>>> of device specific settings. We also have ctrlq in virtio-net. >>>> Yes, but it could be processed by the exist API. Isn't it? Just set ctrl vq >>>> address and let parent to deal with that. >>> I mean how to expose ctrlq related settings to userspace? >> >> I think it works like: >> >> 1) userspace find ctrl_vq is supported >> >> 2) then it can allocate memory for ctrl vq and set its address through >> vhost-mdev >> >> 3) userspace can populate ctrl vq itself > I see. That is to say, userspace e.g. QEMU will program the > ctrl vq with the existing VHOST_*_VRING_* ioctls, and parent > drivers should know that the addresses used in ctrl vq are > host virtual addresses in vhost-mdev's case.That's really good point. And that means parent needs to differ vhost from virtio. It should work. But is there any chance to use DMA address? I'm asking since the API then tends to be device specific.> >> >>>>>>> I think it might be better to do >>>>>>> this in one generic vhost-mdev module. >>>>>> Looking at definitions of VhostUserRequest in qemu, it mixed generic API >>>>>> with device specific API. If we want go this ways (a generic vhost-mdev), >>>>>> more questions needs to be answered: >>>>>> >>>>>> 1) How could userspace know which type of vhost it would use? Do we need to >>>>>> expose virtio subsystem device in for userspace this case? >>>>>> >>>>>> 2) That generic vhost-mdev module still need to filter out unsupported >>>>>> ioctls for a specific type. E.g if it probes a net device, it should refuse >>>>>> API for other type. This in fact a vhost-mdev-net but just not modularize it >>>>>> on top of vhost-mdev. >>>>>> >>>>>> >>>>>>>>> - Some minor fixes and improvements; >>>>>>>>> - Rebase on top of virtio-mdev series v4; >>>>> [...] >>>>>>>>> + >>>>>>>>> +static long vhost_mdev_get_features(struct vhost_mdev *m, u64 __user *featurep) >>>>>>>>> +{ >>>>>>>>> + if (copy_to_user(featurep, &m->features, sizeof(m->features))) >>>>>>>>> + return -EFAULT; >>>>>>>> As discussed in previous version do we need to filter out MQ feature here? >>>>>>> I think it's more straightforward to let the parent drivers to >>>>>>> filter out the unsupported features. Otherwise it would be tricky >>>>>>> when we want to add more features in vhost-mdev module, >>>>>> It's as simple as remove the feature from blacklist? >>>>> It's not really that easy. It may break the old drivers. >>>> I'm not sure I understand here, we do feature negotiation anyhow. For old >>>> drivers do you mean the guest drivers without MQ? >>> For old drivers I mean old parent drivers. It's possible >>> to compile old drivers on new kernels. >> >> Yes, but if old parent driver itself can not support MQ it should just not >> advertise that feature. >> >> >>> I'm not quite sure how will we implement MQ support in >>> vhost-mdev. >> >> Yes, that's why I ask here. I think we want the vhost-mdev to be generic >> which means it's better not let vhost-mdev to know anything which is device >> specific. So this is a question that should be considered. > +1 > >> >>> If we need to introduce new virtio_mdev_device_ops >>> callbacks and an old driver exposed the MQ feature, >>> then the new vhost-mdev will see this old driver expose >>> MQ feature but not provide corresponding callbacks.ean >> >> That's exact the issue which current API can not handle, so that's why I >> suggest to filter MQ out for vhost-mdev. >> >> And in the future, we can: >> >> 1) invent new ioctls and convert them to config access or >> >> 2) just exposing config for userspace to access (then vhost-mdev work much >> more similar to virtio-mdev). >> >> >>>>>>> i.e. if >>>>>>> the parent drivers may expose unsupported features and relay on >>>>>>> vhost-mdev to filter them out, these features will be exposed >>>>>>> to userspace automatically when they are enabled in vhost-mdev >>>>>>> in the future. >>>>>> The issue is, it's only that vhost-mdev knows its own limitation. E.g in >>>>>> this patch, vhost-mdev only implements a subset of transport API, but parent >>>>>> doesn't know about that. >>>>>> >>>>>> Still MQ as an example, there's no way (or no need) for parent to know that >>>>>> vhost-mdev does not support MQ. >>>>> The mdev is a MDEV_CLASS_ID_VHOST mdev device. When the parent >>>>> is being developed, it should know the currently supported features >>>>> of vhost-mdev. >>>> How can parent know MQ is not supported by vhost-mdev? >>> Good point. I agree vhost-mdev should filter out the unsupported >>> features. But in the meantime, I think drivers also shouldn't >>> expose unsupported features. >> >> Exactly. But there's a case in the middle, e.g parent drivers support MQ and >> virtio-mdev can do that but not vhost-mdev. > As we have different mdev class IDs between virtio-mdev and > vhost-mdev, maybe parent can leverage it to return different > sets of supported features for virtio-mdev and vhost-mdev > respectively.Yes, that should work. Thanks> >> >>>>>> And this allows old kenrel to work with new >>>>>> parent drivers. >>>>> The new drivers should provide things like VIRTIO_MDEV_F_VERSION_1 >>>>> to be compatible with the old kernels. When VIRTIO_MDEV_F_VERSION_1 >>>>> is provided/negotiated, the behaviours should be consistent. >>>> To be clear, I didn't mean a change in virtio-mdev API, I meant: >>>> >>>> 1) old vhost-mdev kernel driver that filters out MQ >>>> >>>> 2) new parent driver that support MQ >>>> >>>> >>>>>> So basically we have three choices here: >>>>>> >>>>>> 1) Implement what vhost-user did and implement a generic vhost-mdev (but may >>>>>> still have lots of device specific code). To support advanced feature which >>>>>> requires the access to config, still lots of API that needs to be added. >>>>>> >>>>>> 2) Implement what vhost-kernel did, have a generic vhost-mdev driver and a >>>>>> vhost bus on top for match a device specific API e.g vhost-mdev-net. We >>>>>> still have device specific API but limit them only to device specific >>>>>> module. Still require new ioctls for advanced feature like MQ. >>>>>> >>>>>> 3) Simply expose all virtio-mdev transport to userspace. >>>>> Currently, virtio-mdev transport is a set of function callbacks >>>>> defined in kernel. How to simply expose virtio-mdev transport to >>>>> userspace? >>>> The most straightforward way is to have an 1:1 mapping between ioctl and >>>> virito_mdev_device_ops. >>> Seems we are already trying to do 1:1 mapping between ioctl >>> and virtio_mdev_device_ops in vhost-mdev now (the major piece >>> missing is get_device_id/get_config/set_config). >> >> Yes, with this we can have a device independent API. Do you think this is >> better? > Yeah, I agree. > > Thanks, > Tiwei > >> Thanks >> >> >>> >>>> Thanks >>>> >>>> >>>>>> A generic module >>>>>> without any type specific code (like virtio-mdev). No need dedicated API for >>>>>> e.g MQ. But then the API will look much different than current vhost did. >>>>>> >>>>>> Consider the limitation of 1) I tend to choose 2 or 3. What's you opinion? >>>>>> >>>>>>
On 2019/10/24 ??4:03, Jason Wang wrote:> > On 2019/10/24 ??12:21, Tiwei Bie wrote: >> On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote: >>> On 2019/10/23 ??6:11, Tiwei Bie wrote: >>>> On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote: >>>>> On 2019/10/23 ??3:07, Tiwei Bie wrote: >>>>>> On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote: >>>>>>> On 2019/10/23 ??11:02, Tiwei Bie wrote: >>>>>>>> On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote: >>>>>>>>> On 2019/10/22 ??5:52, Tiwei Bie wrote: >>>>>>>>>> This patch introduces a mdev based hardware vhost backend. >>>>>>>>>> This backend is built on top of the same abstraction used >>>>>>>>>> in virtio-mdev and provides a generic vhost interface for >>>>>>>>>> userspace to accelerate the virtio devices in guest. >>>>>>>>>> >>>>>>>>>> This backend is implemented as a mdev device driver on top >>>>>>>>>> of the same mdev device ops used in virtio-mdev but using >>>>>>>>>> a different mdev class id, and it will register the device >>>>>>>>>> as a VFIO device for userspace to use. Userspace can setup >>>>>>>>>> the IOMMU with the existing VFIO container/group APIs and >>>>>>>>>> then get the device fd with the device name. After getting >>>>>>>>>> the device fd of this device, userspace can use vhost ioctls >>>>>>>>>> to setup the backend. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie at intel.com> >>>>>>>>>> --- >>>>>>>>>> This patch depends on below series: >>>>>>>>>> https://lkml.org/lkml/2019/10/17/286 >>>>>>>>>> >>>>>>>>>> v1 -> v2: >>>>>>>>>> - Replace _SET_STATE with _SET_STATUS (MST); >>>>>>>>>> - Check status bits at each step (MST); >>>>>>>>>> - Report the max ring size and max number of queues (MST); >>>>>>>>>> - Add missing MODULE_DEVICE_TABLE (Jason); >>>>>>>>>> - Only support the network backend w/o multiqueue for now; >>>>>>>>> Any idea on how to extend it to support devices other than >>>>>>>>> net? I think we >>>>>>>>> want a generic API or an API that could be made generic in the >>>>>>>>> future. >>>>>>>>> >>>>>>>>> Do we want to e.g having a generic vhost mdev for all kinds of >>>>>>>>> devices or >>>>>>>>> introducing e.g vhost-net-mdev and vhost-scsi-mdev? >>>>>>>> One possible way is to do what vhost-user does. I.e. Apart from >>>>>>>> the generic ring, features, ... related ioctls, we also introduce >>>>>>>> device specific ioctls when we need them. As vhost-mdev just needs >>>>>>>> to forward configs between parent and userspace and even won't >>>>>>>> cache any info when possible, >>>>>>> So it looks to me this is only possible if we expose e.g >>>>>>> set_config and >>>>>>> get_config to userspace. >>>>>> The set_config and get_config interface isn't really everything >>>>>> of device specific settings. We also have ctrlq in virtio-net. >>>>> Yes, but it could be processed by the exist API. Isn't it? Just >>>>> set ctrl vq >>>>> address and let parent to deal with that. >>>> I mean how to expose ctrlq related settings to userspace? >>> >>> I think it works like: >>> >>> 1) userspace find ctrl_vq is supported >>> >>> 2) then it can allocate memory for ctrl vq and set its address through >>> vhost-mdev >>> >>> 3) userspace can populate ctrl vq itself >> I see. That is to say, userspace e.g. QEMU will program the >> ctrl vq with the existing VHOST_*_VRING_* ioctls, and parent >> drivers should know that the addresses used in ctrl vq are >> host virtual addresses in vhost-mdev's case. > > > That's really good point. And that means parent needs to differ vhost > from virtio. It should work.HVA may only work when we have something similar to VHOST_SET_OWNER which can reuse MM of its owner.> But is there any chance to use DMA address? I'm asking since the API > then tends to be device specific.I wonder whether we can introduce MAP IOMMU notifier and get DMA mappings from that. Thanks
Possibly Parallel Threads
- [PATCH v2] vhost: introduce mdev based hardware backend
- [PATCH v2] vhost: introduce mdev based hardware backend
- [PATCH v2] vhost: introduce mdev based hardware backend
- [PATCH v2] vhost: introduce mdev based hardware backend
- [PATCH v2] vhost: introduce mdev based hardware backend