thr3ads.net - Virtualization - vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero) [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2021-Dec-17 01:57 UTC

vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 6:32 AM Si-Wei Liu <si-wei.liu at oracle.com>
wrote:>
>
>
> On 12/15/2021 6:53 PM, Jason Wang wrote:
> > On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <si-wei.liu at
oracle.com> wrote:
> >>
> >>
> >> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> >>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
> >>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu
<si-wei.liu at oracle.com> wrote:
> >>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> >>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800,
Si-Wei Liu wrote:
> >>>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin
wrote:
> >>>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM
-0800, Si-Wei Liu wrote:
> >>>>>>>>>> Sorry for reviving this ancient
thread. I was kinda lost for the conclusion
> >>>>>>>>>> it ended up with. I have the
following questions,
> >>>>>>>>>>
> >>>>>>>>>> 1. legacy guest support: from the
past conversations it doesn't seem the
> >>>>>>>>>> support will be completely dropped
from the table, is my understanding
> >>>>>>>>>> correct? Actually we're
interested in supporting virtio v0.95 guest for x86,
> >>>>>>>>>> which is backed by the spec at
> >>>>>>>>>>
https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$
. Though I'm not sure
> >>>>>>>>>> if there's request/need to
support wilder legacy virtio versions earlier
> >>>>>>>>>> beyond.
> >>>>>>>>> I personally feel it's less work
to add in kernel than try to
> >>>>>>>>> work around it in userspace. Jason
feels differently.
> >>>>>>>>> Maybe post the patches and this will
prove to Jason it's not
> >>>>>>>>> too terrible?
> >>>>>>>> I suppose if the vdpa vendor does support
0.95 in the datapath and ring
> >>>>>>>> layout level and is limited to x86 only,
there should be easy way out.
> >>>>>>> Note a subtle difference: what matters is that
guest, not host is x86.
> >>>>>>> Matters for emulators which might reorder
memory accesses.
> >>>>>>> I guess this enforcement belongs in QEMU then?
> >>>>>> Right, I mean to get started, the initial guest
driver support and the
> >>>>>> corresponding QEMU support for transitional vdpa
backend can be limited
> >>>>>> to x86 guest/host only. Since the config space is
emulated in QEMU, I
> >>>>>> suppose it's not hard to enforce in QEMU.
> >>>>> It's more than just config space, most devices
have headers before the buffer.
> >>>> The ordering in datapath (data VQs) would have to rely on
vendor's support.
> >>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa
h/w vendor nowadays
> >>>> can/should well support the case when ORDER_PLATFORM is
not acked by the
> >>>> driver (actually this feature is filtered out by the QEMU
vhost-vdpa driver
> >>>> today), even with v1.0 spec conforming and modern only
vDPA device. The
> >>>> control VQ is implemented in software in the kernel, which
can be easily
> >>>> accommodated/fixed when needed.
> >>>>
> >>>>>> QEMU can drive GET_LEGACY,
> >>>>>> GET_ENDIAN et al ioctls in advance to get the
capability from the
> >>>>>> individual vendor driver. For that, we need
another negotiation protocol
> >>>>>> similar to vhost_user's protocol_features
between the vdpa kernel and
> >>>>>> QEMU, way before the guest driver is ever probed
and its feature
> >>>>>> negotiation kicks in. Not sure we need a
GET_MEMORY_ORDER ioctl call
> >>>>>> from the device, but we can assume weak ordering
for legacy at this
> >>>>>> point (x86 only)?
> >>>>> I'm lost here, we have get_features() so:
> >>>> I assume here you refer to get_device_features() that Eli
just changed the
> >>>> name.
> >>>>> 1) VERSION_1 means the device uses LE if provided,
otherwise natvie
> >>>>> 2) ORDER_PLATFORM means device requires platform
ordering
> >>>>>
> >>>>> Any reason for having a new API for this?
> >>>> Are you going to enforce all vDPA hardware vendors to
support the
> >>>> transitional model for legacy guest?
> > Do we really have other choices?
> >
> > I suspect the legacy device is never implemented by any vendor:
> >
> > 1) no virtio way to detect host endian
> This is even true for transitional device that is conforming to the
> spec, right?
For hardware, yes.
> The transport specific way to detect host endian is still
> being discussed and the spec revision is not finalized yet so far as I
> see. Why this suddenly becomes a requirement/blocker for h/w vendors to
> implement the transitional model?
It's not a sudden blocker, the problem has existed since day 0 if I
was not wrong. That's why the problem looks a little bit complicated
and why it would be much simpler if we stick to modern devices.
> Even if the spec is out, this is
> pretty new and I suspect not all vendor would follow right away. I hope
> the software framework can be tolerant with h/w vendors not supporting
> host endianess (BE specifically) or not detecting it if they would like
> to support a transitional device for legacy.
Well, if we know we don't want to support the BE host it would be fine.
>
> > 2) bypass IOMMU with translated requests
> > 3) PIO port
> >
> > Yes we have enp_vdpa, but it's more like a "transitional
device" for
> > legacy only guests.
> >
> >> meaning guest not acknowledging
> >>>> VERSION_1 would use the legacy interfaces captured in the
spec section 7.4
> >>>> (regarding ring layout, native endianness, message
framing, vq alignment of
> >>>> 4096, 32bit feature, no features_ok bit in status, IO port
interface i.e.
> >>>> all the things) instead?
> > Note that we only care about the datapath, control path is mediated
anyhow.
> >
> > So feature_ok and IO port isn't an issue. The rest looks like a
must
> > for the hardware.
> H/W vendors can opt out not implementing transitional interfaces at all
> which limits itself a modern only device. Set endianess detection (via
> transport specific means) aside, for vendors that wishes to support
> transitional device with legacy interface, is it a hard stop to drop
> supporting BE host if everything else is there? The spec today doesn't
> define virtio specific means to detect host memory ordering or device
> memory coherency,
Any reason that we need to care about memory coherency at the virtio
level. I'd expect it's the task of transport.
> will it yet become a stopper another day for h/w
> vendor to support more platforms?
Let's differentiate virtio from vdpa here. For virtio, there's no way
to add any feature for legacy devices. We can only add memory features
detecting for modern devices.

But for vDPA, we can introduce any API that can help vendors to
present a transitional device. But we can force those APIs since it's
too late to do that. So transitional devices support is optional for
sure.
>
> >
> >> Noted we don't yet have a set_device_features()
> >>>> that allows the vdpa device to tell whether it is
operating in transitional
> >>>> or modern-only mode.
> > So the device feature should be provisioned via the netlink protocol.
> Such netlink interface will only be used to limit feature exposure,
> right? i.e. you can limit a transitional supporting vendor driver to
> offering modern-only interface,
There's no way for the management to force a feature, like VERSION_1
via the current protocol.
> but you never want to make a modern-only
> vendor driver to support transitional (I'm not sure if it's a good
idea
> to support all the translation in software, esp. for datapath).
You may hit this problem for sure, you can't force all vendors to
support transitional devices especially considering spec said legacy
is optional. We don't want to end up with a userspace code that can
only work for some specific vendors.
> > And what we want is not "set_device_feature()" but
> > "set_device_mandatory_feautre()", then the parent can choose
to fail
> > the negotiation when VERSION_1 is not negotiated.
> This assumes the transport specific detection of BE host is in place,
> right?
Again, the point is, we can not assume such detection works for all of
the vendors. And assume BE detection is ready, we still need this for
modern devices, isn't it?
> I am not clear who initiates the set_device_mandatory_feautre()
> call, QEMU during guest feature negotiation, or admin user setting it
> ahead via netlink?
Netlink, actually, the spec needs to be extended as well, we saw
similar requests in the past. E.g there could be a device that works
in a packed layout only.

Thanks
>
> Thanks,
> -Siwei
>
> >   Qemu then knows for
> > sure it talks to a transitional device or modern only device.
> >
> > Thanks
> >
> >> For software virtio, all support for the legacy part in
> >>>> a transitional model has been built up there already,
however, it's not easy
> >>>> for vDPA vendors to implement all the requirements for an
all-or-nothing
> >>>> legacy guest support (big endian guest for example). To
these vendors, the
> >>>> legacy support within a transitional model is more of
feature to them and
> >>>> it's best to leave some flexibility for them to
implement partial support
> >>>> for legacy. That in turn calls out the need for a
vhost-user protocol
> >>>> feature like negotiation API that can prohibit those
unsupported guest
> >>>> setups to as early as backend_init before launching the
VM.
> >>> Right. Of note is the fact that it's a spec bug which I
> >>> hope yet to fix, though due to existing guest code the
> >>> fix won't be complete.
> >> I thought at one point you pointed out to me that the spec does
allow
> >> config space read before claiming features_ok, and only config
write
> >> before features_ok is prohibited. I haven't read up the full
thread of
> >> Halil's VERSION_1 for transitional big endian device yet, but
what is
> >> the spec bug you hope to fix?
> >>
> >>> WRT ioctls, One thing we can do though is abuse set_features
> >>> where it's called by QEMU early on with just the VERSION_1
> >>> bit set, to distinguish between legacy and modern
> >>> interface. This before config space accesses and FEATURES_OK.
> >>>
> >>> Halil has been working on this, pls take a look and maybe help
him out.
> >> Interesting thread, am reading now and see how I may leverage or
help there.
> >>
> >>>>>>>> I
> >>>>>>>> checked with Eli and other Mellanox/NVDIA
folks for hardware/firmware level
> >>>>>>>> 0.95 support, it seems all the ingredient
had been there already dated back
> >>>>>>>> to the DPDK days. The only major thing
limiting is in the vDPA software that
> >>>>>>>> the current vdpa core has the assumption
around VIRTIO_F_ACCESS_PLATFORM for
> >>>>>>>> a few DMA setup ops, which is virtio 1.0
only.
> >>>>>>>>
> >>>>>>>>>> 2. suppose some form of legacy
guest support needs to be there, how do we
> >>>>>>>>>> deal with the bogus assumption
below in vdpa_get_config() in the short term?
> >>>>>>>>>> It looks one of the intuitive fix
is to move the vdpa_set_features call out
> >>>>>>>>>> of vdpa_get_config() to
vdpa_set_config().
> >>>>>>>>>>
> >>>>>>>>>>              /*
> >>>>>>>>>>               * Config accesses
aren't supposed to trigger before features are
> >>>>>>>>>> set.
> >>>>>>>>>>               * If it does happen
we assume a legacy guest.
> >>>>>>>>>>               */
> >>>>>>>>>>              if
(!vdev->features_valid)
> >>>>>>>>>>                     
vdpa_set_features(vdev, 0);
> >>>>>>>>>>             
ops->get_config(vdev, offset, buf, len);
> >>>>>>>>>>
> >>>>>>>>>> I can post a patch to fix 2) if
there's consensus already reached.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Siwei
> >>>>>>>>> I'm not sure how important it is
to change that.
> >>>>>>>>> In any case it only affects
transitional devices, right?
> >>>>>>>>> Legacy only should not care ...
> >>>>>>>> Yes I'd like to distinguish legacy
driver (suppose it is 0.95) against the
> >>>>>>>> modern one in a transitional device model
rather than being legacy only.
> >>>>>>>> That way a v0.95 and v1.0 supporting vdpa
parent can support both types of
> >>>>>>>> guests without having to reconfigure. Or
are you suggesting limit to legacy
> >>>>>>>> only at the time of vdpa creation would
simplify the implementation a lot?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -Siwei
> >>>>>>> I don't know for sure. Take a look at the
work Halil was doing
> >>>>>>> to try and support transitional devices with
BE guests.
> >>>>>> Hmmm, we can have those endianness ioctls defined
but the initial QEMU
> >>>>>> implementation can be started to support x86
guest/host with little
> >>>>>> endian and weak memory ordering first. The real
trick is to detect
> >>>>>> legacy guest - I am not sure if it's feasible
to shift all the legacy
> >>>>>> detection work to QEMU, or the kernel has to be
part of the detection
> >>>>>> (e.g. the kick before DRIVER_OK thing we have to
duplicate the tracking
> >>>>>> effort in QEMU) as well. Let me take a further
look and get back.
> >>>>> Michael may think differently but I think doing this
in Qemu is much easier.
> >>>> I think the key is whether we position emulating legacy
interfaces in QEMU
> >>>> doing translation on top of a v1.0 modern-only device in
the kernel, or we
> >>>> allow vdpa core (or you can say vhost-vdpa) and vendor
driver to support a
> >>>> transitional model in the kernel that is able to work for
both v0.95 and
> >>>> v1.0 drivers, with some slight aid from QEMU for
> >>>> detecting/emulation/shadowing (for e.g CVQ, I/O port
relay). I guess for the
> >>>> former we still rely on vendor for a performant data vqs
implementation,
> >>>> leaving the question to what it may end up eventually in
the kernel is
> >>>> effectively the latter).
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>> My suggestion is post the kernel patches, and we can evaluate
> >>> how much work they are.
> >> Thanks for the feedback. I will take some read then get back,
probably
> >> after the winter break. Stay tuned.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Meanwhile, I'll check internally to see if a
legacy only model would
> >>>>>> work. Thanks.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>
> >>>>>>
> >>>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang
wrote:
> >>>>>>>>>>> On 2021/3/2 5:47 ??, Michael
S. Tsirkin wrote:
> >>>>>>>>>>>> On Mon, Mar 01, 2021 at
11:56:50AM +0800, Jason Wang wrote:
> >>>>>>>>>>>>> On 2021/3/1 5:34 ??,
Michael S. Tsirkin wrote:
> >>>>>>>>>>>>>> On Wed, Feb 24,
2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>>>>>>>> Detecting
it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>>>>>>>> the kernel
that it's a legacy guest. Ugh :(
> >>>>>>>>>>>>>>> Well, although
I think adding an ioctl is doable, may I
> >>>>>>>>>>>>>>> know what the
use
> >>>>>>>>>>>>>>> case there
will be for kernel to leverage such info
> >>>>>>>>>>>>>>> directly? Is
there a
> >>>>>>>>>>>>>>> case QEMU
can't do with dedicate ioctls later if there's indeed
> >>>>>>>>>>>>>>>
differentiation (legacy v.s. modern) needed?
> >>>>>>>>>>>>>> BTW a good API
could be
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #define
VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>>> #define
VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> we did it per
vring but maybe that was a mistake ...
> >>>>>>>>>>>>> Actually, I wonder
whether it's good time to just not support
> >>>>>>>>>>>>> legacy driver
> >>>>>>>>>>>>> for vDPA. Consider:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) It's definition
is no-normative
> >>>>>>>>>>>>> 2) A lot of budren of
codes
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So qemu can still
present the legacy device since the config
> >>>>>>>>>>>>> space or other
> >>>>>>>>>>>>> stuffs that is
presented by vhost-vDPA is not expected to be
> >>>>>>>>>>>>> accessed by
> >>>>>>>>>>>>> guest directly. Qemu
can do the endian conversion when necessary
> >>>>>>>>>>>>> in this
> >>>>>>>>>>>>> case?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Overall I would be fine
with this approach but we need to avoid breaking
> >>>>>>>>>>>> working userspace, qemu
releases with vdpa support are out there and
> >>>>>>>>>>>> seem to work for people.
Any changes need to take that into account
> >>>>>>>>>>>> and document compatibility
concerns.
> >>>>>>>>>>> Agree, let me check.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>        I note that any
hardware
> >>>>>>>>>>>> implementation is already
broken for legacy except on platforms with
> >>>>>>>>>>>> strong ordering which
might be helpful in reducing the scope.
> >>>>>>>>>>> Yes.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>>>
>

Michael S. Tsirkin

2021-Dec-17 02:00 UTC

head link

vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 09:57:38AM +0800, Jason Wang
wrote:> On Fri, Dec 17, 2021 at 6:32 AM Si-Wei Liu <si-wei.liu at oracle.com>
wrote:
> >
> >
> >
> > On 12/15/2021 6:53 PM, Jason Wang wrote:
> > > On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <si-wei.liu at
oracle.com> wrote:
> > >>
> > >>
> > >> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> > >>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu
wrote:
> > >>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
> > >>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu
<si-wei.liu at oracle.com> wrote:
> > >>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin
wrote:
> > >>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800,
Si-Wei Liu wrote:
> > >>>>>>>> On 12/12/2021 1:26 AM, Michael S.
Tsirkin wrote:
> > >>>>>>>>> On Fri, Dec 10, 2021 at
05:44:15PM -0800, Si-Wei Liu wrote:
> > >>>>>>>>>> Sorry for reviving this
ancient thread. I was kinda lost for the conclusion
> > >>>>>>>>>> it ended up with. I have the
following questions,
> > >>>>>>>>>>
> > >>>>>>>>>> 1. legacy guest support: from
the past conversations it doesn't seem the
> > >>>>>>>>>> support will be completely
dropped from the table, is my understanding
> > >>>>>>>>>> correct? Actually we're
interested in supporting virtio v0.95 guest for x86,
> > >>>>>>>>>> which is backed by the spec
at
> > >>>>>>>>>>
https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$
. Though I'm not sure
> > >>>>>>>>>> if there's request/need
to support wilder legacy virtio versions earlier
> > >>>>>>>>>> beyond.
> > >>>>>>>>> I personally feel it's less
work to add in kernel than try to
> > >>>>>>>>> work around it in userspace.
Jason feels differently.
> > >>>>>>>>> Maybe post the patches and this
will prove to Jason it's not
> > >>>>>>>>> too terrible?
> > >>>>>>>> I suppose if the vdpa vendor does
support 0.95 in the datapath and ring
> > >>>>>>>> layout level and is limited to x86
only, there should be easy way out.
> > >>>>>>> Note a subtle difference: what matters is
that guest, not host is x86.
> > >>>>>>> Matters for emulators which might reorder
memory accesses.
> > >>>>>>> I guess this enforcement belongs in QEMU
then?
> > >>>>>> Right, I mean to get started, the initial
guest driver support and the
> > >>>>>> corresponding QEMU support for transitional
vdpa backend can be limited
> > >>>>>> to x86 guest/host only. Since the config
space is emulated in QEMU, I
> > >>>>>> suppose it's not hard to enforce in QEMU.
> > >>>>> It's more than just config space, most
devices have headers before the buffer.
> > >>>> The ordering in datapath (data VQs) would have to
rely on vendor's support.
> > >>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess
vdpa h/w vendor nowadays
> > >>>> can/should well support the case when ORDER_PLATFORM
is not acked by the
> > >>>> driver (actually this feature is filtered out by the
QEMU vhost-vdpa driver
> > >>>> today), even with v1.0 spec conforming and modern
only vDPA device. The
> > >>>> control VQ is implemented in software in the kernel,
which can be easily
> > >>>> accommodated/fixed when needed.
> > >>>>
> > >>>>>> QEMU can drive GET_LEGACY,
> > >>>>>> GET_ENDIAN et al ioctls in advance to get the
capability from the
> > >>>>>> individual vendor driver. For that, we need
another negotiation protocol
> > >>>>>> similar to vhost_user's protocol_features
between the vdpa kernel and
> > >>>>>> QEMU, way before the guest driver is ever
probed and its feature
> > >>>>>> negotiation kicks in. Not sure we need a
GET_MEMORY_ORDER ioctl call
> > >>>>>> from the device, but we can assume weak
ordering for legacy at this
> > >>>>>> point (x86 only)?
> > >>>>> I'm lost here, we have get_features() so:
> > >>>> I assume here you refer to get_device_features() that
Eli just changed the
> > >>>> name.
> > >>>>> 1) VERSION_1 means the device uses LE if
provided, otherwise natvie
> > >>>>> 2) ORDER_PLATFORM means device requires platform
ordering
> > >>>>>
> > >>>>> Any reason for having a new API for this?
> > >>>> Are you going to enforce all vDPA hardware vendors to
support the
> > >>>> transitional model for legacy guest?
> > > Do we really have other choices?
> > >
> > > I suspect the legacy device is never implemented by any vendor:
> > >
> > > 1) no virtio way to detect host endian
> > This is even true for transitional device that is conforming to the
> > spec, right?
> 
> For hardware, yes.
> 
> > The transport specific way to detect host endian is still
> > being discussed and the spec revision is not finalized yet so far as I
> > see. Why this suddenly becomes a requirement/blocker for h/w vendors
to
> > implement the transitional model?
> 
> It's not a sudden blocker, the problem has existed since day 0 if I
> was not wrong. That's why the problem looks a little bit complicated
> and why it would be much simpler if we stick to modern devices.
> 
> > Even if the spec is out, this is
> > pretty new and I suspect not all vendor would follow right away. I
hope
> > the software framework can be tolerant with h/w vendors not supporting
> > host endianess (BE specifically) or not detecting it if they would
like
> > to support a transitional device for legacy.
> 
> Well, if we know we don't want to support the BE host it would be fine.
I think you guys mean guest not host here. Same for memory ordering etc.
What matters is whether guest has barriers etc.
> >
> > > 2) bypass IOMMU with translated requests
> > > 3) PIO port
> > >
> > > Yes we have enp_vdpa, but it's more like a "transitional
device" for
> > > legacy only guests.
> > >
> > >> meaning guest not acknowledging
> > >>>> VERSION_1 would use the legacy interfaces captured in
the spec section 7.4
> > >>>> (regarding ring layout, native endianness, message
framing, vq alignment of
> > >>>> 4096, 32bit feature, no features_ok bit in status, IO
port interface i.e.
> > >>>> all the things) instead?
> > > Note that we only care about the datapath, control path is
mediated anyhow.
> > >
> > > So feature_ok and IO port isn't an issue. The rest looks like
a must
> > > for the hardware.
> > H/W vendors can opt out not implementing transitional interfaces at
all
> > which limits itself a modern only device. Set endianess detection (via
> > transport specific means) aside, for vendors that wishes to support
> > transitional device with legacy interface, is it a hard stop to drop
> > supporting BE host if everything else is there? The spec today
doesn't
> > define virtio specific means to detect host memory ordering or device
> > memory coherency,
> 
> Any reason that we need to care about memory coherency at the virtio
> level. I'd expect it's the task of transport.
> 
> > will it yet become a stopper another day for h/w
> > vendor to support more platforms?
> 
> Let's differentiate virtio from vdpa here. For virtio, there's no
way
> to add any feature for legacy devices. We can only add memory features
> detecting for modern devices.
> 
> But for vDPA, we can introduce any API that can help vendors to
> present a transitional device. But we can force those APIs since it's
> too late to do that. So transitional devices support is optional for
> sure.
> 
> >
> > >
> > >> Noted we don't yet have a set_device_features()
> > >>>> that allows the vdpa device to tell whether it is
operating in transitional
> > >>>> or modern-only mode.
> > > So the device feature should be provisioned via the netlink
protocol.
> > Such netlink interface will only be used to limit feature exposure,
> > right? i.e. you can limit a transitional supporting vendor driver to
> > offering modern-only interface,
> 
> There's no way for the management to force a feature, like VERSION_1
> via the current protocol.
> 
> > but you never want to make a modern-only
> > vendor driver to support transitional (I'm not sure if it's a
good idea
> > to support all the translation in software, esp. for datapath).
> 
> You may hit this problem for sure, you can't force all vendors to
> support transitional devices especially considering spec said legacy
> is optional. We don't want to end up with a userspace code that can
> only work for some specific vendors.
> 
> > > And what we want is not "set_device_feature()" but
> > > "set_device_mandatory_feautre()", then the parent can
choose to fail
> > > the negotiation when VERSION_1 is not negotiated.
> > This assumes the transport specific detection of BE host is in place,
> > right?
> 
> Again, the point is, we can not assume such detection works for all of
> the vendors. And assume BE detection is ready, we still need this for
> modern devices, isn't it?
> 
> > I am not clear who initiates the set_device_mandatory_feautre()
> > call, QEMU during guest feature negotiation, or admin user setting it
> > ahead via netlink?
> 
> Netlink, actually, the spec needs to be extended as well, we saw
> similar requests in the past. E.g there could be a device that works
> in a packed layout only.
> 
> Thanks
> 
> >
> > Thanks,
> > -Siwei
> >
> > >   Qemu then knows for
> > > sure it talks to a transitional device or modern only device.
> > >
> > > Thanks
> > >
> > >> For software virtio, all support for the legacy part in
> > >>>> a transitional model has been built up there already,
however, it's not easy
> > >>>> for vDPA vendors to implement all the requirements
for an all-or-nothing
> > >>>> legacy guest support (big endian guest for example).
To these vendors, the
> > >>>> legacy support within a transitional model is more of
feature to them and
> > >>>> it's best to leave some flexibility for them to
implement partial support
> > >>>> for legacy. That in turn calls out the need for a
vhost-user protocol
> > >>>> feature like negotiation API that can prohibit those
unsupported guest
> > >>>> setups to as early as backend_init before launching
the VM.
> > >>> Right. Of note is the fact that it's a spec bug which
I
> > >>> hope yet to fix, though due to existing guest code the
> > >>> fix won't be complete.
> > >> I thought at one point you pointed out to me that the spec
does allow
> > >> config space read before claiming features_ok, and only
config write
> > >> before features_ok is prohibited. I haven't read up the
full thread of
> > >> Halil's VERSION_1 for transitional big endian device yet,
but what is
> > >> the spec bug you hope to fix?
> > >>
> > >>> WRT ioctls, One thing we can do though is abuse
set_features
> > >>> where it's called by QEMU early on with just the
VERSION_1
> > >>> bit set, to distinguish between legacy and modern
> > >>> interface. This before config space accesses and
FEATURES_OK.
> > >>>
> > >>> Halil has been working on this, pls take a look and maybe
help him out.
> > >> Interesting thread, am reading now and see how I may leverage
or help there.
> > >>
> > >>>>>>>> I
> > >>>>>>>> checked with Eli and other
Mellanox/NVDIA folks for hardware/firmware level
> > >>>>>>>> 0.95 support, it seems all the
ingredient had been there already dated back
> > >>>>>>>> to the DPDK days. The only major
thing limiting is in the vDPA software that
> > >>>>>>>> the current vdpa core has the
assumption around VIRTIO_F_ACCESS_PLATFORM for
> > >>>>>>>> a few DMA setup ops, which is virtio
1.0 only.
> > >>>>>>>>
> > >>>>>>>>>> 2. suppose some form of
legacy guest support needs to be there, how do we
> > >>>>>>>>>> deal with the bogus
assumption below in vdpa_get_config() in the short term?
> > >>>>>>>>>> It looks one of the intuitive
fix is to move the vdpa_set_features call out
> > >>>>>>>>>> of vdpa_get_config() to
vdpa_set_config().
> > >>>>>>>>>>
> > >>>>>>>>>>              /*
> > >>>>>>>>>>               * Config
accesses aren't supposed to trigger before features are
> > >>>>>>>>>> set.
> > >>>>>>>>>>               * If it does
happen we assume a legacy guest.
> > >>>>>>>>>>               */
> > >>>>>>>>>>              if
(!vdev->features_valid)
> > >>>>>>>>>>                     
vdpa_set_features(vdev, 0);
> > >>>>>>>>>>             
ops->get_config(vdev, offset, buf, len);
> > >>>>>>>>>>
> > >>>>>>>>>> I can post a patch to fix 2)
if there's consensus already reached.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> -Siwei
> > >>>>>>>>> I'm not sure how important it
is to change that.
> > >>>>>>>>> In any case it only affects
transitional devices, right?
> > >>>>>>>>> Legacy only should not care ...
> > >>>>>>>> Yes I'd like to distinguish
legacy driver (suppose it is 0.95) against the
> > >>>>>>>> modern one in a transitional device
model rather than being legacy only.
> > >>>>>>>> That way a v0.95 and v1.0 supporting
vdpa parent can support both types of
> > >>>>>>>> guests without having to reconfigure.
Or are you suggesting limit to legacy
> > >>>>>>>> only at the time of vdpa creation
would simplify the implementation a lot?
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> -Siwei
> > >>>>>>> I don't know for sure. Take a look at
the work Halil was doing
> > >>>>>>> to try and support transitional devices
with BE guests.
> > >>>>>> Hmmm, we can have those endianness ioctls
defined but the initial QEMU
> > >>>>>> implementation can be started to support x86
guest/host with little
> > >>>>>> endian and weak memory ordering first. The
real trick is to detect
> > >>>>>> legacy guest - I am not sure if it's
feasible to shift all the legacy
> > >>>>>> detection work to QEMU, or the kernel has to
be part of the detection
> > >>>>>> (e.g. the kick before DRIVER_OK thing we have
to duplicate the tracking
> > >>>>>> effort in QEMU) as well. Let me take a
further look and get back.
> > >>>>> Michael may think differently but I think doing
this in Qemu is much easier.
> > >>>> I think the key is whether we position emulating
legacy interfaces in QEMU
> > >>>> doing translation on top of a v1.0 modern-only device
in the kernel, or we
> > >>>> allow vdpa core (or you can say vhost-vdpa) and
vendor driver to support a
> > >>>> transitional model in the kernel that is able to work
for both v0.95 and
> > >>>> v1.0 drivers, with some slight aid from QEMU for
> > >>>> detecting/emulation/shadowing (for e.g CVQ, I/O port
relay). I guess for the
> > >>>> former we still rely on vendor for a performant data
vqs implementation,
> > >>>> leaving the question to what it may end up eventually
in the kernel is
> > >>>> effectively the latter).
> > >>>>
> > >>>> Thanks,
> > >>>> -Siwei
> > >>> My suggestion is post the kernel patches, and we can
evaluate
> > >>> how much work they are.
> > >> Thanks for the feedback. I will take some read then get back,
probably
> > >> after the winter break. Stay tuned.
> > >>
> > >> Thanks,
> > >> -Siwei
> > >>
> > >>>>> Thanks
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Meanwhile, I'll check internally to see
if a legacy only model would
> > >>>>>> work. Thanks.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> -Siwei
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>> On 3/2/2021 2:53 AM, Jason
Wang wrote:
> > >>>>>>>>>>> On 2021/3/2 5:47 ??,
Michael S. Tsirkin wrote:
> > >>>>>>>>>>>> On Mon, Mar 01, 2021
at 11:56:50AM +0800, Jason Wang wrote:
> > >>>>>>>>>>>>> On 2021/3/1 5:34
??, Michael S. Tsirkin wrote:
> > >>>>>>>>>>>>>> On Wed, Feb
24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > >>>>>>>>>>>>>>>>
Detecting it isn't enough though, we will need a new ioctl to notify
> > >>>>>>>>>>>>>>>> the
kernel that it's a legacy guest. Ugh :(
> > >>>>>>>>>>>>>>> Well,
although I think adding an ioctl is doable, may I
> > >>>>>>>>>>>>>>> know what
the use
> > >>>>>>>>>>>>>>> case
there will be for kernel to leverage such info
> > >>>>>>>>>>>>>>> directly?
Is there a
> > >>>>>>>>>>>>>>> case QEMU
can't do with dedicate ioctls later if there's indeed
> > >>>>>>>>>>>>>>>
differentiation (legacy v.s. modern) needed?
> > >>>>>>>>>>>>>> BTW a good
API could be
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> #define
VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > >>>>>>>>>>>>>> #define
VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> we did it per
vring but maybe that was a mistake ...
> > >>>>>>>>>>>>> Actually, I
wonder whether it's good time to just not support
> > >>>>>>>>>>>>> legacy driver
> > >>>>>>>>>>>>> for vDPA.
Consider:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1) It's
definition is no-normative
> > >>>>>>>>>>>>> 2) A lot of
budren of codes
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> So qemu can still
present the legacy device since the config
> > >>>>>>>>>>>>> space or other
> > >>>>>>>>>>>>> stuffs that is
presented by vhost-vDPA is not expected to be
> > >>>>>>>>>>>>> accessed by
> > >>>>>>>>>>>>> guest directly.
Qemu can do the endian conversion when necessary
> > >>>>>>>>>>>>> in this
> > >>>>>>>>>>>>> case?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> Overall I would be
fine with this approach but we need to avoid breaking
> > >>>>>>>>>>>> working userspace,
qemu releases with vdpa support are out there and
> > >>>>>>>>>>>> seem to work for
people. Any changes need to take that into account
> > >>>>>>>>>>>> and document
compatibility concerns.
> > >>>>>>>>>>> Agree, let me check.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>        I note that
any hardware
> > >>>>>>>>>>>> implementation is
already broken for legacy except on platforms with
> > >>>>>>>>>>>> strong ordering which
might be helpful in reducing the scope.
> > >>>>>>>>>>> Yes.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> >

Virtualization - Dec 2021 - vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)