Halil Pasic
2021-Oct-05 10:43 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Mon, 4 Oct 2021 05:07:13 -0400 "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote: > > On Sat, 2 Oct 2021 14:13:37 -0400 > > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > > > > Anyone else have an idea? This is a nasty regression; we could revert the > > > > patch, which would remove the symptoms and give us some time, but that > > > > doesn't really feel right, I'd do that only as a last resort. > > > > > > Well we have Halil's hack (except I would limit it > > > to only apply to BE, only do devices with validate, > > > and only in modern mode), and we will fix QEMU to be spec compliant. > > > Between these why do we need any conditional compiles? > > > > We don't. As I stated before, this hack is flawed because it > > effectively breaks fencing features by the driver with QEMU. Some > > features can not be unset after once set, because we tend to try to > > enable the corresponding functionality whenever we see a write > > features operation with the feature bit set, and we don't disable, if a > > subsequent features write operation stores the feature bit as not set. > > Something to fix in QEMU too, I think.Possibly. But it is the same situation: it probably has a long history. And it may even make some sense. The obvious trigger for doing the conditional initialization for modern is the setting of FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would need a different trigger.> > > But it looks like VIRTIO_1 is fine to get cleared afterwards. > > We'd never clear it though - why would we? >Right.> > So my hack > > should actually look like posted below, modulo conditions. > > > Looking at it some more, I see that vhost-user actually > does not send features to the backend until FEATURES_OK.I.e. the hack does not work for transitional vhost-user devices, but it doesn't break them either. Furthermore, I believe there is not much we can do to support transitional devices with vhost-user and similar, without extending the protocol. The transport specific detection idea would need a new vhost-user thingy to tell the device what has been figured out, right? In theory modern only could work, if the backends were paying extra attention to endianness, instead of just assuming that the code is running little-endian.> However, the code in contrib for vhost-user-blk at least seems > broken wrt endian-ness ATM.Agree. For example config is native endian ATM AFAICT.> What about other backends though?I think whenever the config is owned and managed by the vhost-backend we have a problem with transitional. And we don't have everything in the protocol to deal with this problem. I didn't check modern for the different vhost-user backends. I don't think we recommend our users on s390 to use those. My understanding of the use-cases is far form complete.> Hard to be sure right?I agree.> Cc Raphael and Stefan so they can take a look. > And I guess it's time we CC'd qemu-devel too. > > For now I am beginning to think we should either revert or just limit > validation to LE and think about all this some more. And I am inclining > to do a revert.I'm fine with either of these as a quick fix, but we will eventually have to find a solution. AFAICT this solution works for the s390 setups we care about the most, but so would a revert.> These are all hypervisors that shipped for a long time. > Do we need a flag for early config space access then?You mean a feature bit? I think it is a good idea even if it weren't strictly necessary. We will have a behavior change for some devices, and I think the ability to detect those is valuable. Your spec change proposal, makes it IMHO pretty clear, that we are changing our understanding of how transitional should work. Strictly, transitional is not a normative part of the spec AFAIU, but still...> > > > > > > Regarding the conditions I guess checking that driver_features has > > F_VERSION_1 already satisfies "only modern mode", or? > > Right. > > > For now > > I've deliberately omitted the has verify and the is big endian > > conditions so we have a better chance to see if something breaks > > (i.e. the approach does not work). I can add in those extra conditions > > later. > > Or maybe if we will go down that road just the verify check (for > performance). I'm a bit unhappy we have the extra exit but consistency > seems more important. >I'm fine either way. The extra exit is only for the initialization and one per 1 device, I have no feeling if this has a measurable performance impact.> > > > --------------------------8<--------------------- > > > > From: Halil Pasic <pasic at linux.ibm.com> > > Date: Thu, 30 Sep 2021 02:38:47 +0200 > > Subject: [PATCH] virtio: write back feature VERSION_1 before verify > > > > This patch fixes a regression introduced by commit 82e89ea077b9 > > ("virtio-blk: Add validation for block size in config space") and > > enables similar checks in verify() on big endian platforms. > > > > The problem with checking multi-byte config fields in the verify > > callback, on big endian platforms, and with a possibly transitional > > device is the following. The verify() callback is called between > > config->get_features() and virtio_finalize_features(). That we have a > > device that offered F_VERSION_1 then we have the following options > > either the device is transitional, and then it has to present the legacy > > interface, i.e. a big endian config space until F_VERSION_1 is > > negotiated, or we have a non-transitional device, which makes > > F_VERSION_1 mandatory, and only implements the non-legacy interface and > > thus presents a little endian config space. Because at this point we > > can't know if the device is transitional or non-transitional, we can't > > know do we need to byte swap or not. > > Well we established that we can know. Here's an alternative explanation:I thin we established how this should be in the future, where a transport specific mechanism is used to decide are we operating in legacy mode or in modern mode. But with the current QEMU reality, I don't think so. Namely currently the switch native-endian config -> little endian config happens when the VERSION_1 is negotiated, which may happen whenever the VERSION_1 bit is changed, or only when FEATURES_OK is set (vhost-user). This is consistent with device should detect a legacy driver by checking for VERSION_1, which is what the spec currently says. So for transitional we start out with native-endian config. For modern only the config is always LE. The guest can distinguish between a legacy only device and a modern capable device after the revision negotiation. A legacy device would reject the CCW. But both a transitional device and a modern only device would accept a revision > 0. So the guest does not know for ccw.> > The virtio specification virtio-v1.1-cs01 states: > > Transitional devices MUST detect Legacy drivers by detecting that > VIRTIO_F_VERSION_1 has not been acknowledged by the driver. > This is exactly what QEMU as of 6.1 has done relying solely > on VIRTIO_F_VERSION_1 for detecting that. > > However, the specification also says: > driver MAY read (but MUST NOT write) the device-specific > configuration fields to check that it can support the device before > accepting it.s/ accepting it/setting FEATURES_OK> > In that case, any device relying solely on VIRTIO_F_VERSION_1s/any device/any transitional device/> for detecting legacy drivers will return data in legacy format.E.g. virtio-crypto does not support legacy, and thus it is always providing an LE config space.> In particular, this implies that it is in big endian format > for big endian guests. This naturally confuses the driver > which expects little endian in the modern mode. > > It is probably a good idea to amend the spec to clarify that > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > is complete. However, we already have regression so let's > try to address it. > >I can take the new description without any changes if you like. I care more about getting a decent fix, than a perfect patch description. Should I send out a non-RFC with that implements the proposed changes?> > > > The virtio spec explicitly states that the driver MAY read config > > between reading and writing the features so saying that first accessing > > the config before feature negotiation is done is not an option. The > > specification ain't clear about setting the features multiple times > > before FEATURES_OK, so I guess that should be fine to set F_VERSION_1 > > since at this point we already know that we are about to negotiate > > F_VERSION_1. > > > > I don't consider this patch super clean, but frankly I don't think we > > have a ton of options. Another option that may or man not be cleaner, > > but is also IMHO much uglier is to figure out whether the device is > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding > > according tho what we have figured out, hoping that the characteristics > > of the device didn't change. > > An empty line before tags. >Sure!> > Signed-off-by: Halil Pasic <pasic at linux.ibm.com> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") > > Reported-by: markver at us.ibm.com > > Let's add more commits that are affected. E.g. virtio-net with MTU > feature bit set is affected too. > > So let's add Fixes tag for: > commit 14de9d114a82a564b94388c95af79a701dc93134 > Author: Aaron Conole <aconole at redhat.com> > Date: Fri Jun 3 16:57:12 2016 -0400 > > virtio-net: Add initial MTU advice feature >I believe drv->probe(dev) is called after the real finalize, so that access should be fine or? Don't we just have to look out for verify? Isn't the problematic commit fe36cbe0671e ("virtio_net: clear MTU when out of range")? The problem with commit 14de9d114a82a is that the device won't know, the driver didn't take the advice (for the MTU because it deemed its value invalid). But that doesn't really hurt us. On the other hand with fe36cbe0671e we may deem a valid MTU in the config space invalid because of the endiannes mess-up. I that case we would discard a perfectly good MTU advice.> I think that's all, but pls double check me.Looks good! $ git grep -e '\.validate' -- '*virtio*' drivers/block/virtio_blk.c: .validate = virtblk_validate, drivers/firmware/arm_scmi/virtio.c: .validate = scmi_vio_validate, drivers/net/virtio_net.c: .validate = virtnet_validate, drivers/virtio/virtio_balloon.c: .validate = virtballoon_validate, sound/virtio/virtio_card.c: .validate = virtsnd_validate, But only blk and net access config space from validate.> > > > --- > > drivers/virtio/virtio.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > index 0a5b54034d4b..2b9358f2e22a 100644 > > --- a/drivers/virtio/virtio.c > > +++ b/drivers/virtio/virtio.c > > @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d) > > driver_features_legacy = driver_features; > > } > > > > + /* Write F_VERSION_1 feature to pin down endianness */ > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) { > > + dev->features = (1ULL << VIRTIO_F_VERSION_1); > > + dev->config->finalize_features(dev); > > + } > > + > > if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > > dev->features = driver_features & device_features; > > else > > -- > > 2.31.1 > > > > > > > > > > > > > > _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Michael S. Tsirkin
2021-Oct-05 11:11 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 12:43:03PM +0200, Halil Pasic wrote:> On Mon, 4 Oct 2021 05:07:13 -0400 > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote: > > > On Sat, 2 Oct 2021 14:13:37 -0400 > > > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > > > > > > Anyone else have an idea? This is a nasty regression; we could revert the > > > > > patch, which would remove the symptoms and give us some time, but that > > > > > doesn't really feel right, I'd do that only as a last resort. > > > > > > > > Well we have Halil's hack (except I would limit it > > > > to only apply to BE, only do devices with validate, > > > > and only in modern mode), and we will fix QEMU to be spec compliant. > > > > Between these why do we need any conditional compiles? > > > > > > We don't. As I stated before, this hack is flawed because it > > > effectively breaks fencing features by the driver with QEMU. Some > > > features can not be unset after once set, because we tend to try to > > > enable the corresponding functionality whenever we see a write > > > features operation with the feature bit set, and we don't disable, if a > > > subsequent features write operation stores the feature bit as not set. > > > > Something to fix in QEMU too, I think. > > Possibly. But it is the same situation: it probably has a long > history. And it may even make some sense. The obvious trigger for > doing the conditional initialization for modern is the setting of > FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would > need a different trigger. > > > > > > But it looks like VIRTIO_1 is fine to get cleared afterwards. > > > > We'd never clear it though - why would we? > > > > Right. > > > > So my hack > > > should actually look like posted below, modulo conditions. > > > > > > Looking at it some more, I see that vhost-user actually > > does not send features to the backend until FEATURES_OK. > > I.e. the hack does not work for transitional vhost-user devices, > but it doesn't break them either. > > Furthermore, I believe there is not much we can do to support > transitional devices with vhost-user and similar, without extending > the protocol. The transport specific detection idea would need a new > vhost-user thingy to tell the device what has been figured > out, right? > > In theory modern only could work, if the backends were paying extra > attention to endianness, instead of just assuming that the code is > running little-endian.I think a reasonable thing is to send SET_FEATURES before each GET_CONFIG, to tell backend which format is expected.> > However, the code in contrib for vhost-user-blk at least seems > > broken wrt endian-ness ATM. > > Agree. For example config is native endian ATM AFAICT. > > > What about other backends though? > > I think whenever the config is owned and managed by the vhost-backend > we have a problem with transitional. And we don't have everything in > the protocol to deal with this problem. > > I didn't check modern for the different vhost-user backends. I don't > think we recommend our users on s390 to use those. My understanding > of the use-cases is far form complete. > > > Hard to be sure right? > > I agree. > > > Cc Raphael and Stefan so they can take a look. > > And I guess it's time we CC'd qemu-devel too. > > > > For now I am beginning to think we should either revert or just limit > > validation to LE and think about all this some more. And I am inclining > > to do a revert. > > I'm fine with either of these as a quick fix, but we will eventually have > to find a solution. AFAICT this solution works for the s390 setups we > care about the most, but so would a revert.The reason I like this one is that it also fixes MTU for virtio net, and that one we can't really revert.> > > > These are all hypervisors that shipped for a long time. > > Do we need a flag for early config space access then? > > You mean a feature bit? I think it is a good idea even if > it weren't strictly necessary. We will have a behavior change > for some devices, and I think the ability to detect those > is valuable. > > Your spec change proposal, makes it IMHO pretty clear, that > we are changing our understanding of how transitional should work. > Strictly, transitional is not a normative part of the spec AFAIU, > but still... > > > > > > > > > > > > > > Regarding the conditions I guess checking that driver_features has > > > F_VERSION_1 already satisfies "only modern mode", or? > > > > Right. > > > > > For now > > > I've deliberately omitted the has verify and the is big endian > > > conditions so we have a better chance to see if something breaks > > > (i.e. the approach does not work). I can add in those extra conditions > > > later. > > > > Or maybe if we will go down that road just the verify check (for > > performance). I'm a bit unhappy we have the extra exit but consistency > > seems more important. > > > > I'm fine either way. The extra exit is only for the initialization and > one per 1 device, I have no feeling if this has a measurable performance > impact. > > > > > > > > --------------------------8<--------------------- > > > > > > From: Halil Pasic <pasic at linux.ibm.com> > > > Date: Thu, 30 Sep 2021 02:38:47 +0200 > > > Subject: [PATCH] virtio: write back feature VERSION_1 before verify > > > > > > This patch fixes a regression introduced by commit 82e89ea077b9 > > > ("virtio-blk: Add validation for block size in config space") and > > > enables similar checks in verify() on big endian platforms. > > > > > > The problem with checking multi-byte config fields in the verify > > > callback, on big endian platforms, and with a possibly transitional > > > device is the following. The verify() callback is called between > > > config->get_features() and virtio_finalize_features(). That we have a > > > device that offered F_VERSION_1 then we have the following options > > > either the device is transitional, and then it has to present the legacy > > > interface, i.e. a big endian config space until F_VERSION_1 is > > > negotiated, or we have a non-transitional device, which makes > > > F_VERSION_1 mandatory, and only implements the non-legacy interface and > > > thus presents a little endian config space. Because at this point we > > > can't know if the device is transitional or non-transitional, we can't > > > know do we need to byte swap or not. > > > > Well we established that we can know. Here's an alternative explanation: > > > I thin we established how this should be in the future, where a transport > specific mechanism is used to decide are we operating in legacy mode or > in modern mode. But with the current QEMU reality, I don't think so. > Namely currently the switch native-endian config -> little endian config > happens when the VERSION_1 is negotiated, which may happen whenever > the VERSION_1 bit is changed, or only when FEATURES_OK is set > (vhost-user). > > This is consistent with device should detect a legacy driver by checking > for VERSION_1, which is what the spec currently says. > > So for transitional we start out with native-endian config. For modern > only the config is always LE. > > The guest can distinguish between a legacy only device and a modern > capable device after the revision negotiation. A legacy device would > reject the CCW. > > But both a transitional device and a modern only device would accept > a revision > 0. So the guest does not know for ccw. >Sorry I was talking about the host not the guest. when host sees revision > 0 it knows it's a modern guest and so config should be LE.> > > > > The virtio specification virtio-v1.1-cs01 states: > > > > Transitional devices MUST detect Legacy drivers by detecting that > > VIRTIO_F_VERSION_1 has not been acknowledged by the driver. > > This is exactly what QEMU as of 6.1 has done relying solely > > on VIRTIO_F_VERSION_1 for detecting that. > > > > However, the specification also says: > > driver MAY read (but MUST NOT write) the device-specific > > configuration fields to check that it can support the device before > > accepting it. > > s/ accepting it/setting FEATURES_OK > > > > In that case, any device relying solely on VIRTIO_F_VERSION_1 > > s/any device/any transitional device/ > > > for detecting legacy drivers will return data in legacy format. > > E.g. virtio-crypto does not support legacy, and thus it is always > providing an LE config space. > > > In particular, this implies that it is in big endian format > > for big endian guests. This naturally confuses the driver > > which expects little endian in the modern mode. > > > > It is probably a good idea to amend the spec to clarify that > > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > > is complete. However, we already have regression so let's > > try to address it. > > > > > > I can take the new description without any changes if you like. > I care > more about getting a decent fix, than a perfect patch description. Should > I send out a non-RFC with that implements the proposed changes?Also add a shortened version to the code comment pls.> > > > > > > The virtio spec explicitly states that the driver MAY read config > > > between reading and writing the features so saying that first accessing > > > the config before feature negotiation is done is not an option. The > > > specification ain't clear about setting the features multiple times > > > before FEATURES_OK, so I guess that should be fine to set F_VERSION_1 > > > since at this point we already know that we are about to negotiate > > > F_VERSION_1. > > > > > > I don't consider this patch super clean, but frankly I don't think we > > > have a ton of options. Another option that may or man not be cleaner, > > > but is also IMHO much uglier is to figure out whether the device is > > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding > > > according tho what we have figured out, hoping that the characteristics > > > of the device didn't change. > > > > An empty line before tags. > > > > Sure! > > > > Signed-off-by: Halil Pasic <pasic at linux.ibm.com> > > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") > > > Reported-by: markver at us.ibm.com > > > > Let's add more commits that are affected. E.g. virtio-net with MTU > > feature bit set is affected too. > > > > So let's add Fixes tag for: > > commit 14de9d114a82a564b94388c95af79a701dc93134 > > Author: Aaron Conole <aconole at redhat.com> > > Date: Fri Jun 3 16:57:12 2016 -0400 > > > > virtio-net: Add initial MTU advice feature > > > > I believe drv->probe(dev) is called after the real finalize, so > that access should be fine or? > > Don't we just have to look out for verify?you mean validate.> Isn't the problematic commit fe36cbe0671e ("virtio_net: clear MTU when > out of range")?exactly.> The problem with commit 14de9d114a82a is that the device won't know, > the driver didn't take the advice (for the MTU because it deemed its > value invalid). But that doesn't really hurt us. > On the other hand with fe36cbe0671e we may deem a valid MTU in the > config space invalid because of the endiannes mess-up. I that case > we would discard a perfectly good MTU advice.right.> > > I think that's all, but pls double check me. > > > Looks good! > $ git grep -e '\.validate' -- '*virtio*' > drivers/block/virtio_blk.c: .validate = virtblk_validate, > drivers/firmware/arm_scmi/virtio.c: .validate = scmi_vio_validate, > drivers/net/virtio_net.c: .validate = virtnet_validate, > drivers/virtio/virtio_balloon.c: .validate = virtballoon_validate, > sound/virtio/virtio_card.c: .validate = virtsnd_validate, > > But only blk and net access config space from validate. > > > > > > > > --- > > > drivers/virtio/virtio.c | 6 ++++++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > > index 0a5b54034d4b..2b9358f2e22a 100644 > > > --- a/drivers/virtio/virtio.c > > > +++ b/drivers/virtio/virtio.c > > > @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d) > > > driver_features_legacy = driver_features; > > > } > > > > > > + /* Write F_VERSION_1 feature to pin down endianness */ > > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) { > > > + dev->features = (1ULL << VIRTIO_F_VERSION_1); > > > + dev->config->finalize_features(dev); > > > + } > > > + > > > if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > > > dev->features = driver_features & device_features; > > > else > > > -- > > > 2.31.1 > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Virtualization mailing list > > Virtualization at lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Cornelia Huck
2021-Oct-05 11:13 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05 2021, Halil Pasic <pasic at linux.ibm.com> wrote:> On Mon, 4 Oct 2021 05:07:13 -0400 > "Michael S. Tsirkin" <mst at redhat.com> wrote: >> Well we established that we can know. Here's an alternative explanation: > > > I thin we established how this should be in the future, where a transport > specific mechanism is used to decide are we operating in legacy mode or > in modern mode. But with the current QEMU reality, I don't think so. > Namely currently the switch native-endian config -> little endian config > happens when the VERSION_1 is negotiated, which may happen whenever > the VERSION_1 bit is changed, or only when FEATURES_OK is set > (vhost-user). > > This is consistent with device should detect a legacy driver by checking > for VERSION_1, which is what the spec currently says. > > So for transitional we start out with native-endian config. For modern > only the config is always LE. > > The guest can distinguish between a legacy only device and a modern > capable device after the revision negotiation. A legacy device would > reject the CCW. > > But both a transitional device and a modern only device would accept > a revision > 0. So the guest does not know for ccw.Well, for pci I think the driver knows that it is using either legacy or modern, no? And for ccw, the driver knows at that point in time which revision it negotiated, so it should know that a revision > 0 will use LE (and the device will obviously know that as well.) Or am I misunderstanding what you're getting at?