Halil Pasic
2021-Oct-01 14:22 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Thu, 30 Sep 2021 13:31:04 +0200 Cornelia Huck <cohuck at redhat.com> wrote:> On Thu, Sep 30 2021, Halil Pasic <pasic at linux.ibm.com> wrote: > > > On Thu, 30 Sep 2021 11:28:23 +0200 > > Cornelia Huck <cohuck at redhat.com> wrote: > > > >> On Thu, Sep 30 2021, Halil Pasic <pasic at linux.ibm.com> wrote: > >> > >> > This patch fixes a regression introduced by commit 82e89ea077b9 > >> > ("virtio-blk: Add validation for block size in config space") and > >> > enables similar checks in verify() on big endian platforms. > >> > > >> > The problem with checking multi-byte config fields in the verify > >> > callback, on big endian platforms, and with a possibly transitional > >> > device is the following. The verify() callback is called between > >> > config->get_features() and virtio_finalize_features(). That we have a > >> > device that offered F_VERSION_1 then we have the following options > >> > either the device is transitional, and then it has to present the legacy > >> > interface, i.e. a big endian config space until F_VERSION_1 is > >> > negotiated, or we have a non-transitional device, which makes > >> > F_VERSION_1 mandatory, and only implements the non-legacy interface and > >> > thus presents a little endian config space. Because at this point we > >> > can't know if the device is transitional or non-transitional, we can't > >> > know do we need to byte swap or not. > >> > > >> > The virtio spec explicitly states that the driver MAY read config > >> > between reading and writing the features so saying that first accessing > >> > the config before feature negotiation is done is not an option. The > >> > specification ain't clear about setting the features multiple times > >> > before FEATURES_OK, so I guess that should be fine. > >> > > >> > I don't consider this patch super clean, but frankly I don't think we > >> > have a ton of options. Another option that may or man not be cleaner, > >> > but is also IMHO much uglier is to figure out whether the device is > >> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding > >> > according tho what we have figured out, hoping that the characteristics > >> > of the device didn't change. > >> > > >> > Signed-off-by: Halil Pasic <pasic at linux.ibm.com> > >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") > >> > Reported-by: markver at us.ibm.com > >> > --- > >> > drivers/virtio/virtio.c | 4 ++++ > >> > 1 file changed, 4 insertions(+) > >> > > >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > >> > index 0a5b54034d4b..9dc3cfa17b1c 100644 > >> > --- a/drivers/virtio/virtio.c > >> > +++ b/drivers/virtio/virtio.c > >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d) > >> > if (device_features & (1ULL << i)) > >> > __virtio_set_bit(dev, i); > >> > > >> > + /* Write back features before validate to know endianness */ > >> > + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > >> > + dev->config->finalize_features(dev); > >> > >> This really looks like a mess :( > >> > >> We end up calling ->finalize_features twice: once before ->validate, and > >> once after, that time with the complete song and dance. The first time, > >> we operate on one feature set; after validation, we operate on another, > >> and there might be interdependencies between the two (like a that a bit > >> is cleared because of another bit, which would not happen if validate > >> had a chance to clear that bit before). > > > > Basically the second set is a subset of the first set. > > I don't think that's clear.Validate can only remove features, or? So I guess after validate is a subset of before validate.> > > > >> > >> I'm not sure whether that is even a problem in the spec: while the > >> driver may read the config before finally accepting features > > > > I'm not sure I'm following you. Let me please qoute the specification: > > """ > > 4. Read device feature bits, and write the subset of feature bits > > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. > > """ > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001 > > Yes, exactly, it MAY read before accepting features. How does the device > know whether the config space is little-endian or not? >Well that is what we are talking about. One can try to infer things from the spec. This reset dance I called ugly is probably the cleanest, because the spec says that re-nego should work.> > > >> , it does > >> not really make sense to do so before a feature bit as basic as > >> VERSION_1 which determines the endianness has been negotiated. > > > > Are you suggesting that ->verify() should be after > > virtio_finalize_features()? > > No, that would defeat the entire purpose of verify. After > virtio_finalize_features(), we are done with feature negotiation. >Exactly!> > Wouldn't > > that mean that verify() can't reject feature bits? But that is the whole > > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size > > in config space"). Do you think that the commit in question is > > conceptually flawed? My understanding of the verify is, that it is supposed > > to fence features and feature bits we can't support, e.g. because of > > config space things, but I may be wrong. > > No, that commit is not really flawed on its own, I think the whole > procedure may be problematic. >I agree! But that regression really hurts us. Maybe the best band-aid is to conditional-compile it (not compile the check if s390).> > > > The trouble is, feature bits are not negotiated one by one, but basically all > > at once. I suppose, I did the next best thing to first negotiating > > VERSION_1. > > We probably need to special-case VERSION_1 to move at least forward; > i.e. proceed as if we accepted it when reading the config space. > > The problem is that we do not know what the device assumes when we read > the config space prior to setting FEATURES_OK. It may assume > little-endian if it offered VERSION_1, or it may not. The spec does not > really say what happens before feature negotiation has finished. >No it does not, but I hope, the implementations we care the most about do little endian if VERSION_1 is set but FEATURES_OK is not yet done. A transitional device would have to act upon a feature that is set, because for legacy there is no FEATURES_OK. Where we can run into trouble is minimum required feature set, e.g. mandatory features. I will do some testing.> > > > > >> For > >> VERSION_1, we can probably go ahead and just assume that we will accept > >> it if offered, but what about other (future) bits? > > > > I don't quite understand. > > There might be other bits in the future that change how the config space > works. We cannot assume that any of those bits will be accepted if > offered; i.e. we need a special hack for VERSION_1.I tend to agree. What I didn't consider in this patch is that, setting bits does not only set bits, but may also change the device in a way, that clearing the bit would not change it back.> > > > > Anyway, how do you think we should solve this problem? > > This is a mess. For starters, we need to think about if we should do > something in the spec, and if yes, what.. Then, we can probably think > about how to implement that properly. >I agree.> As we have an error right now that is basically a regression, we > probably need a band-aid to keep going. Not sure if your patch is the > right approach, maybe we really need to special-case VERSION_1 (the > "assume we accepted it" hack mentioned above.) This will likely fix the > reported problem (I assume that is s390x on QEMU); do we know about > other VMMs? Any other big-endian architectures?I didn't quite get it. Would this hack take place in QEMU or in the guest kernel?> > Anyone have any better suggestions? >There is the conditional compile, as an option but I would not say it is better. Regards, Halil
Cornelia Huck
2021-Oct-01 15:18 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Fri, Oct 01 2021, Halil Pasic <pasic at linux.ibm.com> wrote:> On Thu, 30 Sep 2021 13:31:04 +0200 > Cornelia Huck <cohuck at redhat.com> wrote: > >> On Thu, Sep 30 2021, Halil Pasic <pasic at linux.ibm.com> wrote: >> >> > On Thu, 30 Sep 2021 11:28:23 +0200 >> > Cornelia Huck <cohuck at redhat.com> wrote: >> > >> >> On Thu, Sep 30 2021, Halil Pasic <pasic at linux.ibm.com> wrote: >> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d) >> >> > if (device_features & (1ULL << i)) >> >> > __virtio_set_bit(dev, i); >> >> > >> >> > + /* Write back features before validate to know endianness */ >> >> > + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) >> >> > + dev->config->finalize_features(dev); >> >> >> >> This really looks like a mess :( >> >> >> >> We end up calling ->finalize_features twice: once before ->validate, and >> >> once after, that time with the complete song and dance. The first time, >> >> we operate on one feature set; after validation, we operate on another, >> >> and there might be interdependencies between the two (like a that a bit >> >> is cleared because of another bit, which would not happen if validate >> >> had a chance to clear that bit before). >> > >> > Basically the second set is a subset of the first set. >> >> I don't think that's clear. > > Validate can only remove features, or? So I guess after validate > is a subset of before validate.I was thinking about (more-or-less hypothetical) interdependencies (see above). But that's not terribly important.> > >> >> > >> >> >> >> I'm not sure whether that is even a problem in the spec: while the >> >> driver may read the config before finally accepting features >> > >> > I'm not sure I'm following you. Let me please qoute the specification: >> > """ >> > 4. Read device feature bits, and write the subset of feature bits >> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. >> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. >> > """ >> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001 >> >> Yes, exactly, it MAY read before accepting features. How does the device >> know whether the config space is little-endian or not? >> > > Well that is what we are talking about. One can try to infer things from > the spec. This reset dance I called ugly is probably the cleanest, > because the spec says that re-nego should work. > >> > >> >> , it does >> >> not really make sense to do so before a feature bit as basic as >> >> VERSION_1 which determines the endianness has been negotiated. >> > >> > Are you suggesting that ->verify() should be after >> > virtio_finalize_features()? >> >> No, that would defeat the entire purpose of verify. After >> virtio_finalize_features(), we are done with feature negotiation. >> > > Exactly!It seems we are in violent agreement :)> >> > Wouldn't >> > that mean that verify() can't reject feature bits? But that is the whole >> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size >> > in config space"). Do you think that the commit in question is >> > conceptually flawed? My understanding of the verify is, that it is supposed >> > to fence features and feature bits we can't support, e.g. because of >> > config space things, but I may be wrong. >> >> No, that commit is not really flawed on its own, I think the whole >> procedure may be problematic. >> > > I agree! But that regression really hurts us. Maybe the best band-aid is > to conditional-compile it (not compile the check if s390).It's probably most likely to hit on s390 (big-endian, and devices with a blocksize != 512 in common use); but I'd like to make that band-aid more generic than "exclude for s390". A hack for honouring VERSION_1 before negotiation has finished is probably better as a stop-gap before we manage to figure out how to deal with this properly.> >> > >> > The trouble is, feature bits are not negotiated one by one, but basically all >> > at once. I suppose, I did the next best thing to first negotiating >> > VERSION_1. >> >> We probably need to special-case VERSION_1 to move at least forward; >> i.e. proceed as if we accepted it when reading the config space. >> >> The problem is that we do not know what the device assumes when we read >> the config space prior to setting FEATURES_OK. It may assume >> little-endian if it offered VERSION_1, or it may not. The spec does not >> really say what happens before feature negotiation has finished. >> > No it does not, but I hope, the implementations we care the most about do > little endian if VERSION_1 is set but FEATURES_OK is not yet done. A > transitional device would have to act upon a feature that is set, > because for legacy there is no FEATURES_OK. Where we can run into > trouble is minimum required feature set, e.g. mandatory features.All ugly :(> > I will do some testing. > >> > >> > >> >> For >> >> VERSION_1, we can probably go ahead and just assume that we will accept >> >> it if offered, but what about other (future) bits? >> > >> > I don't quite understand. >> >> There might be other bits in the future that change how the config space >> works. We cannot assume that any of those bits will be accepted if >> offered; i.e. we need a special hack for VERSION_1. > > I tend to agree. What I didn't consider in this patch is that, setting > bits does not only set bits, but may also change the device in a way, > that clearing the bit would not change it back. > >> >> > >> > Anyway, how do you think we should solve this problem? >> >> This is a mess. For starters, we need to think about if we should do >> something in the spec, and if yes, what.. Then, we can probably think >> about how to implement that properly. >> > > I agree. > > >> As we have an error right now that is basically a regression, we >> probably need a band-aid to keep going. Not sure if your patch is the >> right approach, maybe we really need to special-case VERSION_1 (the >> "assume we accepted it" hack mentioned above.) This will likely fix the >> reported problem (I assume that is s390x on QEMU); do we know about >> other VMMs? Any other big-endian architectures? > > I didn't quite get it. Would this hack take place in QEMU or in the guest > kernel?I'd say we need a hack here so that we assume little-endian config space if VERSION_1 has been offered; if your patch here works, I assume QEMU does what we expect (assmuming little-endian as well.) I'm mostly wondering what happens if you use a different VMM; can we expect it to work similar to QEMU? Even if it helps for s390, we should double-check what happens for other architectures.> >> >> Anyone have any better suggestions? >> > > There is the conditional compile, as an option but I would not say it is > better.Yes, I agree. Anyone else have an idea? This is a nasty regression; we could revert the patch, which would remove the symptoms and give us some time, but that doesn't really feel right, I'd do that only as a last resort.