Michael S. Tsirkin
2021-Oct-04 13:11 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:> > [cc:qemu-devel] > > On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote: > >> On Thu, 30 Sep 2021 07:12:21 -0400 > >> "Michael S. Tsirkin" <mst at redhat.com> wrote: > >> > >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote: > >> > > This patch fixes a regression introduced by commit 82e89ea077b9 > >> > > ("virtio-blk: Add validation for block size in config space") and > >> > > enables similar checks in verify() on big endian platforms. > >> > > > >> > > The problem with checking multi-byte config fields in the verify > >> > > callback, on big endian platforms, and with a possibly transitional > >> > > device is the following. The verify() callback is called between > >> > > config->get_features() and virtio_finalize_features(). That we have a > >> > > device that offered F_VERSION_1 then we have the following options > >> > > either the device is transitional, and then it has to present the legacy > >> > > interface, i.e. a big endian config space until F_VERSION_1 is > >> > > negotiated, or we have a non-transitional device, which makes > >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and > >> > > thus presents a little endian config space. Because at this point we > >> > > can't know if the device is transitional or non-transitional, we can't > >> > > know do we need to byte swap or not. > >> > > >> > Hmm which transport does this refer to? > >> > >> It is the same with virtio-ccw and virtio-pci. I see the same problem > >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional > >> yet (have to figure out how to do that with libvirt) for pci I used > >> virtio-blk-pci. > >> > >> > Distinguishing between legacy and modern drivers is transport > >> > specific. PCI presents > >> > legacy and modern at separate addresses so distinguishing > >> > between these two should be no trouble. > >> > >> You mean the device id? Yes that is bolted down in the spec, but > >> currently we don't exploit that information. Furthermore there > >> is a fat chance that with QEMU even the allegedly non-transitional > >> devices only present a little endian config space after VERSION_1 > >> was negotiated. Namely get_config for virtio-blk is implemented in > >> virtio_blk_update_config() which does virtio_stl_p(vdev, > >> &blkcfg.blk_size, blk_size) and in there we don't care > >> about transitional or not: > >> > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) > >> { > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) > >> return virtio_is_big_endian(vdev); > >> #elif defined(TARGET_WORDS_BIGENDIAN) > >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { > >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ > >> return false; > >> } > >> return true; > >> #else > >> return false; > >> #endif > >> } > >> > > > > ok so that's a QEMU bug. Any virtio 1.0 and up > > compatible device must use LE. > > It can also present a legacy config space where the > > endian depends on the guest. > > So, how is the virtio core supposed to determine this? A > transport-specific callback?I'd say a field in VirtIODevice is easiest.> > > >> > Channel i/o has versioning so same thing? > >> > > >> > >> Don't think so. Both a transitional and a non-transitional device > >> would have to accept revisions higher than 0 if the driver tried to > >> negotiate those (and we do in our case). > > > > Yes, the modern driver does. And that one is known to be LE. > > legacy driver doesn't. > > > >> > > The virtio spec explicitly states that the driver MAY read config > >> > > between reading and writing the features so saying that first accessing > >> > > the config before feature negotiation is done is not an option. The > >> > > specification ain't clear about setting the features multiple times > >> > > before FEATURES_OK, so I guess that should be fine. > >> > > > >> > > I don't consider this patch super clean, but frankly I don't think we > >> > > have a ton of options. Another option that may or man not be cleaner, > >> > > but is also IMHO much uglier is to figure out whether the device is > >> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding > >> > > according tho what we have figured out, hoping that the characteristics > >> > > of the device didn't change. > >> > > >> > I am confused here. So is the problem at the device or at the driver level? > >> > >> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add > >> validation for block size in config space") virtio-blk is broken on > >> s390. > > > > Because of a qemu bug. I agree. It's worth working around in the driver > > since the qemu bug has been around for a very long time. > > Yes, since we introduced virtio 1 support, I guess... > > > > > > >> The deeper problem is in the spec. We stated that the driver may read > >> config space before the feature negotiation is finalized, but we didn't > >> think enough about what happens when native endiannes is not little > >> endian in the different cases. > > > > Because the spec is very clear that endian-ness is LE. > > I don't see a spec issue yet here, just an implementation issue. > > Maybe not really a bug in the spec, but probably an issue, as this seems > to have been unclear to most people so far. > > > > >> I believe, for non-transitional devices we have a problem in the host as > >> well (i.e. in QEMU). > > > > Because QEMU ignores the spec and instead relies on the feature > > negotiation. > > > >> > >> > I suspect it's actually the host that has the issue, not > >> > the guest? > >> > >> I tend to say we have a problem both in the host and in the guest. I'm > >> more concerned about the problem in the guest, because that is a really > >> nasty regression. > > > > The problem is in the guest. The bug is in the host ;) > > > >> For the host. I think for legacy we don't have a > >> problem, because both sides would operate on the assumption no > >> _F_VERSION_1, IMHO the implementation for the transitional devices is > >> correct. > > > > Well no, the point of transitional is really to be 1.0 compliant > > *and* also expose a legacy interface. > > Worth noting that PCI and CCW are a tad different here: PCI exposes an > additional interface, while CCW uses a revision negotiation mechanism > (for CCW, legacy and standard-compliant are much closer on the transport > side as for PCI.) MMIO does not do transitional, if I'm not wrong.Right. It probably still uses VIRTIO_F_VERSION_1 and we need to fix that.> > > >> For non-transitional flavor, it depends on the device. For > >> example virtio-net and virtio-blk is broken, because we use primitives > >> like virtio_stl_p() and those don't do the right thing before feature > >> negotiation is completed. On the other hand virtio-crypto.c as a truly > >> non-transitional device uses stl_le_p() and IMHO does the right thing. > >> > >> Thanks for your comments! I hope I managed to answer your questions. I > >> need some guidance on how do we want to move forward on this. > >> > >> Regards, > >> Halil > > > > OK so. I don't have a problem with the patch itself, > > assuming it's enough to work around all buggy hosts. > > I am especially worried about things like vhost/vhost-user, > > I suspect they might have a bug like this too, and > > I am not sure whether your work around is enough for these. > > Can you check please? > > > > If not we'll have to move all validate code to after FEATURES_OK > > is set. > > What is supposed to happen for validate after FEATURES_OK? The driver > cannot change any features at that point in time, it can only fail to > use the device.Fail to use the device. Need to tread carefully here of course, we don't want to break working setups.> > > > We do however want to document that this API can be called > > multiple times since that was not the case > > previously. > > > > Also, I would limit this to when > > - the validate callback exists > > - the guest endian-ness is not LE > > > > We also want to document the QEMU bug in a comment here, > > e.g. > > > > /* > > * QEMU before version 6.2 incorrectly uses driver features with guest > > * endian-ness to set endian-ness for config space instead of just using > > * LE for the modern interface as per spec. > > * This breaks reading config in the validate callback. > > * To work around that, when device is 1.0 (so supposed to be LE) > > * but guest is not LE, then send the features to device one extra > > * time before validation. > > */ > > Do we need to consider migration, or do we not need to be bug-compatible > in this case?I suspect we don't need to be bug compatible, any driver accessing config before FEATURES_OK is already broken ...> > > > Finally I'd like to see the QEMU bug fix before I merge this one, > > since it will be harder to test with a fix. > > > > > > > > > >> > > >> > > >> > > Signed-off-by: Halil Pasic <pasic at linux.ibm.com> > >> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") > >> > > Reported-by: markver at us.ibm.com > >> > > --- > >> > > drivers/virtio/virtio.c | 4 ++++ > >> > > 1 file changed, 4 insertions(+) > >> > > > >> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > >> > > index 0a5b54034d4b..9dc3cfa17b1c 100644 > >> > > --- a/drivers/virtio/virtio.c > >> > > +++ b/drivers/virtio/virtio.c > >> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d) > >> > > if (device_features & (1ULL << i)) > >> > > __virtio_set_bit(dev, i); > >> > > > >> > > + /* Write back features before validate to know endianness */ > >> > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > >> > > + dev->config->finalize_features(dev); > >> > > + > >> > > if (drv->validate) { > >> > > err = drv->validate(dev); > >> > > if (err) > >> > > > >> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb > >> > > -- > >> > > 2.25.1 > >> >
Cornelia Huck
2021-Oct-04 14:33 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote: >> >> [cc:qemu-devel] >> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst at redhat.com> wrote: >> >> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote: >> >> On Thu, 30 Sep 2021 07:12:21 -0400 >> >> "Michael S. Tsirkin" <mst at redhat.com> wrote: >> >> >> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote: >> >> > > This patch fixes a regression introduced by commit 82e89ea077b9 >> >> > > ("virtio-blk: Add validation for block size in config space") and >> >> > > enables similar checks in verify() on big endian platforms. >> >> > > >> >> > > The problem with checking multi-byte config fields in the verify >> >> > > callback, on big endian platforms, and with a possibly transitional >> >> > > device is the following. The verify() callback is called between >> >> > > config->get_features() and virtio_finalize_features(). That we have a >> >> > > device that offered F_VERSION_1 then we have the following options >> >> > > either the device is transitional, and then it has to present the legacy >> >> > > interface, i.e. a big endian config space until F_VERSION_1 is >> >> > > negotiated, or we have a non-transitional device, which makes >> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and >> >> > > thus presents a little endian config space. Because at this point we >> >> > > can't know if the device is transitional or non-transitional, we can't >> >> > > know do we need to byte swap or not. >> >> > >> >> > Hmm which transport does this refer to? >> >> >> >> It is the same with virtio-ccw and virtio-pci. I see the same problem >> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional >> >> yet (have to figure out how to do that with libvirt) for pci I used >> >> virtio-blk-pci. >> >> >> >> > Distinguishing between legacy and modern drivers is transport >> >> > specific. PCI presents >> >> > legacy and modern at separate addresses so distinguishing >> >> > between these two should be no trouble. >> >> >> >> You mean the device id? Yes that is bolted down in the spec, but >> >> currently we don't exploit that information. Furthermore there >> >> is a fat chance that with QEMU even the allegedly non-transitional >> >> devices only present a little endian config space after VERSION_1 >> >> was negotiated. Namely get_config for virtio-blk is implemented in >> >> virtio_blk_update_config() which does virtio_stl_p(vdev, >> >> &blkcfg.blk_size, blk_size) and in there we don't care >> >> about transitional or not: >> >> >> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) >> >> { >> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) >> >> return virtio_is_big_endian(vdev); >> >> #elif defined(TARGET_WORDS_BIGENDIAN) >> >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { >> >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ >> >> return false; >> >> } >> >> return true; >> >> #else >> >> return false; >> >> #endif >> >> } >> >> >> > >> > ok so that's a QEMU bug. Any virtio 1.0 and up >> > compatible device must use LE. >> > It can also present a legacy config space where the >> > endian depends on the guest. >> >> So, how is the virtio core supposed to determine this? A >> transport-specific callback? > > I'd say a field in VirtIODevice is easiest.The transport needs to set this as soon as it has figured out whether we're using legacy or not. I guess we also need to fence off any accesses respectively error out the device if the driver tries any read/write operations that would depend on that knowledge? And using a field in VirtIODevice would probably need some care when migrating. Hm...
Halil Pasic
2021-Oct-05 07:25 UTC
[RFC PATCH 1/1] virtio: write back features before verify
On Mon, 4 Oct 2021 09:11:04 -0400 "Michael S. Tsirkin" <mst at redhat.com> wrote:> > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) > > >> { > > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) > > >> return virtio_is_big_endian(vdev); > > >> #elif defined(TARGET_WORDS_BIGENDIAN) > > >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { > > >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ > > >> return false; > > >> } > > >> return true; > > >> #else > > >> return false; > > >> #endif > > >> } > > >> > > > > > > ok so that's a QEMU bug. Any virtio 1.0 and up > > > compatible device must use LE. > > > It can also present a legacy config space where the > > > endian depends on the guest. > > > > So, how is the virtio core supposed to determine this? A > > transport-specific callback? > > I'd say a field in VirtIODevice is easiest.Wouldn't a call from transport code into virtio core be more handy? What I have in mind is stuff like vhost-user and vdpa. My understanding is, that for vhost setups where the config is outside qemu, we probably need a new command that tells the vhost backend what endiannes to use for config. I don't think we can use VHOST_USER_SET_VRING_ENDIAN because that one is on a virtqueue basis according to the doc. So for vhost-user and similar we would fire that command and probably also set the filed, while for devices for which control plane is handled by QEMU we would just set the field. Does that sound about right?