Rusty Russell
2013-May-29 04:31 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori <aliguori at us.ibm.com> writes:> "Michael S. Tsirkin" <mst at redhat.com> writes: >> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): >> + return proxy->device_feature_select; > > Oh dear no... Please use defines like the rest of QEMU.It is pretty ugly. Yet the structure definitions are descriptive, capturing layout, size and endianness in natural a format readable by any C programmer. So AFAICT the question is, do we put the required #define VIRTIO_PCI_CFG_FEATURE_SEL \ (offsetof(struct virtio_pci_common_cfg, device_feature_select)) etc. in the kernel headers or qemu?> Haven't looked at the proposed new ring layout yet.No change, but there's an open question on whether we should nail it to little endian (or define the endian by the transport). Of course, I can't rule out that the 1.0 standard *may* decide to frob the ring layout somehow, but I'd think it would require a compelling reason. I suggest that's 2.0 material... Cheers, Rusty.
Michael S. Tsirkin
2013-May-29 08:24 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:> Anthony Liguori <aliguori at us.ibm.com> writes: > > "Michael S. Tsirkin" <mst at redhat.com> writes: > >> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >> + return proxy->device_feature_select; > > > > Oh dear no... Please use defines like the rest of QEMU. > > It is pretty ugly. > > Yet the structure definitions are descriptive, capturing layout, size > and endianness in natural a format readable by any C programmer. > > So AFAICT the question is, do we put the required > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > etc. in the kernel headers or qemu?I forgot that we need to validate size (different fields have different sizes so memory core does not validate for us). And that is much cleaner if we use offsetof directly: You can see that you are checking the correct size matching the offset, at a glance. ---> virtio: new layout: add offset validation Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index f4db224..fd09ea7 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, { VirtIOPCIProxy *proxy = opaque; VirtIODevice *vdev = proxy->vdev; + struct virtio_pci_common_cfg cfg; uint64_t low = 0xffffffffull; switch (addr) { case offsetof(struct virtio_pci_common_cfg, device_feature_select): + assert(size == sizeof cfg.device_feature_select); return proxy->device_feature_select; case offsetof(struct virtio_pci_common_cfg, device_feature): + assert(size == sizeof cfg.device_feature); /* TODO: 64-bit features */ return proxy->device_feature_select ? 0 : proxy->host_features; case offsetof(struct virtio_pci_common_cfg, guest_feature_select): + assert(size == sizeof cfg.guest_feature_select); return proxy->guest_feature_select; case offsetof(struct virtio_pci_common_cfg, guest_feature): + assert(size == sizeof cfg.guest_feature); /* TODO: 64-bit features */ return proxy->guest_feature_select ? 0 : vdev->guest_features; case offsetof(struct virtio_pci_common_cfg, msix_config): + assert(size == sizeof cfg.msix_config); return vdev->config_vector; case offsetof(struct virtio_pci_common_cfg, num_queues): + assert(size == sizeof cfg.num_queues); /* TODO: more exact limit? */ return VIRTIO_PCI_QUEUE_MAX; case offsetof(struct virtio_pci_common_cfg, device_status): + assert(size == sizeof cfg.device_status); return vdev->status; /* About a specific virtqueue. */ case offsetof(struct virtio_pci_common_cfg, queue_select): + assert(size == sizeof cfg.queue_select); return vdev->queue_sel; case offsetof(struct virtio_pci_common_cfg, queue_size): + assert(size == sizeof cfg.queue_size); return virtio_queue_get_num(vdev, vdev->queue_sel); case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): + assert(size == sizeof cfg.queue_msix_vector); return virtio_queue_vector(vdev, vdev->queue_sel); case offsetof(struct virtio_pci_common_cfg, queue_enable): + assert(size == sizeof cfg.queue_enable); /* TODO */ return 0; case offsetof(struct virtio_pci_common_cfg, queue_notify_off): + assert(size == sizeof cfg.queue_notify_off); return vdev->queue_sel; case offsetof(struct virtio_pci_common_cfg, queue_desc): + assert(size == 4); return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: + assert(size == 4); return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32; case offsetof(struct virtio_pci_common_cfg, queue_avail): + assert(size == 4); return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: + assert(size == 4); return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32; case offsetof(struct virtio_pci_common_cfg, queue_used): + assert(size == 4); return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: + assert(size == 4); return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32; default: return 0; @@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque, hwaddr addr, { VirtIOPCIProxy *proxy = opaque; VirtIODevice *vdev = proxy->vdev; + struct virtio_pci_common_cfg cfg; uint64_t low = 0xffffffffull; uint64_t high = ~low; switch (addr) { case offsetof(struct virtio_pci_common_cfg, device_feature_select): + assert(size == sizeof cfg.device_feature_select); proxy->device_feature_select = val; break; case offsetof(struct virtio_pci_common_cfg, device_feature): + assert(size == sizeof cfg.device_feature); break; case offsetof(struct virtio_pci_common_cfg, guest_feature_select): + assert(size == sizeof cfg.guest_feature_select); proxy->guest_feature_select = val; break; case offsetof(struct virtio_pci_common_cfg, guest_feature): + assert(size == sizeof cfg.guest_feature); /* TODO: 64-bit features */ if (!proxy->guest_feature_select) { virtio_set_features(vdev, val); } break; case offsetof(struct virtio_pci_common_cfg, msix_config): + assert(size == sizeof cfg.msix_config); vdev->config_vector = val; break; case offsetof(struct virtio_pci_common_cfg, num_queues): + assert(size == sizeof cfg.num_queues); break; case offsetof(struct virtio_pci_common_cfg, device_status): + assert(size == sizeof cfg.device_status); virtio_pci_set_status(proxy, val); break; /* About a specific virtqueue. */ case offsetof(struct virtio_pci_common_cfg, queue_select): + assert(size == sizeof cfg.queue_select); assert(val < VIRTIO_PCI_QUEUE_MAX); vdev->queue_sel = val; break; case offsetof(struct virtio_pci_common_cfg, queue_size): + assert(size == sizeof cfg.queue_size); assert(val && val < 0x8ffff && !(val & (val - 1))); virtio_queue_set_num(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): + assert(size == sizeof cfg.queue_msix_vector); virtio_queue_set_vector(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_enable): + assert(size == sizeof cfg.queue_enable); /* TODO */ break; case offsetof(struct virtio_pci_common_cfg, queue_notify_off): + assert(size == sizeof cfg.queue_notify_off); break; case offsetof(struct virtio_pci_common_cfg, queue_desc): + assert(size == 4); val &= low; val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & high; virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: + assert(size == 4); val = val << 32; val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_avail): + assert(size == 4); val &= low; val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & high; virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: + assert(size == 4); val = val << 32; val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_used): + assert(size == 4); val &= low; val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & high; virtio_queue_set_used_addr(vdev, vdev->queue_sel, val); break; case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: + assert(size == 4); val = val << 32; val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; virtio_queue_set_used_addr(vdev, vdev->queue_sel, val);
Paolo Bonzini
2013-May-29 08:52 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 10:24, Michael S. Tsirkin ha scritto:> On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote: >> Anthony Liguori <aliguori at us.ibm.com> writes: >>> "Michael S. Tsirkin" <mst at redhat.com> writes: >>>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>>> + return proxy->device_feature_select; >>> >>> Oh dear no... Please use defines like the rest of QEMU. >> >> It is pretty ugly. >> >> Yet the structure definitions are descriptive, capturing layout, size >> and endianness in natural a format readable by any C programmer. >> >> So AFAICT the question is, do we put the required >> >> #define VIRTIO_PCI_CFG_FEATURE_SEL \ >> (offsetof(struct virtio_pci_common_cfg, device_feature_select)) >> >> etc. in the kernel headers or qemu? > > > I forgot that we need to validate size (different fields > have different sizes so memory core does not validate > for us). > > And that is much cleaner if we use offsetof directly: > You can see that you are checking the correct > size matching the offset, at a glance.I wonder if we can use and possibly extend Peter Crosthwaite's "register API" to support this better. Paolo> ---> > > virtio: new layout: add offset validation > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > > --- > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index f4db224..fd09ea7 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = proxy->vdev; > + struct virtio_pci_common_cfg cfg; > > uint64_t low = 0xffffffffull; > > switch (addr) { > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > + assert(size == sizeof cfg.device_feature_select); > return proxy->device_feature_select; > case offsetof(struct virtio_pci_common_cfg, device_feature): > + assert(size == sizeof cfg.device_feature); > /* TODO: 64-bit features */ > return proxy->device_feature_select ? 0 : proxy->host_features; > case offsetof(struct virtio_pci_common_cfg, guest_feature_select): > + assert(size == sizeof cfg.guest_feature_select); > return proxy->guest_feature_select; > case offsetof(struct virtio_pci_common_cfg, guest_feature): > + assert(size == sizeof cfg.guest_feature); > /* TODO: 64-bit features */ > return proxy->guest_feature_select ? 0 : vdev->guest_features; > case offsetof(struct virtio_pci_common_cfg, msix_config): > + assert(size == sizeof cfg.msix_config); > return vdev->config_vector; > case offsetof(struct virtio_pci_common_cfg, num_queues): > + assert(size == sizeof cfg.num_queues); > /* TODO: more exact limit? */ > return VIRTIO_PCI_QUEUE_MAX; > case offsetof(struct virtio_pci_common_cfg, device_status): > + assert(size == sizeof cfg.device_status); > return vdev->status; > > /* About a specific virtqueue. */ > case offsetof(struct virtio_pci_common_cfg, queue_select): > + assert(size == sizeof cfg.queue_select); > return vdev->queue_sel; > case offsetof(struct virtio_pci_common_cfg, queue_size): > + assert(size == sizeof cfg.queue_size); > return virtio_queue_get_num(vdev, vdev->queue_sel); > case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): > + assert(size == sizeof cfg.queue_msix_vector); > return virtio_queue_vector(vdev, vdev->queue_sel); > case offsetof(struct virtio_pci_common_cfg, queue_enable): > + assert(size == sizeof cfg.queue_enable); > /* TODO */ > return 0; > case offsetof(struct virtio_pci_common_cfg, queue_notify_off): > + assert(size == sizeof cfg.queue_notify_off); > return vdev->queue_sel; > case offsetof(struct virtio_pci_common_cfg, queue_desc): > + assert(size == 4); > return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: > + assert(size == 4); > return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32; > case offsetof(struct virtio_pci_common_cfg, queue_avail): > + assert(size == 4); > return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: > + assert(size == 4); > return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32; > case offsetof(struct virtio_pci_common_cfg, queue_used): > + assert(size == 4); > return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: > + assert(size == 4); > return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32; > default: > return 0; > @@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque, hwaddr addr, > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = proxy->vdev; > + struct virtio_pci_common_cfg cfg; > > uint64_t low = 0xffffffffull; > uint64_t high = ~low; > > switch (addr) { > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > + assert(size == sizeof cfg.device_feature_select); > proxy->device_feature_select = val; > break; > case offsetof(struct virtio_pci_common_cfg, device_feature): > + assert(size == sizeof cfg.device_feature); > break; > case offsetof(struct virtio_pci_common_cfg, guest_feature_select): > + assert(size == sizeof cfg.guest_feature_select); > proxy->guest_feature_select = val; > break; > case offsetof(struct virtio_pci_common_cfg, guest_feature): > + assert(size == sizeof cfg.guest_feature); > /* TODO: 64-bit features */ > if (!proxy->guest_feature_select) { > virtio_set_features(vdev, val); > } > break; > case offsetof(struct virtio_pci_common_cfg, msix_config): > + assert(size == sizeof cfg.msix_config); > vdev->config_vector = val; > break; > case offsetof(struct virtio_pci_common_cfg, num_queues): > + assert(size == sizeof cfg.num_queues); > break; > case offsetof(struct virtio_pci_common_cfg, device_status): > + assert(size == sizeof cfg.device_status); > virtio_pci_set_status(proxy, val); > break; > /* About a specific virtqueue. */ > case offsetof(struct virtio_pci_common_cfg, queue_select): > + assert(size == sizeof cfg.queue_select); > assert(val < VIRTIO_PCI_QUEUE_MAX); > vdev->queue_sel = val; > break; > case offsetof(struct virtio_pci_common_cfg, queue_size): > + assert(size == sizeof cfg.queue_size); > assert(val && val < 0x8ffff && !(val & (val - 1))); > virtio_queue_set_num(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): > + assert(size == sizeof cfg.queue_msix_vector); > virtio_queue_set_vector(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_enable): > + assert(size == sizeof cfg.queue_enable); > /* TODO */ > break; > case offsetof(struct virtio_pci_common_cfg, queue_notify_off): > + assert(size == sizeof cfg.queue_notify_off); > break; > case offsetof(struct virtio_pci_common_cfg, queue_desc): > + assert(size == 4); > val &= low; > val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & high; > virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: > + assert(size == 4); > val = val << 32; > val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; > virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_avail): > + assert(size == 4); > val &= low; > val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & high; > virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: > + assert(size == 4); > val = val << 32; > val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; > virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_used): > + assert(size == 4); > val &= low; > val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & high; > virtio_queue_set_used_addr(vdev, vdev->queue_sel, val); > break; > case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: > + assert(size == 4); > val = val << 32; > val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; > virtio_queue_set_used_addr(vdev, vdev->queue_sel, val); >
Peter Maydell
2013-May-29 09:00 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On 29 May 2013 09:24, Michael S. Tsirkin <mst at redhat.com> wrote:> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index f4db224..fd09ea7 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = proxy->vdev; > + struct virtio_pci_common_cfg cfg; > > uint64_t low = 0xffffffffull; > > switch (addr) { > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > + assert(size == sizeof cfg.device_feature_select); > return proxy->device_feature_select;Asserting is definitely the wrong thing here, since the guest can trigger it. If you really want to use offsetof like this you're going to need to decorate the structs with QEMU_PACKED. thanks -- PMM
Anthony Liguori
2013-May-29 12:52 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell <rusty at rustcorp.com.au> writes:> Anthony Liguori <aliguori at us.ibm.com> writes: >> "Michael S. Tsirkin" <mst at redhat.com> writes: >>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>> + return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. > > It is pretty ugly.I think beauty is in the eye of the beholder here... Pretty much every device we have has a switch statement like this. Consistency wins when it comes to qualitative arguments like this.> Yet the structure definitions are descriptive, capturing layout, size > and endianness in natural a format readable by any C programmer.>From an API design point of view, here are the problems I see:1) C makes no guarantees about structure layout beyond the first member. Yes, if it's naturally aligned or has a packed attribute, GCC does the right thing. But this isn't kernel land anymore, portability matters and there are more compilers than GCC. 2) If we every introduce anything like latching, this doesn't work out so well anymore because it's hard to express in a single C structure the register layout at that point. Perhaps a union could be used but padding may make it a bit challenging. 3) It suspect it's harder to review because a subtle change could more easily have broad impact. If someone changed the type of a field from u32 to u16, it changes the offset of every other field. That's not terribly obvious in the patch itself unless you understand how the structure is used elsewhere. This may not be a problem for virtio because we all understand that the structures are part of an ABI, but if we used this pattern more in QEMU, it would be a lot less obvious.> So AFAICT the question is, do we put the required > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > etc. in the kernel headers or qemu?I'm pretty sure we would end up just having our own integer defines. We carry our own virtio headers today because we can't easily import the kernel headers.>> Haven't looked at the proposed new ring layout yet. > > No change, but there's an open question on whether we should nail it to > little endian (or define the endian by the transport). > > Of course, I can't rule out that the 1.0 standard *may* decide to frob > the ring layout somehow,Well, given that virtio is widely deployed today, I would think the 1.0 standard should strictly reflect what's deployed today, no? Any new config layout would be 2.0 material, right? Re: the new config layout, I don't think we would want to use it for anything but new devices. Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. Regards, Anthony Liguori> but I'd think it would require a compelling > reason. I suggest that's 2.0 material... > > Cheers, > Rusty. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin
2013-May-29 13:24 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:> Rusty Russell <rusty at rustcorp.com.au> writes: > > > Anthony Liguori <aliguori at us.ibm.com> writes: > >> "Michael S. Tsirkin" <mst at redhat.com> writes: > >>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >>> + return proxy->device_feature_select; > >> > >> Oh dear no... Please use defines like the rest of QEMU. > > > > It is pretty ugly. > > I think beauty is in the eye of the beholder here... > > Pretty much every device we have has a switch statement like this. > Consistency wins when it comes to qualitative arguments like this. > > > Yet the structure definitions are descriptive, capturing layout, size > > and endianness in natural a format readable by any C programmer. > > >From an API design point of view, here are the problems I see: > > 1) C makes no guarantees about structure layout beyond the first > member. Yes, if it's naturally aligned or has a packed attribute, > GCC does the right thing. But this isn't kernel land anymore, > portability matters and there are more compilers than GCC.You expect a compiler to pad this structure: struct foo { uint8_t a; uint8_t b; uint16_t c; uint32_t d; }; I'm guessing any compiler that decides to waste memory in this way will quickly get dropped by users and then we won't worry about building QEMU with it.> 2) If we every introduce anything like latching, this doesn't work out > so well anymore because it's hard to express in a single C structure > the register layout at that point. Perhaps a union could be used but > padding may make it a bit challenging.Then linux won't use it either.> 3) It suspect it's harder to review because a subtle change could more > easily have broad impact. If someone changed the type of a field > from u32 to u16, it changes the offset of every other field. That's > not terribly obvious in the patch itself unless you understand how > the structure is used elsewhere. > > This may not be a problem for virtio because we all understand that > the structures are part of an ABI, but if we used this pattern more > in QEMU, it would be a lot less obvious.So let's not use it more in QEMU.> > So AFAICT the question is, do we put the required > > > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > > > etc. in the kernel headers or qemu? > > I'm pretty sure we would end up just having our own integer defines. We > carry our own virtio headers today because we can't easily import the > kernel headers.Yes we can easily import them. And at least we copy headers verbatim.> >> Haven't looked at the proposed new ring layout yet. > > > > No change, but there's an open question on whether we should nail it to > > little endian (or define the endian by the transport). > > > > Of course, I can't rule out that the 1.0 standard *may* decide to frob > > the ring layout somehow, > > Well, given that virtio is widely deployed today, I would think the 1.0 > standard should strictly reflect what's deployed today, no? > Any new config layout would be 2.0 material, right?Not as it's currently planned. Devices can choose to support a legacy layout in addition to the new one, and if you look at the patch you will see that that is exactly what it does.> Re: the new config layout, I don't think we would want to use it for > anything but new devices. Forcing a guest driver changeThere's no forcing. If you look at the patches closely, you will see that we still support the old layout on BAR0.> is a really big > deal and I see no reason to do that unless there's a compelling reason > to.There are many a compelling reasons, and they are well known limitations of virtio PCI: - PCI spec compliance (madates device operation with IO memory disabled). - support 64 bit addressing - add more than 32 feature bits. - individually disable queues. - sanely support cross-endian systems. - support very small (<1 PAGE) for virtio rings. - support a separate page for each vq kick. - make each device place config at flexible offset. Addressing any one of these would cause us to add a substantially new way to operate virtio devices. And since it's a guest change anyway, it seemed like a good time to do the new layout and fix everything in one go. And they are needed like yesterday.> So we're stuck with the 1.0 config layout for a very long time. > > Regards, > > Anthony LiguoriAbsolutely. This patch let us support both which will allow for a gradual transition over the next 10 years or so.> > reason. I suggest that's 2.0 material... > > > > Cheers, > > Rusty. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html
Rusty Russell
2013-May-30 03:58 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori <aliguori at us.ibm.com> writes:> Rusty Russell <rusty at rustcorp.com.au> writes: > >> Anthony Liguori <aliguori at us.ibm.com> writes: >>> "Michael S. Tsirkin" <mst at redhat.com> writes: >>>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>>> + return proxy->device_feature_select; >>> >>> Oh dear no... Please use defines like the rest of QEMU. >> >> It is pretty ugly. > > I think beauty is in the eye of the beholder here... > > Pretty much every device we have has a switch statement like this. > Consistency wins when it comes to qualitative arguments like this.I was agreeing with you here, actually.>> Yet the structure definitions are descriptive, capturing layout, size >> and endianness in natural a format readable by any C programmer. > >>From an API design point of view, here are the problems I see: > > 1) C makes no guarantees about structure layout beyond the first > member. Yes, if it's naturally aligned or has a packed attribute, > GCC does the right thing. But this isn't kernel land anymore, > portability matters and there are more compilers than GCC.[ I argued in detail here, then deleted. Damn bikeshedding... ] I think the best thing is to add the decoded integer versions as macros, and have a heap of BUILD_BUG_ON() in the kernel source to make sure they match.> 3) It suspect it's harder to review because a subtle change could more > easily have broad impact. If someone changed the type of a field > from u32 to u16, it changes the offset of every other field. That's > not terribly obvious in the patch itself unless you understand how > the structure is used elsewhere.MST's patch just did this, so point taken. (MST: I'm going to combine the cfg_type and bar bytes to fix this, patch coming).>> No change, but there's an open question on whether we should nail it to >> little endian (or define the endian by the transport). >> >> Of course, I can't rule out that the 1.0 standard *may* decide to frob >> the ring layout somehow, > > Well, given that virtio is widely deployed today, I would think the 1.0 > standard should strictly reflect what's deployed today, no?That will be up to the committee. I think we want to fix some obvious pain points, though qemu will not benefit from them in the next 5 years.> Any new config layout would be 2.0 material, right? > > Re: the new config layout, I don't think we would want to use it for > anything but new devices.There are enough concrete reasons that I think we want it for existing devices: 1) Negotiated ring size/alignment. Coreboot wants smaller, others want larger. 2) Remove assertion that it has to be an I/O bar. PowerPC wants this. 3) Notification location flexibility. MST wanted this for performance. 4) More feature bits.> Forcing a guest driver change is a really big > deal and I see no reason to do that unless there's a compelling reason > to. > > So we're stuck with the 1.0 config layout for a very long time.We definitely must not force a guest change. The explicit aim of the standard is that "legacy" and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). It's a delicate balancing act. My plan is to accompany any changes in the standard with a qemu implementation, so we can see how painful those changes are. And if there are performance implications, measure them. Cheers, Rusty.