thr3ads.net - Linux Virtualization - [PATCH RFC] virtio-pci: new config layout: using memory BAR [May 2013]

If this information is useful, please help other people find it:
Share via:

Rusty Russell

2013-May-29 04:31 UTC

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Anthony Liguori <aliguori at us.ibm.com> writes:> "Michael S. Tsirkin" <mst at redhat.com> writes:
>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
>> +        return proxy->device_feature_select;
>
> Oh dear no...  Please use defines like the rest of QEMU.
It is pretty ugly.

Yet the structure definitions are descriptive, capturing layout, size
and endianness in natural a format readable by any C programmer.

So AFAICT the question is, do we put the required

#define VIRTIO_PCI_CFG_FEATURE_SEL \
         (offsetof(struct virtio_pci_common_cfg, device_feature_select))

etc. in the kernel headers or qemu?
> Haven't looked at the proposed new ring layout yet.
No change, but there's an open question on whether we should nail it to
little endian (or define the endian by the transport).

Of course, I can't rule out that the 1.0 standard *may* decide to frob
the ring layout somehow, but I'd think it would require a compelling
reason.  I suggest that's 2.0 material...

Cheers,
Rusty.

Michael S. Tsirkin

2013-May-29 08:24 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell
wrote:> Anthony Liguori <aliguori at us.ibm.com> writes:
> > "Michael S. Tsirkin" <mst at redhat.com> writes:
> >> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
> >> +        return proxy->device_feature_select;
> >
> > Oh dear no...  Please use defines like the rest of QEMU.
> 
> It is pretty ugly.
> 
> Yet the structure definitions are descriptive, capturing layout, size
> and endianness in natural a format readable by any C programmer.
> 
> So AFAICT the question is, do we put the required
> 
> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>          (offsetof(struct virtio_pci_common_cfg, device_feature_select))
> 
> etc. in the kernel headers or qemu?

I forgot that we need to validate size (different fields
have different sizes so memory core does not validate
for us).

And that is much cleaner if we use offsetof directly:
You can see that you are checking the correct
size matching the offset, at a glance.

--->

virtio: new layout: add offset validation

Signed-off-by: Michael S. Tsirkin <mst at redhat.com>

---

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f4db224..fd09ea7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void
*opaque, hwaddr addr,
 {
     VirtIOPCIProxy *proxy = opaque;
     VirtIODevice *vdev = proxy->vdev;
+    struct virtio_pci_common_cfg cfg;
 
     uint64_t low = 0xffffffffull;
 
     switch (addr) {
     case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+        assert(size == sizeof cfg.device_feature_select);
         return proxy->device_feature_select;
     case offsetof(struct virtio_pci_common_cfg, device_feature):
+        assert(size == sizeof cfg.device_feature);
         /* TODO: 64-bit features */
 	return proxy->device_feature_select ? 0 : proxy->host_features;
     case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+        assert(size == sizeof cfg.guest_feature_select);
         return proxy->guest_feature_select;
     case offsetof(struct virtio_pci_common_cfg, guest_feature):
+        assert(size == sizeof cfg.guest_feature);
         /* TODO: 64-bit features */
 	return proxy->guest_feature_select ? 0 : vdev->guest_features;
     case offsetof(struct virtio_pci_common_cfg, msix_config):
+        assert(size == sizeof cfg.msix_config);
 	return vdev->config_vector;
     case offsetof(struct virtio_pci_common_cfg, num_queues):
+        assert(size == sizeof cfg.num_queues);
         /* TODO: more exact limit? */
 	return VIRTIO_PCI_QUEUE_MAX;
     case offsetof(struct virtio_pci_common_cfg, device_status):
+        assert(size == sizeof cfg.device_status);
         return vdev->status;
 
 	/* About a specific virtqueue. */
     case offsetof(struct virtio_pci_common_cfg, queue_select):
+        assert(size == sizeof cfg.queue_select);
         return  vdev->queue_sel;
     case offsetof(struct virtio_pci_common_cfg, queue_size):
+        assert(size == sizeof cfg.queue_size);
         return virtio_queue_get_num(vdev, vdev->queue_sel);
     case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+        assert(size == sizeof cfg.queue_msix_vector);
 	return virtio_queue_vector(vdev, vdev->queue_sel);
     case offsetof(struct virtio_pci_common_cfg, queue_enable):
+        assert(size == sizeof cfg.queue_enable);
         /* TODO */
 	return 0;
     case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+        assert(size == sizeof cfg.queue_notify_off);
         return vdev->queue_sel;
     case offsetof(struct virtio_pci_common_cfg, queue_desc):
+        assert(size == 4);
         return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low;
     case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+        assert(size == 4);
         return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >>
32;
     case offsetof(struct virtio_pci_common_cfg, queue_avail):
+        assert(size == 4);
         return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low;
     case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+        assert(size == 4);
         return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >>
32;
     case offsetof(struct virtio_pci_common_cfg, queue_used):
+        assert(size == 4);
         return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low;
     case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
+        assert(size == 4);
         return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >>
32;
     default:
         return 0;
@@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque,
hwaddr addr,
 {
     VirtIOPCIProxy *proxy = opaque;
     VirtIODevice *vdev = proxy->vdev;
+    struct virtio_pci_common_cfg cfg;
 
     uint64_t low = 0xffffffffull;
     uint64_t high = ~low;
 
     switch (addr) {
     case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+        assert(size == sizeof cfg.device_feature_select);
         proxy->device_feature_select = val;
         break;
     case offsetof(struct virtio_pci_common_cfg, device_feature):
+        assert(size == sizeof cfg.device_feature);
         break;
     case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+        assert(size == sizeof cfg.guest_feature_select);
         proxy->guest_feature_select = val;
         break;
     case offsetof(struct virtio_pci_common_cfg, guest_feature):
+        assert(size == sizeof cfg.guest_feature);
         /* TODO: 64-bit features */
 	if (!proxy->guest_feature_select) {
             virtio_set_features(vdev, val);
         }
         break;
     case offsetof(struct virtio_pci_common_cfg, msix_config):
+        assert(size == sizeof cfg.msix_config);
 	vdev->config_vector = val;
         break;
     case offsetof(struct virtio_pci_common_cfg, num_queues):
+        assert(size == sizeof cfg.num_queues);
         break;
     case offsetof(struct virtio_pci_common_cfg, device_status):
+        assert(size == sizeof cfg.device_status);
         virtio_pci_set_status(proxy, val);
         break;
 	/* About a specific virtqueue. */
     case offsetof(struct virtio_pci_common_cfg, queue_select):
+        assert(size == sizeof cfg.queue_select);
         assert(val < VIRTIO_PCI_QUEUE_MAX);
         vdev->queue_sel = val;
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_size):
+        assert(size == sizeof cfg.queue_size);
         assert(val && val < 0x8ffff && !(val & (val -
1)));
         virtio_queue_set_num(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+        assert(size == sizeof cfg.queue_msix_vector);
 	virtio_queue_set_vector(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_enable):
+        assert(size == sizeof cfg.queue_enable);
         /* TODO */
 	break;
     case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+        assert(size == sizeof cfg.queue_notify_off);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_desc):
+        assert(size == 4);
         val &= low;
         val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & high;
         virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+        assert(size == 4);
         val = val << 32;
         val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low;
         virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_avail):
+        assert(size == 4);
         val &= low;
         val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) &
high;
         virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+        assert(size == 4);
         val = val << 32;
         val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low;
         virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_used):
+        assert(size == 4);
         val &= low;
         val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & high;
         virtio_queue_set_used_addr(vdev, vdev->queue_sel, val);
         break;
     case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
+        assert(size == 4);
         val = val << 32;
         val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low;
         virtio_queue_set_used_addr(vdev, vdev->queue_sel, val);

Paolo Bonzini

2013-May-29 08:52 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Il 29/05/2013 10:24, Michael S. Tsirkin ha scritto:> On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:
>> Anthony Liguori <aliguori at us.ibm.com> writes:
>>> "Michael S. Tsirkin" <mst at redhat.com> writes:
>>>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
>>>> +        return proxy->device_feature_select;
>>>
>>> Oh dear no...  Please use defines like the rest of QEMU.
>>
>> It is pretty ugly.
>>
>> Yet the structure definitions are descriptive, capturing layout, size
>> and endianness in natural a format readable by any C programmer.
>>
>> So AFAICT the question is, do we put the required
>>
>> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>>          (offsetof(struct virtio_pci_common_cfg,
device_feature_select))
>>
>> etc. in the kernel headers or qemu?
> 
> 
> I forgot that we need to validate size (different fields
> have different sizes so memory core does not validate
> for us).
> 
> And that is much cleaner if we use offsetof directly:
> You can see that you are checking the correct
> size matching the offset, at a glance.
I wonder if we can use and possibly extend Peter Crosthwaite's
"register
API" to support this better.

Paolo
> --->
> 
> virtio: new layout: add offset validation
> 
> Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
> 
> ---
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index f4db224..fd09ea7 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void
*opaque, hwaddr addr,
>  {
>      VirtIOPCIProxy *proxy = opaque;
>      VirtIODevice *vdev = proxy->vdev;
> +    struct virtio_pci_common_cfg cfg;
>  
>      uint64_t low = 0xffffffffull;
>  
>      switch (addr) {
>      case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +        assert(size == sizeof cfg.device_feature_select);
>          return proxy->device_feature_select;
>      case offsetof(struct virtio_pci_common_cfg, device_feature):
> +        assert(size == sizeof cfg.device_feature);
>          /* TODO: 64-bit features */
>  	return proxy->device_feature_select ? 0 : proxy->host_features;
>      case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
> +        assert(size == sizeof cfg.guest_feature_select);
>          return proxy->guest_feature_select;
>      case offsetof(struct virtio_pci_common_cfg, guest_feature):
> +        assert(size == sizeof cfg.guest_feature);
>          /* TODO: 64-bit features */
>  	return proxy->guest_feature_select ? 0 : vdev->guest_features;
>      case offsetof(struct virtio_pci_common_cfg, msix_config):
> +        assert(size == sizeof cfg.msix_config);
>  	return vdev->config_vector;
>      case offsetof(struct virtio_pci_common_cfg, num_queues):
> +        assert(size == sizeof cfg.num_queues);
>          /* TODO: more exact limit? */
>  	return VIRTIO_PCI_QUEUE_MAX;
>      case offsetof(struct virtio_pci_common_cfg, device_status):
> +        assert(size == sizeof cfg.device_status);
>          return vdev->status;
>  
>  	/* About a specific virtqueue. */
>      case offsetof(struct virtio_pci_common_cfg, queue_select):
> +        assert(size == sizeof cfg.queue_select);
>          return  vdev->queue_sel;
>      case offsetof(struct virtio_pci_common_cfg, queue_size):
> +        assert(size == sizeof cfg.queue_size);
>          return virtio_queue_get_num(vdev, vdev->queue_sel);
>      case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
> +        assert(size == sizeof cfg.queue_msix_vector);
>  	return virtio_queue_vector(vdev, vdev->queue_sel);
>      case offsetof(struct virtio_pci_common_cfg, queue_enable):
> +        assert(size == sizeof cfg.queue_enable);
>          /* TODO */
>  	return 0;
>      case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
> +        assert(size == sizeof cfg.queue_notify_off);
>          return vdev->queue_sel;
>      case offsetof(struct virtio_pci_common_cfg, queue_desc):
> +        assert(size == 4);
>          return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) &
low;
>      case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
> +        assert(size == 4);
>          return virtio_queue_get_desc_addr(vdev, vdev->queue_sel)
>> 32;
>      case offsetof(struct virtio_pci_common_cfg, queue_avail):
> +        assert(size == 4);
>          return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) &
low;
>      case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
> +        assert(size == 4);
>          return virtio_queue_get_avail_addr(vdev, vdev->queue_sel)
>> 32;
>      case offsetof(struct virtio_pci_common_cfg, queue_used):
> +        assert(size == 4);
>          return virtio_queue_get_used_addr(vdev, vdev->queue_sel) &
low;
>      case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
> +        assert(size == 4);
>          return virtio_queue_get_used_addr(vdev, vdev->queue_sel)
>> 32;
>      default:
>          return 0;
> @@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void
*opaque, hwaddr addr,
>  {
>      VirtIOPCIProxy *proxy = opaque;
>      VirtIODevice *vdev = proxy->vdev;
> +    struct virtio_pci_common_cfg cfg;
>  
>      uint64_t low = 0xffffffffull;
>      uint64_t high = ~low;
>  
>      switch (addr) {
>      case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +        assert(size == sizeof cfg.device_feature_select);
>          proxy->device_feature_select = val;
>          break;
>      case offsetof(struct virtio_pci_common_cfg, device_feature):
> +        assert(size == sizeof cfg.device_feature);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
> +        assert(size == sizeof cfg.guest_feature_select);
>          proxy->guest_feature_select = val;
>          break;
>      case offsetof(struct virtio_pci_common_cfg, guest_feature):
> +        assert(size == sizeof cfg.guest_feature);
>          /* TODO: 64-bit features */
>  	if (!proxy->guest_feature_select) {
>              virtio_set_features(vdev, val);
>          }
>          break;
>      case offsetof(struct virtio_pci_common_cfg, msix_config):
> +        assert(size == sizeof cfg.msix_config);
>  	vdev->config_vector = val;
>          break;
>      case offsetof(struct virtio_pci_common_cfg, num_queues):
> +        assert(size == sizeof cfg.num_queues);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, device_status):
> +        assert(size == sizeof cfg.device_status);
>          virtio_pci_set_status(proxy, val);
>          break;
>  	/* About a specific virtqueue. */
>      case offsetof(struct virtio_pci_common_cfg, queue_select):
> +        assert(size == sizeof cfg.queue_select);
>          assert(val < VIRTIO_PCI_QUEUE_MAX);
>          vdev->queue_sel = val;
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_size):
> +        assert(size == sizeof cfg.queue_size);
>          assert(val && val < 0x8ffff && !(val & (val
- 1)));
>          virtio_queue_set_num(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
> +        assert(size == sizeof cfg.queue_msix_vector);
>  	virtio_queue_set_vector(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_enable):
> +        assert(size == sizeof cfg.queue_enable);
>          /* TODO */
>  	break;
>      case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
> +        assert(size == sizeof cfg.queue_notify_off);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_desc):
> +        assert(size == 4);
>          val &= low;
>          val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) &
high;
>          virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
> +        assert(size == 4);
>          val = val << 32;
>          val |= virtio_queue_get_desc_addr(vdev, vdev->queue_sel) &
low;
>          virtio_queue_set_desc_addr(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_avail):
> +        assert(size == 4);
>          val &= low;
>          val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) &
high;
>          virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
> +        assert(size == 4);
>          val = val << 32;
>          val |= virtio_queue_get_avail_addr(vdev, vdev->queue_sel) &
low;
>          virtio_queue_set_avail_addr(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_used):
> +        assert(size == 4);
>          val &= low;
>          val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) &
high;
>          virtio_queue_set_used_addr(vdev, vdev->queue_sel, val);
>          break;
>      case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
> +        assert(size == 4);
>          val = val << 32;
>          val |= virtio_queue_get_used_addr(vdev, vdev->queue_sel) &
low;
>          virtio_queue_set_used_addr(vdev, vdev->queue_sel, val);
>

Peter Maydell

2013-May-29 09:00 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On 29 May 2013 09:24, Michael S. Tsirkin <mst at redhat.com>
wrote:> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index f4db224..fd09ea7 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void
*opaque, hwaddr addr,
>  {
>      VirtIOPCIProxy *proxy = opaque;
>      VirtIODevice *vdev = proxy->vdev;
> +    struct virtio_pci_common_cfg cfg;
>
>      uint64_t low = 0xffffffffull;
>
>      switch (addr) {
>      case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +        assert(size == sizeof cfg.device_feature_select);
>          return proxy->device_feature_select;
Asserting is definitely the wrong thing here, since the
guest can trigger it.

If you really want to use offsetof like this you're
going to need to decorate the structs with QEMU_PACKED.

thanks
-- PMM

Anthony Liguori

2013-May-29 12:52 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Rusty Russell <rusty at rustcorp.com.au> writes:
> Anthony Liguori <aliguori at us.ibm.com> writes:
>> "Michael S. Tsirkin" <mst at redhat.com> writes:
>>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
>>> +        return proxy->device_feature_select;
>>
>> Oh dear no...  Please use defines like the rest of QEMU.
>
> It is pretty ugly.
I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.
> Yet the structure definitions are descriptive, capturing layout, size
> and endianness in natural a format readable by any C programmer.
>From an API design point of view, here are the problems I see:
1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.
> So AFAICT the question is, do we put the required
>
> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>          (offsetof(struct virtio_pci_common_cfg, device_feature_select))
>
> etc. in the kernel headers or qemu?
I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.
>> Haven't looked at the proposed new ring layout yet.
>
> No change, but there's an open question on whether we should nail it to
> little endian (or define the endian by the transport).
>
> Of course, I can't rule out that the 1.0 standard *may* decide to frob
> the ring layout somehow,
Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori
> but I'd think it would require a compelling
> reason.  I suggest that's 2.0 material...
>
> Cheers,
> Rusty.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-May-29 13:24 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori
wrote:> Rusty Russell <rusty at rustcorp.com.au> writes:
> 
> > Anthony Liguori <aliguori at us.ibm.com> writes:
> >> "Michael S. Tsirkin" <mst at redhat.com> writes:
> >>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
> >>> +        return proxy->device_feature_select;
> >>
> >> Oh dear no...  Please use defines like the rest of QEMU.
> >
> > It is pretty ugly.
> 
> I think beauty is in the eye of the beholder here...
> 
> Pretty much every device we have has a switch statement like this.
> Consistency wins when it comes to qualitative arguments like this.
> 
> > Yet the structure definitions are descriptive, capturing layout, size
> > and endianness in natural a format readable by any C programmer.
> 
> >From an API design point of view, here are the problems I see:
> 
> 1) C makes no guarantees about structure layout beyond the first
>    member.  Yes, if it's naturally aligned or has a packed attribute,
>    GCC does the right thing.  But this isn't kernel land anymore,
>    portability matters and there are more compilers than GCC.
You expect a compiler to pad this structure:

struct foo {
	uint8_t a;
	uint8_t b;
	uint16_t c;
	uint32_t d;
};

I'm guessing any compiler that decides to waste memory in this way
will quickly get dropped by users and then we won't worry
about building QEMU with it.
> 2) If we every introduce anything like latching, this doesn't work out
>    so well anymore because it's hard to express in a single C structure
>    the register layout at that point.  Perhaps a union could be used but
>    padding may make it a bit challenging.
Then linux won't use it either.
> 3) It suspect it's harder to review because a subtle change could more
>    easily have broad impact.  If someone changed the type of a field
>    from u32 to u16, it changes the offset of every other field.  That's
>    not terribly obvious in the patch itself unless you understand how
>    the structure is used elsewhere.
> 
>    This may not be a problem for virtio because we all understand that
>    the structures are part of an ABI, but if we used this pattern more
>    in QEMU, it would be a lot less obvious.
So let's not use it more in QEMU.
> > So AFAICT the question is, do we put the required
> >
> > #define VIRTIO_PCI_CFG_FEATURE_SEL \
> >          (offsetof(struct virtio_pci_common_cfg,
device_feature_select))
> >
> > etc. in the kernel headers or qemu?
> 
> I'm pretty sure we would end up just having our own integer defines. 
We
> carry our own virtio headers today because we can't easily import the
> kernel headers.
Yes we can easily import them.
And at least we copy headers verbatim.

> >> Haven't looked at the proposed new ring layout yet.
> >
> > No change, but there's an open question on whether we should nail
it to
> > little endian (or define the endian by the transport).
> >
> > Of course, I can't rule out that the 1.0 standard *may* decide to
frob
> > the ring layout somehow,
> 
> Well, given that virtio is widely deployed today, I would think the 1.0
> standard should strictly reflect what's deployed today, no?
> Any new config layout would be 2.0 material, right?
Not as it's currently planned. Devices can choose
to support a legacy layout in addition to the new one,
and if you look at the patch you will see that that
is exactly what it does.
> Re: the new config layout, I don't think we would want to use it for
> anything but new devices.  Forcing a guest driver change
There's no forcing.
If you look at the patches closely, you will see that
we still support the old layout on BAR0.

> is a really big
> deal and I see no reason to do that unless there's a compelling reason
> to.
There are many a compelling reasons, and they are well known
limitations of virtio PCI:

- PCI spec compliance (madates device operation with IO memory disabled).
- support 64 bit addressing
- add more than 32 feature bits.
- individually disable queues.
- sanely support cross-endian systems.
- support very small (<1 PAGE) for virtio rings.
- support a separate page for each vq kick.
- make each device place config at flexible offset.

Addressing any one of these would cause us to add a substantially new
way to operate virtio devices.

And since it's a guest change anyway, it seemed like a
good time to do the new layout and fix everything in one go.

And they are needed like yesterday.

> So we're stuck with the 1.0 config layout for a very long time.
> 
> Regards,
> 
> Anthony Liguori
Absolutely. This patch let us support both which will allow for
a gradual transition over the next 10 years or so.
> > reason.  I suggest that's 2.0 material...
> >
> > Cheers,
> > Rusty.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rusty Russell

2013-May-30 03:58 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Anthony Liguori <aliguori at us.ibm.com> writes:> Rusty Russell <rusty at rustcorp.com.au> writes:
>
>> Anthony Liguori <aliguori at us.ibm.com> writes:
>>> "Michael S. Tsirkin" <mst at redhat.com> writes:
>>>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
>>>> +        return proxy->device_feature_select;
>>>
>>> Oh dear no...  Please use defines like the rest of QEMU.
>>
>> It is pretty ugly.
>
> I think beauty is in the eye of the beholder here...
>
> Pretty much every device we have has a switch statement like this.
> Consistency wins when it comes to qualitative arguments like this.
I was agreeing with you here, actually.
>> Yet the structure definitions are descriptive, capturing layout, size
>> and endianness in natural a format readable by any C programmer.
>
>>From an API design point of view, here are the problems I see:
>
> 1) C makes no guarantees about structure layout beyond the first
>    member.  Yes, if it's naturally aligned or has a packed attribute,
>    GCC does the right thing.  But this isn't kernel land anymore,
>    portability matters and there are more compilers than GCC.
[ I argued in detail here, then deleted.  Damn bikeshedding... ]

I think the best thing is to add the decoded integer versions as macros,
and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
match.
> 3) It suspect it's harder to review because a subtle change could more
>    easily have broad impact.  If someone changed the type of a field
>    from u32 to u16, it changes the offset of every other field.  That's
>    not terribly obvious in the patch itself unless you understand how
>    the structure is used elsewhere.
MST's patch just did this, so point taken.  (MST: I'm going to combine
the cfg_type and bar bytes to fix this, patch coming).
>> No change, but there's an open question on whether we should nail
it to
>> little endian (or define the endian by the transport).
>>
>> Of course, I can't rule out that the 1.0 standard *may* decide to
frob
>> the ring layout somehow,
>
> Well, given that virtio is widely deployed today, I would think the 1.0
> standard should strictly reflect what's deployed today, no?
That will be up to the committee.  I think we want to fix some obvious
pain points, though qemu will not benefit from them in the next 5 years.
> Any new config layout would be 2.0 material, right?
>
> Re: the new config layout, I don't think we would want to use it for
> anything but new devices.
There are enough concrete reasons that I think we want it for existing
devices:

1) Negotiated ring size/alignment.  Coreboot wants smaller, others want
   larger.
2) Remove assertion that it has to be an I/O bar.  PowerPC wants this.
3) Notification location flexibility.  MST wanted this for performance.
4) More feature bits.
> Forcing a guest driver change is a really big
> deal and I see no reason to do that unless there's a compelling reason
> to.
>
> So we're stuck with the 1.0 config layout for a very long time.
We definitely must not force a guest change.  The explicit aim of the
standard is that "legacy" and 1.0 be backward compatible.  One
deliverable is a document detailing how this is done (effectively a
summary of changes between what we have and 1.0).

It's a delicate balancing act.  My plan is to accompany any changes in
the standard with a qemu implementation, so we can see how painful those
changes are.  And if there are performance implications, measure them.

Cheers,
Rusty.

Reasonably Related Threads

Search for more seemingly similar threads

Linux Virtualization - May 2013 - [PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Reasonably Related Threads