Rusty Russell
2013-May-30 03:58 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori <aliguori at us.ibm.com> writes:> Rusty Russell <rusty at rustcorp.com.au> writes: > >> Anthony Liguori <aliguori at us.ibm.com> writes: >>> "Michael S. Tsirkin" <mst at redhat.com> writes: >>>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>>> + return proxy->device_feature_select; >>> >>> Oh dear no... Please use defines like the rest of QEMU. >> >> It is pretty ugly. > > I think beauty is in the eye of the beholder here... > > Pretty much every device we have has a switch statement like this. > Consistency wins when it comes to qualitative arguments like this.I was agreeing with you here, actually.>> Yet the structure definitions are descriptive, capturing layout, size >> and endianness in natural a format readable by any C programmer. > >>From an API design point of view, here are the problems I see: > > 1) C makes no guarantees about structure layout beyond the first > member. Yes, if it's naturally aligned or has a packed attribute, > GCC does the right thing. But this isn't kernel land anymore, > portability matters and there are more compilers than GCC.[ I argued in detail here, then deleted. Damn bikeshedding... ] I think the best thing is to add the decoded integer versions as macros, and have a heap of BUILD_BUG_ON() in the kernel source to make sure they match.> 3) It suspect it's harder to review because a subtle change could more > easily have broad impact. If someone changed the type of a field > from u32 to u16, it changes the offset of every other field. That's > not terribly obvious in the patch itself unless you understand how > the structure is used elsewhere.MST's patch just did this, so point taken. (MST: I'm going to combine the cfg_type and bar bytes to fix this, patch coming).>> No change, but there's an open question on whether we should nail it to >> little endian (or define the endian by the transport). >> >> Of course, I can't rule out that the 1.0 standard *may* decide to frob >> the ring layout somehow, > > Well, given that virtio is widely deployed today, I would think the 1.0 > standard should strictly reflect what's deployed today, no?That will be up to the committee. I think we want to fix some obvious pain points, though qemu will not benefit from them in the next 5 years.> Any new config layout would be 2.0 material, right? > > Re: the new config layout, I don't think we would want to use it for > anything but new devices.There are enough concrete reasons that I think we want it for existing devices: 1) Negotiated ring size/alignment. Coreboot wants smaller, others want larger. 2) Remove assertion that it has to be an I/O bar. PowerPC wants this. 3) Notification location flexibility. MST wanted this for performance. 4) More feature bits.> Forcing a guest driver change is a really big > deal and I see no reason to do that unless there's a compelling reason > to. > > So we're stuck with the 1.0 config layout for a very long time.We definitely must not force a guest change. The explicit aim of the standard is that "legacy" and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). It's a delicate balancing act. My plan is to accompany any changes in the standard with a qemu implementation, so we can see how painful those changes are. And if there are performance implications, measure them. Cheers, Rusty.
Michael S. Tsirkin
2013-May-30 05:55 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:> Anthony Liguori <aliguori at us.ibm.com> writes: > > Rusty Russell <rusty at rustcorp.com.au> writes: > > > >> Anthony Liguori <aliguori at us.ibm.com> writes: > >>> "Michael S. Tsirkin" <mst at redhat.com> writes: > >>>> + case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >>>> + return proxy->device_feature_select; > >>> > >>> Oh dear no... Please use defines like the rest of QEMU. > >> > >> It is pretty ugly. > > > > I think beauty is in the eye of the beholder here... > > > > Pretty much every device we have has a switch statement like this. > > Consistency wins when it comes to qualitative arguments like this. > > I was agreeing with you here, actually. > > >> Yet the structure definitions are descriptive, capturing layout, size > >> and endianness in natural a format readable by any C programmer. > > > >>From an API design point of view, here are the problems I see: > > > > 1) C makes no guarantees about structure layout beyond the first > > member. Yes, if it's naturally aligned or has a packed attribute, > > GCC does the right thing. But this isn't kernel land anymore, > > portability matters and there are more compilers than GCC. > > [ I argued in detail here, then deleted. Damn bikeshedding... ] > > I think the best thing is to add the decoded integer versions as macros, > and have a heap of BUILD_BUG_ON() in the kernel source to make sure they > match.On the qemu side, fine with me, all I want is ability to stay close to kernel headers. Let's have XXX_SIZE macros as well, so access size can be easily validated? On the kernel side, for 'struct virtio_pci_cap', since we only ever do offsetof on this struct, it might be easier to just use the numeric constants directly. -- MST
Michael S. Tsirkin
2013-May-30 07:55 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:> >> Yet the structure definitions are descriptive, capturing layout, size > >> and endianness in natural a format readable by any C programmer. > > > >>From an API design point of view, here are the problems I see: > > > > 1) C makes no guarantees about structure layout beyond the first > > member. Yes, if it's naturally aligned or has a packed attribute, > > GCC does the right thing. But this isn't kernel land anymore, > > portability matters and there are more compilers than GCC. > > [ I argued in detail here, then deleted. Damn bikeshedding... ] > > I think the best thing is to add the decoded integer versions as macros, > and have a heap of BUILD_BUG_ON() in the kernel source to make sure they > match.Hmm you want to have both in the header? That would confuse users for sure :) I guess we could put in a big comment explaning why do we have both, and which version should userspace use. I tried to write it: /* * On all known C compilers, you can use the * offsetof macro to find the correct offset of each field - * if your compiler doesn't implement this correctly, use the * integer versions below, instead. * In that case please don't use the structures above, to avoid confusion. */ -- MST
Anthony Liguori
2013-May-30 13:53 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell <rusty at rustcorp.com.au> writes:> Anthony Liguori <aliguori at us.ibm.com> writes: >> Forcing a guest driver change is a really big >> deal and I see no reason to do that unless there's a compelling reason >> to. >> >> So we're stuck with the 1.0 config layout for a very long time. > > We definitely must not force a guest change. The explicit aim of the > standard is that "legacy" and 1.0 be backward compatible. One > deliverable is a document detailing how this is done (effectively a > summary of changes between what we have and 1.0).If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori> > It's a delicate balancing act. My plan is to accompany any changes in > the standard with a qemu implementation, so we can see how painful those > changes are. And if there are performance implications, measure them. > > Cheers, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin
2013-May-30 14:01 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:> Rusty Russell <rusty at rustcorp.com.au> writes: > > > Anthony Liguori <aliguori at us.ibm.com> writes: > >> Forcing a guest driver change is a really big > >> deal and I see no reason to do that unless there's a compelling reason > >> to. > >> > >> So we're stuck with the 1.0 config layout for a very long time. > > > > We definitely must not force a guest change. The explicit aim of the > > standard is that "legacy" and 1.0 be backward compatible. One > > deliverable is a document detailing how this is done (effectively a > > summary of changes between what we have and 1.0). > > If 2.0 is fully backwards compatible, great. It seems like such a > difference that that would be impossible but I need to investigate > further. > > Regards, > > Anthony LiguoriIf you look at my patches you'll see how it works. Basically old guests use BAR0 new ones don't, so it's easy: BAR0 access means legacy guest. Only started testing but things seem to work fine with old guests so far. I think we need a spec, not just driver code. Rusty what's the plan? Want me to write it?> > > > It's a delicate balancing act. My plan is to accompany any changes in > > the standard with a qemu implementation, so we can see how painful those > > changes are. And if there are performance implications, measure them. > > > > Cheers, > > Rusty. > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html
Rusty Russell
2013-Jun-03 00:17 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" <mst at redhat.com> writes:> On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote: >> >> Yet the structure definitions are descriptive, capturing layout, size >> >> and endianness in natural a format readable by any C programmer. >> > >> >>From an API design point of view, here are the problems I see: >> > >> > 1) C makes no guarantees about structure layout beyond the first >> > member. Yes, if it's naturally aligned or has a packed attribute, >> > GCC does the right thing. But this isn't kernel land anymore, >> > portability matters and there are more compilers than GCC. >> >> [ I argued in detail here, then deleted. Damn bikeshedding... ] >> >> I think the best thing is to add the decoded integer versions as macros, >> and have a heap of BUILD_BUG_ON() in the kernel source to make sure they >> match. > > Hmm you want to have both in the header? > > That would confuse users for sure :) > > I guess we could put in a big comment explaning > why do we have both, and which version should > userspace use. I tried to write it: > > /* > * On all known C compilers, you can use the > * offsetof macro to find the correct offset of each field - > * if your compiler doesn't implement this correctly, use the > * integer versions below, instead. > * In that case please don't use the structures above, to avoid confusion. > */My version was a little less verbose :) Subject: virtio_pci: macros for PCI layout offsets. QEMU wants it, so why not? Trust, but verify. Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 24bcd9b..6510009 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -453,6 +453,64 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen, return p; } +/* This is part of the ABI. Don't screw with it. */ +static inline void check_offsets(void) +{ + /* Note: disk space was harmed in compilation of this function. */ + BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR !+ offsetof(struct virtio_pci_cap, cap_vndr)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT !+ offsetof(struct virtio_pci_cap, cap_next)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN !+ offsetof(struct virtio_pci_cap, cap_len)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_TYPE_AND_BAR !+ offsetof(struct virtio_pci_cap, type_and_bar)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET !+ offsetof(struct virtio_pci_cap, offset)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH !+ offsetof(struct virtio_pci_cap, length)); + BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT !+ offsetof(struct virtio_pci_notify_cap, + notify_off_multiplier)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT != + offsetof(struct virtio_pci_common_cfg, + device_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF != + offsetof(struct virtio_pci_common_cfg, device_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT != + offsetof(struct virtio_pci_common_cfg, + guest_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF != + offsetof(struct virtio_pci_common_cfg, guest_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX != + offsetof(struct virtio_pci_common_cfg, msix_config)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ != + offsetof(struct virtio_pci_common_cfg, num_queues)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS != + offsetof(struct virtio_pci_common_cfg, device_status)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT != + offsetof(struct virtio_pci_common_cfg, queue_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE != + offsetof(struct virtio_pci_common_cfg, queue_size)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX != + offsetof(struct virtio_pci_common_cfg, queue_msix_vector)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE != + offsetof(struct virtio_pci_common_cfg, queue_enable)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF != + offsetof(struct virtio_pci_common_cfg, queue_notify_off)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO != + offsetof(struct virtio_pci_common_cfg, queue_desc_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI != + offsetof(struct virtio_pci_common_cfg, queue_desc_hi)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO != + offsetof(struct virtio_pci_common_cfg, queue_avail_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI != + offsetof(struct virtio_pci_common_cfg, queue_avail_hi)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDLO != + offsetof(struct virtio_pci_common_cfg, queue_used_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI != + offsetof(struct virtio_pci_common_cfg, queue_used_hi)); +} /* the PCI probing function */ static int virtio_pci_probe(struct pci_dev *pci_dev, @@ -461,6 +519,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, struct virtio_pci_device *vp_dev; int err, common, isr, notify, device; + check_offsets(); + /* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */ if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f) return -ENODEV; diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index e1a25ad..9fb7ea4 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -171,4 +171,34 @@ struct virtio_pci_common_cfg { __le32 queue_used_lo; /* read-write */ __le32 queue_used_hi; /* read-write */ }; + +/* Macro versions of offsets for the Old Timers! */ +#define VIRTIO_PCI_CAP_VNDR 0 +#define VIRTIO_PCI_CAP_NEXT 1 +#define VIRTIO_PCI_CAP_LEN 2 +#define VIRTIO_PCI_CAP_TYPE_AND_BAR 3 +#define VIRTIO_PCI_CAP_OFFSET 4 +#define VIRTIO_PCI_CAP_LENGTH 8 + +#define VIRTIO_PCI_NOTIFY_CAP_MULT 12 + +#define VIRTIO_PCI_COMMON_DFSELECT 0 +#define VIRTIO_PCI_COMMON_DF 4 +#define VIRTIO_PCI_COMMON_GFSELECT 8 +#define VIRTIO_PCI_COMMON_GF 12 +#define VIRTIO_PCI_COMMON_MSIX 16 +#define VIRTIO_PCI_COMMON_NUMQ 18 +#define VIRTIO_PCI_COMMON_STATUS 20 +#define VIRTIO_PCI_COMMON_Q_SELECT 22 +#define VIRTIO_PCI_COMMON_Q_SIZE 24 +#define VIRTIO_PCI_COMMON_Q_MSIX 26 +#define VIRTIO_PCI_COMMON_Q_ENABLE 28 +#define VIRTIO_PCI_COMMON_Q_NOFF 30 +#define VIRTIO_PCI_COMMON_Q_DESCLO 32 +#define VIRTIO_PCI_COMMON_Q_DESCHI 36 +#define VIRTIO_PCI_COMMON_Q_AVAILLO 40 +#define VIRTIO_PCI_COMMON_Q_AVAILHI 44 +#define VIRTIO_PCI_COMMON_Q_USEDLO 48 +#define VIRTIO_PCI_COMMON_Q_USEDHI 52 + #endif /* _UAPI_LINUX_VIRTIO_PCI_H */
Reasonably Related Threads
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR