thr3ads.net - Virtualization - [PATCH RFC] virtio-pci: new config layout: using memory BAR [May 2013]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2013-May-29 13:24 UTC

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori
wrote:> Rusty Russell <rusty at rustcorp.com.au> writes:
> 
> > Anthony Liguori <aliguori at us.ibm.com> writes:
> >> "Michael S. Tsirkin" <mst at redhat.com> writes:
> >>> +    case offsetof(struct virtio_pci_common_cfg,
device_feature_select):
> >>> +        return proxy->device_feature_select;
> >>
> >> Oh dear no...  Please use defines like the rest of QEMU.
> >
> > It is pretty ugly.
> 
> I think beauty is in the eye of the beholder here...
> 
> Pretty much every device we have has a switch statement like this.
> Consistency wins when it comes to qualitative arguments like this.
> 
> > Yet the structure definitions are descriptive, capturing layout, size
> > and endianness in natural a format readable by any C programmer.
> 
> >From an API design point of view, here are the problems I see:
> 
> 1) C makes no guarantees about structure layout beyond the first
>    member.  Yes, if it's naturally aligned or has a packed attribute,
>    GCC does the right thing.  But this isn't kernel land anymore,
>    portability matters and there are more compilers than GCC.
You expect a compiler to pad this structure:

struct foo {
	uint8_t a;
	uint8_t b;
	uint16_t c;
	uint32_t d;
};

I'm guessing any compiler that decides to waste memory in this way
will quickly get dropped by users and then we won't worry
about building QEMU with it.
> 2) If we every introduce anything like latching, this doesn't work out
>    so well anymore because it's hard to express in a single C structure
>    the register layout at that point.  Perhaps a union could be used but
>    padding may make it a bit challenging.
Then linux won't use it either.
> 3) It suspect it's harder to review because a subtle change could more
>    easily have broad impact.  If someone changed the type of a field
>    from u32 to u16, it changes the offset of every other field.  That's
>    not terribly obvious in the patch itself unless you understand how
>    the structure is used elsewhere.
> 
>    This may not be a problem for virtio because we all understand that
>    the structures are part of an ABI, but if we used this pattern more
>    in QEMU, it would be a lot less obvious.
So let's not use it more in QEMU.
> > So AFAICT the question is, do we put the required
> >
> > #define VIRTIO_PCI_CFG_FEATURE_SEL \
> >          (offsetof(struct virtio_pci_common_cfg,
device_feature_select))
> >
> > etc. in the kernel headers or qemu?
> 
> I'm pretty sure we would end up just having our own integer defines. 
We
> carry our own virtio headers today because we can't easily import the
> kernel headers.
Yes we can easily import them.
And at least we copy headers verbatim.

> >> Haven't looked at the proposed new ring layout yet.
> >
> > No change, but there's an open question on whether we should nail
it to
> > little endian (or define the endian by the transport).
> >
> > Of course, I can't rule out that the 1.0 standard *may* decide to
frob
> > the ring layout somehow,
> 
> Well, given that virtio is widely deployed today, I would think the 1.0
> standard should strictly reflect what's deployed today, no?
> Any new config layout would be 2.0 material, right?
Not as it's currently planned. Devices can choose
to support a legacy layout in addition to the new one,
and if you look at the patch you will see that that
is exactly what it does.
> Re: the new config layout, I don't think we would want to use it for
> anything but new devices.  Forcing a guest driver change
There's no forcing.
If you look at the patches closely, you will see that
we still support the old layout on BAR0.

> is a really big
> deal and I see no reason to do that unless there's a compelling reason
> to.
There are many a compelling reasons, and they are well known
limitations of virtio PCI:

- PCI spec compliance (madates device operation with IO memory disabled).
- support 64 bit addressing
- add more than 32 feature bits.
- individually disable queues.
- sanely support cross-endian systems.
- support very small (<1 PAGE) for virtio rings.
- support a separate page for each vq kick.
- make each device place config at flexible offset.

Addressing any one of these would cause us to add a substantially new
way to operate virtio devices.

And since it's a guest change anyway, it seemed like a
good time to do the new layout and fix everything in one go.

And they are needed like yesterday.

> So we're stuck with the 1.0 config layout for a very long time.
> 
> Regards,
> 
> Anthony Liguori
Absolutely. This patch let us support both which will allow for
a gradual transition over the next 10 years or so.
> > reason.  I suggest that's 2.0 material...
> >
> > Cheers,
> > Rusty.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Maydell

2013-May-29 13:35 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On 29 May 2013 14:24, Michael S. Tsirkin <mst at redhat.com>
wrote:> On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
>> 1) C makes no guarantees about structure layout beyond the first
>>    member.  Yes, if it's naturally aligned or has a packed
attribute,
>>    GCC does the right thing.  But this isn't kernel land anymore,
>>    portability matters and there are more compilers than GCC.
>
> You expect a compiler to pad this structure:
>
> struct foo {
>         uint8_t a;
>         uint8_t b;
>         uint16_t c;
>         uint32_t d;
> };
>
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.
Structure alignment is a platform ABI choice. Oddly enough people
choose operating systems on more criteria than the amount of
padding their ABI mandates in structures.

In any case it's really tedious to read a structure and wade
through working out whether it's going to have all its members
naturally aligned, especially once it's more than a handful
of members in size. And getting it wrong is "silent portability
problem".

thanks
-- PMM

Paolo Bonzini

2013-May-29 13:41 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:> You expect a compiler to pad this structure:
> 
> struct foo {
> 	uint8_t a;
> 	uint8_t b;
> 	uint16_t c;
> 	uint32_t d;
> };
> 
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.
You know the virtio-pci config structures are padded, but not all of
them are.  For example, virtio_balloon_stat is not padded and indeed has
an __attribute__((__packed__)) in the spec.

For this reason I prefer to have the attribute everywhere.  So people
don't have to wonder why it's here and not there.

Paolo

Michael S. Tsirkin

2013-May-29 14:02 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini
wrote:> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
> > You expect a compiler to pad this structure:
> > 
> > struct foo {
> > 	uint8_t a;
> > 	uint8_t b;
> > 	uint16_t c;
> > 	uint32_t d;
> > };
> > 
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.
> 
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
> 
> Paolo
FWIW I think it was a mistake to lay the balloon structure out like that.
We should have padded it manually.
__attribute__((__packed__)) is really easy to misuse.
If you get into such a situation, just use offset enums ...

-- 
MST

Anthony Liguori

2013-May-29 14:16 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

"Michael S. Tsirkin" <mst at redhat.com> writes:
> On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
>> 1) C makes no guarantees about structure layout beyond the first
>>    member.  Yes, if it's naturally aligned or has a packed
attribute,
>>    GCC does the right thing.  But this isn't kernel land anymore,
>>    portability matters and there are more compilers than GCC.
>
> You expect a compiler to pad this structure:
>
> struct foo {
> 	uint8_t a;
> 	uint8_t b;
> 	uint16_t c;
> 	uint32_t d;
> };
>
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.
There are other responses in the thread here and I don't really care to
bikeshed on this issue.
>> Well, given that virtio is widely deployed today, I would think the 1.0
>> standard should strictly reflect what's deployed today, no?
>> Any new config layout would be 2.0 material, right?
>
> Not as it's currently planned. Devices can choose
> to support a legacy layout in addition to the new one,
> and if you look at the patch you will see that that
> is exactly what it does.
Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 
>> Re: the new config layout, I don't think we would want to use it
for
>> anything but new devices.  Forcing a guest driver change
>
> There's no forcing.
> If you look at the patches closely, you will see that
> we still support the old layout on BAR0.
>
>
>> is a really big
>> deal and I see no reason to do that unless there's a compelling
reason
>> to.
>
> There are many a compelling reasons, and they are well known
> limitations of virtio PCI:
>
> - PCI spec compliance (madates device operation with IO memory
> disabled).
PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.
> - support 64 bit addressing
We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.
> - add more than 32 feature bits.
> - individually disable queues.
> - sanely support cross-endian systems.
> - support very small (<1 PAGE) for virtio rings.
> - support a separate page for each vq kick.
> - make each device place config at flexible offset.
None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori

> Addressing any one of these would cause us to add a substantially new
> way to operate virtio devices.
>
> And since it's a guest change anyway, it seemed like a
> good time to do the new layout and fix everything in one go.
>
> And they are needed like yesterday.
>
>
>> So we're stuck with the 1.0 config layout for a very long time.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Absolutely. This patch let us support both which will allow for
> a gradual transition over the next 10 years or so.
>
>> > reason.  I suggest that's 2.0 material...
>> >
>> > Cheers,
>> > Rusty.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
>> > the body of a message to majordomo at vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anthony Liguori

2013-May-29 14:18 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Paolo Bonzini <pbonzini at redhat.com> writes:
> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
>> You expect a compiler to pad this structure:
>> 
>> struct foo {
>> 	uint8_t a;
>> 	uint8_t b;
>> 	uint16_t c;
>> 	uint32_t d;
>> };
>> 
>> I'm guessing any compiler that decides to waste memory in this way
>> will quickly get dropped by users and then we won't worry
>> about building QEMU with it.
>
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.
Not that these structures are actually used for something.

We store the config in these structures so they are actually used for
something.

The proposed structures only serve as a way to express offsets.  You
would never actually have a variable of this type.

Regards,

Anthony Liguori
>
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-May-29 14:30 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> 
> > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
> >> 1) C makes no guarantees about structure layout beyond the first
> >>    member.  Yes, if it's naturally aligned or has a packed
attribute,
> >>    GCC does the right thing.  But this isn't kernel land
anymore,
> >>    portability matters and there are more compilers than GCC.
> >
> > You expect a compiler to pad this structure:
> >
> > struct foo {
> > 	uint8_t a;
> > 	uint8_t b;
> > 	uint16_t c;
> > 	uint32_t d;
> > };
> >
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> There are other responses in the thread here and I don't really care to
> bikeshed on this issue.
Great. Let's make the bikeshed blue then?
> >> Well, given that virtio is widely deployed today, I would think
the 1.0
> >> standard should strictly reflect what's deployed today, no?
> >> Any new config layout would be 2.0 material, right?
> >
> > Not as it's currently planned. Devices can choose
> > to support a legacy layout in addition to the new one,
> > and if you look at the patch you will see that that
> > is exactly what it does.
> 
> Adding a new BAR most certainly requires bumping the revision ID or
> changing the device ID, no?
No, why would it?

If a device dropped BAR0, that would be a good reason
to bump revision ID.
We don't do this yet.
> Didn't we run into this problem with the virtio-win drivers with just
> the BAR size changing? 
Because they had a bug: they validated BAR0 size. AFAIK they don't care
what happens with other bars.
> >> Re: the new config layout, I don't think we would want to use
it for
> >> anything but new devices.  Forcing a guest driver change
> >
> > There's no forcing.
> > If you look at the patches closely, you will see that
> > we still support the old layout on BAR0.
> >
> >
> >> is a really big
> >> deal and I see no reason to do that unless there's a
compelling reason
> >> to.
> >
> > There are many a compelling reasons, and they are well known
> > limitations of virtio PCI:
> >
> > - PCI spec compliance (madates device operation with IO memory
> > disabled).
> 
> PCI express spec.  We are fully compliant with the PCI spec.  And
what's
> the user visible advantage of pointing an emulated virtio device behind
> a PCI-e bus verses a legacy PCI bus?
Native hotplug support.
> This is a very good example because if we have to disable BAR0, then
> it's an ABI breaker plan and simple.
Not we. The BIOS can disable IO BAR: it can do this already
but the device won't be functional.
> > - support 64 bit addressing
> 
> We currently support 44-bit addressing for the ring.  While I agree we
> need to bump it, there's no immediate problem with 44-bit addressing.
I heard developers (though not users) complaining.
> > - add more than 32 feature bits.
> > - individually disable queues.
> > - sanely support cross-endian systems.
> > - support very small (<1 PAGE) for virtio rings.
> > - support a separate page for each vq kick.
> > - make each device place config at flexible offset.
> 
> None of these things are holding us back today.
All of them do, to bigger or lesser degree.

> I'm not saying we shouldn't introduce a new device.  But adoption
of
> that device will be slow and realistically will be limited to new
> devices only.
> 
> We'll be supporting both devices for a very, very long time.
This is true for any new feature. What are you trying to say here?
We won't add new features to old config: for once, we have
run out of feature bits.
> Compatibility is the fundamental value that we provide.  We need to go
> out of our way to make sure that existing guests work and work as well
> as possible.
What are you trying to say?
There's nothing here that breaks compatibility.
Have you looked at the patch?

I'm wasting my time arguing on the mailing list, but
once I tear myself away from this occupation,
I intend to verify that I can run an old guest on
qemu with this patch without issues.
> Sticking virtio devices behind a PCI-e bus just for the hell of it
isn't
> a compelling reason to break existing guests.
> 
> Regards,
> 
> Anthony Liguori
That's why my patch does not break existing guests.

> 
> > Addressing any one of these would cause us to add a substantially new
> > way to operate virtio devices.
> >
> > And since it's a guest change anyway, it seemed like a
> > good time to do the new layout and fix everything in one go.
> >
> > And they are needed like yesterday.
> >
> >
> >> So we're stuck with the 1.0 config layout for a very long
time.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Absolutely. This patch let us support both which will allow for
> > a gradual transition over the next 10 years or so.
> >
> >> > reason.  I suggest that's 2.0 material...
> >> >
> >> > Cheers,
> >> > Rusty.
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line
"unsubscribe kvm" in
> >> > the body of a message to majordomo at vger.kernel.org
> >> > More majordomo info at 
http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-May-30 07:43 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini
wrote:> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
> > You expect a compiler to pad this structure:
> > 
> > struct foo {
> > 	uint8_t a;
> > 	uint8_t b;
> > 	uint16_t c;
> > 	uint32_t d;
> > };
> > 
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.
> 
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
> 
> Paolo
BTW we don't even do this consistently everywhere in QEMU.

It would have been better to have a rule to avoid packed
as much as possible, then we'd have found the misaligned field bug
in balloon before it's too late.

That would be a good rule to adopt I think:
any pack if something is misaligned, and document
the reason.

-- 
MST

Virtualization - May 2013 - [PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR