thr3ads.net - Linux Virtualization - [PATCH RFC] virtio-pci: new config layout: using memory BAR [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2013-Jun-05 16:20 UTC

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> 
> > On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
> >> "Michael S. Tsirkin" <mst at redhat.com> writes:
> >> 
> >> > On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori
wrote:
> >> >> "Michael S. Tsirkin" <mst at redhat.com>
writes:
> >> >> 
> >> >> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty
Russell wrote:
> >> >> > You mean make BAR0 an MMIO BAR?
> >> >> > Yes, it would break current windows guests.
> >> >> > Further, as long as we use same address to notify
all queues,
> >> >> > we would also need to decode the instruction on x86
and that's
> >> >> > measureably slower than PIO.
> >> >> > We could go back to discussing hypercall use for
notifications,
> >> >> > but that has its own set of issues...
> >> >> 
> >> >> So... does "violating the PCI-e" spec really
matter?  Is it preventing
> >> >> any guest from working properly?
> >> >
> >> > Yes, absolutely, this wording in spec is not there without
reason.
> >> >
> >> > Existing guests allocate io space for PCI express ports in
> >> > multiples on 4K.
> >> >
> >> > Since each express device is behind such a port, this means
> >> > at most 15 such devices can use IO ports in a system.
> >> >
> >> > That's why to make a pci express virtio device,
> >> > we must allow MMIO and/or some other communication
> >> > mechanism as the spec requires.
> >> 
> >> This is precisely why this is an ABI breaker.
> >> 
> >> If you disable IO bars in the BIOS, than the interface that the OS
sees
> >> will *not have an IO bar*.
> >> 
> >> This *breaks existing guests*.
> >> Any time the programming interfaces changes on a PCI device, the
> >> revision ID and/or device ID must change.  The spec is very clear
about
> >> this.
> >> 
> >> We cannot disable the IO BAR without changing revision ID/device
ID.
> >> 
> >
> > But it's a bios/PC issue. It's not a device issue.
> >
> > Anyway, let's put express aside.
> >
> > It's easy to create non-working setups with pci, today:
> >
> > - create 16 pci bridges
> > - put one virtio device behind each
> >
> > boom
> >
> > Try it.
> >
> > I want to fix that.
> >
> >
> >> > That's on x86.
> >> >
> >> > Besides x86, there are achitectures where IO is unavailable
or very slow.
> >> >
> >> >> I don't think we should rush an ABI breakage if the
only benefit is
> >> >> claiming spec compliance.
> >> >> 
> >> >> Regards,
> >> >> 
> >> >> Anthony Liguori
> >> >
> >> > Why do you bring this up? No one advocates any ABI breakage,
> >> > I only suggest extensions.
> >> 
> >> It's an ABI breakage.  You're claiming that the guests you
tested
> >> handle the breakage reasonably but it is unquestionably an ABI
breakage.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Adding BAR is not an ABI breakage, do we agree on that?
> >
> > Disabling IO would be but I am not proposing disabling IO.
> >
> > Guests might disable IO.
> 
> Look, it's very simple.
> 
> If the failure in the guest is that BAR0 mapping fails because the
> device is enabled but the BAR is disabled,
There's no such thing as device is enabled and neither
BAR is disabled in PCI spec.

Spec says IO and memory can be enabled/disabled, separately.
PCI Express spec says devices should work without IO.

So modern guests will assume it's ok to work without IO,
and it will become more common in the future.
> then you've broken the ABI.

No.  It means that the ABI is broken.

Guests can disable IO *today* and when they do things don't work.
> 
> And what's worse is that this isn't for an obscure scenario (like
having
> 15 PCI bridges) but for something that would become the standard
> scenario (using a PCI-e bus).
> 
> We need to either bump the revision ID or the device ID if we do this.
> 
> Regards,
> 
> Anthony Liguori
We only need to do it if we do a change that breaks guests.

Please find a guest that is broken by the patches. You won't find any.


-- 
MST

Anthony Liguori

2013-Jun-05 18:57 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

"Michael S. Tsirkin" <mst at redhat.com> writes:
> On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> Look, it's very simple.
> We only need to do it if we do a change that breaks guests.
>
> Please find a guest that is broken by the patches. You won't find any.
I think the problem in this whole discussion is that we're talking past
each other.

Here is my understanding:

1) PCI-e says that you must be able to disable IO bars and still have a
functioning device.

2) It says (1) because you must size IO bars to 4096 which means that
practically speaking, once you enable a dozen or so PIO bars, you run
out of PIO space (16 * 4k == 64k and not all that space can be used).

virtio-pci uses a IO bars exclusively today.  Existing guest drivers
assume that there is an IO bar that contains the virtio-pci registers.

So let's consider the following scenarios:

QEMU of today:

1) qemu -drive file=ubuntu-13.04.img,if=virtio

This works today.  Does adding an MMIO bar at BAR1 break this?
Certainly not if the device is behind a PCI bus...

But are we going to put devices behind a PCI-e bus by default?  Are we
going to ask the user to choose whether devices are put behind a legacy
bus or the express bus?

What happens if we put the device behind a PCI-e bus by default?  Well,
it can still work.  That is, until we do something like this:

2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
        -device virtio-balloon..

Such that we have more than a dozen or so devices.  This works
perfectly fine today.  It works fine because we've designed virtio to
make sure it works fine.  Quoting the spec:

"Configuration space is generally used for rarely-changing or
 initialization-time parameters. But it is a limited resource, so it
 might be better to use a virtqueue to update configuration information
 (the network device does this for filtering, otherwise the table in the
 config space could potentially be very large)."

In fact, we can have 100s of PCI devices today without running out of IO
space because we're so careful about this.

So if we switch to using PCI-e by default *and* we keep virtio-pci
without modifying the device IDs, then very frequently we are going to
break existing guests because the drivers they already have no longer
work.

A few virtio-serial channels, a few block devices, a couple of network
adapters, the balloon and RNG driver, and we hit the IO space limit
pretty damn quickly so this is not a contrived scenario at all.  I would
expect that we frequently run into this if we don't address this problem.

So we have a few options:

1) Punt all of this complexity to libvirt et al and watch people make
   the wrong decisions about when to use PCI-e.  This will become yet
   another example of KVM being too hard to configure.

2) Enable PCI-e by default and just force people to upgrade their
   drivers.

3) Don't use PCI-e by default but still add BAR1 to virtio-pci

4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
   it a new device/vendor ID.   Continue to use virtio-pci for existing
   devices potentially adding virtio-{net,blk,...}-pcie variants for
   people that care to use them.

I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
policy so they're going to make a simple decision and always use the
same bus by default.  I suspect if we made PCI the default, they might
just always set the PCI-e flag just because.

There are hundreds of thousands if not millions of guests with existing
virtio-pci drivers.  Forcing them to upgrade better have an extremely
good justification.

I think 4 is the best path forward.  It's better for users (guests
continue to work as they always have).  There's less confusion about
enabling PCI-e support--you must ask for the virtio-pcie variant and you
must have a virtio-pcie driver.  It's easy to explain.

It also maps to what regular hardware does.  I highly doubt that there
are any real PCI cards that made the shift from PCI to PCI-e without
bumping at least a revision ID.

It also means we don't need to play games about sometimes enabling IO
bars and sometimes not.

Regards,

Anthony Liguori
>
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-Jun-05 19:43 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> 
> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
> >> Look, it's very simple.
> > We only need to do it if we do a change that breaks guests.
> >
> > Please find a guest that is broken by the patches. You won't find
any.
> 
> I think the problem in this whole discussion is that we're talking past
> each other.
> 
> Here is my understanding:
> 
> 1) PCI-e says that you must be able to disable IO bars and still have a
> functioning device.
> 
> 2) It says (1) because you must size IO bars to 4096 which means that
> practically speaking, once you enable a dozen or so PIO bars, you run
> out of PIO space (16 * 4k == 64k and not all that space can be used).

Let me add 3 other issues which I mentioned and you seem to miss:

3) architectures which don't have fast access to IO ports, exist
   virtio does not work there ATM

4) setups with many PCI bridges exist and have the same issue
   as PCI express. virtio does not work there ATM

5) On x86, even with nested page tables, firmware only decodes
   the page address on an invalid PTE, not the data. You need to
   emulate the guest to get at the data. Without
   nested page tables, we have to do page table walk and emulate
   to get both address and data. Since this is how MMIO
   is implemented in kvm on x86, MMIO is much slower than PIO
   (with nested page tables by a factor of >2, did not test without).
> virtio-pci uses a IO bars exclusively today.  Existing guest drivers
> assume that there is an IO bar that contains the virtio-pci registers.
> So let's consider the following scenarios:
> 
> QEMU of today:
> 
> 1) qemu -drive file=ubuntu-13.04.img,if=virtio
> 
> This works today.  Does adding an MMIO bar at BAR1 break this?
> Certainly not if the device is behind a PCI bus...
> 
> But are we going to put devices behind a PCI-e bus by default?  Are we
> going to ask the user to choose whether devices are put behind a legacy
> bus or the express bus?
> 
> What happens if we put the device behind a PCI-e bus by default?  Well,
> it can still work.  That is, until we do something like this:
> 
> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
>         -device virtio-balloon..
> 
> Such that we have more than a dozen or so devices.  This works
> perfectly fine today.  It works fine because we've designed virtio to
> make sure it works fine.  Quoting the spec:
> 
> "Configuration space is generally used for rarely-changing or
>  initialization-time parameters. But it is a limited resource, so it
>  might be better to use a virtqueue to update configuration information
>  (the network device does this for filtering, otherwise the table in the
>  config space could potentially be very large)."
> 
> In fact, we can have 100s of PCI devices today without running out of IO
> space because we're so careful about this.
> 
> So if we switch to using PCI-e by default *and* we keep virtio-pci
> without modifying the device IDs, then very frequently we are going to
> break existing guests because the drivers they already have no longer
> work.
> 
> A few virtio-serial channels, a few block devices, a couple of network
> adapters, the balloon and RNG driver, and we hit the IO space limit
> pretty damn quickly so this is not a contrived scenario at all.  I would
> expect that we frequently run into this if we don't address this
problem.
> 
> So we have a few options:
> 1) Punt all of this complexity to libvirt et al and watch people make
>    the wrong decisions about when to use PCI-e.  This will become yet
>    another example of KVM being too hard to configure.
> 
> 2) Enable PCI-e by default and just force people to upgrade their
>    drivers.
> 
> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
> 
> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
We can't do this - it will hurt performance.
>    give
>    it a new device/vendor ID.   Continue to use virtio-pci for existing
>    devices potentially adding virtio-{net,blk,...}-pcie variants for
>    people that care to use them.
> 
> I think 1 == 2 == 3 and I view 2 as an ABI breaker.
Why do you think 2 == 3? 2 changes default behaviour. 3 does not.
> libvirt does like
> policy so they're going to make a simple decision and always use the
> same bus by default.  I suspect if we made PCI the default, they might
> just always set the PCI-e flag just because.
This sounds very strange. But let's assume you are right for
the sake of the argument ...
> There are hundreds of thousands if not millions of guests with existing
> virtio-pci drivers.  Forcing them to upgrade better have an extremely
> good justification.
> 
> I think 4 is the best path forward.  It's better for users (guests
> continue to work as they always have).  There's less confusion about
> enabling PCI-e support--you must ask for the virtio-pcie variant and you
> must have a virtio-pcie driver.  It's easy to explain.
I don't think how this changes the situation. libvirt still need
to set policy and decide which device to use.
> It also maps to what regular hardware does.  I highly doubt that there
> are any real PCI cards that made the shift from PCI to PCI-e without
> bumping at least a revision ID.
Only because the chance it's 100% compatible on the software level is 0.
It always has some hardware specific quirks.
No such excuse here.
> It also means we don't need to play games about sometimes enabling IO
> bars and sometimes not.
This last paragraph is wrong, it ignores the issues 3) to 5) 
I added above.

If you do take them into account:
	- there are reasons to add MMIO BAR to PCI,
	  even without PCI express
	- we won't be able to drop IO BAR from virtio
> Regards,
> 
> Anthony Liguori
> 
> >
> >
> > -- 
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-Jun-05 19:54 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> 
> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
> >> Look, it's very simple.
> > We only need to do it if we do a change that breaks guests.
> >
> > Please find a guest that is broken by the patches. You won't find
any.
> 
> I think the problem in this whole discussion is that we're talking past
> each other.
> 
> Here is my understanding:
> 
> 1) PCI-e says that you must be able to disable IO bars and still have a
> functioning device.
> 
> 2) It says (1) because you must size IO bars to 4096 which means that
> practically speaking, once you enable a dozen or so PIO bars, you run
> out of PIO space (16 * 4k == 64k and not all that space can be used).
> 
> virtio-pci uses a IO bars exclusively today.  Existing guest drivers
> assume that there is an IO bar that contains the virtio-pci registers.
> 
> So let's consider the following scenarios:
> 
> QEMU of today:
> 
> 1) qemu -drive file=ubuntu-13.04.img,if=virtio
> 
> This works today.  Does adding an MMIO bar at BAR1 break this?
> Certainly not if the device is behind a PCI bus...
> 
> But are we going to put devices behind a PCI-e bus by default?  Are we
> going to ask the user to choose whether devices are put behind a legacy
> bus or the express bus?
> 
> What happens if we put the device behind a PCI-e bus by default?  Well,
> it can still work.  That is, until we do something like this:
> 
> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
>         -device virtio-balloon..
> 
> Such that we have more than a dozen or so devices.  This works
> perfectly fine today.  It works fine because we've designed virtio to
> make sure it works fine.  Quoting the spec:
> 
> "Configuration space is generally used for rarely-changing or
>  initialization-time parameters. But it is a limited resource, so it
>  might be better to use a virtqueue to update configuration information
>  (the network device does this for filtering, otherwise the table in the
>  config space could potentially be very large)."
> 
> In fact, we can have 100s of PCI devices today without running out of IO
> space because we're so careful about this.
> 
> So if we switch to using PCI-e by default *and* we keep virtio-pci
> without modifying the device IDs, then very frequently we are going to
> break existing guests because the drivers they already have no longer
> work.
> 
> A few virtio-serial channels, a few block devices, a couple of network
> adapters, the balloon and RNG driver, and we hit the IO space limit
> pretty damn quickly so this is not a contrived scenario at all.  I would
> expect that we frequently run into this if we don't address this
problem.
> 
> So we have a few options:
> 
> 1) Punt all of this complexity to libvirt et al and watch people make
>    the wrong decisions about when to use PCI-e.  This will become yet
>    another example of KVM being too hard to configure.
> 
> 2) Enable PCI-e by default and just force people to upgrade their
>    drivers.
> 
> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
> 
> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
give
>    it a new device/vendor ID.   Continue to use virtio-pci for existing
>    devices potentially adding virtio-{net,blk,...}-pcie variants for
>    people that care to use them.
For the record, with respect to PCI-e discussion, I have no
problem with the idea of changing the device ID or
revision id and asking guests to upgrade if they
want to use a pcie device.

That's not exactly 4 however.


I see no reason to couple PCI-e with MMIO discussion,
that's just one of the reasons to support MMIO.

> I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
> policy so they're going to make a simple decision and always use the
> same bus by default.  I suspect if we made PCI the default, they might
> just always set the PCI-e flag just because.
> 
> There are hundreds of thousands if not millions of guests with existing
> virtio-pci drivers.  Forcing them to upgrade better have an extremely
> good justification.
> 
> I think 4 is the best path forward.  It's better for users (guests
> continue to work as they always have).  There's less confusion about
> enabling PCI-e support--you must ask for the virtio-pcie variant and you
> must have a virtio-pcie driver.  It's easy to explain.
> 
> It also maps to what regular hardware does.  I highly doubt that there
> are any real PCI cards that made the shift from PCI to PCI-e without
> bumping at least a revision ID.
> 
> It also means we don't need to play games about sometimes enabling IO
> bars and sometimes not.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >
> > -- 
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

H. Peter Anvin

2013-Jun-05 21:10 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:> 
> Spec says IO and memory can be enabled/disabled, separately.
> PCI Express spec says devices should work without IO.
> 
For "native endpoints".  Currently virtio would be a "legacy
endpoint"
which is quite correct -- it is compatible with a legacy interface.

	-hpa

Michael S. Tsirkin

2013-Jun-05 21:17 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 02:10:03PM -0700, H. Peter Anvin
wrote:> On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
> > 
> > Spec says IO and memory can be enabled/disabled, separately.
> > PCI Express spec says devices should work without IO.
> > 
> 
> For "native endpoints".  Currently virtio would be a "legacy
endpoint"
> which is quite correct -- it is compatible with a legacy interface.
> 
> 	-hpa
Yes. But it's not the spec wording that we care for most.
It's supporting setups that have trouble using IO.

-- 
MST

Anthony Liguori

2013-Jun-05 21:50 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

"H. Peter Anvin" <hpa at zytor.com> writes:
> On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
>> 
>> Spec says IO and memory can be enabled/disabled, separately.
>> PCI Express spec says devices should work without IO.
>> 
>
> For "native endpoints".  Currently virtio would be a "legacy
endpoint"
> which is quite correct -- it is compatible with a legacy interface.
Do legacy endpoints also use 4k for BARs?

If not, can't we use a new device id for native endpoints and call it a
day?  Legacy endpoints would continue using the existing BAR layout.

Regards,

Anthony Liguori
>
> 	-hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rusty Russell

2013-Jun-06 03:42 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Anthony Liguori <aliguori at us.ibm.com> writes:> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
give
>    it a new device/vendor ID.   Continue to use virtio-pci for existing
>    devices potentially adding virtio-{net,blk,...}-pcie variants for
>    people that care to use them.
Now you have a different compatibility problem; how do you know the
guest supports the new virtio-pcie net?

If you put a virtio-pci card behind a PCI-e bridge today, it's not
compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I've been assuming we'd avoid a "flag day" change; that
devices would
look like existing virtio-pci with capabilities indicating the new
config layout.
> I think 4 is the best path forward.  It's better for users (guests
> continue to work as they always have).  There's less confusion about
> enabling PCI-e support--you must ask for the virtio-pcie variant and you
> must have a virtio-pcie driver.  It's easy to explain.
Removing both forward and backward compatibility is easy to explain, but
I think it'll be harder to deploy.  This is your area though, so perhaps
I'm wrong.
> It also maps to what regular hardware does.  I highly doubt that there
> are any real PCI cards that made the shift from PCI to PCI-e without
> bumping at least a revision ID.
Noone expected the new cards to Just Work with old OSes: a new machine
meant a new OS and new drivers.  Hardware vendors like that.

Since virtualization often involves legacy, our priorities might be
different.

Cheers,
Rusty.

Possibly Parallel Threads

Search for more apparently analagous threads

Linux Virtualization - Jun 2013 - [PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Possibly Parallel Threads