Michael S. Tsirkin
2013-Jun-04 06:42 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes: > > On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: > >> "Michael S. Tsirkin" <mst at redhat.com> writes: > >> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: > >> >> Rusty Russell <rusty at rustcorp.com.au> writes: > >> >> > >> >> > Anthony Liguori <aliguori at us.ibm.com> writes: > >> >> >> Forcing a guest driver change is a really big > >> >> >> deal and I see no reason to do that unless there's a compelling reason > >> >> >> to. > >> >> >> > >> >> >> So we're stuck with the 1.0 config layout for a very long time. > >> >> > > >> >> > We definitely must not force a guest change. The explicit aim of the > >> >> > standard is that "legacy" and 1.0 be backward compatible. One > >> >> > deliverable is a document detailing how this is done (effectively a > >> >> > summary of changes between what we have and 1.0). > >> >> > >> >> If 2.0 is fully backwards compatible, great. It seems like such a > >> >> difference that that would be impossible but I need to investigate > >> >> further. > >> >> > >> >> Regards, > >> >> > >> >> Anthony Liguori > >> > > >> > If you look at my patches you'll see how it works. > >> > Basically old guests use BAR0 new ones don't, so > >> > it's easy: BAR0 access means legacy guest. > >> > Only started testing but things seem to work > >> > fine with old guests so far. > >> > > >> > I think we need a spec, not just driver code. > >> > > >> > Rusty what's the plan? Want me to write it? > >> > >> We need both, of course, but the spec work will happen in the OASIS WG. > >> A draft is good, but let's not commit anything to upstream QEMU until we > >> get the spec finalized. And that is proposed to be late this year. > > > > Well that would be quite sad really. > > > > This means we can't make virtio a spec compliant pci express device, > > and we can't add any more feature bits, so no > > flexible buffer optimizations for virtio net. > > > > There are probably more projects that will be blocked. > > > > So how about we keep extending legacy layout for a bit longer: > > - add a way to access device with MMIO > > - use feature bit 31 to signal 64 bit features > > (and shift device config accordingly) > > By my count, net still has 7 feature bits left, so I don't think the > feature bits are likely to be a limitation in the next 6 months?Yes but you wanted a generic transport feature bit for flexible SG layout. Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?> MMIO is a bigger problem. Linux guests are happy with it: does it break > the Windows drivers? > > Thanks, > Rusty.You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... -- MST
Rusty Russell
2013-Jun-05 07:19 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" <mst at redhat.com> writes:>> By my count, net still has 7 feature bits left, so I don't think the >> feature bits are likely to be a limitation in the next 6 months? > > Yes but you wanted a generic transport feature bit > for flexible SG layout. > Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?If we need it within 6 months, I'd rather do that: this feature would then be assumed for 1.0, and reserved. We may do that for other features, too (the committee will have to review).>> MMIO is a bigger problem. Linux guests are happy with it: does it break >> the Windows drivers? >> >> Thanks, >> Rusty. > > You mean make BAR0 an MMIO BAR? > Yes, it would break current windows guests. > Further, as long as we use same address to notify all queues, > we would also need to decode the instruction on x86 and that's > measureably slower than PIO. > We could go back to discussing hypercall use for notifications, > but that has its own set of issues...We might have something ready to deploy in 3 months, but realistically, I'd rather have it ready and tested outside the main git tree(s) and push it once the standard is committed. Cheers, Rusty.
Michael S. Tsirkin
2013-Jun-05 10:22 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 04:49:22PM +0930, Rusty Russell wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes: > >> By my count, net still has 7 feature bits left, so I don't think the > >> feature bits are likely to be a limitation in the next 6 months? > > > > Yes but you wanted a generic transport feature bit > > for flexible SG layout. > > Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then? > > If we need it within 6 months, I'd rather do that: this feature would > then be assumed for 1.0, and reserved. We may do that for other > features, too (the committee will have to review). > > >> MMIO is a bigger problem. Linux guests are happy with it: does it break > >> the Windows drivers? > >> > >> Thanks, > >> Rusty. > > > > You mean make BAR0 an MMIO BAR? > > Yes, it would break current windows guests. > > Further, as long as we use same address to notify all queues, > > we would also need to decode the instruction on x86 and that's > > measureably slower than PIO. > > We could go back to discussing hypercall use for notifications, > > but that has its own set of issues... > > We might have something ready to deploy in 3 months, > but realistically, > I'd rather have it ready and tested outside the main git tree(s) and > push it once the standard is committed. > > Cheers, > Rusty.We have working code already. We don't need 3 months out of tree, this gets us no progress. To make it ready to deploy we need it upstream and people testing it. I think the issue is the layout change, you don't want the new config layout before we get standartization rolling, right? Okay how about this small patch to the linux guest (completely untested, just to give you the idea): diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index a7ce730..0e34862 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -675,6 +675,33 @@ static void virtio_pci_release_dev(struct device *_d) */ } +/* Map a BAR. But carefully: make sure we don't overlap the MSI-X table */ +static void __iomem * virtio_pci_iomap(struct pci_dev *pci_dev, int bar) +{ + int msix_cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); + if (msix_cap) { + u32 offset; + u8 bir; + pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_TABLE, + &offset); + bir = (u8)(offset & PCI_MSIX_TABLE_BIR); + offset &= PCI_MSIX_TABLE_OFFSET; + /* Spec says table offset is in a 4K page all by itself */ + if (bir == bar && offset < 4096) + return NULL; + + pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_PBA, + &offset); + bir = (u8)(offset & PCI_MSIX_PBA_BIR); + offset &= PCI_MSIX_PBA_OFFSET; + /* Spec says table offset is in a 4K page all by itself. */ + if (bir == bar && offset < 4096) + return NULL; + } + /* 4K is enough for all devices at the moment. */ + return pci_iomap(pci_dev, 0, 4096); +} + /* the PCI probing function */ static int virtio_pci_probe(struct pci_dev *pci_dev, const struct pci_device_id *id) @@ -716,7 +743,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, if (err) goto out_enable_device; - vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0); + vp_dev->ioaddr = virtio_pci_iomap(pci_dev, 0); + /* Failed to map BAR0? Try with BAR1. */ + if (vp_dev->ioaddr == NULL) { + vp_dev->ioaddr = virtio_pci_iomap(pci_dev, 1); if (vp_dev->ioaddr == NULL) { err = -ENOMEM; goto out_req_regions; In other words: put a copy of IO config at start of MMIO BAR1, making sure we don't overlap the MSIX table that's there. This does not break windows guests, and makes us compliant to the PCI express spec. Is this small enough to start going immediately, without waiting 3+ months? I really think it's a good idea to put something like this in the field: we might discover more issues around MMIO and we'll address them in the new config layout. This way we don't get two changes: new layout and switch to MMIO all at once. If you like this, I can make the appropriate spec and qemu changes in a couple of days. -- MST
Anthony Liguori
2013-Jun-05 12:59 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" <mst at redhat.com> writes:> On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: > You mean make BAR0 an MMIO BAR? > Yes, it would break current windows guests. > Further, as long as we use same address to notify all queues, > we would also need to decode the instruction on x86 and that's > measureably slower than PIO. > We could go back to discussing hypercall use for notifications, > but that has its own set of issues...So... does "violating the PCI-e" spec really matter? Is it preventing any guest from working properly? I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori> > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin
2013-Jun-05 14:09 UTC
[PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes: > > > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: > > You mean make BAR0 an MMIO BAR? > > Yes, it would break current windows guests. > > Further, as long as we use same address to notify all queues, > > we would also need to decode the instruction on x86 and that's > > measureably slower than PIO. > > We could go back to discussing hypercall use for notifications, > > but that has its own set of issues... > > So... does "violating the PCI-e" spec really matter? Is it preventing > any guest from working properly?Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow.> I don't think we should rush an ABI breakage if the only benefit is > claiming spec compliance. > > Regards, > > Anthony LiguoriWhy do you bring this up? No one advocates any ABI breakage, I only suggest extensions.> > > > -- > > MST > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html
Possibly Parallel Threads
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR
- [PATCH RFC] virtio-pci: new config layout: using memory BAR