thr3ads.net - Linux Virtualization - [PATCH RFC] virtio-pci: new config layout: using memory BAR [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2013-Jun-04 06:42 UTC

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> > On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst at redhat.com> writes:
> >> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori
wrote:
> >> >> Rusty Russell <rusty at rustcorp.com.au> writes:
> >> >> 
> >> >> > Anthony Liguori <aliguori at us.ibm.com>
writes:
> >> >> >> Forcing a guest driver change is a really big
> >> >> >> deal and I see no reason to do that unless
there's a compelling reason
> >> >> >> to.
> >> >> >>
> >> >> >> So we're stuck with the 1.0 config layout
for a very long time.
> >> >> >
> >> >> > We definitely must not force a guest change.  The
explicit aim of the
> >> >> > standard is that "legacy" and 1.0 be
backward compatible.  One
> >> >> > deliverable is a document detailing how this is done
(effectively a
> >> >> > summary of changes between what we have and 1.0).
> >> >> 
> >> >> If 2.0 is fully backwards compatible, great.  It seems
like such a
> >> >> difference that that would be impossible but I need to
investigate
> >> >> further.
> >> >> 
> >> >> Regards,
> >> >> 
> >> >> Anthony Liguori
> >> >
> >> > If you look at my patches you'll see how it works.
> >> > Basically old guests use BAR0 new ones don't, so
> >> > it's easy: BAR0 access means legacy guest.
> >> > Only started testing but things seem to work
> >> > fine with old guests so far.
> >> >
> >> > I think we need a spec, not just driver code.
> >> >
> >> > Rusty what's the plan? Want me to write it?
> >> 
> >> We need both, of course, but the spec work will happen in the
OASIS WG.
> >> A draft is good, but let's not commit anything to upstream
QEMU until we
> >> get the spec finalized.  And that is proposed to be late this
year.
> >
> > Well that would be quite sad really.
> > 
> > This means we can't make virtio a spec compliant pci express
device,
> > and we can't add any more feature bits, so no
> > flexible buffer optimizations for virtio net.
> >
> > There are probably more projects that will be blocked.
> >
> > So how about we keep extending legacy layout for a bit longer:
> > - add a way to access device with MMIO
> > - use feature bit 31 to signal 64 bit features
> >   (and shift device config accordingly)
> 
> By my count, net still has 7 feature bits left, so I don't think the
> feature bits are likely to be a limitation in the next 6 months?
Yes but you wanted a generic transport feature bit
for flexible SG layout.
Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?
> MMIO is a bigger problem.  Linux guests are happy with it: does it break
> the Windows drivers?
> 
> Thanks,
> Rusty.
You mean make BAR0 an MMIO BAR?
Yes, it would break current windows guests.
Further, as long as we use same address to notify all queues,
we would also need to decode the instruction on x86 and that's
measureably slower than PIO.
We could go back to discussing hypercall use for notifications,
but that has its own set of issues...

-- 
MST

Rusty Russell

2013-Jun-05 07:19 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

"Michael S. Tsirkin" <mst at redhat.com>
writes:>> By my count, net still has 7 feature bits left, so I don't think
the
>> feature bits are likely to be a limitation in the next 6 months?
>
> Yes but you wanted a generic transport feature bit
> for flexible SG layout.
> Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?
If we need it within 6 months, I'd rather do that: this feature would
then be assumed for 1.0, and reserved.  We may do that for other
features, too (the committee will have to review).
>> MMIO is a bigger problem.  Linux guests are happy with it: does it
break
>> the Windows drivers?
>> 
>> Thanks,
>> Rusty.
>
> You mean make BAR0 an MMIO BAR?
> Yes, it would break current windows guests.
> Further, as long as we use same address to notify all queues,
> we would also need to decode the instruction on x86 and that's
> measureably slower than PIO.
> We could go back to discussing hypercall use for notifications,
> but that has its own set of issues...
We might have something ready to deploy in 3 months, but realistically,
I'd rather have it ready and tested outside the main git tree(s) and
push it once the standard is committed.

Cheers,
Rusty.

Michael S. Tsirkin

2013-Jun-05 10:22 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 04:49:22PM +0930, Rusty Russell
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> >> By my count, net still has 7 feature bits left, so I don't
think the
> >> feature bits are likely to be a limitation in the next 6 months?
> >
> > Yes but you wanted a generic transport feature bit
> > for flexible SG layout.
> > Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?
> 
> If we need it within 6 months, I'd rather do that: this feature would
> then be assumed for 1.0, and reserved.  We may do that for other
> features, too (the committee will have to review).
> 
> >> MMIO is a bigger problem.  Linux guests are happy with it: does it
break
> >> the Windows drivers?
> >> 
> >> Thanks,
> >> Rusty.
> >
> > You mean make BAR0 an MMIO BAR?
> > Yes, it would break current windows guests.
> > Further, as long as we use same address to notify all queues,
> > we would also need to decode the instruction on x86 and that's
> > measureably slower than PIO.
> > We could go back to discussing hypercall use for notifications,
> > but that has its own set of issues...
> 
> We might have something ready to deploy in 3 months,
> but realistically,
> I'd rather have it ready and tested outside the main git tree(s) and
> push it once the standard is committed.
> 
> Cheers,
> Rusty.

We have working code already. We don't need 3 months out of tree,
this gets us no progress. To make it ready to deploy
we need it upstream and people testing it.

I think the issue is the layout change, you don't want
the new config layout before we get standartization rolling, right?
Okay how about this small patch to the linux guest
(completely untested, just to give you the idea):


diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a7ce730..0e34862 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -675,6 +675,33 @@ static void virtio_pci_release_dev(struct device *_d)
 	 */
 }
 
+/* Map a BAR. But carefully: make sure we don't overlap the MSI-X table */
+static void __iomem * virtio_pci_iomap(struct pci_dev *pci_dev, int bar)
+{
+	int msix_cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+	if (msix_cap) {
+		u32 offset;
+		u8 bir;
+		pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_TABLE,
+				      &offset);
+		bir = (u8)(offset & PCI_MSIX_TABLE_BIR);
+		offset &= PCI_MSIX_TABLE_OFFSET;
+		/* Spec says table offset is in a 4K page all by itself */
+		if (bir == bar && offset < 4096)
+			return NULL;
+
+		pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_PBA,
+				      &offset);
+		bir = (u8)(offset & PCI_MSIX_PBA_BIR);
+		offset &= PCI_MSIX_PBA_OFFSET;
+		/* Spec says table offset is in a 4K page all by itself. */
+		if (bir == bar && offset < 4096)
+			return NULL;
+	}
+	/* 4K is enough for all devices at the moment. */
+	return pci_iomap(pci_dev, 0, 4096);
+}
+
 /* the PCI probing function */
 static int virtio_pci_probe(struct pci_dev *pci_dev,
 			    const struct pci_device_id *id)
@@ -716,7 +743,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	if (err)
 		goto out_enable_device;
 
-	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
+	vp_dev->ioaddr = virtio_pci_iomap(pci_dev, 0);
+	/* Failed to map BAR0? Try with BAR1. */
+	if (vp_dev->ioaddr == NULL) {
+		vp_dev->ioaddr = virtio_pci_iomap(pci_dev, 1);
 	if (vp_dev->ioaddr == NULL) {
 		err = -ENOMEM;
 		goto out_req_regions;


In other words: put a copy of IO config at start of MMIO BAR1,
making sure we don't overlap the MSIX table that's there.

This does not break windows guests, and makes us compliant to the
PCI express spec.

Is this small enough to start going immediately, without waiting
3+ months?

I really think it's a good idea to put something like this
in the field: we might discover more issues around MMIO
and we'll address them in the new config layout.
This way we don't get two changes: new layout and switch to MMIO
all at once.

If you like this, I can make the appropriate spec and qemu changes in a
couple of days.

-- 
MST

Anthony Liguori

2013-Jun-05 12:59 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

"Michael S. Tsirkin" <mst at redhat.com> writes:
> On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
> You mean make BAR0 an MMIO BAR?
> Yes, it would break current windows guests.
> Further, as long as we use same address to notify all queues,
> we would also need to decode the instruction on x86 and that's
> measureably slower than PIO.
> We could go back to discussing hypercall use for notifications,
> but that has its own set of issues...
So... does "violating the PCI-e" spec really matter?  Is it preventing
any guest from working properly?

I don't think we should rush an ABI breakage if the only benefit is
claiming spec compliance.

Regards,

Anthony Liguori
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin

2013-Jun-05 14:09 UTC

head link

[PATCH RFC] virtio-pci: new config layout: using memory BAR

On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori
wrote:> "Michael S. Tsirkin" <mst at redhat.com> writes:
> 
> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
> > You mean make BAR0 an MMIO BAR?
> > Yes, it would break current windows guests.
> > Further, as long as we use same address to notify all queues,
> > we would also need to decode the instruction on x86 and that's
> > measureably slower than PIO.
> > We could go back to discussing hypercall use for notifications,
> > but that has its own set of issues...
> 
> So... does "violating the PCI-e" spec really matter?  Is it
preventing
> any guest from working properly?
Yes, absolutely, this wording in spec is not there without reason.

Existing guests allocate io space for PCI express ports in
multiples on 4K.

Since each express device is behind such a port, this means
at most 15 such devices can use IO ports in a system.

That's why to make a pci express virtio device,
we must allow MMIO and/or some other communication
mechanism as the spec requires.

That's on x86.

Besides x86, there are achitectures where IO is unavailable or very slow.
> I don't think we should rush an ABI breakage if the only benefit is
> claiming spec compliance.
> 
> Regards,
> 
> Anthony Liguori
Why do you bring this up? No one advocates any ABI breakage,
I only suggest extensions.

> >
> > -- 
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe
kvm" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Apparently Analagous Threads

Search for more apparently analagous threads

Linux Virtualization - Jun 2013 - [PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

[PATCH RFC] virtio-pci: new config layout: using memory BAR

Apparently Analagous Threads