Joby Poriyath
2013-Aug-30 16:12 UTC
[PATCH v7] interrupts: allow guest to set/clear MSI-X mask bit
Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen *thinks* that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using ''xl debug-keys M''. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. patch revision history ---------------------- v1: Initial patch to allow guest writes to MSI-X control bit v2: retained the reserved bits while updating MSI-X control vector (only 1 bit is defined) v3: Allow guest writes only when Xen view of MSI-X control bit is 0 v4: Added a warning if Xen thinks MSI-X control bit is masked, where in reality, it''s not v5 & v6: Added const-correctness v7: Get msi_desc from the guest write ''address'' Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> --- xen/arch/x86/hvm/vmsi.c | 73 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 61 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 0d5ef1b..6830cf2 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -187,6 +187,19 @@ static struct msixtbl_entry *msixtbl_find_entry( return NULL; } +static struct msi_desc *virt_to_msi_desc(struct pci_dev *dev, void *virt) +{ + struct msi_desc *desc; + + list_for_each_entry( desc, &dev->msi_list, list ) + if ( desc->msi_attrib.type == PCI_CAP_ID_MSIX && + desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET + == virt ) + return desc; + + return NULL; +} + static void __iomem *msixtbl_addr_to_virt( struct msixtbl_entry *entry, unsigned long addr) { @@ -247,13 +260,16 @@ out: } static int msixtbl_write(struct vcpu *v, unsigned long address, - unsigned long len, unsigned long val) + unsigned long len, unsigned long val) { unsigned long offset; struct msixtbl_entry *entry; + struct msi_desc *m_desc; void *virt; unsigned int nr_entry, index; int r = X86EMUL_UNHANDLEABLE; + unsigned long flags, orig; + struct irq_desc *desc; if ( len != 4 || (address & 3) ) return r; @@ -283,22 +299,55 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, if ( !virt ) goto out; - /* Do not allow the mask bit to be changed. */ -#if 0 /* XXX - * As the mask bit is the only defined bit in the word, and as the - * host MSI-X code doesn''t preserve the other bits anyway, doing - * this is pointless. So for now just discard the write (also - * saving us from having to determine the matching irq_desc). - */ + m_desc = virt_to_msi_desc(entry->pdev, virt); + if ( !m_desc || m_desc->irq < 0 ) + goto out; + + desc = irq_to_desc(m_desc->irq); + if ( !desc ) + goto out; + spin_lock_irqsave(&desc->lock, flags); + + if ( !desc->msi_desc ) + goto unlock; + orig = readl(virt); - val &= ~PCI_MSIX_VECTOR_BITMASK; - val |= orig & PCI_MSIX_VECTOR_BITMASK; + + /* + * Do not allow guest to modify MSI-X control bit if it is masked + * by Xen. We''ll only handle the case where Xen thinks that + * bit is unmasked, but hardware has silently masked the bit + * (in case of SR-IOV VF reset, etc). On the other hand, if Xen + * thinks that the bit is masked, but it''s really not, + * we log a warning. + */ + if ( desc->msi_desc->msi_attrib.masked ) + { + if ( !(orig & PCI_MSIX_VECTOR_BITMASK) ) + printk(XENLOG_WARNING "MSI-X control bit is unmasked when" + " it is expected to be masked [%04x:%02x:%02x.%01x]\n", + entry->pdev->seg, entry->pdev->bus, + PCI_SLOT(entry->pdev->devfn), + PCI_FUNC(entry->pdev->devfn)); + + goto unlock; + } + + /* + * The mask bit is the only defined bit in the word. But we + * ought to preserve the reserved bits. Clearing the reserved + * bits can result in undefined behaviour (see PCI Local Bus + * Specification revision 2.3). + */ + val &= PCI_MSIX_VECTOR_BITMASK; + val |= (orig & ~PCI_MSIX_VECTOR_BITMASK); writel(val, virt); - spin_unlock_irqrestore(&desc->lock, flags); -#endif +unlock: + spin_unlock_irqrestore(&desc->lock, flags); r = X86EMUL_OKAY; + out: rcu_read_unlock(&msixtbl_rcu_lock); return r; -- 1.7.10.4
Jan Beulich
2013-Sep-04 08:36 UTC
Re: [PATCH v7] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 30.08.13 at 18:12, Joby Poriyath <joby.poriyath@citrix.com> wrote: > +static struct msi_desc *virt_to_msi_desc(struct pci_dev *dev, void *virt) > +{ > + struct msi_desc *desc; > + > + list_for_each_entry( desc, &dev->msi_list, list ) > + if ( desc->msi_attrib.type == PCI_CAP_ID_MSIX && > + desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET > + == virt )To match the function''s name I''d prefer this to hit on the full 16-byte range rather than just the control word.> static int msixtbl_write(struct vcpu *v, unsigned long address, > - unsigned long len, unsigned long val) > + unsigned long len, unsigned long val) > { > unsigned long offset; > struct msixtbl_entry *entry; > + struct msi_desc *m_desc;Please name this msi_desc, consistent with other variables of this type in this file. And afaict this could (once again) be const.> + m_desc = virt_to_msi_desc(entry->pdev, virt); > + if ( !m_desc || m_desc->irq < 0 ) > + goto out; > + > + desc = irq_to_desc(m_desc->irq); > + if ( !desc ) > + goto out; > + > spin_lock_irqsave(&desc->lock, flags); > + > + if ( !desc->msi_desc ) > + goto unlock;I''d again strongly recommend adding an ASSERT() here, checking desc->msi_desc against (as it''s currently named) m_desc. But overall this looks much better than the earlier, no reverted variant. Jan
Joby Poriyath
2013-Sep-04 12:01 UTC
Re: [PATCH v7] interrupts: allow guest to set/clear MSI-X mask bit
On Wed, Sep 04, 2013 at 09:36:09AM +0100, Jan Beulich wrote:> >>> On 30.08.13 at 18:12, Joby Poriyath <joby.poriyath@citrix.com> wrote: > > +static struct msi_desc *virt_to_msi_desc(struct pci_dev *dev, void *virt) > > +{ > > + struct msi_desc *desc; > > + > > + list_for_each_entry( desc, &dev->msi_list, list ) > > + if ( desc->msi_attrib.type == PCI_CAP_ID_MSIX && > > + desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET > > + == virt ) > > To match the function''s name I''d prefer this to hit on the full 16-byte > range rather than just the control word. >Sorry Jan, I didn''t quite understand this. virt points to the control word in this function. So how can we do a match on (low | high) address and data words?> > static int msixtbl_write(struct vcpu *v, unsigned long address, > > - unsigned long len, unsigned long val) > > + unsigned long len, unsigned long val) > > { > > unsigned long offset; > > struct msixtbl_entry *entry; > > + struct msi_desc *m_desc; > > Please name this msi_desc, consistent with other variables of this > type in this file. And afaict this could (once again) be const.I''ll add the const and also rename m_desc to msi_desc.> > > + m_desc = virt_to_msi_desc(entry->pdev, virt); > > + if ( !m_desc || m_desc->irq < 0 ) > > + goto out; > > + > > + desc = irq_to_desc(m_desc->irq); > > + if ( !desc ) > > + goto out; > > + > > spin_lock_irqsave(&desc->lock, flags); > > + > > + if ( !desc->msi_desc ) > > + goto unlock; > > I''d again strongly recommend adding an ASSERT() here, checking > desc->msi_desc against (as it''s currently named) m_desc.ASSERT makes sense. I''ll add it.> > But overall this looks much better than the earlier, no reverted > variant. > > Jan >Thanks, Joby
Jan Beulich
2013-Sep-04 12:10 UTC
Re: [PATCH v7] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 04.09.13 at 14:01, Joby Poriyath <joby.poriyath@citrix.com> wrote: > On Wed, Sep 04, 2013 at 09:36:09AM +0100, Jan Beulich wrote: >> >>> On 30.08.13 at 18:12, Joby Poriyath <joby.poriyath@citrix.com> wrote: >> > +static struct msi_desc *virt_to_msi_desc(struct pci_dev *dev, void *virt) >> > +{ >> > + struct msi_desc *desc; >> > + >> > + list_for_each_entry( desc, &dev->msi_list, list ) >> > + if ( desc->msi_attrib.type == PCI_CAP_ID_MSIX && >> > + desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET >> > + == virt ) >> >> To match the function''s name I''d prefer this to hit on the full 16-byte >> range rather than just the control word. >> > > Sorry Jan, I didn''t quite understand this. virt points to the control word > in this function. So how can we do a match on (low | high) address and data > words?Each MSI-X table entry occupies 16 bytes. What I''m asking for is that rather than just matching virt == base + ctrl-offset, you should use a range check (virt >= base && virt < base + 16) here, so that the addresses of the three other dwords in the MSI-X table entry could also be passed in. Otherwise the function doesn''t really do what its name says. Jan
Maybe Matching Threads
- [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
- [PATCH v5] x86: properly handle MSI-X unmask operation from guests
- [PATCH v4] x86: properly handle MSI-X unmask operation from guests
- Why guest is disallowed to change mask bit
- [PATCH 1/2] xen, libxc: init msix addr/data with value from qemu via hypercall