Joby Poriyath
2013-Aug-14 16:18 UTC
[PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen *thinks* that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using ''xl debug-keys M''. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> --- xen/arch/x86/hvm/vmsi.c | 47 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 36de312..21421cc 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -169,6 +169,7 @@ struct msixtbl_entry uint32_t msi_ad[3]; /* Shadow of address low, high and data */ } gentries[MAX_MSIX_ACC_ENTRIES]; struct rcu_head rcu; + struct pirq *pirq; }; static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, void *virt; unsigned int nr_entry, index; int r = X86EMUL_UNHANDLEABLE; + unsigned long flags; + struct irq_desc *desc; + unsigned long orig; if ( len != 4 || (address & 3) ) return r; @@ -283,22 +287,35 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, if ( !virt ) goto out; - /* Do not allow the mask bit to be changed. */ -#if 0 /* XXX - * As the mask bit is the only defined bit in the word, and as the - * host MSI-X code doesn''t preserve the other bits anyway, doing - * this is pointless. So for now just discard the write (also - * saving us from having to determine the matching irq_desc). - */ - spin_lock_irqsave(&desc->lock, flags); + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); + if ( !desc ) + goto out; + + if ( !desc->msi_desc ) + goto unlock; + + /* Do not allow guest to modify MSIX control bit if it is masked + * by Xen. We''ll only handle the case where Xen thinks that + * bit is unmasked, but hardware has silently masked the bit + * (in case of SR-IOV VF reset, etc). + */ + if ( desc->msi_desc->msi_attrib.masked ) + goto unlock; + + /* The mask bit is the only defined bit in the word. But we + * ought to preserve the reserved bits. Clearing the reserved + * bits can result in undefined behaviour (see PCI Local Bus + * Specification revision 2.3). + */ orig = readl(virt); - val &= ~PCI_MSIX_VECTOR_BITMASK; - val |= orig & PCI_MSIX_VECTOR_BITMASK; + val &= PCI_MSIX_VECTOR_BITMASK; + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); writel(val, virt); - spin_unlock_irqrestore(&desc->lock, flags); -#endif +unlock: + spin_unlock_irqrestore(&desc->lock, flags); r = X86EMUL_OKAY; + out: rcu_read_unlock(&msixtbl_rcu_lock); return r; @@ -328,7 +345,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { static void add_msixtbl_entry(struct domain *d, struct pci_dev *pdev, uint64_t gtable, - struct msixtbl_entry *entry) + struct msixtbl_entry *entry, + struct pirq *pirq) { u32 len; @@ -342,6 +360,7 @@ static void add_msixtbl_entry(struct domain *d, entry->table_len = len; entry->pdev = pdev; entry->gtable = (unsigned long) gtable; + entry->pirq = pirq; list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); } @@ -404,7 +423,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) entry = new_entry; new_entry = NULL; - add_msixtbl_entry(d, pdev, gtable, entry); + add_msixtbl_entry(d, pdev, gtable, entry, pirq); found: atomic_inc(&entry->refcnt); -- 1.7.10.4
Andrew Cooper
2013-Aug-14 16:30 UTC
Re: [PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
On 14/08/13 17:18, Joby Poriyath wrote:> Guest needs the ability to enable and disable MSI-X interrupts > by setting the MSI-X control bit, for a passed-through device. > Guest is allowed to write MSI-X mask bit only if Xen *thinks* > that mask is clear (interrupts enabled). If the mask is set by > Xen (interrupts disabled), writes to mask bit by the guest is > ignored. > > Currently, a write to MSI-X mask bit by the guest is silently > ignored. > > A likely scenario is where we have a 82599 SR-IOV nic passed > through to a guest. From the guest if you do > > ifconfig <ETH_DEV> down > ifconfig <ETH_DEV> up > > the interrupts remain masked. On VF reset, the mask bit is set > by the controller. At this point, Xen is not aware that mask is set. > However, interrupts are enabled by VF driver by clearing the mask > bit by writing directly to BAR3 region containing the MSI-X table. > > From dom0, we can verify that > interrupts are being masked using ''xl debug-keys M''. > > Initially, guest was allowed to modify MSI-X bit. > Later this behaviour was changed. > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > --- > xen/arch/x86/hvm/vmsi.c | 47 +++++++++++++++++++++++++++++++++-------------- > 1 file changed, 33 insertions(+), 14 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 36de312..21421cc 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -169,6 +169,7 @@ struct msixtbl_entry > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > } gentries[MAX_MSIX_ACC_ENTRIES]; > struct rcu_head rcu; > + struct pirq *pirq; > }; > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > void *virt; > unsigned int nr_entry, index; > int r = X86EMUL_UNHANDLEABLE; > + unsigned long flags; > + struct irq_desc *desc; > + unsigned long orig;unsigned long flags, orig; To be more compact.> > if ( len != 4 || (address & 3) ) > return r; > @@ -283,22 +287,35 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > if ( !virt ) > goto out; > > - /* Do not allow the mask bit to be changed. */ > -#if 0 /* XXX > - * As the mask bit is the only defined bit in the word, and as the > - * host MSI-X code doesn''t preserve the other bits anyway, doing > - * this is pointless. So for now just discard the write (also > - * saving us from having to determine the matching irq_desc). > - */ > - spin_lock_irqsave(&desc->lock, flags); > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > + if ( !desc ) > + goto out; > + > + if ( !desc->msi_desc ) > + goto unlock; > + > + /* Do not allow guest to modify MSIX control bit if it is masked > + * by Xen. We''ll only handle the case where Xen thinks that > + * bit is unmasked, but hardware has silently masked the bit > + * (in case of SR-IOV VF reset, etc). > + */ > + if ( desc->msi_desc->msi_attrib.masked ) > + goto unlock;If Xen wants the msi masked, or the guest wants the msi masked then you must set the masked bit, else must clear it. The root cause of this whole issue is that Xen doesn''t actually know what state the mask bit is in; it only knows its intention. Therefore, goto unlock is incorrect here. By this point, we must write the bit one way or another.> + > + /* The mask bit is the only defined bit in the word. But we > + * ought to preserve the reserved bits. Clearing the reserved > + * bits can result in undefined behaviour (see PCI Local Bus > + * Specification revision 2.3). > + */ > orig = readl(virt); > - val &= ~PCI_MSIX_VECTOR_BITMASK; > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > + val &= PCI_MSIX_VECTOR_BITMASK; > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > writel(val, virt); > - spin_unlock_irqrestore(&desc->lock, flags); > -#endif > > +unlock: > + spin_unlock_irqrestore(&desc->lock, flags); > r = X86EMUL_OKAY; > + > out: > rcu_read_unlock(&msixtbl_rcu_lock); > return r; > @@ -328,7 +345,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > static void add_msixtbl_entry(struct domain *d, > struct pci_dev *pdev, > uint64_t gtable, > - struct msixtbl_entry *entry) > + struct msixtbl_entry *entry, > + struct pirq *pirq)I would advocate const-correctness here, so "const struct pirq *pirq". ~Andrew> { > u32 len; > > @@ -342,6 +360,7 @@ static void add_msixtbl_entry(struct domain *d, > entry->table_len = len; > entry->pdev = pdev; > entry->gtable = (unsigned long) gtable; > + entry->pirq = pirq; > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > } > @@ -404,7 +423,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > entry = new_entry; > new_entry = NULL; > - add_msixtbl_entry(d, pdev, gtable, entry); > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > found: > atomic_inc(&entry->refcnt);
Andrew Cooper
2013-Aug-14 16:36 UTC
Re: [PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
On 14/08/13 17:30, Andrew Cooper wrote:> On 14/08/13 17:18, Joby Poriyath wrote: >> Guest needs the ability to enable and disable MSI-X interrupts >> by setting the MSI-X control bit, for a passed-through device. >> Guest is allowed to write MSI-X mask bit only if Xen *thinks* >> that mask is clear (interrupts enabled). If the mask is set by >> Xen (interrupts disabled), writes to mask bit by the guest is >> ignored. >> >> Currently, a write to MSI-X mask bit by the guest is silently >> ignored. >> >> A likely scenario is where we have a 82599 SR-IOV nic passed >> through to a guest. From the guest if you do >> >> ifconfig <ETH_DEV> down >> ifconfig <ETH_DEV> up >> >> the interrupts remain masked. On VF reset, the mask bit is set >> by the controller. At this point, Xen is not aware that mask is set. >> However, interrupts are enabled by VF driver by clearing the mask >> bit by writing directly to BAR3 region containing the MSI-X table. >> >> From dom0, we can verify that >> interrupts are being masked using ''xl debug-keys M''. >> >> Initially, guest was allowed to modify MSI-X bit. >> Later this behaviour was changed. >> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. >> >> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> >> --- >> xen/arch/x86/hvm/vmsi.c | 47 +++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 33 insertions(+), 14 deletions(-) >> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >> index 36de312..21421cc 100644 >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -169,6 +169,7 @@ struct msixtbl_entry >> uint32_t msi_ad[3]; /* Shadow of address low, high and data */ >> } gentries[MAX_MSIX_ACC_ENTRIES]; >> struct rcu_head rcu; >> + struct pirq *pirq; >> }; >> >> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); >> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, >> void *virt; >> unsigned int nr_entry, index; >> int r = X86EMUL_UNHANDLEABLE; >> + unsigned long flags; >> + struct irq_desc *desc; >> + unsigned long orig; > unsigned long flags, orig; > > To be more compact. > >> >> if ( len != 4 || (address & 3) ) >> return r; >> @@ -283,22 +287,35 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, >> if ( !virt ) >> goto out; >> >> - /* Do not allow the mask bit to be changed. */ >> -#if 0 /* XXX >> - * As the mask bit is the only defined bit in the word, and as the >> - * host MSI-X code doesn''t preserve the other bits anyway, doing >> - * this is pointless. So for now just discard the write (also >> - * saving us from having to determine the matching irq_desc). >> - */ >> - spin_lock_irqsave(&desc->lock, flags); >> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); >> + if ( !desc ) >> + goto out; >> + >> + if ( !desc->msi_desc ) >> + goto unlock; >> + >> + /* Do not allow guest to modify MSIX control bit if it is masked >> + * by Xen. We''ll only handle the case where Xen thinks that >> + * bit is unmasked, but hardware has silently masked the bit >> + * (in case of SR-IOV VF reset, etc). >> + */ >> + if ( desc->msi_desc->msi_attrib.masked ) >> + goto unlock; > If Xen wants the msi masked, or the guest wants the msi masked then you > must set the masked bit, else must clear it. > > The root cause of this whole issue is that Xen doesn''t actually know > what state the mask bit is in; it only knows its intention. > > Therefore, goto unlock is incorrect here. By this point, we must write > the bit one way or another. > >> + >> + /* The mask bit is the only defined bit in the word. But we >> + * ought to preserve the reserved bits. Clearing the reserved >> + * bits can result in undefined behaviour (see PCI Local Bus >> + * Specification revision 2.3). >> + */ >> orig = readl(virt); >> - val &= ~PCI_MSIX_VECTOR_BITMASK; >> - val |= orig & PCI_MSIX_VECTOR_BITMASK; >> + val &= PCI_MSIX_VECTOR_BITMASK; >> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); >> writel(val, virt); >> - spin_unlock_irqrestore(&desc->lock, flags); >> -#endif >> >> +unlock: >> + spin_unlock_irqrestore(&desc->lock, flags); >> r = X86EMUL_OKAY; >> + >> out: >> rcu_read_unlock(&msixtbl_rcu_lock); >> return r; >> @@ -328,7 +345,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { >> static void add_msixtbl_entry(struct domain *d, >> struct pci_dev *pdev, >> uint64_t gtable, >> - struct msixtbl_entry *entry) >> + struct msixtbl_entry *entry, >> + struct pirq *pirq) > I would advocate const-correctness here, so "const struct pirq *pirq".Sorry - please ignore this. I was being an idiot. The other points still stand. ~Andrew> > ~Andrew > >> { >> u32 len; >> >> @@ -342,6 +360,7 @@ static void add_msixtbl_entry(struct domain *d, >> entry->table_len = len; >> entry->pdev = pdev; >> entry->gtable = (unsigned long) gtable; >> + entry->pirq = pirq; >> >> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); >> } >> @@ -404,7 +423,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) >> >> entry = new_entry; >> new_entry = NULL; >> - add_msixtbl_entry(d, pdev, gtable, entry); >> + add_msixtbl_entry(d, pdev, gtable, entry, pirq); >> >> found: >> atomic_inc(&entry->refcnt); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Aug-15 07:44 UTC
Re: [PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 14.08.13 at 18:36, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 14/08/13 17:30, Andrew Cooper wrote: >> On 14/08/13 17:18, Joby Poriyath wrote: >>> @@ -328,7 +345,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { >>> static void add_msixtbl_entry(struct domain *d, >>> struct pci_dev *pdev, >>> uint64_t gtable, >>> - struct msixtbl_entry *entry) >>> + struct msixtbl_entry *entry, >>> + struct pirq *pirq) >> I would advocate const-correctness here, so "const struct pirq *pirq". > > Sorry - please ignore this. I was being an idiot.In fact, I don''t see why const couldn''t be added here. Certainly pirq_spin_lock_irq_desc() - the only place where it is being consumed - doesn''t need it non-const. Jan
Jan Beulich
2013-Aug-15 07:48 UTC
Re: [PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 14.08.13 at 18:30, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 14/08/13 17:18, Joby Poriyath wrote: >> + /* Do not allow guest to modify MSIX control bit if it is masked >> + * by Xen. We''ll only handle the case where Xen thinks that >> + * bit is unmasked, but hardware has silently masked the bit >> + * (in case of SR-IOV VF reset, etc). >> + */ >> + if ( desc->msi_desc->msi_attrib.masked ) >> + goto unlock; > > If Xen wants the msi masked, or the guest wants the msi masked then you > must set the masked bit, else must clear it. > > The root cause of this whole issue is that Xen doesn''t actually know > what state the mask bit is in; it only knows its intention. > > Therefore, goto unlock is incorrect here. By this point, we must write > the bit one way or another.I disagree - the flag getting cleared under our feet would be a severe problem. Adding a respective WARN_ON() or ASSERT() might be a good idea. The flag getting set without our knowledge, otoh, is not a problem, and can be dealt with by the code as is. Jan
Jan Beulich
2013-Aug-15 08:00 UTC
Re: [PATCH v3] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 14.08.13 at 18:18, Joby Poriyath <joby.poriyath@citrix.com> wrote:A few coding style issues:> + /* Do not allow guest to modify MSIX control bit if it is masked > + * by Xen. We''ll only handle the case where Xen thinks that > + * bit is unmasked, but hardware has silently masked the bit > + * (in case of SR-IOV VF reset, etc). > + */The /* goes on its own line.> + if ( desc->msi_desc->msi_attrib.masked ) > + goto unlock; > + > + /* The mask bit is the only defined bit in the word. But we > + * ought to preserve the reserved bits. Clearing the reserved > + * bits can result in undefined behaviour (see PCI Local Bus > + * Specification revision 2.3). > + */Same here.> orig = readl(virt); > - val &= ~PCI_MSIX_VECTOR_BITMASK; > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > + val &= PCI_MSIX_VECTOR_BITMASK; > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );The parentheses are bogus here anyway, but it you insist on having them, there''s at least shouldn''t be blanks immediately inside them. Jan