Joby Poriyath
2013-Jul-19 15:07 UTC
[PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. The the mask bit for the VF is being set by the PF performing a reset (at the request of the VF). However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using ''xl debug-keys M''. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> --- xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 36de312..97d9f93 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -169,6 +169,7 @@ struct msixtbl_entry uint32_t msi_ad[3]; /* Shadow of address low, high and data */ } gentries[MAX_MSIX_ACC_ENTRIES]; struct rcu_head rcu; + struct pirq *pirq; }; static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, void *virt; unsigned int nr_entry, index; int r = X86EMUL_UNHANDLEABLE; + unsigned long flags; + struct irq_desc *desc; + unsigned long orig; if ( len != 4 || (address & 3) ) return r; @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, if ( !virt ) goto out; - /* Do not allow the mask bit to be changed. */ -#if 0 /* XXX - * As the mask bit is the only defined bit in the word, and as the - * host MSI-X code doesn''t preserve the other bits anyway, doing - * this is pointless. So for now just discard the write (also - * saving us from having to determine the matching irq_desc). - */ - spin_lock_irqsave(&desc->lock, flags); + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); + if ( !desc ) + goto out; + + /* The mask bit is the only defined bit in the word. But we + * ought to preserve the reserved bits. Clearing the reserved + * bits can result in undefined behaviour (see PCI Local Bus + * Specification revision 2.3). + */ orig = readl(virt); - val &= ~PCI_MSIX_VECTOR_BITMASK; - val |= orig & PCI_MSIX_VECTOR_BITMASK; + val &= PCI_MSIX_VECTOR_BITMASK; + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); writel(val, virt); spin_unlock_irqrestore(&desc->lock, flags); -#endif r = X86EMUL_OKAY; out: @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { static void add_msixtbl_entry(struct domain *d, struct pci_dev *pdev, uint64_t gtable, - struct msixtbl_entry *entry) + struct msixtbl_entry *entry, + struct pirq *pirq) { u32 len; @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, entry->table_len = len; entry->pdev = pdev; entry->gtable = (unsigned long) gtable; + entry->pirq = pirq; list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); } @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) entry = new_entry; new_entry = NULL; - add_msixtbl_entry(d, pdev, gtable, entry); + add_msixtbl_entry(d, pdev, gtable, entry, pirq); found: atomic_inc(&entry->refcnt); -- 1.7.10.4
Konrad Rzeszutek Wilk
2013-Jul-19 15:23 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:> Guest needs the ability to enable and disable MSI-X interrupts > by setting the MSI-X control bit. Currently, a write to MSI-X > mask bit by the guest is silently ignored. > > A likely scenario is where we have a 82599 SR-IOV nic passed > through to a guest. From the guest if you do > > ifconfig <ETH_DEV> down > ifconfig <ETH_DEV> up > > the interrupts remain masked. The the mask bit for the VF is > being set by the PF performing a reset (at the request of the VF). > However, interrupts are enabled by VF driver by clearing the mask > bit by writing directly to BAR3 region containing the MSI-X table. > > From dom0, we can verify that > interrupts are being masked using ''xl debug-keys M''. > > Initially, guest was allowed to modify MSI-X bit. > Later this behaviour was changed. > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > --- > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 36de312..97d9f93 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -169,6 +169,7 @@ struct msixtbl_entry > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > } gentries[MAX_MSIX_ACC_ENTRIES]; > struct rcu_head rcu; > + struct pirq *pirq; > }; > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > void *virt; > unsigned int nr_entry, index; > int r = X86EMUL_UNHANDLEABLE; > + unsigned long flags; > + struct irq_desc *desc; > + unsigned long orig; > > if ( len != 4 || (address & 3) ) > return r; > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > if ( !virt ) > goto out; > > - /* Do not allow the mask bit to be changed. */ > -#if 0 /* XXX > - * As the mask bit is the only defined bit in the word, and as the > - * host MSI-X code doesn''t preserve the other bits anyway, doing > - * this is pointless. So for now just discard the write (also > - * saving us from having to determine the matching irq_desc). > - */ > - spin_lock_irqsave(&desc->lock, flags); > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > + if ( !desc ) > + goto out; > + > + /* The mask bit is the only defined bit in the word. But we > + * ought to preserve the reserved bits. Clearing the reserved > + * bits can result in undefined behaviour (see PCI Local Bus > + * Specification revision 2.3).So.. if we do it won''t that be potentially dangerous?> + */ > orig = readl(virt); > - val &= ~PCI_MSIX_VECTOR_BITMASK; > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > + val &= PCI_MSIX_VECTOR_BITMASK; > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > writel(val, virt); > spin_unlock_irqrestore(&desc->lock, flags); > -#endif > > r = X86EMUL_OKAY; > out: > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > static void add_msixtbl_entry(struct domain *d, > struct pci_dev *pdev, > uint64_t gtable, > - struct msixtbl_entry *entry) > + struct msixtbl_entry *entry, > + struct pirq *pirq) > { > u32 len; > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > entry->table_len = len; > entry->pdev = pdev; > entry->gtable = (unsigned long) gtable; > + entry->pirq = pirq; > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > } > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > entry = new_entry; > new_entry = NULL; > - add_msixtbl_entry(d, pdev, gtable, entry); > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > found: > atomic_inc(&entry->refcnt); > -- > 1.7.10.4 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Joby Poriyath
2013-Jul-19 16:07 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Fri, Jul 19, 2013 at 11:23:26AM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote: > > Guest needs the ability to enable and disable MSI-X interrupts > > by setting the MSI-X control bit. Currently, a write to MSI-X > > mask bit by the guest is silently ignored. > > > > A likely scenario is where we have a 82599 SR-IOV nic passed > > through to a guest. From the guest if you do > > > > ifconfig <ETH_DEV> down > > ifconfig <ETH_DEV> up > > > > the interrupts remain masked. The the mask bit for the VF is > > being set by the PF performing a reset (at the request of the VF). > > However, interrupts are enabled by VF driver by clearing the mask > > bit by writing directly to BAR3 region containing the MSI-X table. > > > > From dom0, we can verify that > > interrupts are being masked using ''xl debug-keys M''. > > > > Initially, guest was allowed to modify MSI-X bit. > > Later this behaviour was changed. > > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > > --- > > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > > 1 file changed, 19 insertions(+), 13 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > > index 36de312..97d9f93 100644 > > --- a/xen/arch/x86/hvm/vmsi.c > > +++ b/xen/arch/x86/hvm/vmsi.c > > @@ -169,6 +169,7 @@ struct msixtbl_entry > > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > > } gentries[MAX_MSIX_ACC_ENTRIES]; > > struct rcu_head rcu; > > + struct pirq *pirq; > > }; > > > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > void *virt; > > unsigned int nr_entry, index; > > int r = X86EMUL_UNHANDLEABLE; > > + unsigned long flags; > > + struct irq_desc *desc; > > + unsigned long orig; > > > > if ( len != 4 || (address & 3) ) > > return r; > > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > if ( !virt ) > > goto out; > > > > - /* Do not allow the mask bit to be changed. */ > > -#if 0 /* XXX > > - * As the mask bit is the only defined bit in the word, and as the > > - * host MSI-X code doesn''t preserve the other bits anyway, doing > > - * this is pointless. So for now just discard the write (also > > - * saving us from having to determine the matching irq_desc). > > - */ > > - spin_lock_irqsave(&desc->lock, flags); > > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > > + if ( !desc ) > > + goto out; > > + > > + /* The mask bit is the only defined bit in the word. But we > > + * ought to preserve the reserved bits. Clearing the reserved > > + * bits can result in undefined behaviour (see PCI Local Bus > > + * Specification revision 2.3). > > So.. if we do it won''t that be potentially dangerous?Yes it can be. That''s why we are reading the value from MSI-X vector control register and only changing 0th bit and retaining the rest.> > + */ > > orig = readl(virt); > > - val &= ~PCI_MSIX_VECTOR_BITMASK; > > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > > + val &= PCI_MSIX_VECTOR_BITMASK; > > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > > writel(val, virt); > > spin_unlock_irqrestore(&desc->lock, flags); > > -#endif > > > > r = X86EMUL_OKAY; > > out: > > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > > static void add_msixtbl_entry(struct domain *d, > > struct pci_dev *pdev, > > uint64_t gtable, > > - struct msixtbl_entry *entry) > > + struct msixtbl_entry *entry, > > + struct pirq *pirq) > > { > > u32 len; > > > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > > entry->table_len = len; > > entry->pdev = pdev; > > entry->gtable = (unsigned long) gtable; > > + entry->pirq = pirq; > > > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > > } > > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > > > entry = new_entry; > > new_entry = NULL; > > - add_msixtbl_entry(d, pdev, gtable, entry); > > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > > > found: > > atomic_inc(&entry->refcnt); > > -- > > 1.7.10.4 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Malcolm Crossley
2013-Jul-19 16:38 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On 19/07/13 16:07, Joby Poriyath wrote:> Guest needs the ability to enable and disable MSI-X interrupts > by setting the MSI-X control bit. Currently, a write to MSI-X > mask bit by the guest is silently ignored. > > A likely scenario is where we have a 82599 SR-IOV nic passed > through to a guest. From the guest if you do > > ifconfig <ETH_DEV> down > ifconfig <ETH_DEV> up > > the interrupts remain masked. The the mask bit for the VF is > being set by the PF performing a reset (at the request of the VF). > However, interrupts are enabled by VF driver by clearing the mask > bit by writing directly to BAR3 region containing the MSI-X table. > > From dom0, we can verify that > interrupts are being masked using ''xl debug-keys M''. > > Initially, guest was allowed to modify MSI-X bit. > Later this behaviour was changed. > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>Reviewed-by: Malcolm Crossley <malcolm.crossley@citrix.com>> --- > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 36de312..97d9f93 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -169,6 +169,7 @@ struct msixtbl_entry > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > } gentries[MAX_MSIX_ACC_ENTRIES]; > struct rcu_head rcu; > + struct pirq *pirq; > }; > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > void *virt; > unsigned int nr_entry, index; > int r = X86EMUL_UNHANDLEABLE; > + unsigned long flags; > + struct irq_desc *desc; > + unsigned long orig; > > if ( len != 4 || (address & 3) ) > return r; > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > if ( !virt ) > goto out; > > - /* Do not allow the mask bit to be changed. */ > -#if 0 /* XXX > - * As the mask bit is the only defined bit in the word, and as the > - * host MSI-X code doesn''t preserve the other bits anyway, doing > - * this is pointless. So for now just discard the write (also > - * saving us from having to determine the matching irq_desc). > - */ > - spin_lock_irqsave(&desc->lock, flags); > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > + if ( !desc ) > + goto out; > + > + /* The mask bit is the only defined bit in the word. But we > + * ought to preserve the reserved bits. Clearing the reserved > + * bits can result in undefined behaviour (see PCI Local Bus > + * Specification revision 2.3). > + */ > orig = readl(virt); > - val &= ~PCI_MSIX_VECTOR_BITMASK; > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > + val &= PCI_MSIX_VECTOR_BITMASK; > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > writel(val, virt); > spin_unlock_irqrestore(&desc->lock, flags); > -#endif > > r = X86EMUL_OKAY; > out: > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > static void add_msixtbl_entry(struct domain *d, > struct pci_dev *pdev, > uint64_t gtable, > - struct msixtbl_entry *entry) > + struct msixtbl_entry *entry, > + struct pirq *pirq) > { > u32 len; > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > entry->table_len = len; > entry->pdev = pdev; > entry->gtable = (unsigned long) gtable; > + entry->pirq = pirq; > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > } > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > entry = new_entry; > new_entry = NULL; > - add_msixtbl_entry(d, pdev, gtable, entry); > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > found: > atomic_inc(&entry->refcnt);
Joby Poriyath
2013-Jul-23 10:54 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
Ping... Could you kindly review this? Many thanks, Joby On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:> Guest needs the ability to enable and disable MSI-X interrupts > by setting the MSI-X control bit. Currently, a write to MSI-X > mask bit by the guest is silently ignored. > > A likely scenario is where we have a 82599 SR-IOV nic passed > through to a guest. From the guest if you do > > ifconfig <ETH_DEV> down > ifconfig <ETH_DEV> up > > the interrupts remain masked. The the mask bit for the VF is > being set by the PF performing a reset (at the request of the VF). > However, interrupts are enabled by VF driver by clearing the mask > bit by writing directly to BAR3 region containing the MSI-X table. > > From dom0, we can verify that > interrupts are being masked using ''xl debug-keys M''. > > Initially, guest was allowed to modify MSI-X bit. > Later this behaviour was changed. > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > --- > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 36de312..97d9f93 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -169,6 +169,7 @@ struct msixtbl_entry > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > } gentries[MAX_MSIX_ACC_ENTRIES]; > struct rcu_head rcu; > + struct pirq *pirq; > }; > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > void *virt; > unsigned int nr_entry, index; > int r = X86EMUL_UNHANDLEABLE; > + unsigned long flags; > + struct irq_desc *desc; > + unsigned long orig; > > if ( len != 4 || (address & 3) ) > return r; > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > if ( !virt ) > goto out; > > - /* Do not allow the mask bit to be changed. */ > -#if 0 /* XXX > - * As the mask bit is the only defined bit in the word, and as the > - * host MSI-X code doesn''t preserve the other bits anyway, doing > - * this is pointless. So for now just discard the write (also > - * saving us from having to determine the matching irq_desc). > - */ > - spin_lock_irqsave(&desc->lock, flags); > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > + if ( !desc ) > + goto out; > + > + /* The mask bit is the only defined bit in the word. But we > + * ought to preserve the reserved bits. Clearing the reserved > + * bits can result in undefined behaviour (see PCI Local Bus > + * Specification revision 2.3). > + */ > orig = readl(virt); > - val &= ~PCI_MSIX_VECTOR_BITMASK; > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > + val &= PCI_MSIX_VECTOR_BITMASK; > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > writel(val, virt); > spin_unlock_irqrestore(&desc->lock, flags); > -#endif > > r = X86EMUL_OKAY; > out: > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > static void add_msixtbl_entry(struct domain *d, > struct pci_dev *pdev, > uint64_t gtable, > - struct msixtbl_entry *entry) > + struct msixtbl_entry *entry, > + struct pirq *pirq) > { > u32 len; > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > entry->table_len = len; > entry->pdev = pdev; > entry->gtable = (unsigned long) gtable; > + entry->pirq = pirq; > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > } > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > entry = new_entry; > new_entry = NULL; > - add_msixtbl_entry(d, pdev, gtable, entry); > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > found: > atomic_inc(&entry->refcnt); > -- > 1.7.10.4 >
Andrew Cooper
2013-Jul-23 13:21 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On 23/07/13 11:54, Joby Poriyath wrote:> Ping... > > Could you kindly review this? > > Many thanks, > Joby > > On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote: >> Guest needs the ability to enable and disable MSI-X interrupts >> by setting the MSI-X control bit. Currently, a write to MSI-X >> mask bit by the guest is silently ignored. >> >> A likely scenario is where we have a 82599 SR-IOV nic passed >> through to a guest. From the guest if you do >> >> ifconfig <ETH_DEV> down >> ifconfig <ETH_DEV> up >> >> the interrupts remain masked. The the mask bit for the VF is >> being set by the PF performing a reset (at the request of the VF). >> However, interrupts are enabled by VF driver by clearing the mask >> bit by writing directly to BAR3 region containing the MSI-X table. >> >> From dom0, we can verify that >> interrupts are being masked using ''xl debug-keys M''. >> >> Initially, guest was allowed to modify MSI-X bit. >> Later this behaviour was changed. >> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. >> >> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> This should be backported to all stable versions (perhaps including 4.1) as the buggy changeset 74c213c50 was backported in the name of security/general fixes. ~Andrew>> --- >> xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- >> 1 file changed, 19 insertions(+), 13 deletions(-) >> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >> index 36de312..97d9f93 100644 >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -169,6 +169,7 @@ struct msixtbl_entry >> uint32_t msi_ad[3]; /* Shadow of address low, high and data */ >> } gentries[MAX_MSIX_ACC_ENTRIES]; >> struct rcu_head rcu; >> + struct pirq *pirq; >> }; >> >> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); >> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, >> void *virt; >> unsigned int nr_entry, index; >> int r = X86EMUL_UNHANDLEABLE; >> + unsigned long flags; >> + struct irq_desc *desc; >> + unsigned long orig; >> >> if ( len != 4 || (address & 3) ) >> return r; >> @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, >> if ( !virt ) >> goto out; >> >> - /* Do not allow the mask bit to be changed. */ >> -#if 0 /* XXX >> - * As the mask bit is the only defined bit in the word, and as the >> - * host MSI-X code doesn''t preserve the other bits anyway, doing >> - * this is pointless. So for now just discard the write (also >> - * saving us from having to determine the matching irq_desc). >> - */ >> - spin_lock_irqsave(&desc->lock, flags); >> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); >> + if ( !desc ) >> + goto out; >> + >> + /* The mask bit is the only defined bit in the word. But we >> + * ought to preserve the reserved bits. Clearing the reserved >> + * bits can result in undefined behaviour (see PCI Local Bus >> + * Specification revision 2.3). >> + */ >> orig = readl(virt); >> - val &= ~PCI_MSIX_VECTOR_BITMASK; >> - val |= orig & PCI_MSIX_VECTOR_BITMASK; >> + val &= PCI_MSIX_VECTOR_BITMASK; >> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); >> writel(val, virt); >> spin_unlock_irqrestore(&desc->lock, flags); >> -#endif >> >> r = X86EMUL_OKAY; >> out: >> @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { >> static void add_msixtbl_entry(struct domain *d, >> struct pci_dev *pdev, >> uint64_t gtable, >> - struct msixtbl_entry *entry) >> + struct msixtbl_entry *entry, >> + struct pirq *pirq) >> { >> u32 len; >> >> @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, >> entry->table_len = len; >> entry->pdev = pdev; >> entry->gtable = (unsigned long) gtable; >> + entry->pirq = pirq; >> >> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); >> } >> @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) >> >> entry = new_entry; >> new_entry = NULL; >> - add_msixtbl_entry(d, pdev, gtable, entry); >> + add_msixtbl_entry(d, pdev, gtable, entry, pirq); >> >> found: >> atomic_inc(&entry->refcnt); >> -- >> 1.7.10.4 >>
Konrad Rzeszutek Wilk
2013-Jul-23 13:28 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:> Ping... > > Could you kindly review this?I believe Malcom reviewed and sent you an email with his response. Are you waiting for Andrew to look at it as well? Or Jan? Jan is out on vacation for a couple of weeks. Anyhow if it is Andrew that you are asking to look over it I would suggest you put his name on the ''To'' in the email in case he is using filters to stick non-To emails in some ''lower priority'' mailbox.> > Many thanks, > Joby > > On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote: > > Guest needs the ability to enable and disable MSI-X interrupts > > by setting the MSI-X control bit. Currently, a write to MSI-X > > mask bit by the guest is silently ignored. > > > > A likely scenario is where we have a 82599 SR-IOV nic passed > > through to a guest. From the guest if you do > > > > ifconfig <ETH_DEV> down > > ifconfig <ETH_DEV> up > > > > the interrupts remain masked. The the mask bit for the VF is > > being set by the PF performing a reset (at the request of the VF). > > However, interrupts are enabled by VF driver by clearing the mask > > bit by writing directly to BAR3 region containing the MSI-X table. > > > > From dom0, we can verify that > > interrupts are being masked using ''xl debug-keys M''. > > > > Initially, guest was allowed to modify MSI-X bit. > > Later this behaviour was changed. > > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > > --- > > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > > 1 file changed, 19 insertions(+), 13 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > > index 36de312..97d9f93 100644 > > --- a/xen/arch/x86/hvm/vmsi.c > > +++ b/xen/arch/x86/hvm/vmsi.c > > @@ -169,6 +169,7 @@ struct msixtbl_entry > > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > > } gentries[MAX_MSIX_ACC_ENTRIES]; > > struct rcu_head rcu; > > + struct pirq *pirq; > > }; > > > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > void *virt; > > unsigned int nr_entry, index; > > int r = X86EMUL_UNHANDLEABLE; > > + unsigned long flags; > > + struct irq_desc *desc; > > + unsigned long orig; > > > > if ( len != 4 || (address & 3) ) > > return r; > > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > if ( !virt ) > > goto out; > > > > - /* Do not allow the mask bit to be changed. */ > > -#if 0 /* XXX > > - * As the mask bit is the only defined bit in the word, and as the > > - * host MSI-X code doesn''t preserve the other bits anyway, doing > > - * this is pointless. So for now just discard the write (also > > - * saving us from having to determine the matching irq_desc). > > - */ > > - spin_lock_irqsave(&desc->lock, flags); > > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > > + if ( !desc ) > > + goto out; > > + > > + /* The mask bit is the only defined bit in the word. But we > > + * ought to preserve the reserved bits. Clearing the reserved > > + * bits can result in undefined behaviour (see PCI Local Bus > > + * Specification revision 2.3). > > + */ > > orig = readl(virt); > > - val &= ~PCI_MSIX_VECTOR_BITMASK; > > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > > + val &= PCI_MSIX_VECTOR_BITMASK; > > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > > writel(val, virt); > > spin_unlock_irqrestore(&desc->lock, flags); > > -#endif > > > > r = X86EMUL_OKAY; > > out: > > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > > static void add_msixtbl_entry(struct domain *d, > > struct pci_dev *pdev, > > uint64_t gtable, > > - struct msixtbl_entry *entry) > > + struct msixtbl_entry *entry, > > + struct pirq *pirq) > > { > > u32 len; > > > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > > entry->table_len = len; > > entry->pdev = pdev; > > entry->gtable = (unsigned long) gtable; > > + entry->pirq = pirq; > > > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > > } > > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > > > entry = new_entry; > > new_entry = NULL; > > - add_msixtbl_entry(d, pdev, gtable, entry); > > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > > > found: > > atomic_inc(&entry->refcnt); > > -- > > 1.7.10.4 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Joby Poriyath
2013-Jul-23 17:59 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
Hi Konrad, On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote: > > Ping... > > > > Could you kindly review this? > > I believe Malcom reviewed and sent you an email with his response.Andrew and Malcolm has reviewed the code.> > Are you waiting for Andrew to look at it as well? Or Jan? Jan is > out on vacation for a couple of weeks.I was hoping that Jan would also review this, since this patch partly reverts his code. This patch makes an assumption that, for a passed through device, xen needs to mask the MSI-X control bit only when it does vcpu migration. Otherwise xen leaves it in the state guest has set it.> > Anyhow if it is Andrew that you are asking to look over it I would > suggest you put his name on the ''To'' in the email in case he is > using filters to stick non-To emails in some ''lower priority'' mailbox. > > > > > Many thanks, > > Joby > >Thanks, Joby> > On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote: > > > Guest needs the ability to enable and disable MSI-X interrupts > > > by setting the MSI-X control bit. Currently, a write to MSI-X > > > mask bit by the guest is silently ignored. > > > > > > A likely scenario is where we have a 82599 SR-IOV nic passed > > > through to a guest. From the guest if you do > > > > > > ifconfig <ETH_DEV> down > > > ifconfig <ETH_DEV> up > > > > > > the interrupts remain masked. The the mask bit for the VF is > > > being set by the PF performing a reset (at the request of the VF). > > > However, interrupts are enabled by VF driver by clearing the mask > > > bit by writing directly to BAR3 region containing the MSI-X table. > > > > > > From dom0, we can verify that > > > interrupts are being masked using ''xl debug-keys M''. > > > > > > Initially, guest was allowed to modify MSI-X bit. > > > Later this behaviour was changed. > > > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > > > > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> > > > --- > > > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++------------- > > > 1 file changed, 19 insertions(+), 13 deletions(-) > > > > > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > > > index 36de312..97d9f93 100644 > > > --- a/xen/arch/x86/hvm/vmsi.c > > > +++ b/xen/arch/x86/hvm/vmsi.c > > > @@ -169,6 +169,7 @@ struct msixtbl_entry > > > uint32_t msi_ad[3]; /* Shadow of address low, high and data */ > > > } gentries[MAX_MSIX_ACC_ENTRIES]; > > > struct rcu_head rcu; > > > + struct pirq *pirq; > > > }; > > > > > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > > > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > > void *virt; > > > unsigned int nr_entry, index; > > > int r = X86EMUL_UNHANDLEABLE; > > > + unsigned long flags; > > > + struct irq_desc *desc; > > > + unsigned long orig; > > > > > > if ( len != 4 || (address & 3) ) > > > return r; > > > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, > > > if ( !virt ) > > > goto out; > > > > > > - /* Do not allow the mask bit to be changed. */ > > > -#if 0 /* XXX > > > - * As the mask bit is the only defined bit in the word, and as the > > > - * host MSI-X code doesn''t preserve the other bits anyway, doing > > > - * this is pointless. So for now just discard the write (also > > > - * saving us from having to determine the matching irq_desc). > > > - */ > > > - spin_lock_irqsave(&desc->lock, flags); > > > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags); > > > + if ( !desc ) > > > + goto out; > > > + > > > + /* The mask bit is the only defined bit in the word. But we > > > + * ought to preserve the reserved bits. Clearing the reserved > > > + * bits can result in undefined behaviour (see PCI Local Bus > > > + * Specification revision 2.3). > > > + */ > > > orig = readl(virt); > > > - val &= ~PCI_MSIX_VECTOR_BITMASK; > > > - val |= orig & PCI_MSIX_VECTOR_BITMASK; > > > + val &= PCI_MSIX_VECTOR_BITMASK; > > > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK ); > > > writel(val, virt); > > > spin_unlock_irqrestore(&desc->lock, flags); > > > -#endif > > > > > > r = X86EMUL_OKAY; > > > out: > > > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = { > > > static void add_msixtbl_entry(struct domain *d, > > > struct pci_dev *pdev, > > > uint64_t gtable, > > > - struct msixtbl_entry *entry) > > > + struct msixtbl_entry *entry, > > > + struct pirq *pirq) > > > { > > > u32 len; > > > > > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d, > > > entry->table_len = len; > > > entry->pdev = pdev; > > > entry->gtable = (unsigned long) gtable; > > > + entry->pirq = pirq; > > > > > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list); > > > } > > > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) > > > > > > entry = new_entry; > > > new_entry = NULL; > > > - add_msixtbl_entry(d, pdev, gtable, entry); > > > + add_msixtbl_entry(d, pdev, gtable, entry, pirq); > > > > > > found: > > > atomic_inc(&entry->refcnt); > > > -- > > > 1.7.10.4 > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Jan Beulich
2013-Aug-05 10:44 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote: > On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote: >> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote: >> > Ping... >> > >> > Could you kindly review this? >> >> I believe Malcom reviewed and sent you an email with his response. > > Andrew and Malcolm has reviewed the code. > >> >> Are you waiting for Andrew to look at it as well? Or Jan? Jan is >> out on vacation for a couple of weeks. > > I was hoping that Jan would also review this, since this patch > partly reverts his code. > > This patch makes an assumption that, for a passed through device, > xen needs to mask the MSI-X control bit only when it does vcpu > migration. Otherwise xen leaves it in the state guest has set it.Exactly. And hence the patch is wrong, re-introducing a security issue. If you need to give the guest control over the _virtual_ mask bit, you need to add proper emulation / masking of the delivery of the virtual interrupt (i.e. the physical mask bit should always be under the control of Xen, but interrupts should not be delivered to the guest when the virtual mask bit is set). In any event - if the solution was as simple as what you propose, I would have implemented it right away rather than leaving the code known broken (which in fact was partly based on the observation that KVM/QEMU - at least at that time - got away without properly emulating the guest attempts to control this bit as well). Jan
Andrew Cooper
2013-Aug-05 11:01 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On 05/08/13 11:44, Jan Beulich wrote:>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote: >> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote: >>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote: >>>> Ping... >>>> >>>> Could you kindly review this? >>> I believe Malcom reviewed and sent you an email with his response. >> Andrew and Malcolm has reviewed the code. >> >>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is >>> out on vacation for a couple of weeks. >> I was hoping that Jan would also review this, since this patch >> partly reverts his code. >> >> This patch makes an assumption that, for a passed through device, >> xen needs to mask the MSI-X control bit only when it does vcpu >> migration. Otherwise xen leaves it in the state guest has set it. > Exactly. And hence the patch is wrong,Unfortunately it is not. The whole problem is that the VF can (and does) request that the PF resets itself (using an out-of-band mailbox system in the hardware itself), causing the mask bit to change (i.e. get set) without Xen''s knowledge. At this point, it is only the guest which can rectify the situation and clear the mask bit. I would agree that the solution is not perfect, but SRIOV virtual functions rely on the ability to clear the mask bit, whether or not it is a security issue. This means that your patch to disable writing to the mask bit has caused a functional regression in all SRIOV operations in Xen. I would say that the less bad alternative is to reintroduce the ability to change the mask bit, and fix the regression, then work out how to fix the security aspect. ~Andrew> re-introducing a security > issue. If you need to give the guest control over the _virtual_ > mask bit, you need to add proper emulation / masking of the > delivery of the virtual interrupt (i.e. the physical mask bit should > always be under the control of Xen, but interrupts should not be > delivered to the guest when the virtual mask bit is set). > > In any event - if the solution was as simple as what you propose, > I would have implemented it right away rather than leaving the > code known broken (which in fact was partly based on the > observation that KVM/QEMU - at least at that time - got away > without properly emulating the guest attempts to control this bit > as well). > > Jan >
Jan Beulich
2013-Aug-05 11:49 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
>>> On 05.08.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 05/08/13 11:44, Jan Beulich wrote: >>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote: >>> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote: >>>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote: >>>>> Ping... >>>>> >>>>> Could you kindly review this? >>>> I believe Malcom reviewed and sent you an email with his response. >>> Andrew and Malcolm has reviewed the code. >>> >>>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is >>>> out on vacation for a couple of weeks. >>> I was hoping that Jan would also review this, since this patch >>> partly reverts his code. >>> >>> This patch makes an assumption that, for a passed through device, >>> xen needs to mask the MSI-X control bit only when it does vcpu >>> migration. Otherwise xen leaves it in the state guest has set it. >> Exactly. And hence the patch is wrong, > > Unfortunately it is not. > > The whole problem is that the VF can (and does) request that the PF > resets itself (using an out-of-band mailbox system in the hardware > itself), causing the mask bit to change (i.e. get set) without Xen''s > knowledge.The at the very least such a code change should be done for VFs only. But I''d object to that too.> At this point, it is only the guest which can rectify the situation and > clear the mask bit. > > I would agree that the solution is not perfect, but SRIOV virtual > functions rely on the ability to clear the mask bit, whether or not it > is a security issue. This means that your patch to disable writing to > the mask bit has caused a functional regression in all SRIOV operations > in Xen.Only in the special case that was described in the patch header afaict (or else I''m sure we would have known much earlier).> I would say that the less bad alternative is to reintroduce the ability > to change the mask bit, and fix the regression, then work out how to fix > the security aspect.No, my position is pretty clear here: No re-introduction of security issues when fixing secondary problems. The more that the immediate (but only partial) solution would be quite clear: If the guest wants to clear the flag and Xen thinks the flag is clear, it could certainly allow the write. The security aspect here is when the guest wants to clear the flag while Xen expects it to be set. But a proper solution would of course be preferred. That may involve notifying Xen of the reset from the PF driver. Jan
Andrew Cooper
2013-Aug-05 16:03 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On 05/08/13 12:49, Jan Beulich wrote:>>>> On 05.08.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 05/08/13 11:44, Jan Beulich wrote: >>>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote: >>>> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote: >>>>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote: >>>>>> Ping... >>>>>> >>>>>> Could you kindly review this? >>>>> I believe Malcom reviewed and sent you an email with his response. >>>> Andrew and Malcolm has reviewed the code. >>>> >>>>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is >>>>> out on vacation for a couple of weeks. >>>> I was hoping that Jan would also review this, since this patch >>>> partly reverts his code. >>>> >>>> This patch makes an assumption that, for a passed through device, >>>> xen needs to mask the MSI-X control bit only when it does vcpu >>>> migration. Otherwise xen leaves it in the state guest has set it. >>> Exactly. And hence the patch is wrong, >> Unfortunately it is not. >> >> The whole problem is that the VF can (and does) request that the PF >> resets itself (using an out-of-band mailbox system in the hardware >> itself), causing the mask bit to change (i.e. get set) without Xen''s >> knowledge. > The at the very least such a code change should be done for VFs > only. But I''d object to that too. > >> At this point, it is only the guest which can rectify the situation and >> clear the mask bit. >> >> I would agree that the solution is not perfect, but SRIOV virtual >> functions rely on the ability to clear the mask bit, whether or not it >> is a security issue. This means that your patch to disable writing to >> the mask bit has caused a functional regression in all SRIOV operations >> in Xen. > Only in the special case that was described in the patch header > afaict (or else I''m sure we would have known much earlier).No - this unconditionally breaks any attempt to use an SRIOV virtual function by a VF-aware driver. This includes FreeBSD and SLES VMs, and results in a complete loss of interrupts, because the mask bit has been set without Xen''s knowledge, and nothing ever clears it, as the guests attempt to clear it gets intercepted and discarded. This was discovered as a regression from XenServer 6.1 to 6.2, and was identified specifically this change which was backported into Xen 4.1.5> >> I would say that the less bad alternative is to reintroduce the ability >> to change the mask bit, and fix the regression, then work out how to fix >> the security aspect. > No, my position is pretty clear here: No re-introduction of security > issues when fixing secondary problems.Then we have no option but to state that SRIOV is completely broken under Xen (including all stable trees) until we have a fix.> > The more that the immediate (but only partial) solution would be quite > clear: If the guest wants to clear the flag and Xen thinks the flag is > clear, it could certainly allow the write. The security aspect here is > when the guest wants to clear the flag while Xen expects it to be set. > > But a proper solution would of course be preferred. That may involve > notifying Xen of the reset from the PF driver. > > Jan >That is making the assumption that the PF driver is even aware that the reset has occurred. If the PF driver can get at the reset information, I still cant see Xen specific hooks being accepted upstream. Fundamentally, given that only the VM is in a position to actually know about the state of the mask bit (As we cant possibly have Xen polling for the state), then the guest has to have access to the mask bit. ~Andrew
Jan Beulich
2013-Aug-06 08:29 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 05/08/13 12:49, Jan Beulich wrote: >> But a proper solution would of course be preferred. That may involve >> notifying Xen of the reset from the PF driver. > > That is making the assumption that the PF driver is even aware that the > reset has occurred. If the PF driver can get at the reset information, > I still cant see Xen specific hooks being accepted upstream.You sort of contradict your earlier statement here that "the VF requests the PF to reset itself" (where you probably meant "it", as the VF is hardly allowed to cause a reset of the PF): Such an operation should imo be visible to the PF driver. And then, even if the reset is being done, shouldn''t the interrupt setup occur _after_ the reset? Which would make Xen clear the flag...> Fundamentally, given that only the VM is in a position to actually know > about the state of the mask bit (As we cant possibly have Xen polling > for the state), then the guest has to have access to the mask bit.No, that continues to not be an option as long as Xen uses the mask bit itself. Once again, the obvious immediate solution would appear to be to carry out the write with the OR of both the guest intended and the hypervisor required mask bits, thus allowing the bit to get cleared immediately if Xen doesn''t need it to be set (and the clearing would happen subsequently if Xen needs the flag to be set at the point of the guest write). This requires associating back the table entry to the IRQ in question (if any), in order to look at irq_desc->msi_desc->msi_attrib.masked. Jan
Andrew Cooper
2013-Aug-06 09:52 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On 06/08/13 09:29, Jan Beulich wrote:>>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 05/08/13 12:49, Jan Beulich wrote: >>> But a proper solution would of course be preferred. That may involve >>> notifying Xen of the reset from the PF driver. >> That is making the assumption that the PF driver is even aware that the >> reset has occurred. If the PF driver can get at the reset information, >> I still cant see Xen specific hooks being accepted upstream. > You sort of contradict your earlier statement here that "the VF > requests the PF to reset itself" (where you probably meant "it", > as the VF is hardly allowed to cause a reset of the PF): Such an > operation should imo be visible to the PF driver. > > And then, even if the reset is being done, shouldn''t the interrupt > setup occur _after_ the reset? Which would make Xen clear the > flag...The observed behaviour is this: On startup, the VF driver issues this backchannel reset of itself (the VF) which, amongst other things, sets the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but on further consideration, relying on this behaviour might not be a good idea. The VF is then expected to re-enable the interrupt at its convenience. There are no obvious hooks in the PF driver to receive notification of these backchannel resets. ~Andrew> >> Fundamentally, given that only the VM is in a position to actually know >> about the state of the mask bit (As we cant possibly have Xen polling >> for the state), then the guest has to have access to the mask bit. > No, that continues to not be an option as long as Xen uses the mask > bit itself. Once again, the obvious immediate solution would appear > to be to carry out the write with the OR of both the guest intended > and the hypervisor required mask bits, thus allowing the bit to get > cleared immediately if Xen doesn''t need it to be set (and the clearing > would happen subsequently if Xen needs the flag to be set at the > point of the guest write). This requires associating back the table > entry to the IRQ in question (if any), in order to look at > irq_desc->msi_desc->msi_attrib.masked. > > Jan >
Jan Beulich
2013-Aug-06 10:17 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
>>> On 06.08.13 at 11:52, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 06/08/13 09:29, Jan Beulich wrote: >>>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 05/08/13 12:49, Jan Beulich wrote: >>>> But a proper solution would of course be preferred. That may involve >>>> notifying Xen of the reset from the PF driver. >>> That is making the assumption that the PF driver is even aware that the >>> reset has occurred. If the PF driver can get at the reset information, >>> I still cant see Xen specific hooks being accepted upstream. >> You sort of contradict your earlier statement here that "the VF >> requests the PF to reset itself" (where you probably meant "it", >> as the VF is hardly allowed to cause a reset of the PF): Such an >> operation should imo be visible to the PF driver. >> >> And then, even if the reset is being done, shouldn''t the interrupt >> setup occur _after_ the reset? Which would make Xen clear the >> flag... > > The observed behaviour is this: On startup, the VF driver issues this > backchannel reset of itself (the VF) which, amongst other things, sets > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but > on further consideration, relying on this behaviour might not be a good > idea. The VF is then expected to re-enable the interrupt at its > convenience.Re-enable? Not set up? That would at once deal with your valid concern regarding MSI addr/data. This still looks to me like a device/driver specific bug rather than a general SR-IOV issue... Jan
Pasi Kärkkäinen
2013-Aug-06 10:17 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote:> On 06/08/13 09:29, Jan Beulich wrote: > >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> On 05/08/13 12:49, Jan Beulich wrote: > >>> But a proper solution would of course be preferred. That may involve > >>> notifying Xen of the reset from the PF driver. > >> That is making the assumption that the PF driver is even aware that the > >> reset has occurred. If the PF driver can get at the reset information, > >> I still cant see Xen specific hooks being accepted upstream. > > You sort of contradict your earlier statement here that "the VF > > requests the PF to reset itself" (where you probably meant "it", > > as the VF is hardly allowed to cause a reset of the PF): Such an > > operation should imo be visible to the PF driver. > > > > And then, even if the reset is being done, shouldn''t the interrupt > > setup occur _after_ the reset? Which would make Xen clear the > > flag... > > The observed behaviour is this: On startup, the VF driver issues this > backchannel reset of itself (the VF) which, amongst other things, sets > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but > on further consideration, relying on this behaviour might not be a good > idea. The VF is then expected to re-enable the interrupt at its > convenience. > > There are no obvious hooks in the PF driver to receive notification of > these backchannel resets. >Doesn''t the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself? -- Pasi
Pasi Kärkkäinen
2013-Aug-06 13:11 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Tue, Aug 06, 2013 at 01:17:35PM +0300, Pasi Kärkkäinen wrote:> On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote: > > On 06/08/13 09:29, Jan Beulich wrote: > > >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > >> On 05/08/13 12:49, Jan Beulich wrote: > > >>> But a proper solution would of course be preferred. That may involve > > >>> notifying Xen of the reset from the PF driver. > > >> That is making the assumption that the PF driver is even aware that the > > >> reset has occurred. If the PF driver can get at the reset information, > > >> I still cant see Xen specific hooks being accepted upstream. > > > You sort of contradict your earlier statement here that "the VF > > > requests the PF to reset itself" (where you probably meant "it", > > > as the VF is hardly allowed to cause a reset of the PF): Such an > > > operation should imo be visible to the PF driver. > > > > > > And then, even if the reset is being done, shouldn''t the interrupt > > > setup occur _after_ the reset? Which would make Xen clear the > > > flag... > > > > The observed behaviour is this: On startup, the VF driver issues this > > backchannel reset of itself (the VF) which, amongst other things, sets > > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but > > on further consideration, relying on this behaviour might not be a good > > idea. The VF is then expected to re-enable the interrupt at its > > convenience. > > > > There are no obvious hooks in the PF driver to receive notification of > > these backchannel resets. > > > > Doesn''t the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself? >At least I''ve seen such messages from the Intel igxbe PF driver.. or did I misunderstand something? -- Pasi
Joby Poriyath
2013-Aug-13 17:08 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
Hi Jan, On Tue, Aug 06, 2013 at 09:29:22AM +0100, Jan Beulich wrote:> >>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 05/08/13 12:49, Jan Beulich wrote: > >> But a proper solution would of course be preferred. That may involve > >> notifying Xen of the reset from the PF driver. > > > > That is making the assumption that the PF driver is even aware that the > > reset has occurred. If the PF driver can get at the reset information, > > I still cant see Xen specific hooks being accepted upstream. > > You sort of contradict your earlier statement here that "the VF > requests the PF to reset itself" (where you probably meant "it", > as the VF is hardly allowed to cause a reset of the PF): Such an > operation should imo be visible to the PF driver. > > And then, even if the reset is being done, shouldn''t the interrupt > setup occur _after_ the reset? Which would make Xen clear the > flag... > > > Fundamentally, given that only the VM is in a position to actually know > > about the state of the mask bit (As we cant possibly have Xen polling > > for the state), then the guest has to have access to the mask bit. > > No, that continues to not be an option as long as Xen uses the mask > bit itself. Once again, the obvious immediate solution would appear > to be to carry out the write with the OR of both the guest intended > and the hypervisor required mask bits, thus allowing the bit to get > cleared immediately if Xen doesn''t need it to be set (and the clearing > would happen subsequently if Xen needs the flag to be set at the > point of the guest write). This requires associating back the table > entry to the IRQ in question (if any), in order to look at > irq_desc->msi_desc->msi_attrib.masked.Thanks for the suggestion. The reason why guest was allowed to modify MSIX control bit, was to reduce the load on qemu-dm (see http://zhenxiao.com/read/SR-IOV.pdf). I''ll post the updated patch.> > Jan >Thanks, Joby
Joby Poriyath
2013-Aug-13 17:37 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
On Tue, Aug 06, 2013 at 04:11:56PM +0300, Pasi Kärkkäinen wrote:> On Tue, Aug 06, 2013 at 01:17:35PM +0300, Pasi Kärkkäinen wrote: > > On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote: > > > On 06/08/13 09:29, Jan Beulich wrote: > > > >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > > >> On 05/08/13 12:49, Jan Beulich wrote: > > > >>> But a proper solution would of course be preferred. That may involve > > > >>> notifying Xen of the reset from the PF driver. > > > >> That is making the assumption that the PF driver is even aware that the > > > >> reset has occurred. If the PF driver can get at the reset information, > > > >> I still cant see Xen specific hooks being accepted upstream. > > > > You sort of contradict your earlier statement here that "the VF > > > > requests the PF to reset itself" (where you probably meant "it", > > > > as the VF is hardly allowed to cause a reset of the PF): Such an > > > > operation should imo be visible to the PF driver. > > > > > > > > And then, even if the reset is being done, shouldn't the interrupt > > > > setup occur _after_ the reset? Which would make Xen clear the > > > > flag... > > > > > > The observed behaviour is this: On startup, the VF driver issues this > > > backchannel reset of itself (the VF) which, amongst other things, sets > > > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but > > > on further consideration, relying on this behaviour might not be a good > > > idea. The VF is then expected to re-enable the interrupt at its > > > convenience. > > > > > > There are no obvious hooks in the PF driver to receive notification of > > > these backchannel resets. > > > > > > > Doesn't the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself? > > > > At least I've seen such messages from the Intel igxbe PF driver.. or did I misunderstand something? >Looking at the ixgbevf code in kernel (tip), reset function is ixgbevf_reset_hw_vf. The reset happens with IXGBE_WRITE_REG(hw, IXGBE_VFCTRL, IXGBE_CTRL_RST); This will in turn mask the MSI-X vector. The controller does this automagically. <snip> from intel 82599 controller spec MSI–X Vector Control — MSIXTVCTRL (BAR3: 0xC + 0x10*n, n=0...255; RW) Bits Default Type 0 1b RW Description ----------- Mask Bit. When this bit is set, the function is prohibited from sending a message using this MSI-X table entry. However, any other MSI-X table entries programmed with the same vector are still capable of sending an equivalent message unless they are also masked. This bit’s state after reset is 1b (entry is masked). This bit is read/write. <snip> After the reset, VF sends a message to PF informing PF about the reset. Guest was allowed to modify the MSI-X control bit to reduce the load on dom0. Otherwise this would have to be emulated by qemu-dm. I found a paper which explains this in detail (http://zhenxiao.com/read/SR-IOV.pdf).> -- Pasi >Joby _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Aug-14 10:11 UTC
Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
>>> On 13.08.13 at 19:37, Joby Poriyath <joby.poriyath@citrix.com> wrote: > Guest was allowed to modify the MSI-X control bit to reduce the load on > dom0. > Otherwise this would have to be emulated by qemu-dm. I found a paper > which explains this in detail (http://zhenxiao.com/read/SR-IOV.pdf).Without going through it in full, I wasn''t able to find any mention of the guest being granted to the MSI-X mask bit (searching for all of "MSI-X", "mask", and "bit" individually). Which would anyway be more consistent with MSI, where the mask bit in PCI config space, and hence the guest can''t be allowed direct access in any case. Jan