Joby Poriyath
2013-Sep-04 18:07 UTC
[PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen *thinks* that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using ''xl debug-keys M''. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. patch revision history ---------------------- v1: Initial patch to allow guest writes to MSI-X control bit v2: retained the reserved bits while updating MSI-X control vector (only 1 bit is defined) v3: Allow guest writes only when Xen view of MSI-X control bit is 0 v4: Added a warning if Xen thinks MSI-X control bit is masked, where in reality, it''s not v5 & v6: Added const-correctness v7: Get msi_desc from the guest write ''address'' v8: Added ASSERT and renamed m_desc to msi_desc Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> --- xen/arch/x86/hvm/vmsi.c | 75 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 63 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 0d5ef1b..1f43f6b 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -187,6 +187,19 @@ static struct msixtbl_entry *msixtbl_find_entry( return NULL; } +static struct msi_desc *virt_to_msi_desc(struct pci_dev *dev, void *virt) +{ + struct msi_desc *desc; + + list_for_each_entry( desc, &dev->msi_list, list ) + if ( desc->msi_attrib.type == PCI_CAP_ID_MSIX && + virt >= desc->mask_base && + virt < desc->mask_base + PCI_MSIX_ENTRY_SIZE ) + return desc; + + return NULL; +} + static void __iomem *msixtbl_addr_to_virt( struct msixtbl_entry *entry, unsigned long addr) { @@ -247,13 +260,16 @@ out: } static int msixtbl_write(struct vcpu *v, unsigned long address, - unsigned long len, unsigned long val) + unsigned long len, unsigned long val) { unsigned long offset; struct msixtbl_entry *entry; + const struct msi_desc *msi_desc; void *virt; unsigned int nr_entry, index; int r = X86EMUL_UNHANDLEABLE; + unsigned long flags, orig; + struct irq_desc *desc; if ( len != 4 || (address & 3) ) return r; @@ -283,22 +299,57 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, if ( !virt ) goto out; - /* Do not allow the mask bit to be changed. */ -#if 0 /* XXX - * As the mask bit is the only defined bit in the word, and as the - * host MSI-X code doesn''t preserve the other bits anyway, doing - * this is pointless. So for now just discard the write (also - * saving us from having to determine the matching irq_desc). - */ + msi_desc = virt_to_msi_desc(entry->pdev, virt); + if ( !msi_desc || msi_desc->irq < 0 ) + goto out; + + desc = irq_to_desc(msi_desc->irq); + if ( !desc ) + goto out; + spin_lock_irqsave(&desc->lock, flags); + + if ( !desc->msi_desc ) + goto unlock; + + ASSERT(msi_desc == desc->msi_desc); + orig = readl(virt); - val &= ~PCI_MSIX_VECTOR_BITMASK; - val |= orig & PCI_MSIX_VECTOR_BITMASK; + + /* + * Do not allow guest to modify MSI-X control bit if it is masked + * by Xen. We''ll only handle the case where Xen thinks that + * bit is unmasked, but hardware has silently masked the bit + * (in case of SR-IOV VF reset, etc). On the other hand, if Xen + * thinks that the bit is masked, but it''s really not, + * we log a warning. + */ + if ( msi_desc->msi_attrib.masked ) + { + if ( !(orig & PCI_MSIX_VECTOR_BITMASK) ) + printk(XENLOG_WARNING "MSI-X control bit is unmasked when" + " it is expected to be masked [%04x:%02x:%02x.%01x]\n", + entry->pdev->seg, entry->pdev->bus, + PCI_SLOT(entry->pdev->devfn), + PCI_FUNC(entry->pdev->devfn)); + + goto unlock; + } + + /* + * The mask bit is the only defined bit in the word. But we + * ought to preserve the reserved bits. Clearing the reserved + * bits can result in undefined behaviour (see PCI Local Bus + * Specification revision 2.3). + */ + val &= PCI_MSIX_VECTOR_BITMASK; + val |= (orig & ~PCI_MSIX_VECTOR_BITMASK); writel(val, virt); - spin_unlock_irqrestore(&desc->lock, flags); -#endif +unlock: + spin_unlock_irqrestore(&desc->lock, flags); r = X86EMUL_OKAY; + out: rcu_read_unlock(&msixtbl_rcu_lock); return r; -- 1.7.10.4
Jan Beulich
2013-Sep-05 08:02 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 04.09.13 at 20:07, Joby Poriyath <joby.poriyath@citrix.com> wrote: > Guest needs the ability to enable and disable MSI-X interrupts > by setting the MSI-X control bit, for a passed-through device. > Guest is allowed to write MSI-X mask bit only if Xen *thinks* > that mask is clear (interrupts enabled). If the mask is set by > Xen (interrupts disabled), writes to mask bit by the guest is > ignored. > > Currently, a write to MSI-X mask bit by the guest is silently > ignored. > > A likely scenario is where we have a 82599 SR-IOV nic passed > through to a guest. From the guest if you do > > ifconfig <ETH_DEV> down > ifconfig <ETH_DEV> up > > the interrupts remain masked. On VF reset, the mask bit is set > by the controller. At this point, Xen is not aware that mask is set. > However, interrupts are enabled by VF driver by clearing the mask > bit by writing directly to BAR3 region containing the MSI-X table. > > From dom0, we can verify that > interrupts are being masked using ''xl debug-keys M''. > > Initially, guest was allowed to modify MSI-X bit. > Later this behaviour was changed. > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. > > patch revision history > ---------------------- > v1: Initial patch to allow guest writes to MSI-X control bit > v2: retained the reserved bits while updating MSI-X control vector > (only 1 bit is defined) > v3: Allow guest writes only when Xen view of MSI-X control bit is 0 > v4: Added a warning if Xen thinks MSI-X control bit is masked, > where in reality, it''s not > v5 & v6: Added const-correctness > v7: Get msi_desc from the guest write ''address'' > v8: Added ASSERT and renamed m_desc to msi_descLooks good to me now - unless I hear otherwise from anyone I would go and apply this as soon as the current backlog in the staging tree cleared. Jan
Joby Poriyath
2013-Sep-05 09:40 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
On Thu, Sep 05, 2013 at 09:02:33AM +0100, Jan Beulich wrote:> Looks good to me now - unless I hear otherwise from anyone I > would go and apply this as soon as the current backlog in the > staging tree cleared. > > Jan >Thanks Jan. Sander, would it be possible for you to give this patch a try on your hardware? Many thanks, Joby
Sander Eikelenboom
2013-Sep-10 15:08 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
Thursday, September 5, 2013, 11:40:55 AM, you wrote:> On Thu, Sep 05, 2013 at 09:02:33AM +0100, Jan Beulich wrote: >> Looks good to me now - unless I hear otherwise from anyone I >> would go and apply this as soon as the current backlog in the >> staging tree cleared. >> >> Jan >>> Thanks Jan.> Sander, would it be possible for you to give this patch a try > on your hardware?Hi Joby, Just tested it, seems to work OK now ! (Just for the record, as it seems a bit late since the patch is already applied, but i was on holiday) -- Sander> Many thanks, > Joby
Joby Poriyath
2013-Sep-11 10:40 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
On Tue, Sep 10, 2013 at 05:08:52PM +0200, Sander Eikelenboom wrote:> Hi Joby, > > Just tested it, seems to work OK now ! > (Just for the record, as it seems a bit late since the patch is already applied, but i was on holiday)An extra pair of eyes is always helpful. Thank you very much Sander. Joby
Xu, YongweiX
2013-Sep-16 08:33 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
Hi Joby, I found your patch (Xen C/S:27480 7843bc3502ae) introduced a new issue, when I boot up a rhel6.4 guest with assigned a e1000e/igb/ixgbe PF or VF and more than 1 vcpu, the guest''s network will be broken in a short time and cannot be recovered. The test machine was SandyBridge-EP and IvyTown-EP. Yongwei(Terrence)> -----Original Message----- > From: xen-devel-bounces@lists.xen.org > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Joby Poriyath > Sent: Wednesday, September 11, 2013 6:41 PM > To: Sander Eikelenboom > Cc: xen-devel; Jan Beulich > Subject: Re: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > On Tue, Sep 10, 2013 at 05:08:52PM +0200, Sander Eikelenboom wrote: > > Hi Joby, > > > > Just tested it, seems to work OK now ! > > (Just for the record, as it seems a bit late since the patch is already applied, > but i was on holiday) > > An extra pair of eyes is always helpful. > Thank you very much Sander. > > Joby > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Sep-16 09:36 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 16.09.13 at 10:33, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > I found your patch (Xen C/S:27480 7843bc3502ae) introduced a new issue, when > I boot up a rhel6.4 guest with assigned a e1000e/igb/ixgbe PF or VF and more > than 1 vcpu, the guest''s network will be broken in a short time and cannot be > recovered. > The test machine was SandyBridge-EP and IvyTown-EP.I would be very helpful if you could give some more detail: What specifically doesn''t work, logs, ... That''s particularly important since, if indeed broken, the patch would need to be reverted from at least the stable trees. Jan
Xu, YongweiX
2013-Sep-16 11:23 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, September 16, 2013 5:37 PM > To: Joby Poriyath; Xu, YongweiX > Cc: Sander Eikelenboom; xen-devel > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > >>> On 16.09.13 at 10:33, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > > I found your patch (Xen C/S:27480 7843bc3502ae) introduced a new > > issue, when I boot up a rhel6.4 guest with assigned a e1000e/igb/ixgbe > > PF or VF and more than 1 vcpu, the guest''s network will be broken in a > > short time and cannot be recovered. > > The test machine was SandyBridge-EP and IvyTown-EP. > > I would be very helpful if you could give some more detail: What specifically > doesn''t work, logs, ... > > That''s particularly important since, if indeed broken, the patch would need to > be reverted from at least the stable trees. > > JanI''ve made further test about this issue, the config file as the attachment:rhel6u4.hvm(with qemu-xen), the result as below: 1. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 1 vcpu, the guest network works fine. 2. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 2(or more) vcpus, the guest can get IP first, but after about 10~20 seconds, the network will be broken. We can see only boot guest with more than 1 vcpu would cause this issue. Only when boot guest with e1000e nic and 2(2 or more)vcpus it would print call trace log, but I think it''s enough to explain that the network broken caused by MSI-X, as the attachment:guest_with_e1000e.log. Yongwei(Terrence) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Sep-16 11:31 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
On 16/09/2013 12:23, Xu, YongweiX wrote:>> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Monday, September 16, 2013 5:37 PM >> To: Joby Poriyath; Xu, YongweiX >> Cc: Sander Eikelenboom; xen-devel >> Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X >> mask bit >> >>>>> On 16.09.13 at 10:33, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: >>> I found your patch (Xen C/S:27480 7843bc3502ae) introduced a new >>> issue, when I boot up a rhel6.4 guest with assigned a e1000e/igb/ixgbe >>> PF or VF and more than 1 vcpu, the guest''s network will be broken in a >>> short time and cannot be recovered. >>> The test machine was SandyBridge-EP and IvyTown-EP. >> I would be very helpful if you could give some more detail: What specifically >> doesn''t work, logs, ... >> >> That''s particularly important since, if indeed broken, the patch would need to >> be reverted from at least the stable trees. >> >> Jan > I''ve made further test about this issue, the config file as the attachment:rhel6u4.hvm(with qemu-xen), the result as below: > 1. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 1 vcpu, the guest network works fine. > 2. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 2(or more) vcpus, the guest can get IP first, but after about 10~20 seconds, the network will be broken. > We can see only boot guest with more than 1 vcpu would cause this issue. > > Only when boot guest with e1000e nic and 2(2 or more)vcpus it would print call trace log, but I think it''s enough to explain that the network broken caused by MSI-X, as the attachment:guest_with_e1000e.log.And does reverting that specific changeset fix the issue? I ask, because that change set specifically fixes SRIOV passthrough for HVM guests using the ixgbevf driver, which was broken by an earlier security enhancement. ~Andrew
Jan Beulich
2013-Sep-16 11:51 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 16.09.13 at 13:23, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > Only when boot guest with e1000e nic and 2(2 or more)vcpus it would print > call trace log, but I think it''s enough to explain that the network broken > caused by MSI-X, as the attachment:guest_with_e1000e.log.Now you would also need to tell us what IRQ 50 relates to (i.e. where in the group of presumably multiple MSI-X vectors that device uses this sits). Plus, for an edge type interrupt to be raised repeatedly and wrongly, the device doing the respective bus master writes must be doing something odd (perhaps because having been programmed wrongly). For understanding what''s going on here, providing ''i'' and ''M'' debug key output might be pretty helpful. And finally - 2.6.32 is pretty old. Do you also see the problem with a more modern kernel (trying to exclude possible driver issues)? Jan
Xu, YongweiX
2013-Sep-17 03:06 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, September 16, 2013 7:51 PM > To: Xu, YongweiX > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > >>> On 16.09.13 at 13:23, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > > Only when boot guest with e1000e nic and 2(2 or more)vcpus it would > > print call trace log, but I think it''s enough to explain that the > > network broken caused by MSI-X, as the attachment:guest_with_e1000e.log. > > Now you would also need to tell us what IRQ 50 relates to (i.e. > where in the group of presumably multiple MSI-X vectors that device uses this > sits). Plus, for an edge type interrupt to be raised repeatedly and wrongly, the > device doing the respective bus master writes must be doing something odd > (perhaps because having been programmed wrongly). For understanding > what''s going on here, providing ''i'' and ''M'' debug key output might be pretty > helpful. > > And finally - 2.6.32 is pretty old. Do you also see the problem with a more > modern kernel (trying to exclude possible driver issues)?I''ve provided ''i'' an ''M'' debug key to output the IRQ and MSI-X information, the ''xl dmesg''log of dom0 as the attachment:xl_dmesg.log. Retested by changing the guest kernel to 3.11.1, the issue not exist. But rhel6.4 is a mainstream distribution, so I think we must take care of it. Yongwei(Terrence) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Xu, YongweiX
2013-Sep-17 04:50 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: Monday, September 16, 2013 7:31 PM > To: Xu, YongweiX > Cc: Jan Beulich; xen-devel; Liu, SongtaoX; Zhou, Chao; Joby Poriyath; Sander > Eikelenboom > Subject: Re: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > On 16/09/2013 12:23, Xu, YongweiX wrote: > >> -----Original Message----- > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Monday, September 16, 2013 5:37 PM > >> To: Joby Poriyath; Xu, YongweiX > >> Cc: Sander Eikelenboom; xen-devel > >> Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to > >> set/clear MSI-X mask bit > >> > >>>>> On 16.09.13 at 10:33, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > >>> I found your patch (Xen C/S:27480 7843bc3502ae) introduced a new > >>> issue, when I boot up a rhel6.4 guest with assigned a > >>> e1000e/igb/ixgbe PF or VF and more than 1 vcpu, the guest''s network > >>> will be broken in a short time and cannot be recovered. > >>> The test machine was SandyBridge-EP and IvyTown-EP. > >> I would be very helpful if you could give some more detail: What > >> specifically doesn''t work, logs, ... > >> > >> That''s particularly important since, if indeed broken, the patch > >> would need to be reverted from at least the stable trees. > >> > >> Jan > > I''ve made further test about this issue, the config file as the > attachment:rhel6u4.hvm(with qemu-xen), the result as below: > > 1. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 1 > vcpu, the guest network works fine. > > 2. Boot up rhel6.4 guest with igbvf/ixgbevf/igbpf/ixgbepf/e1000e nic and 2(or > more) vcpus, the guest can get IP first, but after about 10~20 seconds, the > network will be broken. > > We can see only boot guest with more than 1 vcpu would cause this issue. > > > > Only when boot guest with e1000e nic and 2(2 or more)vcpus it would print > call trace log, but I think it''s enough to explain that the network broken caused > by MSI-X, as the attachment:guest_with_e1000e.log. > > And does reverting that specific changeset fix the issue? > > I ask, because that change set specifically fixes SRIOV passthrough for HVM > guests using the ixgbevf driver, which was broken by an earlier security > enhancement.I''ve tried to revert the Xen C/S to 27479 and I couldn''t reproduce the issue , but on Xen C/S:27480, I can reproduce the issue all the time. Yongwei(Terrence)
Jan Beulich
2013-Sep-17 06:38 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 17.09.13 at 05:06, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > I''ve provided ''i'' an ''M'' debug key to output the IRQ and MSI-X information, > the ''xl dmesg''log of dom0 as the attachment:xl_dmesg.log.Nothing odd there, but then again you also still didn''t tell us which IRQ it is that gets switched off by the guest kernel. Even without me saying so explicitly, it should be pretty clear that sending complete information (matching up hypervisor, host, and guest logs) ideally limited to just one guest instance (the log here has 6 guests) would provide the maximum information. Remember that unless you''re going to debug the problem yourself, we depend on the information coming from you being complete and consistent. In the case at hand, telling us whether the log was taken with a VF or PF assigned would also be relevant information (which would presumably be deducible from the guest kernel log if you had sent it).> Retested by changing the guest kernel to 3.11.1, the issue not exist. But > rhel6.4 is a mainstream distribution, so I think we must take care of it.Sure, but the direction of where to look for the problem may be different. It might e.g. be the case that the old driver handles the VF reset differently, which might direct us to revisit the unmasking done by the patch. In any event - any debugging _you_ can do would likely get us to make faster progress on this... Jan
Xu, YongweiX
2013-Sep-18 03:19 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Tuesday, September 17, 2013 2:39 PM > To: Xu, YongweiX > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > >>> On 17.09.13 at 05:06, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > > I've provided 'i' an 'M' debug key to output the IRQ and MSI-X > > information, the 'xl dmesg'log of dom0 as the attachment:xl_dmesg.log. > > Nothing odd there, but then again you also still didn't tell us which IRQ it is that > gets switched off by the guest kernel. Even without me saying so explicitly, it > should be pretty clear that sending complete information (matching up > hypervisor, host, and guest > logs) ideally limited to just one guest instance (the log here has 6 > guests) would provide the maximum information. Remember that unless you're > going to debug the problem yourself, we depend on the information coming > from you being complete and consistent. > In the case at hand, telling us whether the log was taken with a VF or PF > assigned would also be relevant information (which would presumably be > deducible from the guest kernel log if you had sent it).I've retested this issue for many times, but can only get these log as attachment, it can only be found IRQ #50 issue on guest but cannot found on Dom0 "xl dmesg" log, if you think it's still lack of persuasion, do you have any method or patch to capture more IRQ information? thanks!> > Retested by changing the guest kernel to 3.11.1, the issue not exist. > > But > > rhel6.4 is a mainstream distribution, so I think we must take care of it. > > Sure, but the direction of where to look for the problem may be different. It > might e.g. be the case that the old driver handles the VF reset differently, > which might direct us to revisit the unmasking done by the patch. > > In any event - any debugging _you_ can do would likely get us to make faster > progress on this..._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Sep-18 09:32 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 18.09.13 at 05:19, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > I've retested this issue for many times, but can only get these log as > attachment, it can only be found IRQ #50 issue on guest but cannot found on > Dom0 "xl dmesg" log, if you think it's still lack of persuasion, do you have > any method or patch to capture more IRQ information? thanks!The paired up logs at least allows us to guess that it's only the third of the three interrupts the driver sets up that causes the problem. Looking at the plain 2.6.32 driver sources likely will make little sense, as the RHEL kernel is presumably heavily patched, and as I don't know that driver anyway. So getting someone of your LAD folks involved might be a good idea, even if just to explain what the driver and hardware behavior here are, and what the individual IRQs are actually used for (judging from 3.12-rc1, the IRQ use is for RX, TX, and "other", whatever "other" means, and looking at the 2.6.32 and 3.12-rc1 variants of e1000_msix_other() they look rather similar, so the problem must be associated with something elsewhere in the driver). Interestingly enough the other two IRQ handlers don't have a check resulting in them returning IRQ_NONE. Could you double check what the counts of all three interrupts are in the guest's /proc/interrupts? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Joby Poriyath
2013-Sep-18 13:41 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
On Wed, Sep 18, 2013 at 03:19:51AM +0000, Xu, YongweiX wrote:> > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: Tuesday, September 17, 2013 2:39 PM > > To: Xu, YongweiX > > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel > > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > > mask bit > > > > >>> On 17.09.13 at 05:06, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > > > I've provided 'i' an 'M' debug key to output the IRQ and MSI-X > > > information, the 'xl dmesg'log of dom0 as the attachment:xl_dmesg.log. > > > > Nothing odd there, but then again you also still didn't tell us which IRQ it is that > > gets switched off by the guest kernel. Even without me saying so explicitly, it > > should be pretty clear that sending complete information (matching up > > hypervisor, host, and guest > > logs) ideally limited to just one guest instance (the log here has 6 > > guests) would provide the maximum information. Remember that unless you're > > going to debug the problem yourself, we depend on the information coming > > from you being complete and consistent. > > In the case at hand, telling us whether the log was taken with a VF or PF > > assigned would also be relevant information (which would presumably be > > deducible from the guest kernel log if you had sent it). > > I've retested this issue for many times, but can only get these log as attachment, it can only be found IRQ #50 issue on guest but cannot found on Dom0 "xl dmesg" log, if you think it's still lack of persuasion, do you have any method or patch to capture more IRQ information? thanks! > >I did some testing with Xen 4.3, RHEL 6.4, qemu-traditional. I used Intel 82599 VF for pass through. I noticed that guest is losing network soon after it boots (may be a minute or so). I could recover the network by 1. stopping irqbalance 2. reconfiguring network And then it stayed up. Here is what's happening.... The irqbalance, triggers the irq migration. Guest kernel will mask the MSI-X interrupt. Xen will allow this and MSI-X interrupt is masked. Then guest kernel will update the vector. These writes are trapped by Xen. Xen will make a note that MSI-X vector has been updated, and then it'll exit to Qemu. Qemu makes a note of the updated vector, but it won't inform Xen yet. This is because, the exit to Qemu happens for every 32-bit writes and MSI-X vector is 128-bit. So it'll call Xen only when guest writes the MSI-X control word. Guest kernel, after having updated the MSI-X vector will unmask the MSI-X interrupt. This will trap into Xen. Xen notices that the vector has been updated, so it'll exit to Qemu without unmasking the MSI-X vector. Qemu will check that MSI-X is indeed masked. If this is not the case guest attempt to update MSI-X vector is ignored. If the MSI-X vector is masked, Qemu will call Xen to update the MSI-X vector (xc_domain_update_msi_irq). But xc_domain_update_msi_irq doesn't unmask the MSI-X, and it remains masked. And guest loses network. With slightly older Qemu (before git commit 56d7747a3cf811910c4cf865e1ebcb8b82502005) Qemu had write access to MSI-X table, so it would go ahead and unmask the MSI-X vector. I've tested this on Xen 4.1 (XenServer 6.1). Without the patch, guest kernel's attempt to migrate the IRQ remains unsatisfied. Xen will silently ignore guest's attempt to mask MSI-X vector. So it remains unmasked. Although Xen will exit to Qemu when guest kernel tries to update the MSI-X vector, Qemu doesn't call Xen since it notices that MSI-X is in unmasked state. And finally, without the patch, SR-IOV pass through is broken (which the patch attempted to fix). I can't explain the behaviour that you are seeing. Could you please test with IRQ balance turned off? Thanks, Joby _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Xu, YongweiX
2013-Sep-22 02:07 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Joby Poriyath [mailto:joby.poriyath@citrix.com] > Sent: Wednesday, September 18, 2013 9:42 PM > To: Xu, YongweiX > Cc: Jan Beulich; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel > Subject: Re: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > On Wed, Sep 18, 2013 at 03:19:51AM +0000, Xu, YongweiX wrote: > > > -----Original Message----- > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > > Sent: Tuesday, September 17, 2013 2:39 PM > > > To: Xu, YongweiX > > > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; > > > xen-devel > > > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to > > > set/clear MSI-X mask bit > > > > > > >>> On 17.09.13 at 05:06, "Xu, YongweiX" <yongweix.xu@intel.com> > wrote: > > > > I've provided 'i' an 'M' debug key to output the IRQ and MSI-X > > > > information, the 'xl dmesg'log of dom0 as the attachment:xl_dmesg.log. > > > > > > Nothing odd there, but then again you also still didn't tell us > > > which IRQ it is that gets switched off by the guest kernel. Even > > > without me saying so explicitly, it should be pretty clear that > > > sending complete information (matching up hypervisor, host, and > > > guest > > > logs) ideally limited to just one guest instance (the log here has 6 > > > guests) would provide the maximum information. Remember that unless > > > you're going to debug the problem yourself, we depend on the > > > information coming from you being complete and consistent. > > > In the case at hand, telling us whether the log was taken with a VF > > > or PF assigned would also be relevant information (which would > > > presumably be deducible from the guest kernel log if you had sent it). > > > > I've retested this issue for many times, but can only get these log as > > attachment, it can only be found IRQ #50 issue on guest but cannot > > found on Dom0 "xl dmesg" log, if you think it's still lack of > > persuasion, do you have any method or patch to capture more IRQ > > information? thanks! > > > > > > I did some testing with Xen 4.3, RHEL 6.4, qemu-traditional. > I used Intel 82599 VF for pass through. > I noticed that guest is losing network soon after it boots (may be a minute or > so). I could recover the network by > 1. stopping irqbalance > 2. reconfiguring network > And then it stayed up. > > Here is what's happening.... > > The irqbalance, triggers the irq migration. Guest kernel will mask the MSI-X > interrupt. Xen will allow this and MSI-X interrupt is masked. > > Then guest kernel will update the vector. These writes are trapped by Xen. > Xen will make a note that MSI-X vector has been updated, and then it'll exit to > Qemu. > > Qemu makes a note of the updated vector, but it won't inform Xen yet. > This is because, the exit to Qemu happens for every 32-bit writes and MSI-X > vector is 128-bit. So it'll call Xen only when guest writes the MSI-X control word. > > Guest kernel, after having updated the MSI-X vector will unmask the MSI-X > interrupt. This will trap into Xen. > > Xen notices that the vector has been updated, so it'll exit to Qemu without > unmasking the MSI-X vector. > > Qemu will check that MSI-X is indeed masked. If this is not the case guest > attempt to update MSI-X vector is ignored. > > If the MSI-X vector is masked, Qemu will call Xen to update the MSI-X vector > (xc_domain_update_msi_irq). But xc_domain_update_msi_irq doesn't unmask > the MSI-X, and it remains masked. And guest loses network. > > With slightly older Qemu (before git commit > 56d7747a3cf811910c4cf865e1ebcb8b82502005) > Qemu had write access to MSI-X table, so it would go ahead and unmask the > MSI-X vector. I've tested this on Xen 4.1 (XenServer 6.1). > > Without the patch, guest kernel's attempt to migrate the IRQ remains > unsatisfied. > Xen will silently ignore guest's attempt to mask MSI-X vector. So it remains > unmasked. Although Xen will exit to Qemu when guest kernel tries to update > the MSI-X vector, Qemu doesn't call Xen since it notices that MSI-X is in > unmasked state. > > And finally, without the patch, SR-IOV pass through is broken (which the patch > attempted to fix). > > I can't explain the behaviour that you are seeing. > > Could you please test with IRQ balance turned off?Thank you for your great description, I've tried to boot up a rhel6u4 guest with guest IRQ balance turned off, the guest's SR-IOV pass through network work fine! Yongwei(Terrence) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Xu, YongweiX
2013-Sep-22 03:22 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, September 18, 2013 5:33 PM > To: Xu, YongweiX > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X > mask bit > > >>> On 18.09.13 at 05:19, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > > I've retested this issue for many times, but can only get these log as > > attachment, it can only be found IRQ #50 issue on guest but cannot > > found on > > Dom0 "xl dmesg" log, if you think it's still lack of persuasion, do > > you have any method or patch to capture more IRQ information? thanks! > > The paired up logs at least allows us to guess that it's only the third of the > three interrupts the driver sets up that causes the problem. > > Looking at the plain 2.6.32 driver sources likely will make little sense, as the > RHEL kernel is presumably heavily patched, and as I don't know that driver > anyway. So getting someone of your LAD folks involved might be a good idea, > even if just to explain what the driver and hardware behavior here are, and > what the individual IRQs are actually used for (judging from 3.12-rc1, the IRQ > use is for RX, TX, and "other", whatever "other" means, and looking at the > 2.6.32 and 3.12-rc1 variants of e1000_msix_other() they look rather similar, so > the problem must be associated with something elsewhere in the driver). > > Interestingly enough the other two IRQ handlers don't have a check resulting in > them returning IRQ_NONE. Could you double check what the counts of all three > interrupts are in the guest's /proc/interrupts?Here is the guest's /proc/interrupts log of the guest with pass through e1000e/igb/ixgbe, we can find with e1000e/igb the IRQ #50 was indeed the eth network device, ixgbe was a little different, because it's RX and TX on the same IRQ, so the IRQ #49 was the ixgbe network device. Yongwei(Terrence) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Sep-23 06:56 UTC
Re: [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit
>>> On 22.09.13 at 05:22, "Xu, YongweiX" <yongweix.xu@intel.com> wrote: > Here is the guest''s /proc/interrupts log of the guest with pass through > e1000e/igb/ixgbe, we can find with e1000e/igb the IRQ #50 was indeed the eth > network device, ixgbe was a little different, because it''s RX and TX on the > same IRQ, so the IRQ #49 was the ixgbe network device.Okay, so the _only_ bad case is the auxiliary IRQ in the e1000e case. That surely smells more like a driver/hardware issue than a bug in the Xen change (even more so that you say a recent driver doesn''t exhibit the same behavior). Once again, getting you LAD folks involved here would be greatly appreciated. Jan
Apparently Analagous Threads
- [PATCH v7] interrupts: allow guest to set/clear MSI-X mask bit
- (updated) test report for xen-unstable tree with upstream QEMU
- [stable-2.6.31/master] Compile error "error: redefinition of xen_destroy_irq"
- [PATCH 4/4] x86: split MSI IRQ chip
- [RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file