Cinco, Dante
2009-Oct-08 00:08 UTC
[Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
I need help tracking down an IRQ SMP affinity problem. Xen version: 3.4 unstable dom0: Linux 2.6.30.3 (Debian) domU: Linux 2.6.30.1 (Debian) Hardware platform: HP ProLiant G6, dual-socket Xeon 5540, hyperthreading enable in BIOS and kernel (total of 16 CPUs: 2 sockets * 4 cores per socket * 2 threads per core) With vcpus < 5, I can change /proc/irq/<irq#>/smp_affinity and see the interrupts get routed to the proper CPU(s) by checking /proc/interrupts. With vcpus > 4, any change to /proc/irq/<irq#>/smp_affinity results in a complete loss of interrupts for <irq#>. I noticed in the domU /var/log/kern.log that APIC routing changes from "flat" for vcpus=4 to "physical flat" for vcpus=5. Looking at the source code for linux-2.6.30.1/arch/x86/kernel/apic/probe_64.c, this switch occurs when "max_physical_apicid >= 8." In the domU /var/log/kern.log and /proc/cpuinfo, only even numbered APIC IDs (starting from 0) are used so when it gets to the 5th CPU, it is already at APIC ID 8 which triggers the physical flat APIC routing. dom0 has all 16 CPUs available to it. The mapping between CPU numbers and APIC ID is 1-to-1 (CPU0:APIC ID0 ... CPU15:APIC ID15). domU is configured with either vcpus=4 or vcpus=5. In both cases, the mapping uses even number only for the APIC IDs (CPU0:APIC ID0 ... CPU5:APIC ID8). I''m using an ATTO/PMC Tachyon-based Fibre Channel PCIe card on this platform. It uses PCI-MSI-edge for its interrupt. I use pciback.hide in my dom0 Xen 3.5 kernel stanza to pass the device directly to domU. I''m also using "iommu=1,no-intremap,passthrough" in the stanza. I''m able to see the device in dom0 via "lspci -vv" and see the MSI message address and data that have been programmed into the Tachyon registers and using IRQ 32. Regardless of changes to IRQ 32''s SMP affinity in domU, the MSI message address and data as seen from dom0 does not change. I can only conclude that domU is running some sort of IRQ emulation. # lspci -vv in dom0 07:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) Subsystem: Atto Technology Device 003c Interrupt: pin A routed to IRQ 32 Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+ Address: 00000000fee00000 Data: 40ba (dest ID=0, RH=DM=0, fixed interrupt, vector=0xba) Kernel driver in use: pciback In domU, the device has been remapped (intentionally in the dom0 config file) to bus 0, device 8 and can also be seen via "lspci -vv" with the same MSI message address but different data and using IRQ 48. # lspci -vv in domU with vcpus=5 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) Subsystem: Atto Technology Device 003c Interrupt: pin A routed to IRQ 48 Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Address: 00000000fee00000 Data: 4059 (dest ID=0, RH=DM=0, fixed interrupt, vector=0x59) Kernel driver in use: hwdrv Kernel modules: hbas-hw At this point, the kernel driver for the device has been loaded and the number of interrupts can be seen in /proc/interrupts. The default IRQ SMP has not been changed and yet the interrupts are all being routed to CPU0. This is for vcpus=5 (physical flat APIC routing). Changing IRQ 48''s SMP affinity to any value will result in a complete loss of all interrupts. domU and dom0 need to be rebooted to restore normal operation. # cat /proc/irq/48/smp_affinity 1f # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 48: 60920 0 0 0 0 PCI-MSI-edge HW_TACHYON With vcpus=4 (flat APIC routing), IRQ 48''s SMP affinity behaves as expected (each of the 4 bits in /proc/irq/48/smp_affinity correspond to a CPU or CPUs where the interrupts will be routed). The MSI message address and data have different attributes compared to vcpus=5. The address has dest ID=f (matches default /proc/irq/48/smp_affinity), RH=DM=1 and uses lowest priority instead of fixed interrupt. # lspci -vv in domU with vcpus=4 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) Subsystem: Atto Technology Device 003c Interrupt: pin A routed to IRQ 48 Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Address: 00000000fee0f00c Data: 4159 (dest ID=f, RH=DM=1, lowest priority interrupt, vector=0x59) Kernel driver in use: hwdrv Kernel modules: hbas-hw # cat /proc/irq/48/smp_affinity f # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 48: 14082 19052 15337 14645 PCI-MSI-edge HW_TACHYON Changing IRQ 48''s SMP affinity to 8 shows that all the interrupts are being routed to CPU3 as expected and the MSI message address has changed to reflect the new dest ID while the vector stays the same. # echo 8 > /proc/irq/48/smp_affinity # cat /proc/interrupts 48: 14082 19052 15338 351361 PCI-MSI-edge HW_TACHYON # lspci -vv in domU with vcpus=4 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) Subsystem: Atto Technology Device 003c Interrupt: pin A routed to IRQ 48 Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Address: 00000000fee0800c Data: 4159 (dest ID=8, RH=DM=1, lowest priority interrupt, vector=0x59) Kernel driver in use: hwdrv Kernel modules: hbas-hw My hunch is there is something wrong with physical flat APIC routing in domU. If I boot this same platform to straight Linux 2.6.30.1 (no Xen), /var/log/kern.log shows that it too is using physical flat APIC routing which is expected since it has a total of 16 CPUs. Unlike domU though, changing the IRQ SMP affinity to any one-hot value (only one bit out of 16 is set to 1) behaves as expected. A non-one hot value results in all interrupts being routed to CPU0 but at least the interrupts are not lost. One of my questions is "Why does domU use only even numbered APIC IDs?" If it used odd numbers, then physical flat APIC routing will only trigger when vcpus > 7. I welcome any suggestions on how to pursue this problem or hopefully, someone will say that a patch for this already exists. Thanks. Dante Cinco _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Edge
2009-Oct-08 16:07 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
More info on the version... It''s actually the 3.4.1 release. Also the dom0 us 2.6.30.3 with Andrew Lyon''s patch set. Built from Boris''s HOWTO: http://bderzhavets.wordpress.com/2009/08/14/attempt-of-prevu-xen-3-4-1-hypervisor-on-ubuntu-jaunty-server-64-bit/ -Bruce On Wed, Oct 7, 2009 at 5:08 PM, Cinco, Dante <Dante.Cinco@lsi.com> wrote:> I need help tracking down an IRQ SMP affinity problem. > > Xen version: 3.4 unstable > dom0: Linux 2.6.30.3 (Debian) > domU: Linux 2.6.30.1 (Debian) > Hardware platform: HP ProLiant G6, dual-socket Xeon 5540, hyperthreading > enable in BIOS and kernel (total of 16 CPUs: 2 sockets * 4 cores per socket > * 2 threads per core) > > With vcpus < 5, I can change /proc/irq/<irq#>/smp_affinity and see the > interrupts get routed to the proper CPU(s) by checking /proc/interrupts. > With vcpus > 4, any change to /proc/irq/<irq#>/smp_affinity results in a > complete loss of interrupts for <irq#>. > > I noticed in the domU /var/log/kern.log that APIC routing changes from > "flat" for vcpus=4 to "physical flat" for vcpus=5. Looking at the source > code for linux-2.6.30.1/arch/x86/kernel/apic/probe_64.c, this switch occurs > when "max_physical_apicid >= 8." In the domU /var/log/kern.log and > /proc/cpuinfo, only even numbered APIC IDs (starting from 0) are used so > when it gets to the 5th CPU, it is already at APIC ID 8 which triggers the > physical flat APIC routing. > > dom0 has all 16 CPUs available to it. The mapping between CPU numbers and > APIC ID is 1-to-1 (CPU0:APIC ID0 ... CPU15:APIC ID15). domU is configured > with either vcpus=4 or vcpus=5. In both cases, the mapping uses even number > only for the APIC IDs (CPU0:APIC ID0 ... CPU5:APIC ID8). > > I''m using an ATTO/PMC Tachyon-based Fibre Channel PCIe card on this > platform. It uses PCI-MSI-edge for its interrupt. I use pciback.hide in my > dom0 Xen 3.5 kernel stanza to pass the device directly to domU. I''m also > using "iommu=1,no-intremap,passthrough" in the stanza. I''m able to see the > device in dom0 via "lspci -vv" and see the MSI message address and data that > have been programmed into the Tachyon registers and using IRQ 32. Regardless > of changes to IRQ 32''s SMP affinity in domU, the MSI message address and > data as seen from dom0 does not change. I can only conclude that domU is > running some sort of IRQ emulation. > > # lspci -vv in dom0 > 07:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > Subsystem: Atto Technology Device 003c > Interrupt: pin A routed to IRQ 32 > Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/1 Enable+ > Address: 00000000fee00000 Data: 40ba (dest ID=0, RH=DM=0, > fixed interrupt, vector=0xba) > Kernel driver in use: pciback > > In domU, the device has been remapped (intentionally in the dom0 config > file) to bus 0, device 8 and can also be seen via "lspci -vv" with the same > MSI message address but different data and using IRQ 48. > > # lspci -vv in domU with vcpus=5 > 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > Subsystem: Atto Technology Device 003c > Interrupt: pin A routed to IRQ 48 > Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable+ > Address: 00000000fee00000 Data: 4059 (dest ID=0, RH=DM=0, > fixed interrupt, vector=0x59) > Kernel driver in use: hwdrv > Kernel modules: hbas-hw > > At this point, the kernel driver for the device has been loaded and the > number of interrupts can be seen in /proc/interrupts. The default IRQ SMP > has not been changed and yet the interrupts are all being routed to CPU0. > This is for vcpus=5 (physical flat APIC routing). Changing IRQ 48''s SMP > affinity to any value will result in a complete loss of all interrupts. domU > and dom0 need to be rebooted to restore normal operation. > # cat /proc/irq/48/smp_affinity > 1f > # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 > 48: 60920 0 0 0 0 > PCI-MSI-edge HW_TACHYON > > With vcpus=4 (flat APIC routing), IRQ 48''s SMP affinity behaves as expected > (each of the 4 bits in /proc/irq/48/smp_affinity correspond to a CPU or CPUs > where the interrupts will be routed). The MSI message address and data have > different attributes compared to vcpus=5. The address has dest ID=f (matches > default /proc/irq/48/smp_affinity), RH=DM=1 and uses lowest priority instead > of fixed interrupt. > > # lspci -vv in domU with vcpus=4 > 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > Subsystem: Atto Technology Device 003c > Interrupt: pin A routed to IRQ 48 > Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable+ > Address: 00000000fee0f00c Data: 4159 (dest ID=f, RH=DM=1, > lowest priority interrupt, vector=0x59) > Kernel driver in use: hwdrv > Kernel modules: hbas-hw > > # cat /proc/irq/48/smp_affinity > f > # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 48: 14082 19052 15337 14645 PCI-MSI-edge > HW_TACHYON > > Changing IRQ 48''s SMP affinity to 8 shows that all the interrupts are being > routed to CPU3 as expected and the MSI message address has changed to > reflect the new dest ID while the vector stays the same. > > # echo 8 > /proc/irq/48/smp_affinity > # cat /proc/interrupts > 48: 14082 19052 15338 351361 PCI-MSI-edge > HW_TACHYON > > # lspci -vv in domU with vcpus=4 > 00:08.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05) > Subsystem: Atto Technology Device 003c > Interrupt: pin A routed to IRQ 48 > Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable+ > Address: 00000000fee0800c Data: 4159 (dest ID=8, RH=DM=1, > lowest priority interrupt, vector=0x59) > Kernel driver in use: hwdrv > Kernel modules: hbas-hw > > My hunch is there is something wrong with physical flat APIC routing in > domU. If I boot this same platform to straight Linux 2.6.30.1 (no Xen), > /var/log/kern.log shows that it too is using physical flat APIC routing > which is expected since it has a total of 16 CPUs. Unlike domU though, > changing the IRQ SMP affinity to any one-hot value (only one bit out of 16 > is set to 1) behaves as expected. A non-one hot value results in all > interrupts being routed to CPU0 but at least the interrupts are not lost. > > One of my questions is "Why does domU use only even numbered APIC IDs?" If > it used odd numbers, then physical flat APIC routing will only trigger when > vcpus > 7. > > I welcome any suggestions on how to pursue this problem or hopefully, > someone will say that a patch for this already exists. > > Thanks. > > Dante Cinco > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-08 18:05 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 08/10/2009 01:08, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote:> One of my questions is "Why does domU use only even numbered APIC IDs?" If it > used odd numbers, then physical flat APIC routing will only trigger when vcpus > > 7.It''s just the mapping we use. Local APICs get even numbers, IOAPIC gets id 1.> I welcome any suggestions on how to pursue this problem or hopefully, someone > will say that a patch for this already exists.Is this true for all interrupts, or just the passthrough one using MSI? What Xen version are you using? You say ''3.4 unstable'' - do you mean tip of xen-3.4-testing.hg? Have you tried xen-unstable.hg (current developemnt tree)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-08 18:11 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
The IRQ SMP affinity problem happens on just the passthrough one using MSI. I''ve only used Xen 3.4.1. Are you aware of recent code changes that may address this issue? Dante -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: Thursday, October 08, 2009 11:06 AM To: Cinco, Dante; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On 08/10/2009 01:08, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote:> One of my questions is "Why does domU use only even numbered APIC > IDs?" If it used odd numbers, then physical flat APIC routing will > only trigger when vcpus > > 7.It''s just the mapping we use. Local APICs get even numbers, IOAPIC gets id 1.> I welcome any suggestions on how to pursue this problem or hopefully, > someone will say that a patch for this already exists.Is this true for all interrupts, or just the passthrough one using MSI? What Xen version are you using? You say ''3.4 unstable'' - do you mean tip of xen-3.4-testing.hg? Have you tried xen-unstable.hg (current developemnt tree)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-08 21:35 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 08/10/2009 19:11, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote:> The IRQ SMP affinity problem happens on just the passthrough one using MSI. > > I''ve only used Xen 3.4.1. Are you aware of recent code changes that may > address this issue?No, but it might be worth a try. Unfortunately I''m not so familiar with the MSI passthru code as I am with the rest of the irq emulation layer. Qing He (cc''ed) may be able to assist, as I think he did much of the development of MSI support for passthru devices. -- Keir> Dante > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, October 08, 2009 11:06 AM > To: Cinco, Dante; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on > HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > On 08/10/2009 01:08, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote: > >> One of my questions is "Why does domU use only even numbered APIC >> IDs?" If it used odd numbers, then physical flat APIC routing will >> only trigger when vcpus >>> 7. > > It''s just the mapping we use. Local APICs get even numbers, IOAPIC gets id 1. > >> I welcome any suggestions on how to pursue this problem or hopefully, >> someone will say that a patch for this already exists. > > Is this true for all interrupts, or just the passthrough one using MSI? > > What Xen version are you using? You say ''3.4 unstable'' - do you mean tip of > xen-3.4-testing.hg? Have you tried xen-unstable.hg (current developemnt tree)? > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-09 09:07 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-09 at 05:35 +0800, Keir Fraser wrote:> On 08/10/2009 19:11, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote: > > > The IRQ SMP affinity problem happens on just the passthrough one using MSI. > > > > I''ve only used Xen 3.4.1. Are you aware of recent code changes that may > > address this issue? > > No, but it might be worth a try. Unfortunately I''m not so familiar with the > MSI passthru code as I am with the rest of the irq emulation layer. Qing He > (cc''ed) may be able to assist, as I think he did much of the development of > MSI support for passthru devices. >MSI passthru uses emulation, there is nothing to do between the guest affinity and the physical affinity. When an MSI is received, a vmsi logic calculates the destination and sets the virtual local APIC of that VCPU. But after checking the code, the part handling DM=0 is there and I haven''t found big problems on the first glance, maybe there is some glitch that causes the MSI failure in physical mode. Some debug log can be helpful to track down the problem. Can you add ''hvm_debug=0x200'' to the xen command line and post the xm dmesg result? This will print hvm debug level DBG_LEVEL_IOAPIC which includes vmsi delivery logic. There are two patches between 3.4.1 and unstable (20084, 20140), these are mainly cleanup patches but the related code does change, don''t know if they fix this issue. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-09 15:59 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Thanks for the suggestions Qing. I will send you the log with "hvm_debug=0x200" and try Xen 3.5 unstable. Dante -----Original Message----- From: Qing He [mailto:qing.he@intel.com] Sent: Friday, October 09, 2009 2:08 AM To: Keir Fraser Cc: Cinco, Dante; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On Fri, 2009-10-09 at 05:35 +0800, Keir Fraser wrote:> On 08/10/2009 19:11, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote: > > > The IRQ SMP affinity problem happens on just the passthrough one using MSI. > > > > I''ve only used Xen 3.4.1. Are you aware of recent code changes that > > may address this issue? > > No, but it might be worth a try. Unfortunately I''m not so familiar > with the MSI passthru code as I am with the rest of the irq emulation > layer. Qing He > (cc''ed) may be able to assist, as I think he did much of the > development of MSI support for passthru devices. >MSI passthru uses emulation, there is nothing to do between the guest affinity and the physical affinity. When an MSI is received, a vmsi logic calculates the destination and sets the virtual local APIC of that VCPU. But after checking the code, the part handling DM=0 is there and I haven''t found big problems on the first glance, maybe there is some glitch that causes the MSI failure in physical mode. Some debug log can be helpful to track down the problem. Can you add ''hvm_debug=0x200'' to the xen command line and post the xm dmesg result? This will print hvm debug level DBG_LEVEL_IOAPIC which includes vmsi delivery logic. There are two patches between 3.4.1 and unstable (20084, 20140), these are mainly cleanup patches but the related code does change, don''t know if they fix this issue. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-09 23:39 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Qing, I''m attaching a tar''d directory that contains the various log files I gathered from my system. When I tried adding "hvm_debug=0x200" in the Xen command line, the domU became unaccessible on boot up with the Xen console constantly printing this message: "(XEN) [HVM:1.0] <vioapic_irq_positive_edge> irq 2." So I backed out the hvm_debug but hopefully there are still enough logging to provide some clues. Here''s a summary of the events leading to the lost interrupts: Boot Xen 3.5-unstable with 2.6.30.3 - command line: /xen-3.5-unstable.gz com1=115200,8n1 console=com1 acpi=force apic=on iommu=1,no-intremap,passthrough loglvl=all loglvl_guest=all - command line: module /vmlinuz-2.6.31.1 root=UUID=xxx ro pciback.hide=(07:00.0)(07:00.1)(07:00.2)(07:00.3) acpi=force console=ttyS0 - dom0: lspci -vv shows device at IRQ 32 with MSI message address, data = 0x0, 0x0 Bringup domU with vcpus=5, hap=0, pci=[''07:00.0@8'',''07:00.1@9'',''07:00.2@a'',''07:00.3@b''] (device driver not yet loaded) - dom0: lspci -vv shows device at IRQ 32 (07:00.0) with MSI message address, data = 0xfee01000, 0x407b - config: Load kernel module that contains device driver - dom0: no change in lspci -vv - domU: lspci -vv shows device at IRQ 48 (00:08.0) with MSI message address, data = 0xfee00000, 0x4059 - domU: /proc/interrupts show interrupts for IRQ 48 going to CPU0 Change /proc/irq/48/smp_affinity from 1f to 1 - dom0: no change to lspci -vv - domU: no change to lspci -vv - domU: /proc/interrupts show interrupts for IRQ 48 going to CPU0 Change /proc/irq/48/smp_affinity from 1 to 2 - dom0: lspci -vv shows MSI message data changed from 0x407b to 0x40d3, address the same - domU: lspci -vv shows new MSI message address, data = 0xfee02000, 0x4079 - domU: no more interrupts from IRQ 48 - Xen console: (XEN) do_IRQ: 8.211 No irq handler for vector (irq -1) Dante -----Original Message----- From: Qing He [mailto:qing.he@intel.com] Sent: Friday, October 09, 2009 2:08 AM To: Keir Fraser Cc: Cinco, Dante; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On Fri, 2009-10-09 at 05:35 +0800, Keir Fraser wrote:> On 08/10/2009 19:11, "Cinco, Dante" <Dante.Cinco@lsi.com> wrote: > > > The IRQ SMP affinity problem happens on just the passthrough one using MSI. > > > > I''ve only used Xen 3.4.1. Are you aware of recent code changes that > > may address this issue? > > No, but it might be worth a try. Unfortunately I''m not so familiar > with the MSI passthru code as I am with the rest of the irq emulation > layer. Qing He > (cc''ed) may be able to assist, as I think he did much of the > development of MSI support for passthru devices. >MSI passthru uses emulation, there is nothing to do between the guest affinity and the physical affinity. When an MSI is received, a vmsi logic calculates the destination and sets the virtual local APIC of that VCPU. But after checking the code, the part handling DM=0 is there and I haven''t found big problems on the first glance, maybe there is some glitch that causes the MSI failure in physical mode. Some debug log can be helpful to track down the problem. Can you add ''hvm_debug=0x200'' to the xen command line and post the xm dmesg result? This will print hvm debug level DBG_LEVEL_IOAPIC which includes vmsi delivery logic. There are two patches between 3.4.1 and unstable (20084, 20140), these are mainly cleanup patches but the related code does change, don''t know if they fix this issue. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-10 09:43 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Sat, 2009-10-10 at 07:39 +0800, Cinco, Dante wrote:> When I tried adding "hvm_debug=0x200" in the Xen command line, the domU > became unaccessible on boot up with the Xen console constantly printing > this message: "(XEN) [HVM:1.0] <vioapic_irq_positive_edge> irq 2."So this is useless, maybe one time setups should be split out from those fire everytime, or using a separate debug level for MSI operation.> Change /proc/irq/48/smp_affinity from 1 to 2 > - Xen console: (XEN) do_IRQ: 8.211 No irq handler for vector (irq -1)This is weird, although there is no other confirmation, I guess this vector 211 (0xd3) is the MSI vector. This can explain why the MSI doesn''t fire any more. However, this error message is not expected. Physical MSI at xen level always goes to vcpu 0 when it''s first bound, and the affinity doesn''t change after this. Futhermore, logical flat mode works fine, do you observe this error message when vcpus=4? I''ll continue to investigate and try to reproduce the problem at my side. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-10 10:10 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 10/10/2009 10:43, "Qing He" <qing.he@intel.com> wrote:> On Sat, 2009-10-10 at 07:39 +0800, Cinco, Dante wrote: >> When I tried adding "hvm_debug=0x200" in the Xen command line, the domU >> became unaccessible on boot up with the Xen console constantly printing >> this message: "(XEN) [HVM:1.0] <vioapic_irq_positive_edge> irq 2." > > So this is useless, maybe one time setups should be split out from > those fire everytime, or using a separate debug level for MSI operation.Well, indeed. Message that print on every interrupt are typically unuseful! I tend to kill them when I find them, but they keep creeping in. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-12 05:25 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
With vcpus < 4, logical flat mode works fine (no error message). I can change smp_affinity to any value > 0 and < 16 and the interrupts go to the proper CPU(s). Could you point me to the code that handles MSI so that I can better understand the MSI implementation? Thanks. Dante ________________________________________ From: Qing He [qing.he@intel.com] Sent: Saturday, October 10, 2009 2:43 AM To: Cinco, Dante Cc: Keir Fraser; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On Sat, 2009-10-10 at 07:39 +0800, Cinco, Dante wrote:> When I tried adding "hvm_debug=0x200" in the Xen command line, the domU > became unaccessible on boot up with the Xen console constantly printing > this message: "(XEN) [HVM:1.0] <vioapic_irq_positive_edge> irq 2."So this is useless, maybe one time setups should be split out from those fire everytime, or using a separate debug level for MSI operation.> Change /proc/irq/48/smp_affinity from 1 to 2 > - Xen console: (XEN) do_IRQ: 8.211 No irq handler for vector (irq -1)This is weird, although there is no other confirmation, I guess this vector 211 (0xd3) is the MSI vector. This can explain why the MSI doesn''t fire any more. However, this error message is not expected. Physical MSI at xen level always goes to vcpu 0 when it''s first bound, and the affinity doesn''t change after this. Futhermore, logical flat mode works fine, do you observe this error message when vcpus=4? I''ll continue to investigate and try to reproduce the problem at my side. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-12 05:54 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Mon, 2009-10-12 at 13:25 +0800, Cinco, Dante wrote:> With vcpus < 4, logical flat mode works fine (no error message). I can > change smp_affinity to any value > 0 and < 16 and the interrupts go to > the proper CPU(s). Could you point me to the code that handles MSI so > that I can better understand the MSI implementation?There are two parts: 1) init or changing the data and address of MSI: 1) qemu-xen: hw/passthrough.c: pt_msg.*_write, MSI access are trapped here first. And then pt_update_msi in hw/pt-msi.c is called to update the MSI binding. 2) xen: drivers/passthrough/io.c: pt_irq_create_bind_vtd, where MSI is actually bound to the guest. 2) on MSI reception: In drivers/passthrough/io.c, hvm_do_IRQ_dpci and hvm_dirq_assist are the routines responsible for handling all assigned irqs (including MSI), and if an MSI is received, vmsi_deliver in arch/x86/vmsi.c get called to deliver MSI to the corresponding vlapic. And I just learned from Xiantao Zhang that for the guest Linux kernel, it enables per-cpu vector if it''s in physical mode, and that looks more likely relevant to this problem. It had problem in the older xen to handle this, and changeset 20253 is supposed to fix it, although I noticed your xen version is 20270. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-14 19:54 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
I switched over to Xen 3.5-unstable (changeset 20303) and pv_ops dom0 2.6.31.1 hoping that this would resolve the IRQ SMP affinity problem. I had to use pci-stub to hide the PCI devices since pciback wasn''t working. With vcpus=16 (APIC routing is physical flat), the interrupts were working in domU and being routed to CPU0 with the default smp_affinity (ffff) but as soon as I changed it to any 16-bit one-hot value or even setting it to the same default value resulted in a complete loss of interrupts (even in the devices that didn''t have any change to smp_affinity). With vcpus=4 (APIC routing is logical flat), I can see the interrupts being load balanced across all CPUs but as soon as I changed smp_affinity to any value, the interrupts stopped. This used to work reliably with the non-pv_ops kernel. I attached the logs in case anyone wants to take a look. I did see the MSI message address/data change in both domU and dom0 (using "lspci -vv"): vcpus=16: domU MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 40a9 domU MSI message address/data after smp_affinity=0010: Address: 00000000fee08000 Data: 40b1 (8 is APIC ID of CPU4) dom0 MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 4094 dom0 MSI message address/data after smp_affinity=0010: Address: 00000000fee00000 Data: 409c Aside from "lspci -vv" what other means are there to track down this problem? Is there some way to print the interrupt vector table? I''m considering adding printk''s to the code that Qing mentioned in his previous email (see below). Any suggestions on where in the code to add the printk''s? Thanks. Dante -----Original Message----- From: Qing He [mailto:qing.he@intel.com] Sent: Sunday, October 11, 2009 10:55 PM To: Cinco, Dante Cc: Keir Fraser; xen-devel@lists.xensource.com; xiantao.zhang@intel.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On Mon, 2009-10-12 at 13:25 +0800, Cinco, Dante wrote:> With vcpus < 4, logical flat mode works fine (no error message). I can > change smp_affinity to any value > 0 and < 16 and the interrupts go to > the proper CPU(s). Could you point me to the code that handles MSI so > that I can better understand the MSI implementation?There are two parts: 1) init or changing the data and address of MSI: 1) qemu-xen: hw/passthrough.c: pt_msg.*_write, MSI access are trapped here first. And then pt_update_msi in hw/pt-msi.c is called to update the MSI binding. 2) xen: drivers/passthrough/io.c: pt_irq_create_bind_vtd, where MSI is actually bound to the guest. 2) on MSI reception: In drivers/passthrough/io.c, hvm_do_IRQ_dpci and hvm_dirq_assist are the routines responsible for handling all assigned irqs (including MSI), and if an MSI is received, vmsi_deliver in arch/x86/vmsi.c get called to deliver MSI to the corresponding vlapic. And I just learned from Xiantao Zhang that for the guest Linux kernel, it enables per-cpu vector if it''s in physical mode, and that looks more likely relevant to this problem. It had problem in the older xen to handle this, and changeset 20253 is supposed to fix it, although I noticed your xen version is 20270. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2009-Oct-16 00:09 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Wed, Oct 14, 2009 at 01:54:33PM -0600, Cinco, Dante wrote:> I switched over to Xen 3.5-unstable (changeset 20303) and pv_ops dom0 2.6.31.1 hoping that this would resolve the IRQ SMP affinity problem. I had to use pci-stub to hide the PCI devices since pciback wasn''t working. With vcpus=16 (APIC routing is physical flat), the interrupts were working in domU and being routed to CPU0 with the default smp_affinity (ffff) but as soon as I changed it to any 16-bit one-hot value or even setting it to the same default value resulted in a complete loss of interrupts (even in the devices that didn''t have any change to smp_affinity). With vcpus=4 (APIC routing is logical flat), I can see the interrupts being load balanced across all CPUs but as soon as I changed smp_affinity to any value, the interrupts stopped. This used to work reliably with the non-pv_ops kernel. I attached the logs in case anyone wants to take a look. > > I did see the MSI message address/data change in both domU and dom0 (using "lspci -vv"): > > vcpus=16: > > domU MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 40a9 > domU MSI message address/data after smp_affinity=0010: Address: 00000000fee08000 Data: 40b1 (8 is APIC ID of CPU4)What does Xen tell you (hit Ctrl-A three times and then ''z''). Specifically look for vector 169 (a9) and 177 (b1). Do those values match with what you see in DomU and Dom0? Mainly that 177 has dest_id of 8. Oh, and also check the guest interrupt information, to see if those values match..> > dom0 MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 4094 > dom0 MSI message address/data after smp_affinity=0010: Address: 00000000fee00000 Data: 409c > > Aside from "lspci -vv" what other means are there to track down this problem? Is there some way to print the interrupt vector table? I''m considering adding printk''s to the code that Qing mentioned in his previous email (see below). Any suggestions on where in the code to add the printk''s?Hit Ctrl-A three times and you can get a wealth of information.. Of interest might also be the IO APIC area - you can see if the vector in question is masked?> > Thanks. > > Dante > > -----Original Message----- > From: Qing He [mailto:qing.he@intel.com] > Sent: Sunday, October 11, 2009 10:55 PM > To: Cinco, Dante > Cc: Keir Fraser; xen-devel@lists.xensource.com; xiantao.zhang@intel.com > Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > On Mon, 2009-10-12 at 13:25 +0800, Cinco, Dante wrote: > > With vcpus < 4, logical flat mode works fine (no error message). I can > > change smp_affinity to any value > 0 and < 16 and the interrupts go to > > the proper CPU(s). Could you point me to the code that handles MSI so > > that I can better understand the MSI implementation? > > There are two parts: > 1) init or changing the data and address of MSI: > 1) qemu-xen: hw/passthrough.c: pt_msg.*_write, MSI access are > trapped here first. And then pt_update_msi in > hw/pt-msi.c is called to update the MSI binding. > 2) xen: drivers/passthrough/io.c: pt_irq_create_bind_vtd, > where MSI is actually bound to the guest. > > 2) on MSI reception: > In drivers/passthrough/io.c, hvm_do_IRQ_dpci and hvm_dirq_assist > are the routines responsible for handling all assigned irqs > (including MSI), and if an MSI is received, vmsi_deliver in > arch/x86/vmsi.c get called to deliver MSI to the corresponding > vlapic. > > And I just learned from Xiantao Zhang that for the guest Linux kernel, it enables per-cpu vector if it''s in physical mode, and that looks more likely relevant to this problem. It had problem in the older xen to handle this, and changeset 20253 is supposed to fix it, although I noticed your xen version is 20270. > > Thanks, > Qing> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-16 01:38 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
I''m still trying to track down the problem of lost interrupts when I change /proc/irq/<irq#>/smp_affinity in domU. I''m now at Xen 3.5-unstable changeset 20320 and using pvops dom0 2.6.31.1. In domU, my PCI devices are at virtual slots 5, 6, 7 and 8 so I use "lspci -vv" to get their respective IRQs and MSI message address/data and I can also see their IRQs in /proc/interrupts (I''m not showing all 16 CPUs): lspci -vv -s 00:05.0 | grep IRQ; lspci -vv -s 00:06.0 | grep IRQ; lspci -vv -s 00:07.0 | grep IRQ; lspci -vv -s 00:08.0 | grep IRQ Interrupt: pin A routed to IRQ 48 Interrupt: pin B routed to IRQ 49 Interrupt: pin C routed to IRQ 50 Interrupt: pin D routed to IRQ 51 lspci -vv -s 00:05.0 | grep Address; lspci -vv -s 00:06.0 | grep Address; lspci -vv -s 00:07.0 | grep Address; lspci -vv -s 00:08.0 | grep Address Address: 00000000fee00000 Data: 4071 (vector=113) Address: 00000000fee00000 Data: 4089 (vector=137) Address: 00000000fee00000 Data: 4099 (vector=153) Address: 00000000fee00000 Data: 40a9 (vector=169) egrep ''(HW_TACHYON|CPU0)'' /proc/interrupts CPU0 CPU1 48: 1571765 0 PCI-MSI-edge HW_TACHYON 49: 3204403 0 PCI-MSI-edge HW_TACHYON 50: 2643008 0 PCI-MSI-edge HW_TACHYON 51: 3270322 0 PCI-MSI-edge HW_TACHYON In dom0, my PCI devices show up as a 4-function device: 0:07:0.0, 0:07:0.1, 0:07:0.2, 0:07:0.3 and I also use "lspci -vv" to get the IRQs and MSI info: lspci -vv -s 0:07:0.0 | grep IRQ;lspci -vv -s 0:07:0.1 | grep IRQ;lspci -vv -s 0:07:0.2 | grep IRQ;lspci -vv -s 0:07:0.3 | grep IRQ Interrupt: pin A routed to IRQ 11 Interrupt: pin B routed to IRQ 10 Interrupt: pin C routed to IRQ 7 Interrupt: pin D routed to IRQ 5 lspci -vv -s 0:07:0.0 | grep Address;lspci -vv -s 0:07:0.1 | grep Address;lspci -vv -s 0:07:0.2 | grep Address;lspci -vv -s 0:07:0.3 | grep Address Address: 00000000fee00000 Data: 403c (vector=60) Address: 00000000fee00000 Data: 4044 (vector=68) Address: 00000000fee00000 Data: 404c (vector=76) Address: 00000000fee00000 Data: 4054 (vector=84) I used the "Ctrl-a" "Ctrl-a" "Ctrl-a" "i" key sequence from the Xen console to print the guest interrupt information and the PCI devices. The vectors shown here are actually the vectors as seen from dom0 so I don''t understand the label "Guest interrupt information." Meanwhile, the IRQs (74 - 77) do not match those from dom0 (11, 10, 7, 5) or domU (48, 49, 50, 51) as seen by "lspci -vv" but they do match those reported by the "Ctrl-a" key sequence followed by "Q" for PCI devices. (XEN) Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec: 60 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----), (XEN) IRQ: 75, IRQ affinity:0x00000001, Vec: 68 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 78(----), (XEN) IRQ: 76, IRQ affinity:0x00000001, Vec: 76 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 77(----), (XEN) IRQ: 77, IRQ affinity:0x00000001, Vec: 84 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 76(----), (XEN) ==== PCI devices ===(XEN) 07:00.3 - dom 1 - MSIs < 77 > (XEN) 07:00.2 - dom 1 - MSIs < 76 > (XEN) 07:00.1 - dom 1 - MSIs < 75 > (XEN) 07:00.0 - dom 1 - MSIs < 74 > If I look at /var/log/xen/qemu-dm-dpm.log, I see these 4 lines that show the pirq''s which matches those in the last column of guest interrupt information: pt_msi_setup: msi mapped with pirq 4f (79) pt_msi_setup: msi mapped with pirq 4e (78) pt_msi_setup: msi mapped with pirq 4d (77) pt_msi_setup: msi mapped with pirq 4c (76) The gvec''s (71, 89, 99, a9) matches the vectors as seen by lspci in domU: pt_msgctrl_reg_write: guest enabling MSI, disable MSI-INTx translation pt_msi_update: Update msi with pirq 4f gvec 71 gflags 0 pt_msgctrl_reg_write: guest enabling MSI, disable MSI-INTx translation pt_msi_update: Update msi with pirq 4e gvec 89 gflags 0 pt_msgctrl_reg_write: guest enabling MSI, disable MSI-INTx translation pt_msi_update: Update msi with pirq 4d gvec 99 gflags 0 pt_msgctrl_reg_write: guest enabling MSI, disable MSI-INTx translation pt_msi_update: Update msi with pirq 4c gvec a9 gflags 0 I see these same pirq''s in the output of "xm dmesg" (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 4f device = 5 intx = 0 (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.1 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.1 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 4e device = 6 intx = 0 (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.2 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.2 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 4d device = 7 intx = 0 (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.3 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.3 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 4c device = 8 intx = 0 The machine_gsi''s match the pirq''s while the m_irq''s match the IRQ from lspci dom0. What are the guest_gsi''s? (XEN) io.c:316:d0 pt_irq_destroy_bind_vtd: machine_gsi=79 guest_gsi=36, device=5, intx=0. (XEN) io.c:371:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = b device = 5 intx = 0 (XEN) io.c:316:d0 pt_irq_destroy_bind_vtd: machine_gsi=78 guest_gsi=40, device=6, intx=0. (XEN) io.c:371:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4e device = 0x6 intx = 0x0 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = a device = 6 intx = 0 (XEN) io.c:316:d0 pt_irq_destroy_bind_vtd: machine_gsi=77 guest_gsi=44, device=7, intx=0. (XEN) io.c:371:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4d device = 0x7 intx = 0x0 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 7 device = 7 intx = 0 (XEN) io.c:316:d0 pt_irq_destroy_bind_vtd: machine_gsi=76 guest_gsi=17, device=8, intx=0. (XEN) io.c:371:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4c device = 0x8 intx = 0x0 (XEN) [VT-D]io.c:291:d0 VT-d irq bind: m_irq = 5 device = 8 intx = 0 So now when I finally get to the part where I change the smp_affinity, I see a corresponding change in the guest interrupt information, qemu-dm-dpm.log and lspci on both dom0 and domU: cat /proc/irq/48/smp_affinity ffff echo 2 > /proc/irq/48/smp_affinity cat /proc/irq/48/smp_affinity 0002 (XEN) Guest interrupt information: (IRQ affinity changed from 1 to 2, while vector changed from 60 to 92) (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec: 92 type=PCI-MSI status=00000010 in-flight=1 domain-list=1: 79(---M), pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2 (What is the significance of gflags 2?) pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2 domU: lspci -vv -s 00:05.0 | grep Address Address: 00000000fee02000 Data: 40b1 (dest ID changed from 0 to 2 and vector changed from 0x71 to 0xb1) dom0: lspci -vv -s 0:07:0.0 | grep Address Address: 00000000fee00000 Data: 405c (vector changed from 0x3c (60 decimal) to 0x5c (92 decimal)) I''m confused why there are 4 sets of IRQs: dom0 lspci:[11,10,7,5], domU lspci proc interrupts:[48,49,50,51], pirq:[76,77,78,79], guest int info:[74,75,76,77]. Are the changes resulting from changing the IRQ smp_affinity consistent with what is expected? Any recommendation on where to go from here? Thanks in advance. Dante _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2009-Oct-16 01:40 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Thu, Oct 15, 2009 at 08:09:42PM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 14, 2009 at 01:54:33PM -0600, Cinco, Dante wrote: > > I switched over to Xen 3.5-unstable (changeset 20303) and pv_ops dom0 2.6.31.1 hoping that this would resolve the IRQ SMP affinity problem. I had to use pci-stub to hide the PCI devices since pciback wasn''t working. With vcpus=16 (APIC routing is physical flat), the interrupts were working in domU and being routed to CPU0 with the default smp_affinity (ffff) but as soon as I changed it to any 16-bit one-hot value or even setting it to the same default value resulted in a complete loss of interrupts (even in the devices that didn''t have any change to smp_affinity). With vcpus=4 (APIC routing is logical flat), I can see the interrupts being load balanced across all CPUs but as soon as I changed smp_affinity to any value, the interrupts stopped. This used to work reliably with the non-pv_ops kernel. I attached the logs in case anyone wants to take a look. > > > > I did see the MSI message address/data change in both domU and dom0 (using "lspci -vv"): > > > > vcpus=16: > > > > domU MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 40a9 > > domU MSI message address/data after smp_affinity=0010: Address: 00000000fee08000 Data: 40b1 (8 is APIC ID of CPU4) > > What does Xen tell you (hit Ctrl-A three times and then ''z''). Specifically look for vector 169 (a9) and 177 (b1). > Do those values match with what you see in DomU and Dom0? Mainly that 177 has dest_id of 8. > Oh, and also check the guest interrupt information, to see if those values match..N/m. I was thinking that maybe your IOAPIC has those vectors programmed in it. But that would not make any sense.> > > > dom0 MSI message address/data with default smp_affinity: Address: 00000000fee00000 Data: 4094 > > dom0 MSI message address/data after smp_affinity=0010: Address: 00000000fee00000 Data: 409c_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 02:34 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 09:38 +0800, Cinco, Dante wrote:> I''m confused why there are 4 sets of IRQs: dom0 lspci:[11,10,7,5], domU > lspci proc interrupts:[48,49,50,51], pirq:[76,77,78,79], guest int > info:[74,75,76,77].This is indeed a little confusing at first, I''ll try to differentiate them here: 1. dom0 IRQ [11,10,7,5]: this is decided by dom0 kernel, through information from host ACPI 2. domU IRQ [48,49,50,51]: this is decided by domU kernel, through virtual ACPI present to guest. If there are multiple domUs, this space overlaps 3. pirq [76,77,79,79]: this is a per domain concept, it has nothing to do with the physical or virtual irq number, the sole purpose is to provide an interface between domains (mainly PV) and hypervisor. GSI part happens to be indentical mapped though. 4. irq [74,75,76,77]: global hypervisor concept, to track all irqs for all domains, origininally named `vector'', the name changes when hypervisor per-CPU vectoring is introduced.> pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2 (What is the significance of gflags 2?) > pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2gflags is a custom interface that incorporates address and data: DM, dest, etc. gflags=2 means DM=0 des=2. The first line is an intermediate result, print when guest updates the MSI address, the second line indicates an update to MSI data.> (XEN) Guest interrupt information: > (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec: 60 type=PCI-MSI > status=00000010 in-flight=0 domain-list=1: 79(----), > > echo 2 > /proc/irq/48/smp_affinity > > (XEN) Guest interrupt information: (IRQ affinity changed from 1 to 2, while > vector changed from 60 to 92) > (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec: 92 type= PCI-MSI > status=00000010 in-flight=1 domain-list=1: 79(---M),`(---M)'' means masked, that may be why the irq is not received. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-16 06:37 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 16/10/2009 03:34, "Qing He" <qing.he@intel.com> wrote:>> (XEN) Guest interrupt information: (IRQ affinity changed from 1 to 2, while >> vector changed from 60 to 92) >> (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec: 92 type= PCI-MSI >> status=00000010 in-flight=1 domain-list=1: 79(---M), > > `(---M)'' means masked, that may be why the irq is not received.Glad you managed to pick that out of the information overload. :-) It does look like the next obvious lead to chase down. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 07:32 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Keir Fraser wrote:> On 16/10/2009 03:34, "Qing He" <qing.he@intel.com> wrote: > >>> (XEN) Guest interrupt information: (IRQ affinity changed from 1 to >>> 2, while vector changed from 60 to 92) (XEN) IRQ: 74, IRQ >>> affinity:0x00000002, Vec: 92 type= PCI-MSI status=00000010 >>> in-flight=1 domain-list=1: 79(---M), >> >> `(---M)'' means masked, that may be why the irq is not received. > > Glad you managed to pick that out of the information overload. :-) It > does look like the next obvious lead to chase down.According to the description, the issue should be caused by lost EOI write for the MSI interrupt and leads to permanent interrupt mask. There should be a race between guest setting new vector and EOIs old vector for the interrupt. Once guest sets new vector before it EOIs the old vector, hypervisor can''t find the pirq which corresponds old vector(has changed to new vector) , so also can''t EOI the old vector forever in hardware level. Since the corresponding vector in real processor can''t be EOIed, so system may lose all interrupts and result the reported issues ultimately. But I remembered there should be a timer to handle this case through a forcible EOI write to the real processor after timeout, but seems it doesn''t function in the expected way. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 08:22 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
He, Qing wrote:> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >> According to the description, the issue should be caused by lost EOI >> write for the MSI interrupt and leads to permanent interrupt mask. >> There should be a race between guest setting new vector and EOIs >> old vector for the interrupt. Once guest sets new vector before it >> EOIs the old vector, hypervisor can''t find the pirq which >> corresponds old vector(has changed >> to new vector) , so also can''t EOI the old vector forever in hardware >> level. Since the corresponding vector in real processor can''t be >> EOIed, >> so system may lose all interrupts and result the reported issues >> ultimately. > >> But I remembered there should be a timer to handle this case >> through a forcible EOI write to the real processor after timeout, >> but seems it doesn''t function in the expected way. > > The EOI timer is supposed to deal with the irq sharing problem, > since MSI doesn''t share, this timer will not be started in the > case of MSI.That maybe a problem if so. If a malicious/buggy guest won''t EOI the MSI vector, so host may hang due to lack of timeout mechanism? Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 08:24 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote:> According to the description, the issue should be caused by lost EOI write > for the MSI interrupt and leads to permanent interrupt mask. There should > be a race between guest setting new vector and EOIs old vector for the > interrupt. Once guest sets new vector before it EOIs the old vector, > hypervisor can''t find the pirq which corresponds old vector(has changed > to new vector) , so also can''t EOI the old vector forever in hardware > level. Since the corresponding vector in real processor can''t be EOIed, > so system may lose all interrupts and result the reported issues ultimately.> But I remembered there should be a timer to handle this case > through a forcible EOI write to the real processor after timeout, > but seems it doesn''t function in the expected way.The EOI timer is supposed to deal with the irq sharing problem, since MSI doesn''t share, this timer will not be started in the case of MSI. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 08:34 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote:> He, Qing wrote: > > On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: > >> According to the description, the issue should be caused by lost EOI > >> write for the MSI interrupt and leads to permanent interrupt mask. > >> There should be a race between guest setting new vector and EOIs > >> old vector for the interrupt. Once guest sets new vector before it > >> EOIs the old vector, hypervisor can''t find the pirq which > >> corresponds old vector(has changed > >> to new vector) , so also can''t EOI the old vector forever in hardware > >> level. Since the corresponding vector in real processor can''t be > >> EOIed, > >> so system may lose all interrupts and result the reported issues > >> ultimately. > > > >> But I remembered there should be a timer to handle this case > >> through a forcible EOI write to the real processor after timeout, > >> but seems it doesn''t function in the expected way. > > > > The EOI timer is supposed to deal with the irq sharing problem, > > since MSI doesn''t share, this timer will not be started in the > > case of MSI. > > That maybe a problem if so. If a malicious/buggy guest won''t EOI the > MSI vector, so host may hang due to lack of timeout mechanism?Why does host hang? Only the assigned interrupt will block, and that''s exactly what the guest wants :-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 08:35 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
He, Qing wrote:> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: >> He, Qing wrote: >>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >>>> According to the description, the issue should be caused by lost >>>> EOI write for the MSI interrupt and leads to permanent interrupt >>>> mask. There should be a race between guest setting new vector and >>>> EOIs old vector for the interrupt. Once guest sets new vector >>>> before it EOIs the old vector, hypervisor can''t find the pirq which >>>> corresponds old vector(has changed >>>> to new vector) , so also can''t EOI the old vector forever in >>>> hardware level. Since the corresponding vector in real processor >>>> can''t be EOIed, so system may lose all interrupts and result the >>>> reported issues ultimately. >>> >>>> But I remembered there should be a timer to handle this case >>>> through a forcible EOI write to the real processor after timeout, >>>> but seems it doesn''t function in the expected way. >>> >>> The EOI timer is supposed to deal with the irq sharing problem, >>> since MSI doesn''t share, this timer will not be started in the >>> case of MSI. >> >> That maybe a problem if so. If a malicious/buggy guest won''t EOI the >> MSI vector, so host may hang due to lack of timeout mechanism? > > Why does host hang? Only the assigned interrupt will block, and that''s > exactly what the guest wants :-)Hypervisor shouldn''t EOI the real vector until guest EOI the corresponding virtual vector , right ? Not sure.:-) Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 09:01 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote:> He, Qing wrote: > > On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: > >> He, Qing wrote: > >>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: > >>>> According to the description, the issue should be caused by lost > >>>> EOI write for the MSI interrupt and leads to permanent interrupt > >>>> mask. There should be a race between guest setting new vector and > >>>> EOIs old vector for the interrupt. Once guest sets new vector > >>>> before it EOIs the old vector, hypervisor can''t find the pirq which > >>>> corresponds old vector(has changed > >>>> to new vector) , so also can''t EOI the old vector forever in > >>>> hardware level. Since the corresponding vector in real processor > >>>> can''t be EOIed, so system may lose all interrupts and result the > >>>> reported issues ultimately. > >>> > >>>> But I remembered there should be a timer to handle this case > >>>> through a forcible EOI write to the real processor after timeout, > >>>> but seems it doesn''t function in the expected way. > >>> > >>> The EOI timer is supposed to deal with the irq sharing problem, > >>> since MSI doesn''t share, this timer will not be started in the > >>> case of MSI. > >> > >> That maybe a problem if so. If a malicious/buggy guest won''t EOI the > >> MSI vector, so host may hang due to lack of timeout mechanism? > > > > Why does host hang? Only the assigned interrupt will block, and that''s > > exactly what the guest wants :-) > > Hypervisor shouldn''t EOI the real vector until guest EOI the corresponding > virtual vector , right ? Not sure.:-)Yes, it is the algorithm used today. After reviewing the code, if the guest really does something like changing affinity within the window between an irq fire and eoi, there is indeed a problem, attached is the patch. Although I kinda doubt it, shouldn''t desc->lock in guest protect and make these two operations mutual exclusive. Dante, Can you see if this patch helps? Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-16 09:41 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 16/10/2009 09:35, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI the >>> MSI vector, so host may hang due to lack of timeout mechanism? >> >> Why does host hang? Only the assigned interrupt will block, and that''s >> exactly what the guest wants :-) > > Hypervisor shouldn''t EOI the real vector until guest EOI the corresponding > virtual vector , right ? Not sure.:-)If the EOI is via the local APIC, which I suppose it must be, then a timeout fallback probably is required. This is because priorities are assigned arbitrarily to guest interrupts, and a non-EOIed interrupt blocks any lower-priority interrupts. In particular, some of those could be owned by dom0 for example, and be quite critical to forward progress of the entire system. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 09:42 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 17:01 +0800, Qing He wrote:> Yes, it is the algorithm used today. > > After reviewing the code, if the guest really does something like > changing affinity within the window between an irq fire and eoi, > there is indeed a problem, attached is the patch. Although I kinda > doubt it, shouldn''t desc->lock in guest protect and make these two > operations mutual exclusive. > > Dante, > Can you see if this patch helps?Please ignore this patch, I intended to use it to see if this can confirm the analysis (in the cost of interrupt losses), but it may actually bring more severe problems. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 09:49 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
He, Qing wrote:> On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote: >> He, Qing wrote: >>> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: >>>> He, Qing wrote: >>>>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >>>>>> According to the description, the issue should be caused by lost >>>>>> EOI write for the MSI interrupt and leads to permanent interrupt >>>>>> mask. There should be a race between guest setting new vector and >>>>>> EOIs old vector for the interrupt. Once guest sets new vector >>>>>> before it EOIs the old vector, hypervisor can''t find the pirq >>>>>> which corresponds old vector(has changed >>>>>> to new vector) , so also can''t EOI the old vector forever in >>>>>> hardware level. Since the corresponding vector in real processor >>>>>> can''t be EOIed, so system may lose all interrupts and result the >>>>>> reported issues ultimately. >>>>> >>>>>> But I remembered there should be a timer to handle this case >>>>>> through a forcible EOI write to the real processor after timeout, >>>>>> but seems it doesn''t function in the expected way. >>>>> >>>>> The EOI timer is supposed to deal with the irq sharing problem, >>>>> since MSI doesn''t share, this timer will not be started in the >>>>> case of MSI. >>>> >>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI >>>> the MSI vector, so host may hang due to lack of timeout mechanism? >>> >>> Why does host hang? Only the assigned interrupt will block, and >>> that''s exactly what the guest wants :-) >> >> Hypervisor shouldn''t EOI the real vector until guest EOI the >> corresponding virtual vector , right ? Not sure.:-) > > Yes, it is the algorithm used today.So it should be still a problem. If guest won''t do eoi, host can''t do eoi also, and leads to system hang without timeout mechanism. So we may need to introduce a timer for each MSI interrupt source to avoid hanging host, Keir?> After reviewing the code, if the guest really does something like > changing affinity within the window between an irq fire and eoi, > there is indeed a problem, attached is the patch. Although I kinda > doubt it, shouldn''t desc->lock in guest protect and make these two > operations mutual exclusive.We shouldn''t let hypervisor do real EOI before guest does the correponding virtual EOI, so this patch maybe have a correctness issue. :-) Attached the fix according to my privious guess, and it should fix the issue. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-16 09:57 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Fri, 2009-10-16 at 17:41 +0800, Keir Fraser wrote:> On 16/10/2009 09:35, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > > >>> That maybe a problem if so. If a malicious/buggy guest won''t EOI the > >>> MSI vector, so host may hang due to lack of timeout mechanism? > >> > >> Why does host hang? Only the assigned interrupt will block, and that''s > >> exactly what the guest wants :-) > > > > Hypervisor shouldn''t EOI the real vector until guest EOI the corresponding > > virtual vector , right ? Not sure.:-) > > If the EOI is via the local APIC, which I suppose it must be, then a timeout > fallback probably is required. This is because priorities are assigned > arbitrarily to guest interrupts, and a non-EOIed interrupt blocks any > lower-priority interrupts. In particular, some of those could be owned by > dom0 for example, and be quite critical to forward progress of the entire > system.Yeah, I just come to realized it. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 09:58 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Keir Fraser wrote:> On 16/10/2009 09:35, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > >>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI >>>> the MSI vector, so host may hang due to lack of timeout mechanism? >>> >>> Why does host hang? Only the assigned interrupt will block, and >>> that''s exactly what the guest wants :-) >> >> Hypervisor shouldn''t EOI the real vector until guest EOI the >> corresponding virtual vector , right ? Not sure.:-) > > If the EOI is via the local APIC, which I suppose it must be, then a > timeout fallback probably is required. This is because priorities are > assigned arbitrarily to guest interrupts, and a non-EOIed interrupt > blocks any lower-priority interrupts. In particular, some of those > could be owned by dom0 for example, and be quite critical to forward > progress of the entire system.Yeah, exactly same with my concern. We may need to add the timeout mechanism for each interrupt source to avoid that buggy/malicious guests hang host through not writing EOI. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Oct-16 10:21 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 16.10.09 11:58 >>> >Keir Fraser wrote: >> If the EOI is via the local APIC, which I suppose it must be, then a >> timeout fallback probably is required. This is because priorities are >> assigned arbitrarily to guest interrupts, and a non-EOIed interrupt >> blocks any lower-priority interrupts. In particular, some of those >> could be owned by dom0 for example, and be quite critical to forward >> progress of the entire system. > >Yeah, exactly same with my concern. We may need to add the timeout >mechanism for each interrupt source to avoid that buggy/malicious >guests hang host through not writing EOI.But that''s (supposed to be) happening already: If an MSI interrupt is maskable, the interrupt gets masked and the EOI is sent immediately. If it''s not maskable, a timer gets started to issue the EOI if the guest doesn''t. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-16 14:54 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Attached this new one which should eliminate the race ultimately. Xiantao -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Zhang, Xiantao Sent: Friday, October 16, 2009 5:50 PM To: He, Qing Cc: Cinco, Dante; xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) He, Qing wrote:> On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote: >> He, Qing wrote: >>> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: >>>> He, Qing wrote: >>>>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >>>>>> According to the description, the issue should be caused by lost >>>>>> EOI write for the MSI interrupt and leads to permanent interrupt >>>>>> mask. There should be a race between guest setting new vector and >>>>>> EOIs old vector for the interrupt. Once guest sets new vector >>>>>> before it EOIs the old vector, hypervisor can''t find the pirq >>>>>> which corresponds old vector(has changed >>>>>> to new vector) , so also can''t EOI the old vector forever in >>>>>> hardware level. Since the corresponding vector in real processor >>>>>> can''t be EOIed, so system may lose all interrupts and result the >>>>>> reported issues ultimately. >>>>> >>>>>> But I remembered there should be a timer to handle this case >>>>>> through a forcible EOI write to the real processor after timeout, >>>>>> but seems it doesn''t function in the expected way. >>>>> >>>>> The EOI timer is supposed to deal with the irq sharing problem, >>>>> since MSI doesn''t share, this timer will not be started in the >>>>> case of MSI. >>>> >>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI >>>> the MSI vector, so host may hang due to lack of timeout mechanism? >>> >>> Why does host hang? Only the assigned interrupt will block, and >>> that''s exactly what the guest wants :-) >> >> Hypervisor shouldn''t EOI the real vector until guest EOI the >> corresponding virtual vector , right ? Not sure.:-) > > Yes, it is the algorithm used today.So it should be still a problem. If guest won''t do eoi, host can''t do eoi also, and leads to system hang without timeout mechanism. So we may need to introduce a timer for each MSI interrupt source to avoid hanging host, Keir?> After reviewing the code, if the guest really does something like > changing affinity within the window between an irq fire and eoi, > there is indeed a problem, attached is the patch. Although I kinda > doubt it, shouldn''t desc->lock in guest protect and make these two > operations mutual exclusive.We shouldn''t let hypervisor do real EOI before guest does the correponding virtual EOI, so this patch maybe have a correctness issue. :-) Attached the fix according to my privious guess, and it should fix the issue. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-16 18:24 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Xiantao, I''m still losing the interrupts with your patch but I see some differences. To simplifiy the data, I''m only going to focus on the first function of my 4-function PCI device. After changing the IRQ affinity, the IRQ is not masked anymore (unlike before the patch). What stands out for me is the new vector (219) as reported by "guest interrupt information" does not match the vector (187) in dom0 lspci. Before the patch, the new vector in "guest interrupt information" matched the new vector in dom0 lspci (dest ID in dom0 lspci was unchanged). I also saw this message pop on the Xen console when I changed smp_affinity: (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). 187 is the vector from dom0 lspci before and after the smp_affinity change but "guest interrupt information" reports the new vector is 219. To me, this looks like the new MSI message data (with vector=219) did not get written into the PCI device, right? Here''s a comparison before and after changing smp_affinity from ffff to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=ffff (default): dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) pt_msi_update: Update msi with pirq 4f gvec 71 gflags 0 Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=2: dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2 pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2 ------------------------------------------------------------------------ -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Friday, October 16, 2009 7:55 AM To: Zhang, Xiantao; He, Qing Cc: Cinco, Dante; xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Attached this new one which should eliminate the race ultimately. Xiantao -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Zhang, Xiantao Sent: Friday, October 16, 2009 5:50 PM To: He, Qing Cc: Cinco, Dante; xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) He, Qing wrote:> On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote: >> He, Qing wrote: >>> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: >>>> He, Qing wrote: >>>>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >>>>>> According to the description, the issue should be caused by lost >>>>>> EOI write for the MSI interrupt and leads to permanent interrupt >>>>>> mask. There should be a race between guest setting new vector and >>>>>> EOIs old vector for the interrupt. Once guest sets new vector >>>>>> before it EOIs the old vector, hypervisor can''t find the pirq >>>>>> which corresponds old vector(has changed to new vector) , so also >>>>>> can''t EOI the old vector forever in hardware level. Since the >>>>>> corresponding vector in real processor can''t be EOIed, so system >>>>>> may lose all interrupts and result the reported issues >>>>>> ultimately. >>>>> >>>>>> But I remembered there should be a timer to handle this case >>>>>> through a forcible EOI write to the real processor after timeout, >>>>>> but seems it doesn''t function in the expected way. >>>>> >>>>> The EOI timer is supposed to deal with the irq sharing problem, >>>>> since MSI doesn''t share, this timer will not be started in the >>>>> case of MSI. >>>> >>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI >>>> the MSI vector, so host may hang due to lack of timeout mechanism? >>> >>> Why does host hang? Only the assigned interrupt will block, and >>> that''s exactly what the guest wants :-) >> >> Hypervisor shouldn''t EOI the real vector until guest EOI the >> corresponding virtual vector , right ? Not sure.:-) > > Yes, it is the algorithm used today.So it should be still a problem. If guest won''t do eoi, host can''t do eoi also, and leads to system hang without timeout mechanism. So we may need to introduce a timer for each MSI interrupt source to avoid hanging host, Keir?> After reviewing the code, if the guest really does something like > changing affinity within the window between an irq fire and eoi, there > is indeed a problem, attached is the patch. Although I kinda doubt it, > shouldn''t desc->lock in guest protect and make these two operations > mutual exclusive.We shouldn''t let hypervisor do real EOI before guest does the correponding virtual EOI, so this patch maybe have a correctness issue. :-) Attached the fix according to my privious guess, and it should fix the issue. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-17 00:59 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Dante It should be another issue as you described. Can you try the following code to see whether it works for you ? Just a try. Xiantao diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) continue; irq = desc - irq_desc; ASSERT(MSI_IRQ(irq)); - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); + //desc->handler->set_affinity(irq, *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); } spin_unlock(&d->event_lock); -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, Dante Sent: Saturday, October 17, 2009 2:24 AM To: Zhang, Xiantao; He, Qing Cc: Keir; xen-devel@lists.xensource.com; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Xiantao, I''m still losing the interrupts with your patch but I see some differences. To simplifiy the data, I''m only going to focus on the first function of my 4-function PCI device. After changing the IRQ affinity, the IRQ is not masked anymore (unlike before the patch). What stands out for me is the new vector (219) as reported by "guest interrupt information" does not match the vector (187) in dom0 lspci. Before the patch, the new vector in "guest interrupt information" matched the new vector in dom0 lspci (dest ID in dom0 lspci was unchanged). I also saw this message pop on the Xen console when I changed smp_affinity: (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). 187 is the vector from dom0 lspci before and after the smp_affinity change but "guest interrupt information" reports the new vector is 219. To me, this looks like the new MSI message data (with vector=219) did not get written into the PCI device, right? Here''s a comparison before and after changing smp_affinity from ffff to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=ffff (default): dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) pt_msi_update: Update msi with pirq 4f gvec 71 gflags 0 Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=2: dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2 pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2 ------------------------------------------------------------------------ -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Friday, October 16, 2009 7:55 AM To: Zhang, Xiantao; He, Qing Cc: Cinco, Dante; xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Attached this new one which should eliminate the race ultimately. Xiantao -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Zhang, Xiantao Sent: Friday, October 16, 2009 5:50 PM To: He, Qing Cc: Cinco, Dante; xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) He, Qing wrote:> On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote: >> He, Qing wrote: >>> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote: >>>> He, Qing wrote: >>>>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote: >>>>>> According to the description, the issue should be caused by lost >>>>>> EOI write for the MSI interrupt and leads to permanent interrupt >>>>>> mask. There should be a race between guest setting new vector and >>>>>> EOIs old vector for the interrupt. Once guest sets new vector >>>>>> before it EOIs the old vector, hypervisor can''t find the pirq >>>>>> which corresponds old vector(has changed to new vector) , so also >>>>>> can''t EOI the old vector forever in hardware level. Since the >>>>>> corresponding vector in real processor can''t be EOIed, so system >>>>>> may lose all interrupts and result the reported issues >>>>>> ultimately. >>>>> >>>>>> But I remembered there should be a timer to handle this case >>>>>> through a forcible EOI write to the real processor after timeout, >>>>>> but seems it doesn''t function in the expected way. >>>>> >>>>> The EOI timer is supposed to deal with the irq sharing problem, >>>>> since MSI doesn''t share, this timer will not be started in the >>>>> case of MSI. >>>> >>>> That maybe a problem if so. If a malicious/buggy guest won''t EOI >>>> the MSI vector, so host may hang due to lack of timeout mechanism? >>> >>> Why does host hang? Only the assigned interrupt will block, and >>> that''s exactly what the guest wants :-) >> >> Hypervisor shouldn''t EOI the real vector until guest EOI the >> corresponding virtual vector , right ? Not sure.:-) > > Yes, it is the algorithm used today.So it should be still a problem. If guest won''t do eoi, host can''t do eoi also, and leads to system hang without timeout mechanism. So we may need to introduce a timer for each MSI interrupt source to avoid hanging host, Keir?> After reviewing the code, if the guest really does something like > changing affinity within the window between an irq fire and eoi, there > is indeed a problem, attached is the patch. Although I kinda doubt it, > shouldn''t desc->lock in guest protect and make these two operations > mutual exclusive.We shouldn''t let hypervisor do real EOI before guest does the correponding virtual EOI, so this patch maybe have a correctness issue. :-) Attached the fix according to my privious guess, and it should fix the issue. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-20 00:19 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Xiantao, With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ smp_affinity to any one-hot value and see the interrupts routed to the specified CPU. Every now and then though, both domU and dom0 will permanently lockup (cold reboot required) after changing the smp_affinity. If I change it manually via command-line, it seems to be okay but if I change it within a script (such as shifting-left a walking "1" to test all 16 CPUs), it will lockup part way through the script. Other observations: The MSI message address/data in dom0 "lspci -vv" stays the same as well as the "interrupt guest information" from the Xen console even though I see the destination ID and vector change in domU "lspci -vv." You''re probably expecting this behavior since you removed the set_affinity call in the last patch. With vcpus=5, I can only change smp_affinity to 1. Any other value aside from 1 or 1f (default) results in an instant, permanent lockup of both domU and dom0 (Xen console still accessible). I also observed when I tried changing the smp_affinity of the first function of the 4-function PCI device to 2, the 3rd and 4th functions got masked: (XEN) IRQ: 66, IRQ affinity:0x00000001, Vec:186 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) (XEN) IRQ: 67, IRQ affinity:0x00000001, Vec:194 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 78(----) (XEN) IRQ: 68, IRQ affinity:0x00000001, Vec:202 type=PCI-MSI status=00000010 in-flight=1 domain-list=1: 77(---M) (XEN) IRQ: 69, IRQ affinity:0x00000001, Vec:210 type=PCI-MSI status=00000010 in-flight=1 domain-list=1: 76(---M) In the above log, I had changed the smp_affinity for IRQ 66 but IRQ 68 and 69 got masked. Dante -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Friday, October 16, 2009 5:59 PM To: Cinco, Dante; He, Qing Cc: xen-devel@lists.xensource.com; Fraser; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Dante It should be another issue as you described. Can you try the following code to see whether it works for you ? Just a try. Xiantao diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) continue; irq = desc - irq_desc; ASSERT(MSI_IRQ(irq)); - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); + //desc->handler->set_affinity(irq, *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); } spin_unlock(&d->event_lock); -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, Dante Sent: Saturday, October 17, 2009 2:24 AM To: Zhang, Xiantao; He, Qing Cc: Keir; xen-devel@lists.xensource.com; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Xiantao, I''m still losing the interrupts with your patch but I see some differences. To simplifiy the data, I''m only going to focus on the first function of my 4-function PCI device. After changing the IRQ affinity, the IRQ is not masked anymore (unlike before the patch). What stands out for me is the new vector (219) as reported by "guest interrupt information" does not match the vector (187) in dom0 lspci. Before the patch, the new vector in "guest interrupt information" matched the new vector in dom0 lspci (dest ID in dom0 lspci was unchanged). I also saw this message pop on the Xen console when I changed smp_affinity: (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). 187 is the vector from dom0 lspci before and after the smp_affinity change but "guest interrupt information" reports the new vector is 219. To me, this looks like the new MSI message data (with vector=219) did not get written into the PCI device, right? Here''s a comparison before and after changing smp_affinity from ffff to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=ffff (default): dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) pt_msi_update: Update msi with pirq 4f gvec 71 gflags 0 Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 ------------------------------------------------------------------------ /proc/irq/48/smp_affinity=2: dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2 pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-20 05:46 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Cinco, Dante wrote:> Xiantao, > With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ > smp_affinity to any one-hot value and see the interrupts routed to > the specified CPU. Every now and then though, both domU and dom0 will > permanently lockup (cold reboot required) after changing the > smp_affinity. If I change it manually via command-line, it seems to > be okay but if I change it within a script (such as shifting-left a > walking "1" to test all 16 CPUs), it will lockup part way through the > script.I can''t reproduce the failure at my side after applying the patches even with a similar script which changes irq''s affinity. Could you share your script with me ?> Other observations: > > In the above log, I had changed the smp_affinity for IRQ 66 but IRQ > 68 and 69 got masked.We can see the warning as "No irq handler for vector" but it shouldn''t hang host, and it maybe related to another potential issue, and maybe need further investigation. Xiantao> -----Original Message----- > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > Sent: Friday, October 16, 2009 5:59 PM > To: Cinco, Dante; He, Qing > Cc: xen-devel@lists.xensource.com; Fraser; Fraser > Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > Dante > It should be another issue as you described. Can you try the > following code to see whether it works for you ? Just a try. > Xiantao > > diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 > +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 > @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) > continue; > irq = desc - irq_desc; > ASSERT(MSI_IRQ(irq)); > - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); > + //desc->handler->set_affinity(irq, > *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); > } > spin_unlock(&d->event_lock); > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, > Dante > Sent: Saturday, October 17, 2009 2:24 AM > To: Zhang, Xiantao; He, Qing > Cc: Keir; xen-devel@lists.xensource.com; Fraser > Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > Xiantao, > I''m still losing the interrupts with your patch but I see some > differences. To simplifiy the data, I''m only going to focus on the > first function of my 4-function PCI device. > > After changing the IRQ affinity, the IRQ is not masked anymore > (unlike before the patch). What stands out for me is the new vector > (219) as reported by "guest interrupt information" does not match the > vector (187) in dom0 lspci. Before the patch, the new vector in > "guest interrupt information" matched the new vector in dom0 lspci > (dest ID in dom0 lspci was unchanged). I also saw this message pop on > the Xen console when I changed smp_affinity: > > (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). > > 187 is the vector from dom0 lspci before and after the smp_affinity > change but "guest interrupt information" reports the new vector is > 219. To me, this looks like the new MSI message data (with > vector=219) did not get written into the PCI device, right? > > Here''s a comparison before and after changing smp_affinity from ffff > to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): > > ------------------------------------------------------------------------ > > /proc/irq/48/smp_affinity=ffff (default): > > dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) > > domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) > > qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) > pt_msi_update: Update msi with pirq 4f gvec 71 > gflags 0 > > Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, > Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: > 79(----) > > Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: > bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 > domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) > [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 > intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: > machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 > XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 > > ------------------------------------------------------------------------ > > /proc/irq/48/smp_affinity=2: > > dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed > from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) > > domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed > from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) > > Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, > Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: > 79(----) > > qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 > gflags 2 pt_msi_update: Update msi with pirq 4f gvec > b1 gflags 2_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-20 07:51 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Attached two patches should fix the issues. For the issue which complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", I root-caused it. Currenlty, when programs MSI address & data, Xen doesn''t perform the mask/unmask logic to avoid inconsistent interrupt genernation. In this case, according to spec, the interrupt generation behavior is undfined, and device may generate MSI interrupts with the expected vector and incorrect destination ID, so leads to the issue. The attached two patches should address it. Fix-irq-affinity-msi3.patch: same with the previous post. Mask_msi_irq_when_programe_it.patch : disable irq when program msi. Xiantao Zhang, Xiantao wrote:> Cinco, Dante wrote: >> Xiantao, >> With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ >> smp_affinity to any one-hot value and see the interrupts routed to >> the specified CPU. Every now and then though, both domU and dom0 will >> permanently lockup (cold reboot required) after changing the >> smp_affinity. If I change it manually via command-line, it seems to >> be okay but if I change it within a script (such as shifting-left a >> walking "1" to test all 16 CPUs), it will lockup part way through the >> script. > > I can''t reproduce the failure at my side after applying the patches > even with a similar script which changes irq''s affinity. Could you > share your script with me ? > > > >> Other observations: >> >> In the above log, I had changed the smp_affinity for IRQ 66 but IRQ >> 68 and 69 got masked. > > We can see the warning as "No irq handler for vector" but it > shouldn''t hang host, and it maybe related to another potential issue, > and maybe need further investigation. > > Xiantao > >> -----Original Message----- >> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >> Sent: Friday, October 16, 2009 5:59 PM >> To: Cinco, Dante; He, Qing >> Cc: xen-devel@lists.xensource.com; Fraser; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Dante >> It should be another issue as you described. Can you try the >> following code to see whether it works for you ? Just a try. Xiantao >> >> diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 >> +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 >> @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) >> continue; irq = desc - irq_desc; >> ASSERT(MSI_IRQ(irq)); >> - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); >> + //desc->handler->set_affinity(irq, >> *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); >> } spin_unlock(&d->event_lock); >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, >> Dante Sent: Saturday, October 17, 2009 2:24 AM >> To: Zhang, Xiantao; He, Qing >> Cc: Keir; xen-devel@lists.xensource.com; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Xiantao, >> I''m still losing the interrupts with your patch but I see some >> differences. To simplifiy the data, I''m only going to focus on the >> first function of my 4-function PCI device. >> >> After changing the IRQ affinity, the IRQ is not masked anymore >> (unlike before the patch). What stands out for me is the new vector >> (219) as reported by "guest interrupt information" does not match the >> vector (187) in dom0 lspci. Before the patch, the new vector in >> "guest interrupt information" matched the new vector in dom0 lspci >> (dest ID in dom0 lspci was unchanged). I also saw this message pop on >> the Xen console when I changed smp_affinity: >> >> (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). >> >> 187 is the vector from dom0 lspci before and after the smp_affinity >> change but "guest interrupt information" reports the new vector is >> 219. To me, this looks like the new MSI message data (with >> vector=219) did not get written into the PCI device, right? >> >> Here''s a comparison before and after changing smp_affinity from ffff >> to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): >> >> ------------------------------------------------------------------------ >> >> /proc/irq/48/smp_affinity=ffff (default): >> >> dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) >> >> domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) >> >> qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) >> pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 0 >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, >> Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: >> bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 >> domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) >> [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 >> intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: >> machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 >> XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 >> >> ------------------------------------------------------------------------ >> >> /proc/irq/48/smp_affinity=2: >> >> dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed >> from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) >> >> domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed >> from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, >> Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 2 pt_msi_update: Update msi with pirq 4f gvec >> b1 gflags 2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-20 17:26 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Xintao, With the latest patch (Fix-irq-affinity-msi3.patch, Mask_msi_irq_when_programe_it.patch), should I still apply the previous patch with removes "desc->handler->set_affinity(irq, *cpumask_of(v->processor))" or was that just a one-time experiment that should now be discarded? Dante -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Tuesday, October 20, 2009 12:51 AM To: Zhang, Xiantao; Cinco, Dante; He, Qing Cc: xen-devel@lists.xensource.com; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Attached two patches should fix the issues. For the issue which complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", I root-caused it. Currenlty, when programs MSI address & data, Xen doesn''t perform the mask/unmask logic to avoid inconsistent interrupt genernation. In this case, according to spec, the interrupt generation behavior is undfined, and device may generate MSI interrupts with the expected vector and incorrect destination ID, so leads to the issue. The attached two patches should address it. Fix-irq-affinity-msi3.patch: same with the previous post. Mask_msi_irq_when_programe_it.patch : disable irq when program msi. Xiantao Zhang, Xiantao wrote:> Cinco, Dante wrote: >> Xiantao, >> With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ >> smp_affinity to any one-hot value and see the interrupts routed to >> the specified CPU. Every now and then though, both domU and dom0 will >> permanently lockup (cold reboot required) after changing the >> smp_affinity. If I change it manually via command-line, it seems to >> be okay but if I change it within a script (such as shifting-left a >> walking "1" to test all 16 CPUs), it will lockup part way through the >> script. > > I can''t reproduce the failure at my side after applying the patches > even with a similar script which changes irq''s affinity. Could you > share your script with me ? > > > >> Other observations: >> >> In the above log, I had changed the smp_affinity for IRQ 66 but IRQ >> 68 and 69 got masked. > > We can see the warning as "No irq handler for vector" but it shouldn''t > hang host, and it maybe related to another potential issue, and maybe > need further investigation. > > Xiantao > >> -----Original Message----- >> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >> Sent: Friday, October 16, 2009 5:59 PM >> To: Cinco, Dante; He, Qing >> Cc: xen-devel@lists.xensource.com; Fraser; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Dante >> It should be another issue as you described. Can you try the >> following code to see whether it works for you ? Just a try. Xiantao >> >> diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 >> +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 >> @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) >> continue; irq = desc - irq_desc; >> ASSERT(MSI_IRQ(irq)); >> - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); >> + //desc->handler->set_affinity(irq, >> *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); >> } spin_unlock(&d->event_lock); >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, >> Dante Sent: Saturday, October 17, 2009 2:24 AM >> To: Zhang, Xiantao; He, Qing >> Cc: Keir; xen-devel@lists.xensource.com; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Xiantao, >> I''m still losing the interrupts with your patch but I see some >> differences. To simplifiy the data, I''m only going to focus on the >> first function of my 4-function PCI device. >> >> After changing the IRQ affinity, the IRQ is not masked anymore >> (unlike before the patch). What stands out for me is the new vector >> (219) as reported by "guest interrupt information" does not match the >> vector (187) in dom0 lspci. Before the patch, the new vector in >> "guest interrupt information" matched the new vector in dom0 lspci >> (dest ID in dom0 lspci was unchanged). I also saw this message pop on >> the Xen console when I changed smp_affinity: >> >> (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). >> >> 187 is the vector from dom0 lspci before and after the smp_affinity >> change but "guest interrupt information" reports the new vector is >> 219. To me, this looks like the new MSI message data (with >> vector=219) did not get written into the PCI device, right? >> >> Here''s a comparison before and after changing smp_affinity from ffff >> to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=ffff (default): >> >> dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) >> >> domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) >> >> qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) >> pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 0 >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, >> Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: >> bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 >> domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) >> [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 >> intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: >> machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 >> XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=2: >> >> dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed >> from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) >> >> domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed >> from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, >> Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 2 pt_msi_update: Update msi with pirq 4f gvec >> b1 gflags 2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-21 01:10 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Only need to apply the two patches and the previous one should be discarded. Xiantao -----Original Message----- From: Cinco, Dante [mailto:Dante.Cinco@lsi.com] Sent: Wednesday, October 21, 2009 1:27 AM To: Zhang, Xiantao; He, Qing Cc: xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Xintao, With the latest patch (Fix-irq-affinity-msi3.patch, Mask_msi_irq_when_programe_it.patch), should I still apply the previous patch with removes "desc->handler->set_affinity(irq, *cpumask_of(v->processor))" or was that just a one-time experiment that should now be discarded? Dante -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Tuesday, October 20, 2009 12:51 AM To: Zhang, Xiantao; Cinco, Dante; He, Qing Cc: xen-devel@lists.xensource.com; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Attached two patches should fix the issues. For the issue which complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", I root-caused it. Currenlty, when programs MSI address & data, Xen doesn''t perform the mask/unmask logic to avoid inconsistent interrupt genernation. In this case, according to spec, the interrupt generation behavior is undfined, and device may generate MSI interrupts with the expected vector and incorrect destination ID, so leads to the issue. The attached two patches should address it. Fix-irq-affinity-msi3.patch: same with the previous post. Mask_msi_irq_when_programe_it.patch : disable irq when program msi. Xiantao Zhang, Xiantao wrote:> Cinco, Dante wrote: >> Xiantao, >> With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ >> smp_affinity to any one-hot value and see the interrupts routed to >> the specified CPU. Every now and then though, both domU and dom0 will >> permanently lockup (cold reboot required) after changing the >> smp_affinity. If I change it manually via command-line, it seems to >> be okay but if I change it within a script (such as shifting-left a >> walking "1" to test all 16 CPUs), it will lockup part way through the >> script. > > I can''t reproduce the failure at my side after applying the patches > even with a similar script which changes irq''s affinity. Could you > share your script with me ? > > > >> Other observations: >> >> In the above log, I had changed the smp_affinity for IRQ 66 but IRQ >> 68 and 69 got masked. > > We can see the warning as "No irq handler for vector" but it shouldn''t > hang host, and it maybe related to another potential issue, and maybe > need further investigation. > > Xiantao > >> -----Original Message----- >> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >> Sent: Friday, October 16, 2009 5:59 PM >> To: Cinco, Dante; He, Qing >> Cc: xen-devel@lists.xensource.com; Fraser; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Dante >> It should be another issue as you described. Can you try the >> following code to see whether it works for you ? Just a try. Xiantao >> >> diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 >> +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 >> @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) >> continue; irq = desc - irq_desc; >> ASSERT(MSI_IRQ(irq)); >> - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); >> + //desc->handler->set_affinity(irq, >> *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); >> } spin_unlock(&d->event_lock); >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, >> Dante Sent: Saturday, October 17, 2009 2:24 AM >> To: Zhang, Xiantao; He, Qing >> Cc: Keir; xen-devel@lists.xensource.com; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Xiantao, >> I''m still losing the interrupts with your patch but I see some >> differences. To simplifiy the data, I''m only going to focus on the >> first function of my 4-function PCI device. >> >> After changing the IRQ affinity, the IRQ is not masked anymore >> (unlike before the patch). What stands out for me is the new vector >> (219) as reported by "guest interrupt information" does not match the >> vector (187) in dom0 lspci. Before the patch, the new vector in >> "guest interrupt information" matched the new vector in dom0 lspci >> (dest ID in dom0 lspci was unchanged). I also saw this message pop on >> the Xen console when I changed smp_affinity: >> >> (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). >> >> 187 is the vector from dom0 lspci before and after the smp_affinity >> change but "guest interrupt information" reports the new vector is >> 219. To me, this looks like the new MSI message data (with >> vector=219) did not get written into the PCI device, right? >> >> Here''s a comparison before and after changing smp_affinity from ffff >> to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=ffff (default): >> >> dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) >> >> domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) >> >> qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) >> pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 0 >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, >> Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: >> bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 >> domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) >> [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 >> intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: >> machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 >> XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=2: >> >> dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed >> from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) >> >> domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed >> from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, >> Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 2 pt_msi_update: Update msi with pirq 4f gvec >> b1 gflags 2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-22 01:00 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
After adding a lot of dprintk''s in the code (xen/arch/x86/msi.c, irq.c, pci.c, traps.c), I found out why I''m getting the message "do_IRQ: 1.186 No irq handler for vector (irq -1)." Some time after the new MSI message address (dest ID) and data (vector) were written to the PCI device, something or somebody called guest_io_write() which overwrote the new vector (218) with the old vector (186). I added an extra read_msi_msg() after write_msi_msg() just to make sure that the new MSI message address and data was actually written to the PCI device. I added some code in pci_conf_write() and pci_conf_read() to print the "cf8" and data if a write/read is targeted at the bus/dev/func/reg of the PCI device. One of my questions is where did the old vector (186) come from? What data structure did guest_io_write() get the 186 from? I hope this data will help get to the bottom of this IRQ SMP affinity problem. Dante -------------------------------------------- BEGIN DATA cat /proc/irq/48/smp_affinity ffff (XEN) Guest interrupt information: (XEN) IRQ: 66, IRQ affinity:0x00000001, Vec:186 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) dom0 lspci -vv -s 0:07:0.0 | grep Address Address: 00000000fee00000 Data: 40ba (dest ID=0 APIC ID of CPU0, vector=186) domU lspci -vv -s 00:05.0 | grep IRQ Interrupt: pin A routed to IRQ 48 domU lspci -vv -s 00:05.0 | grep Address Address: 00000000fee00000 Data: 4071 -------------------------------------------- domU: echo 2 > /proc/irq/48/smp_affinity (XEN) irq.c:415: __assign_irq_vector::irq=66,old_vector=186,cfg->vector=218,cpu=1 (XEN) hvm.c:248: hvm_migrate_pirqs::irq=66, v->processor=1 (XEN) io_apic.c:339: set_desc_affinity::irq=66,apicid=16,vector=218 (XEN) msi.c:270: write_msi_msg::msg->address_lo=0xfee10000,msg->data=0x40da (XEN) pci.c:53: pci_conf_write::cf8=0x80070064,offset=0,bytes=4,data=0xfee10000 (MSI message address low, dest ID) (XEN) pci.c:53: pci_conf_write::cf8=0x80070068,offset=0,bytes=4,data=0x0 (MSI message address high, 64-bit) (XEN) pci.c:53: pci_conf_write::cf8=0x8007006c,offset=0,bytes=2,data=0x40da (MSI message data, vector) (XEN) pci.c:42: pci_conf_read::cf8=0x80070064,offset=0,bytes=4,value=0xfee10000 (XEN) pci.c:42: pci_conf_read::cf8=0x80070068,offset=0,bytes=4,value=0x0 (XEN) pci.c:42: pci_conf_read::cf8=0x8007006c,offset=0,bytes=2,value=0x40da (XEN) msi.c:204: read_msi_msg::msg->address_lo=0xfee10000,msg->data=0x40da (XEN) traps.c:1626: guest_io_write::pci_conf_write data=0x40ba <<<<<<<<<< culprit (XEN) pci.c:53: pci_conf_write::cf8=0x8007006c,offset=0,bytes=2,data=0x40ba <<<<<<<<<< vector reverted back to 186 (XEN) do_IRQ: 1.186 No irq handler for vector (irq -1) <<<<<<<<<< can''t find handler because vector should have been 218 (XEN) Guest interrupt information: (XEN) IRQ: 66, IRQ affinity:0x00000002, Vec:218 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----) dom0 lspci -vv -s 0:07:0.0 | grep Address Address: 00000000fee10000 Data: 40ba (dest ID=16 APIC ID of CPU1, vector=186) domU lspci -vv -s 00:05.0 | grep Address Address: 00000000fee02000 Data: 40b1 I followed the call hierarchy for guest_io_write() as far as I can: do_page_fault fixup_page_fault handle_gdt_ldt_mapping_fault do_general_protection emulate_privileged_op guest_io_write -------------------------------------------- END DATA -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Tuesday, October 20, 2009 6:11 PM To: Cinco, Dante; He, Qing Cc: xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Only need to apply the two patches and the previous one should be discarded. Xiantao -----Original Message----- From: Cinco, Dante [mailto:Dante.Cinco@lsi.com] Sent: Wednesday, October 21, 2009 1:27 AM To: Zhang, Xiantao; He, Qing Cc: xen-devel@lists.xensource.com; Keir Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Xintao, With the latest patch (Fix-irq-affinity-msi3.patch, Mask_msi_irq_when_programe_it.patch), should I still apply the previous patch with removes "desc->handler->set_affinity(irq, *cpumask_of(v->processor))" or was that just a one-time experiment that should now be discarded? Dante -----Original Message----- From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] Sent: Tuesday, October 20, 2009 12:51 AM To: Zhang, Xiantao; Cinco, Dante; He, Qing Cc: xen-devel@lists.xensource.com; Fraser Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) Attached two patches should fix the issues. For the issue which complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", I root-caused it. Currenlty, when programs MSI address & data, Xen doesn''t perform the mask/unmask logic to avoid inconsistent interrupt genernation. In this case, according to spec, the interrupt generation behavior is undfined, and device may generate MSI interrupts with the expected vector and incorrect destination ID, so leads to the issue. The attached two patches should address it. Fix-irq-affinity-msi3.patch: same with the previous post. Mask_msi_irq_when_programe_it.patch : disable irq when program msi. Xiantao Zhang, Xiantao wrote:> Cinco, Dante wrote: >> Xiantao, >> With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ >> smp_affinity to any one-hot value and see the interrupts routed to >> the specified CPU. Every now and then though, both domU and dom0 will >> permanently lockup (cold reboot required) after changing the >> smp_affinity. If I change it manually via command-line, it seems to >> be okay but if I change it within a script (such as shifting-left a >> walking "1" to test all 16 CPUs), it will lockup part way through the >> script. > > I can''t reproduce the failure at my side after applying the patches > even with a similar script which changes irq''s affinity. Could you > share your script with me ? > > > >> Other observations: >> >> In the above log, I had changed the smp_affinity for IRQ 66 but IRQ >> 68 and 69 got masked. > > We can see the warning as "No irq handler for vector" but it shouldn''t > hang host, and it maybe related to another potential issue, and maybe > need further investigation. > > Xiantao > >> -----Original Message----- >> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >> Sent: Friday, October 16, 2009 5:59 PM >> To: Cinco, Dante; He, Qing >> Cc: xen-devel@lists.xensource.com; Fraser; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Dante >> It should be another issue as you described. Can you try the >> following code to see whether it works for you ? Just a try. Xiantao >> >> diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 >> +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 >> @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) >> continue; irq = desc - irq_desc; >> ASSERT(MSI_IRQ(irq)); >> - desc->handler->set_affinity(irq, *cpumask_of(v->processor)); >> + //desc->handler->set_affinity(irq, >> *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); >> } spin_unlock(&d->event_lock); >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, >> Dante Sent: Saturday, October 17, 2009 2:24 AM >> To: Zhang, Xiantao; He, Qing >> Cc: Keir; xen-devel@lists.xensource.com; Fraser >> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus >>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >> >> Xiantao, >> I''m still losing the interrupts with your patch but I see some >> differences. To simplifiy the data, I''m only going to focus on the >> first function of my 4-function PCI device. >> >> After changing the IRQ affinity, the IRQ is not masked anymore >> (unlike before the patch). What stands out for me is the new vector >> (219) as reported by "guest interrupt information" does not match the >> vector (187) in dom0 lspci. Before the patch, the new vector in >> "guest interrupt information" matched the new vector in dom0 lspci >> (dest ID in dom0 lspci was unchanged). I also saw this message pop on >> the Xen console when I changed smp_affinity: >> >> (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). >> >> 187 is the vector from dom0 lspci before and after the smp_affinity >> change but "guest interrupt information" reports the new vector is >> 219. To me, this looks like the new MSI message data (with >> vector=219) did not get written into the PCI device, right? >> >> Here''s a comparison before and after changing smp_affinity from ffff >> to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=ffff (default): >> >> dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) >> >> domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) >> >> qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) >> pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 0 >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, >> Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: >> bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 >> domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) >> [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 >> intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: >> machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 >> XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 >> >> --------------------------------------------------------------------- >> --- >> >> /proc/irq/48/smp_affinity=2: >> >> dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed >> from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) >> >> domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed >> from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) >> >> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, >> Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >> 79(----) >> >> qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 >> gflags 2 pt_msi_update: Update msi with pirq 4f gvec >> b1 gflags 2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-22 01:58 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Dante, Have you applied the two patches when you did the testing? Without them, we can reproduce the issue you reported, but with them, the issue is gone. The root-cause is that when program MSI, we have to mask the MSI interrupt source first, otherwise it may generate inconistent interrupts with incorrect destination and right vector or incorrect vector and right destination. For exmaple, if the old MSI interrupt info is 0.186 which means the destination id is 0 and the vector is 186, but when the IRQ migrates to another cpu(e.g. Cpu 1), the MSI info should be changed to 1.194. When you programs MSI info to pci device, if not mask it first, it may generate the interrupt as 1.186 or 0.194. Obviously, ther interrupts with the info 1.186 and 0.194 doesn''t exist, and according to the spec, any combination is possible. Since Xen writes addr field first, so it is likely to generate 1.186 instead of 0.194, so your pci devices may generate interrupt with new destination and old vector(1.186). In my two patches, one is used to fix guest interrupt affinity issue(a race exists between guest eoi old vector and guest setting new vector), and another one is used to safely program MSI info to pci devices to avoid inconsistent interrupts generation.> (XEN) traps.c:1626: guest_io_write::pci_conf_write data=0x40baThis should be written by dom0(likely to be Qemu). And if it does exist, we may have to prohibit such unsafe writings about MSI in Qemu. Xiantao> <<<<<<<<<< culprit (XEN) pci.c:53: > pci_conf_write::cf8=0x8007006c,offset=0,bytes=2,data=0x40ba > <<<<<<<<<< vector reverted back to 186 (XEN) do_IRQ: 1.186 No irq > handler for vector (irq -1) <<<<<<<<<< can''t > find handler because vector should have been 218 > > (XEN) Guest interrupt information: > (XEN) IRQ: 66, IRQ affinity:0x00000002, Vec:218 type=PCI-MSI > status=00000010 in-flight=0 domain-list=1: 79(----) > > dom0 lspci -vv -s 0:07:0.0 | grep Address > Address: 00000000fee10000 Data: 40ba (dest ID=16 > APIC ID of CPU1, vector=186) > > domU lspci -vv -s 00:05.0 | grep Address > Address: 00000000fee02000 Data: 40b1 > > I followed the call hierarchy for guest_io_write() as far as I can: > > do_page_fault > fixup_page_fault > handle_gdt_ldt_mapping_fault > do_general_protection > emulate_privileged_op > guest_io_write > > -------------------------------------------- END DATA > > -----Original Message----- > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > Sent: Tuesday, October 20, 2009 6:11 PM > To: Cinco, Dante; He, Qing > Cc: xen-devel@lists.xensource.com; Keir Fraser > Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > Only need to apply the two patches and the previous one should be > discarded. > Xiantao > > -----Original Message----- > From: Cinco, Dante [mailto:Dante.Cinco@lsi.com] > Sent: Wednesday, October 21, 2009 1:27 AM > To: Zhang, Xiantao; He, Qing > Cc: xen-devel@lists.xensource.com; Keir Fraser > Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > Xintao, > With the latest patch (Fix-irq-affinity-msi3.patch, > Mask_msi_irq_when_programe_it.patch), should I still apply the > previous patch with removes "desc->handler->set_affinity(irq, > *cpumask_of(v->processor))" or was that just a one-time experiment > that should now be discarded? > Dante > > -----Original Message----- > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > Sent: Tuesday, October 20, 2009 12:51 AM > To: Zhang, Xiantao; Cinco, Dante; He, Qing > Cc: xen-devel@lists.xensource.com; Fraser > Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > Attached two patches should fix the issues. For the issue which > complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", > I root-caused it. Currenlty, when programs MSI address & data, Xen > doesn''t perform the mask/unmask logic to avoid inconsistent interrupt > genernation. In this case, according to spec, the interrupt > generation behavior is undfined, and device may generate MSI > interrupts with the expected vector and incorrect destination ID, so > leads to the issue. The attached two patches should address it. > Fix-irq-affinity-msi3.patch: same with the previous post. > Mask_msi_irq_when_programe_it.patch : disable irq when program msi. > > Xiantao > > > Zhang, Xiantao wrote: >> Cinco, Dante wrote: >>> Xiantao, >>> With vcpus=16 (all CPUs) in domU, I''m able to change the IRQ >>> smp_affinity to any one-hot value and see the interrupts routed to >>> the specified CPU. Every now and then though, both domU and dom0 >>> will permanently lockup (cold reboot required) after changing the >>> smp_affinity. If I change it manually via command-line, it seems to >>> be okay but if I change it within a script (such as shifting-left a >>> walking "1" to test all 16 CPUs), it will lockup part way through >>> the script. >> >> I can''t reproduce the failure at my side after applying the patches >> even with a similar script which changes irq''s affinity. Could you >> share your script with me ? >> >> >> >>> Other observations: >>> >>> In the above log, I had changed the smp_affinity for IRQ 66 but IRQ >>> 68 and 69 got masked. >> >> We can see the warning as "No irq handler for vector" but it >> shouldn''t hang host, and it maybe related to another potential >> issue, and maybe need further investigation. >> >> Xiantao >> >>> -----Original Message----- >>> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >>> Sent: Friday, October 16, 2009 5:59 PM >>> To: Cinco, Dante; He, Qing >>> Cc: xen-devel@lists.xensource.com; Fraser; Fraser >>> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with >>> vcpus >>>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >>> >>> Dante >>> It should be another issue as you described. Can you try the >>> following code to see whether it works for you ? Just a try. >>> Xiantao >>> >>> diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c >>> --- a/xen/arch/x86/hvm/hvm.c Fri Oct 16 09:04:53 2009 +0100 >>> +++ b/xen/arch/x86/hvm/hvm.c Sat Oct 17 08:48:23 2009 +0800 >>> @@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v) >>> continue; irq = desc - irq_desc; >>> ASSERT(MSI_IRQ(irq)); >>> - desc->handler->set_affinity(irq, >>> *cpumask_of(v->processor)); + >>> //desc->handler->set_affinity(irq, >>> *cpumask_of(v->processor)); spin_unlock_irq(&desc->lock); } >>> spin_unlock(&d->event_lock); >>> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cinco, >>> Dante Sent: Saturday, October 17, 2009 2:24 AM >>> To: Zhang, Xiantao; He, Qing >>> Cc: Keir; xen-devel@lists.xensource.com; Fraser >>> Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with >>> vcpus >>>> 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) >>> >>> Xiantao, >>> I''m still losing the interrupts with your patch but I see some >>> differences. To simplifiy the data, I''m only going to focus on the >>> first function of my 4-function PCI device. >>> >>> After changing the IRQ affinity, the IRQ is not masked anymore >>> (unlike before the patch). What stands out for me is the new vector >>> (219) as reported by "guest interrupt information" does not match >>> the vector (187) in dom0 lspci. Before the patch, the new vector in >>> "guest interrupt information" matched the new vector in dom0 lspci >>> (dest ID in dom0 lspci was unchanged). I also saw this message pop >>> on the Xen console when I changed smp_affinity: >>> >>> (XEN) do_IRQ: 1.187 No irq handler for vector (irq -1). >>> >>> 187 is the vector from dom0 lspci before and after the smp_affinity >>> change but "guest interrupt information" reports the new vector is >>> 219. To me, this looks like the new MSI message data (with >>> vector=219) did not get written into the PCI device, right? >>> >>> Here''s a comparison before and after changing smp_affinity from ffff >>> to 2 (dom0 is pvops 2.6.31.1, domU is 2.6.30.1): >>> >>> --------------------------------------------------------------------- >>> --- >>> >>> /proc/irq/48/smp_affinity=ffff (default): >>> >>> dom0 lspci: Address: 00000000fee00000 Data: 40bb (vector=187) >>> >>> domU lspci: Address: 00000000fee00000 Data: 4071 (vector=113) >>> >>> qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79) >>> pt_msi_update: Update msi with pirq 4f gvec 71 >>> gflags 0 >>> >>> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, >>> Vec:187 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >>> 79(----) >>> >>> Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: >>> bdf = 7:0.0 (XEN) [VT-D]iommu.c:1175:d0 >>> domain_context_mapping:PCIe: bdf = 7:0.0 (XEN) >>> [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 >>> intx = 0 (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: >>> machine_gsi=79 guest_gsi=36, device=5, intx=0 (XEN) io.c:381:d0 >>> XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 0x5 intx = 0x0 >>> >>> --------------------------------------------------------------------- >>> --- >>> >>> /proc/irq/48/smp_affinity=2: >>> >>> dom0 lspci: Address: 00000000fee10000 Data: 40bb (dest ID changed >>> from 0 (APIC ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged) >>> >>> domU lspci: Address: 00000000fee02000 Data: 40b1 (dest ID changed >>> from 0 (APIC ID of CPU0) to 2 (APIC ID of CPU1), new vector=177) >>> >>> Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, >>> Vec:219 type=PCI-MSI status=00000010 in-flight=0 domain-list=1: >>> 79(----) >>> >>> qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 >>> gflags 2 pt_msi_update: Update msi with pirq 4f >>> gvec b1 gflags 2 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-22 02:42 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Zhang, Xiantao wrote:> Dante, > Have you applied the two patches when you did the testing? > Without them, we can reproduce the issue you reported, but with them, > the issue is gone. The root-cause is that when program MSI, we have > to mask the MSI interrupt source first, otherwise it may generate > inconistent interrupts with incorrect destination and right vector or > incorrect vector and right destination. > > For exmaple, if the old MSI interrupt info is 0.186 which means the > destination id is 0 and the vector is 186, but when the IRQ migrates > to another cpu(e.g. Cpu 1), the MSI info should be changed to 1.194. > When you programs MSI info to pci device, if not mask it first, it > may generate the interrupt as 1.186 or 0.194. Obviously, ther > interrupts with the info 1.186 and 0.194 doesn''t exist, and according > to the spec, any combination is possible. Since Xen writes addr field > first, so it is likely to generate 1.186 instead of 0.194, so your > pci devices may generate interrupt with new destination and old > vector(1.186). In my two patches, one is used to fix guest > interrupt affinity issue(a race exists between guest eoi old vector > and guest setting new vector), and another one is used to safely > program MSI info to pci devices to avoid inconsistent interrupts > generation. > >> (XEN) traps.c:1626: guest_io_write::pci_conf_write data=0x40ba > > This should be written by dom0(likely to be Qemu). And if it does > exist, we may have to prohibit such unsafe writings about MSI in > Qemu.Another issue may exist which leads to the issue. Currenlty, both Qemu and hypervisor can program MSI but Xen lacks synchronization mechnism between them to avoid race. As said in the last mail, Qemu shouldn''t be allowed to do the unsafe writing about MSI Info, and insteadly, it should resort to hypervisor through hypercall for MSI programing, otherwise, Qemu may write staled MSI info to PCI devices and leads to the strange issues. Keir/Ian What''s your opinion about the potential issue ? Maybe we need to add a lock between them or just allow hypervisor to do the writing ? Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Oct-22 05:10 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On Thu, 2009-10-22 at 09:58 +0800, Zhang, Xiantao wrote:> > (XEN) traps.c:1626: guest_io_write::pci_conf_write data=0x40ba > > This should be written by dom0(likely to be Qemu). And if it does > exist, we may have to prohibit such unsafe writings about MSI in > Qemu.Yes, it is the case, the problem happens in Qemu, the algorithm looks like below: pt_pci_write_config(new_value) { dev_value = pci_read_block(); value = msi_write_handler(dev_value, new_value); pci_write_block(value); } msi_write_handler(dev_value, new_value) { HYPERVISOR_bind_pt_irq(); // updates MSI binding return dev_value; // it decides not to change it } The problem lies here, when bind_pt_irq is called, the real physical data/address is updated by the hypervisor. There were no problem exposed before because at that time hypervisor uses a universal vector , the data/address of msi remains unchanged. But this isn''t the case when per-CPU vector is there, the pci_write_block is undesirable in QEmu now, it writes stale value back into the register and invalidate any modifications. Clearly, if QEmu decides to hand the management of these registers to the hypervisor, it shouldn''t touch them again. Here is a patch to fix this by introducing a no_wb flag. Can you have a try? Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-22 06:25 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 22/10/2009 03:42, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:>> This should be written by dom0(likely to be Qemu). And if it does >> exist, we may have to prohibit such unsafe writings about MSI in >> Qemu. > > Another issue may exist which leads to the issue. Currenlty, both Qemu and > hypervisor can program MSI but Xen lacks synchronization mechnism between them > to avoid race. As said in the last mail, Qemu shouldn''t be allowed to do the > unsafe writing about MSI Info, and insteadly, it should resort to hypervisor > through hypercall for MSI programing, otherwise, Qemu may write staled MSI > info to PCI devices and leads to the strange issues. > Keir/Ian > What''s your opinion about the potential issue ? Maybe we need to add a lock > between them or just allow hypervisor to do the writing ?In general, having qemu make pci updates via the cf8/cfc method is clearly unsafe, and cannot be made safe. I would certainly be happy to see some of the low-level PCI management pushed into pciback (and/or pci-stub, depending on whether pciback is to be ported to pv_ops). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Oct-22 06:46 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 20.10.09 09:51 >>> >Attached two patches should fix the issues. For the issue which complains >"(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", I root-caused it. >Currenlty, when programs MSI address & data, Xen doesn''t perform the >mask/unmask logic to avoid inconsistent interrupt genernation. In this >case, according to spec, the interrupt generation behavior is undfined, >and device may generate MSI interrupts with the expected vector and >incorrect destination ID, so leads to the issue. The attached two patches >should address it.What about the case of MSI not having a mask bit? Shouldn''t movement (i.e. vector or affinity changes) be disallowed for non-maskable ones? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-22 07:11 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Jan Beulich wrote:>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 20.10.09 09:51 >>> >> Attached two patches should fix the issues. For the issue which >> complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", >> I root-caused it. Currenlty, when programs MSI address & data, Xen >> doesn''t perform the mask/unmask logic to avoid inconsistent >> interrupt genernation. In this case, according to spec, the >> interrupt generation behavior is undfined, >> and device may generate MSI interrupts with the expected vector and >> incorrect destination ID, so leads to the issue. The attached two >> patches should address it. > > What about the case of MSI not having a mask bit? Shouldn''t movement > (i.e. vector or affinity changes) be disallowed for non-maskable ones?IRQ migration shouldn''t depend on the interrupt status(mask/unmask), and hyperviosr can handle non-masked irq during the migration. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Oct-22 07:31 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 22.10.09 09:11 >>> >Jan Beulich wrote: >>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 20.10.09 09:51 >>> >>> Attached two patches should fix the issues. For the issue which >>> complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1),", >>> I root-caused it. Currenlty, when programs MSI address & data, Xen >>> doesn''t perform the mask/unmask logic to avoid inconsistent >>> interrupt genernation. In this case, according to spec, the >>> interrupt generation behavior is undfined, >>> and device may generate MSI interrupts with the expected vector and >>> incorrect destination ID, so leads to the issue. The attached two >>> patches should address it. >> >> What about the case of MSI not having a mask bit? Shouldn''t movement >> (i.e. vector or affinity changes) be disallowed for non-maskable ones? > >IRQ migration shouldn''t depend on the interrupt status(mask/unmask), >and hyperviosr can handle non-masked irq during the migration.Hmm, then I don''t understand which case your patch was a fix for: I understood that it addresses an issue when the affinity of an interrupt gets changed (requiring a re-write of the address/data pair). If the hypervisor can deal with it without masking, then why did you add it? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-22 08:41 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Jan Beulich wrote:>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 22.10.09 09:11 >>> >> Jan Beulich wrote: >>>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 20.10.09 09:51 >>> >>>> Attached two patches should fix the issues. For the issue which >>>> complains "(XEN) do_IRQ: 1.187 No irq handler for vector (irq >>>> -1),", I root-caused it. Currenlty, when programs MSI address & >>>> data, Xen doesn''t perform the mask/unmask logic to avoid >>>> inconsistent interrupt genernation. In this case, according to >>>> spec, the interrupt generation behavior is undfined, >>>> and device may generate MSI interrupts with the expected vector and >>>> incorrect destination ID, so leads to the issue. The attached two >>>> patches should address it. >>> >>> What about the case of MSI not having a mask bit? Shouldn''t movement >>> (i.e. vector or affinity changes) be disallowed for non-maskable >>> ones? >> >> IRQ migration shouldn''t depend on the interrupt status(mask/unmask), >> and hyperviosr can handle non-masked irq during the migration. > > Hmm, then I don''t understand which case your patch was a fix for: I > understood that it addresses an issue when the affinity of an > interrupt gets changed (requiring a re-write of the address/data > pair). If the hypervisor can deal with it without masking, then why > did you add it?Hmm, sorry, seems I misunderstood your question. If the msi doesn''t support mask bit(clearing MSI enable bit doesn''t help in this case), the issue may still exist. Just checked Linux side, seems it doesn''t perform mask operation when program MSI, but don''t know why Linux hasn''t such issues. Actaully, we do see inconsisten interrupt message from the device without this patch, and after applying the patch, the issue is gone. May need further investigation why Linux doesn''t need the mask operation. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-22 09:42 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:>> Hmm, then I don''t understand which case your patch was a fix for: I >> understood that it addresses an issue when the affinity of an >> interrupt gets changed (requiring a re-write of the address/data >> pair). If the hypervisor can deal with it without masking, then why >> did you add it? > > Hmm, sorry, seems I misunderstood your question. If the msi doesn''t support > mask bit(clearing MSI enable bit doesn''t help in this case), the issue may > still exist. Just checked Linux side, seems it doesn''t perform mask operation > when program MSI, but don''t know why Linux hasn''t such issues. Actaully, we > do see inconsisten interrupt message from the device without this patch, and > after applying the patch, the issue is gone. May need further investigation > why Linux doesn''t need the mask operation.Linux is quite careful about when it will reprogram vector/affinity info isn''t it? Doesn''t it mark such an update pending and only flush it through during next interrupt delivery, or something like that? Do we need some of the upstream Linux patches for this? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-22 16:32 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Keir Fraser wrote:> On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > >>> Hmm, then I don''t understand which case your patch was a fix for: I >>> understood that it addresses an issue when the affinity of an >>> interrupt gets changed (requiring a re-write of the address/data >>> pair). If the hypervisor can deal with it without masking, then why >>> did you add it? >> >> Hmm, sorry, seems I misunderstood your question. If the msi doesn''t >> support mask bit(clearing MSI enable bit doesn''t help in this case), >> the issue may still exist. Just checked Linux side, seems it doesn''t >> perform mask operation when program MSI, but don''t know why Linux >> hasn''t such issues. Actaully, we do see inconsisten interrupt >> message from the device without this patch, and after applying the >> patch, the issue is gone. May need further investigation why Linux >> doesn''t need the mask operation. > > Linux is quite careful about when it will reprogram vector/affinity > info isn''t it? Doesn''t it mark such an update pending and only flush > it through during next interrupt delivery, or something like that? Do > we need some of the upstream Linux patches for this?Yeah, after checking the related logic in Linux, I think we need to port more logic to support IRQ migration to avoid the reported races in this thread. For setting affinity for specific irq, the first step is to mark it pending, and then do real setting before acking the irq for next interrupt delivery, so at this time there shouldn''t be new interrupts generated for normal devcies before acking it. I will post the backport patch later. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-22 16:33 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Xiantao, I''m sorry I forgot to mention that I did apply your two patches but it didn''t have any effect (interrupts still lost after changing smp_affinity and "No handler for irq vector" message). I added a dprintk in msi_set_mask_bit() and realized that MSI does not have a mask bit (MSIX does). My PCI device uses MSI not MSIX. I placed my dprintk inside the condition below and it never triggered. switch (entry->msi_attrib.type) { case PCI_CAP_ID_MSI: if (entry->msi_attrib.maskbit) { While debugging this problem, I thought about the potential problem of an interrupt firing between the writes for the MSI message address and MSI message data. I noticed that pci_conf_write() uses spin_lock_irqsave() to disable interrupts before issuing the "out" instruction but the writes for the address and data are two separate pci_conf_write() calls. To me, it would be safer to write the address and data in a single call and preceded by spin_lock_irqsave(). This way, when the interrupts are enabled, the address and data have both been updated. Dante -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: Thursday, October 22, 2009 2:42 AM To: Zhang, Xiantao; Jan Beulich Cc: He, Qing; xen-devel@lists.xensource.com; Cinco, Dante Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:>> Hmm, then I don''t understand which case your patch was a fix for: I >> understood that it addresses an issue when the affinity of an >> interrupt gets changed (requiring a re-write of the address/data >> pair). If the hypervisor can deal with it without masking, then why >> did you add it? > > Hmm, sorry, seems I misunderstood your question. If the msi doesn''t > support mask bit(clearing MSI enable bit doesn''t help in this case), > the issue may still exist. Just checked Linux side, seems it doesn''t > perform mask operation when program MSI, but don''t know why Linux > hasn''t such issues. Actaully, we do see inconsisten interrupt message > from the device without this patch, and after applying the patch, the > issue is gone. May need further investigation why Linux doesn''t need the mask operation.Linux is quite careful about when it will reprogram vector/affinity info isn''t it? Doesn''t it mark such an update pending and only flush it through during next interrupt delivery, or something like that? Do we need some of the upstream Linux patches for this? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-22 21:11 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
On 10/21/09 23:25, Keir Fraser wrote:> In general, having qemu make pci updates via the cf8/cfc method is clearly > unsafe, and cannot be made safe. I would certainly be happy to see some of > the low-level PCI management pushed into pciback (and/or pci-stub, depending > on whether pciback is to be ported to pv_ops). >I''ve got Konrad''s forward-port of pciback in xen/master at the moment. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cinco, Dante
2009-Oct-23 00:10 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Qing, Your patch worked. It suppressed the extra write that previously overwrote the MSI message data with the old vector. No more "no handler for irq" message and the interrupts were successfully migrated to the new CPU. I still experienced a hang on both domU and dom0 when I changed the smp_affinity of all 4 PCI devices (I have a 4-function PCI device) simultaneously (the "echo <new_smp_affinity> > /proc/irq/<irq#>/smp_affinity" are in a shell script) but I didn''t get a chance to pursue this today. Dante -----Original Message----- From: Qing He [mailto:qing.he@intel.com] Sent: Wednesday, October 21, 2009 10:11 PM To: Zhang, Xiantao Cc: Cinco, Dante; xen-devel@lists.xensource.com; keir.fraser@eu.citrix.com Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) On Thu, 2009-10-22 at 09:58 +0800, Zhang, Xiantao wrote:> > (XEN) traps.c:1626: guest_io_write::pci_conf_write data=0x40ba > > This should be written by dom0(likely to be Qemu). And if it does > exist, we may have to prohibit such unsafe writings about MSI in Qemu.Yes, it is the case, the problem happens in Qemu, the algorithm looks like below: pt_pci_write_config(new_value) { dev_value = pci_read_block(); value = msi_write_handler(dev_value, new_value); pci_write_block(value); } msi_write_handler(dev_value, new_value) { HYPERVISOR_bind_pt_irq(); // updates MSI binding return dev_value; // it decides not to change it } The problem lies here, when bind_pt_irq is called, the real physical data/address is updated by the hypervisor. There were no problem exposed before because at that time hypervisor uses a universal vector , the data/address of msi remains unchanged. But this isn''t the case when per-CPU vector is there, the pci_write_block is undesirable in QEmu now, it writes stale value back into the register and invalidate any modifications. Clearly, if QEmu decides to hand the management of these registers to the hypervisor, it shouldn''t touch them again. Here is a patch to fix this by introducing a no_wb flag. Can you have a try? Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-23 01:06 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Dante, If the device doesn''t support MSI mask bit, the second patch should have no effect for that. And I am working on backporting more IRQ migration logic from Linux, and it should ensure addr/vector are both written to devices before firing new interrrupts. But as I mentioned before, if you want to solve the guest affinity setting issue, you have to apply the first patch I sent out (fix-irq-affinity-msi3.patch). :-) Xiantao Cinco, Dante wrote:> Xiantao, > > I''m sorry I forgot to mention that I did apply your two patches but > it didn''t have any effect (interrupts still lost after changing > smp_affinity and "No handler for irq vector" message). I added a > dprintk in msi_set_mask_bit() and realized that MSI does not have a > mask bit (MSIX does). My PCI device uses MSI not MSIX. I placed my > dprintk inside the condition below and it never triggered. > > switch (entry->msi_attrib.type) { > case PCI_CAP_ID_MSI: > if (entry->msi_attrib.maskbit) { > > While debugging this problem, I thought about the potential problem > of an interrupt firing between the writes for the MSI message address > and MSI message data. I noticed that pci_conf_write() uses > spin_lock_irqsave() to disable interrupts before issuing the "out" > instruction but the writes for the address and data are two separate > pci_conf_write() calls. To me, it would be safer to write the address > and data in a single call and preceded by spin_lock_irqsave(). This > way, when the interrupts are enabled, the address and data have both > been updated. > > Dante > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, October 22, 2009 2:42 AM > To: Zhang, Xiantao; Jan Beulich > Cc: He, Qing; xen-devel@lists.xensource.com; Cinco, Dante > Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) > > On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > >>> Hmm, then I don''t understand which case your patch was a fix for: I >>> understood that it addresses an issue when the affinity of an >>> interrupt gets changed (requiring a re-write of the address/data >>> pair). If the hypervisor can deal with it without masking, then why >>> did you add it? >> >> Hmm, sorry, seems I misunderstood your question. If the msi doesn''t >> support mask bit(clearing MSI enable bit doesn''t help in this case), >> the issue may still exist. Just checked Linux side, seems it doesn''t >> perform mask operation when program MSI, but don''t know why Linux >> hasn''t such issues. Actaully, we do see inconsisten interrupt >> message >> from the device without this patch, and after applying the patch, the >> issue is gone. May need further investigation why Linux doesn''t >> need the mask operation. > > Linux is quite careful about when it will reprogram vector/affinity > info isn''t it? Doesn''t it mark such an update pending and only flush > it through during next interrupt delivery, or something like that? Do > we need some of the upstream Linux patches for this? > > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-26 13:02 UTC
RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Keir, The attached patch(irq-migration-enhancement.patch) targets to enhance irq migration logic, and the most logic is ported from Linux and tailored for Xen. Please apply, and it should eliminate the race between writing msi''s vector and addr. In addition, to fix guest''s interrupt affinity issue, we also needs to apply the patch(fix-irq-affinity-msi3.patch) . Xiantao Keir Fraser wrote:> On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > >>> Hmm, then I don''t understand which case your patch was a fix for: I >>> understood that it addresses an issue when the affinity of an >>> interrupt gets changed (requiring a re-write of the address/data >>> pair). If the hypervisor can deal with it without masking, then why >>> did you add it? >> >> Hmm, sorry, seems I misunderstood your question. If the msi doesn''t >> support mask bit(clearing MSI enable bit doesn''t help in this case), >> the issue may still exist. Just checked Linux side, seems it doesn''t >> perform mask operation when program MSI, but don''t know why Linux >> hasn''t such issues. Actaully, we do see inconsisten interrupt >> message from the device without this patch, and after applying the >> patch, the issue is gone. May need further investigation why Linux >> doesn''t need the mask operation. > > Linux is quite careful about when it will reprogram vector/affinity > info isn''t it? Doesn''t it mark such an update pending and only flush > it through during next interrupt delivery, or something like that? Do > we need some of the upstream Linux patches for this? > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-26 13:34 UTC
Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)
Thanks, applied as c/s 20370. I think fix-irq-affinity-msi3.patch is already applied as c/s 20334. -- Keir On 26/10/2009 13:02, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:> Keir, > The attached patch(irq-migration-enhancement.patch) targets to enhance irq > migration logic, and the most logic is ported from Linux and tailored for Xen. > Please apply, and it should eliminate the race between writing msi''s vector > and addr. In addition, to fix guest''s interrupt affinity issue, we also needs > to apply the patch(fix-irq-affinity-msi3.patch) . > Xiantao > > > Keir Fraser wrote: >> On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: >> >>>> Hmm, then I don''t understand which case your patch was a fix for: I >>>> understood that it addresses an issue when the affinity of an >>>> interrupt gets changed (requiring a re-write of the address/data >>>> pair). If the hypervisor can deal with it without masking, then why >>>> did you add it? >>> >>> Hmm, sorry, seems I misunderstood your question. If the msi doesn''t >>> support mask bit(clearing MSI enable bit doesn''t help in this case), >>> the issue may still exist. Just checked Linux side, seems it doesn''t >>> perform mask operation when program MSI, but don''t know why Linux >>> hasn''t such issues. Actaully, we do see inconsisten interrupt >>> message from the device without this patch, and after applying the >>> patch, the issue is gone. May need further investigation why Linux >>> doesn''t need the mask operation. >> >> Linux is quite careful about when it will reprogram vector/affinity >> info isn''t it? Doesn''t it mark such an update pending and only flush >> it through during next interrupt delivery, or something like that? Do >> we need some of the upstream Linux patches for this? >> >> -- Keir >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel