Hi, I was playing around with a MSI capable virtual device (so far submitted as patches only) in the upstream qemu tree but having trouble getting it to work on a Xen hvm guest. The device happens to be a QEMU implementation of VMWare''s pvscsi controller. The device works fine in a Xen guest when I switch the device''s code to force usage of legacy interrupts with upstream QEMU. With MSI based interrupts, the device works fine on a KVM guest but as stated before, not on a Xen guest. After digging a bit, it appears, the reason for the failure in Xen guests is that the MSI data register in the Xen guest ends up with a value of 4300 where the Deliver Mode value of 3 happens to be reserved (per spec) and therefore illegal. The vmsi_deliver routine in Xen rejects MSI interrupts with such data as illegal (per expectation) causing all commands issued by the guest OS on the device to timeout. Given this above scenario, I was wondering if anyone can shed some light on how to debug this further for Xen. Something I would specifically like to know is where the MSI data register configuration actually happens. Is it done by some code specific to Xen and within the Xen codebase or it all done within QEMU? Thanks, Deep P.S. some details on the device''s PCI configuration: lspci output for a working instance in KVM: 00:07.0 SCSI storage controller: VMware PVSCSI SCSI Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 Interrupt: pin A routed to IRQ 45 Region 0: Memory at febf0000 (32-bit, non-prefetchable) [size=32K] Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0300c Data: 4161 Kernel driver in use: vmw_pvscsi Kernel modules: vmw_pvscsi Here is the lspci output for the scenario where it''s failing to work in Xen: 00:04.0 SCSI storage controller: VMware PVSCSI SCSI Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 Interrupt: pin A routed to IRQ 80 Region 0: Memory at f3020000 (32-bit, non-prefetchable) [size=32K] Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee36000 Data: 4300 Kernel driver in use: vmw_pvscsi Kernel modules: vmw_pvscsi
On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote:> Hi, I was playing around with a MSI capable virtual device (so far > submitted as patches only) in the upstream qemu tree but having > trouble getting it to work on a Xen hvm guest. The device happens to > be a QEMU implementation of VMWare''s pvscsi controller. The device > works fine in a Xen guest when I switch the device''s code to force > usage of legacy interrupts with upstream QEMU. With MSI based > interrupts, the device works fine on a KVM guest but as stated before, > not on a Xen guest. After digging a bit, it appears, the reason for > the failure in Xen guests is that the MSI data register in the Xen > guest ends up with a value of 4300 where the Deliver Mode value of 3 > happens to be reserved (per spec) and therefore illegal. The > vmsi_deliver routine in Xen rejects MSI interrupts with such data as > illegal (per expectation) causing all commands issued by the guest OS > on the device to timeout. > > Given this above scenario, I was wondering if anyone can shed some > light on how to debug this further for Xen. Something I would > specifically like to know is where the MSI data register configuration > actually happens. Is it done by some code specific to Xen and within > the Xen codebase or it all done within QEMU? >This seems like the same issue I ran into, though in my case it is with passed through physical devices. See http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and the older messages in that thread for more info on what''s going on. No fix yet but help debugging is very welcome.
Deep Debroy
2012-Jun-27 23:18 UTC
Re: MSI message data register configuration in Xen guests
On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@roce.org> wrote:> > On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: > > Hi, I was playing around with a MSI capable virtual device (so far > > submitted as patches only) in the upstream qemu tree but having > > trouble getting it to work on a Xen hvm guest. The device happens to > > be a QEMU implementation of VMWare''s pvscsi controller. The device > > works fine in a Xen guest when I switch the device''s code to force > > usage of legacy interrupts with upstream QEMU. With MSI based > > interrupts, the device works fine on a KVM guest but as stated before, > > not on a Xen guest. After digging a bit, it appears, the reason for > > the failure in Xen guests is that the MSI data register in the Xen > > guest ends up with a value of 4300 where the Deliver Mode value of 3 > > happens to be reserved (per spec) and therefore illegal. The > > vmsi_deliver routine in Xen rejects MSI interrupts with such data as > > illegal (per expectation) causing all commands issued by the guest OS > > on the device to timeout. > > > > Given this above scenario, I was wondering if anyone can shed some > > light on how to debug this further for Xen. Something I would > > specifically like to know is where the MSI data register configuration > > actually happens. Is it done by some code specific to Xen and within > > the Xen codebase or it all done within QEMU? > > > > This seems like the same issue I ran into, though in my case it is > with passed through physical devices. See > http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and > the older messages in that thread for more info on what''s going on. No > fix yet but help debugging is very welcome.Thanks Rolu for pointing out the other thread - it was very useful. Some of the symptoms appear to be identical in my case. However, I am not using a pass-through device. Instead, in my case it''s a fully virtualized device pretty much identical to a raw file backed disk image where the controller is pvscsi rather than lsi. Therefore I guess some of the latter discussion in the other thread around pass-through specific areas of code in qemu are not relevant? Please correct me if I am wrong. Also note that I am using upstream qemu where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor xenstore.c exsits (which is where Stefano''s suggested change appeared to be). So far, here''s what I am observing in the hvm linux guest : On the guest side, as discussed in the other thread, xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 is being by xen_msi_compose_msg that is written in the data register. On the qemu (upstream) side, when the virtualized controller is trying to complete a request, it''s invoking the following chain of calls -> stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the delivery mode of 0x3. Is the above sequence of interactions the expected path for a HVM guest trying to use a fully virtualized device/controller that uses MSI in upstream qemu? If so, if a standard linux guest always populates the value of 0x4300 in the MSI data register through xen_hvm_setup_msi_irqs, how are MSI notifications from a device in qemu supposed to work given the delivery type of 0x3 is indeed reserved and bypass the the vmsi_deliver check? Thanks, Deep
Konrad Rzeszutek Wilk
2012-Jun-28 20:52 UTC
Re: MSI message data register configuration in Xen guests
On Tue, Jun 26, 2012 at 04:51:29AM +0200, Rolu wrote:> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: > > Hi, I was playing around with a MSI capable virtual device (so far > > submitted as patches only) in the upstream qemu tree but having > > trouble getting it to work on a Xen hvm guest. The device happens to > > be a QEMU implementation of VMWare''s pvscsi controller. The device > > works fine in a Xen guest when I switch the device''s code to force > > usage of legacy interrupts with upstream QEMU. With MSI based > > interrupts, the device works fine on a KVM guest but as stated before, > > not on a Xen guest. After digging a bit, it appears, the reason for > > the failure in Xen guests is that the MSI data register in the Xen > > guest ends up with a value of 4300 where the Deliver Mode value of 3 > > happens to be reserved (per spec) and therefore illegal. The > > vmsi_deliver routine in Xen rejects MSI interrupts with such data as > > illegal (per expectation) causing all commands issued by the guest OS > > on the device to timeout. > > > > Given this above scenario, I was wondering if anyone can shed some > > light on how to debug this further for Xen. Something I would > > specifically like to know is where the MSI data register configuration > > actually happens. Is it done by some code specific to Xen and within > > the Xen codebase or it all done within QEMU? > > > > This seems like the same issue I ran into, though in my case it is > with passed through physical devices. See > http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and > the older messages in that thread for more info on what''s going on. No > fix yet but help debugging is very welcome.Huh? You said in http://lists.xen.org/archives/html/xen-devel/2012-06/msg01653.html "This worked!"
Deep Debroy
2012-Jun-28 20:52 UTC
Re: MSI message data register configuration in Xen guests
On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@gmail.com> wrote:> On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@roce.org> wrote: >> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: >> > Hi, I was playing around with a MSI capable virtual device (so far >> > submitted as patches only) in the upstream qemu tree but having >> > trouble getting it to work on a Xen hvm guest. The device happens to >> > be a QEMU implementation of VMWare''s pvscsi controller. The device >> > works fine in a Xen guest when I switch the device''s code to force >> > usage of legacy interrupts with upstream QEMU. With MSI based >> > interrupts, the device works fine on a KVM guest but as stated before, >> > not on a Xen guest. After digging a bit, it appears, the reason for >> > the failure in Xen guests is that the MSI data register in the Xen >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 >> > happens to be reserved (per spec) and therefore illegal. The >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as >> > illegal (per expectation) causing all commands issued by the guest OS >> > on the device to timeout. >> > >> > Given this above scenario, I was wondering if anyone can shed some >> > light on how to debug this further for Xen. Something I would >> > specifically like to know is where the MSI data register configuration >> > actually happens. Is it done by some code specific to Xen and within >> > the Xen codebase or it all done within QEMU? >> > >> >> This seems like the same issue I ran into, though in my case it is >> with passed through physical devices. See >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and >> the older messages in that thread for more info on what''s going on. No >> fix yet but help debugging is very welcome. > > Thanks Rolu for pointing out the other thread - it was very useful. > Some of the symptoms appear to be identical in my case. However, I am > not using a pass-through device. Instead, in my case it''s a fully > virtualized device pretty much identical to a raw file backed disk > image where the controller is pvscsi rather than lsi. Therefore I > guess some of the latter discussion in the other thread around > pass-through specific areas of code in qemu are not relevant? Please > correct me if I am wrong. Also note that I am using upstream qemu > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor > xenstore.c exsits (which is where Stefano''s suggested change appeared > to be). > > So far, here''s what I am observing in the hvm linux guest : > > On the guest side, as discussed in the other thread, > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 > is being by xen_msi_compose_msg that is written in the data register. > On the qemu (upstream) side, when the virtualized controller is trying > to complete a request, it''s invoking the following chain of calls -> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the > delivery mode of 0x3. > > Is the above sequence of interactions the expected path for a HVM > guest trying to use a fully virtualized device/controller that uses > MSI in upstream qemu? If so, if a standard linux guest always > populates the value of 0x4300 in the MSI data register through > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in > qemu supposed to work given the delivery type of 0x3 is indeed > reserved and bypass the the vmsi_deliver check? > > Thanks, > DeepI wanted to see whether the HVM guest can interact with the MSI virtualized controller properly without any of the Xen-specific code in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen specific routines like xen_hvm_setup_msi_irqs which is where the 0x4300 is getting populated. This seems to work properly. The MSI data register for the controller ends up getting a valid value like 0x4049, vmsi_deliver no longer complains, all MSI notifications are delivered in the expected way to the guest and the raw, file-backed disks attached to the controller showing up in fdisk -l. My conclusion: the linux kernel''s xen specific code, specifically routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with fully virtualized qemu devices that use MSI. I will follow-up regarding that on LKML. Thanks, Deep
Deep Debroy
2012-Jun-28 21:26 UTC
Re: MSI message data register configuration in Xen guests
On Thu, Jun 28, 2012 at 1:52 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Tue, Jun 26, 2012 at 04:51:29AM +0200, Rolu wrote: >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: >> > Hi, I was playing around with a MSI capable virtual device (so far >> > submitted as patches only) in the upstream qemu tree but having >> > trouble getting it to work on a Xen hvm guest. The device happens to >> > be a QEMU implementation of VMWare''s pvscsi controller. The device >> > works fine in a Xen guest when I switch the device''s code to force >> > usage of legacy interrupts with upstream QEMU. With MSI based >> > interrupts, the device works fine on a KVM guest but as stated before, >> > not on a Xen guest. After digging a bit, it appears, the reason for >> > the failure in Xen guests is that the MSI data register in the Xen >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 >> > happens to be reserved (per spec) and therefore illegal. The >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as >> > illegal (per expectation) causing all commands issued by the guest OS >> > on the device to timeout. >> > >> > Given this above scenario, I was wondering if anyone can shed some >> > light on how to debug this further for Xen. Something I would >> > specifically like to know is where the MSI data register configuration >> > actually happens. Is it done by some code specific to Xen and within >> > the Xen codebase or it all done within QEMU? >> > >> >> This seems like the same issue I ran into, though in my case it is >> with passed through physical devices. See >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and >> the older messages in that thread for more info on what''s going on. No >> fix yet but help debugging is very welcome. > > Huh? You said in http://lists.xen.org/archives/html/xen-devel/2012-06/msg01653.html > "This worked!"Hi Konrad, I believe Rolu''s response in the thread you pointed to was with respect to pass-through devices. This current thread is not about pass-through devices but for a fully virtualized qemu device - specifically a disk controller that is exposing raw-image backed files from the host the guest as disks, very similar to the default LSI scsi controller in qemu.
On Thu, Jun 28, 2012 at 10:52 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Tue, Jun 26, 2012 at 04:51:29AM +0200, Rolu wrote: >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: >> > Hi, I was playing around with a MSI capable virtual device (so far >> > submitted as patches only) in the upstream qemu tree but having >> > trouble getting it to work on a Xen hvm guest. The device happens to >> > be a QEMU implementation of VMWare''s pvscsi controller. The device >> > works fine in a Xen guest when I switch the device''s code to force >> > usage of legacy interrupts with upstream QEMU. With MSI based >> > interrupts, the device works fine on a KVM guest but as stated before, >> > not on a Xen guest. After digging a bit, it appears, the reason for >> > the failure in Xen guests is that the MSI data register in the Xen >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 >> > happens to be reserved (per spec) and therefore illegal. The >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as >> > illegal (per expectation) causing all commands issued by the guest OS >> > on the device to timeout. >> > >> > Given this above scenario, I was wondering if anyone can shed some >> > light on how to debug this further for Xen. Something I would >> > specifically like to know is where the MSI data register configuration >> > actually happens. Is it done by some code specific to Xen and within >> > the Xen codebase or it all done within QEMU? >> > >> >> This seems like the same issue I ran into, though in my case it is >> with passed through physical devices. See >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and >> the older messages in that thread for more info on what''s going on. No >> fix yet but help debugging is very welcome. > > Huh? You said in http://lists.xen.org/archives/html/xen-devel/2012-06/msg01653.html > "This worked!"That''s a day and a half later.
Stefano Stabellini
2012-Jun-29 11:10 UTC
Re: MSI message data register configuration in Xen guests
On Thu, 28 Jun 2012, Deep Debroy wrote:> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@gmail.com> wrote: > > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@roce.org> wrote: > >> > >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: > >> > Hi, I was playing around with a MSI capable virtual device (so far > >> > submitted as patches only) in the upstream qemu tree but having > >> > trouble getting it to work on a Xen hvm guest. The device happens to > >> > be a QEMU implementation of VMWare''s pvscsi controller. The device > >> > works fine in a Xen guest when I switch the device''s code to force > >> > usage of legacy interrupts with upstream QEMU. With MSI based > >> > interrupts, the device works fine on a KVM guest but as stated before, > >> > not on a Xen guest. After digging a bit, it appears, the reason for > >> > the failure in Xen guests is that the MSI data register in the Xen > >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 > >> > happens to be reserved (per spec) and therefore illegal. The > >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as > >> > illegal (per expectation) causing all commands issued by the guest OS > >> > on the device to timeout. > >> > > >> > Given this above scenario, I was wondering if anyone can shed some > >> > light on how to debug this further for Xen. Something I would > >> > specifically like to know is where the MSI data register configuration > >> > actually happens. Is it done by some code specific to Xen and within > >> > the Xen codebase or it all done within QEMU? > >> > > >> > >> This seems like the same issue I ran into, though in my case it is > >> with passed through physical devices. See > >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and > >> the older messages in that thread for more info on what''s going on. No > >> fix yet but help debugging is very welcome. > > > > Thanks Rolu for pointing out the other thread - it was very useful. > > Some of the symptoms appear to be identical in my case. However, I am > > not using a pass-through device. Instead, in my case it''s a fully > > virtualized device pretty much identical to a raw file backed disk > > image where the controller is pvscsi rather than lsi. Therefore I > > guess some of the latter discussion in the other thread around > > pass-through specific areas of code in qemu are not relevant? Please > > correct me if I am wrong. Also note that I am using upstream qemu > > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor > > xenstore.c exsits (which is where Stefano''s suggested change appeared > > to be). > > > > So far, here''s what I am observing in the hvm linux guest : > > > > On the guest side, as discussed in the other thread, > > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 > > is being by xen_msi_compose_msg that is written in the data register. > > On the qemu (upstream) side, when the virtualized controller is trying > > to complete a request, it''s invoking the following chain of calls -> > > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi > > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi > > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the > > delivery mode of 0x3. > > > > Is the above sequence of interactions the expected path for a HVM > > guest trying to use a fully virtualized device/controller that uses > > MSI in upstream qemu? If so, if a standard linux guest always > > populates the value of 0x4300 in the MSI data register through > > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in > > qemu supposed to work given the delivery type of 0x3 is indeed > > reserved and bypass the the vmsi_deliver check? > > > I wanted to see whether the HVM guest can interact with the MSI > virtualized controller properly without any of the Xen-specific code > in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code > in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled > such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen > specific routines like xen_hvm_setup_msi_irqs which is where the > 0x4300 is getting populated. This seems to work properly. The MSI data > register for the controller ends up getting a valid value like 0x4049, > vmsi_deliver no longer complains, all MSI notifications are delivered > in the expected way to the guest and the raw, file-backed disks > attached to the controller showing up in fdisk -l. > > My conclusion: the linux kernel''s xen specific code, specifically > routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with > fully virtualized qemu devices that use MSI. I will follow-up > regarding that on LKML.Thanks for your analysis of the problem, I think it is correct: Linux PV on HVM is trying to setup an event channel delivery for the MSI as it always does (therefore choosing 0x3 as delivery mode). However emulated devices in QEMU don''t support that. To be honest emulated devices in QEMU didn''t support MSIs at all until very recently, so this is why we are seeing this issue only now. Could you please try this Xen patch and let me know if it makes things better? diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index a90927a..f44f3b9 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data) >> MSI_DATA_TRIGGER_SHIFT; uint8_t vector = data & MSI_DATA_VECTOR_MASK; + if ( !vector ) + { + int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff); + if ( pirq > 0 ) + { + struct pirq *info = pirq_info(d, pirq); + + /* if it is the first time, allocate the pirq */ + if (info->arch.hvm.emuirq == IRQ_UNBOUND) + { + spin_lock(&d->event_lock); + map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU); + spin_unlock(&d->event_lock); + } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU) + { + printk("%s: pirq %d does not correspond to an emulated MSI\n", __func__, pirq); + return; + } + send_guest_pirq(d, info); + return; + } else { + printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, pirq); + } + } + vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode); } diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h index 40e2245..066f64d 100644 --- a/xen/include/asm-x86/irq.h +++ b/xen/include/asm-x86/irq.h @@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *); }) #define IRQ_UNBOUND -1 #define IRQ_PT -2 +#define IRQ_MSI_EMU -3 bool_t cpu_has_pending_apic_eoi(void); --1342847746-372388876-1340965137=:27860 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --1342847746-372388876-1340965137=:27860--
Deep Debroy
2012-Jul-03 05:54 UTC
Re: MSI message data register configuration in Xen guests
On Fri, Jun 29, 2012 at 4:10 AM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> On Thu, 28 Jun 2012, Deep Debroy wrote: >> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@gmail.com> wrote: >> > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@roce.org> wrote: >> >> >> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: >> >> > Hi, I was playing around with a MSI capable virtual device (so far >> >> > submitted as patches only) in the upstream qemu tree but having >> >> > trouble getting it to work on a Xen hvm guest. The device happens to >> >> > be a QEMU implementation of VMWare''s pvscsi controller. The device >> >> > works fine in a Xen guest when I switch the device''s code to force >> >> > usage of legacy interrupts with upstream QEMU. With MSI based >> >> > interrupts, the device works fine on a KVM guest but as stated before, >> >> > not on a Xen guest. After digging a bit, it appears, the reason for >> >> > the failure in Xen guests is that the MSI data register in the Xen >> >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 >> >> > happens to be reserved (per spec) and therefore illegal. The >> >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as >> >> > illegal (per expectation) causing all commands issued by the guest OS >> >> > on the device to timeout. >> >> > >> >> > Given this above scenario, I was wondering if anyone can shed some >> >> > light on how to debug this further for Xen. Something I would >> >> > specifically like to know is where the MSI data register configuration >> >> > actually happens. Is it done by some code specific to Xen and within >> >> > the Xen codebase or it all done within QEMU? >> >> > >> >> >> >> This seems like the same issue I ran into, though in my case it is >> >> with passed through physical devices. See >> >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and >> >> the older messages in that thread for more info on what''s going on. No >> >> fix yet but help debugging is very welcome. >> > >> > Thanks Rolu for pointing out the other thread - it was very useful. >> > Some of the symptoms appear to be identical in my case. However, I am >> > not using a pass-through device. Instead, in my case it''s a fully >> > virtualized device pretty much identical to a raw file backed disk >> > image where the controller is pvscsi rather than lsi. Therefore I >> > guess some of the latter discussion in the other thread around >> > pass-through specific areas of code in qemu are not relevant? Please >> > correct me if I am wrong. Also note that I am using upstream qemu >> > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor >> > xenstore.c exsits (which is where Stefano''s suggested change appeared >> > to be). >> > >> > So far, here''s what I am observing in the hvm linux guest : >> > >> > On the guest side, as discussed in the other thread, >> > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 >> > is being by xen_msi_compose_msg that is written in the data register. >> > On the qemu (upstream) side, when the virtualized controller is trying >> > to complete a request, it''s invoking the following chain of calls -> >> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi >> > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi >> > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the >> > delivery mode of 0x3. >> > >> > Is the above sequence of interactions the expected path for a HVM >> > guest trying to use a fully virtualized device/controller that uses >> > MSI in upstream qemu? If so, if a standard linux guest always >> > populates the value of 0x4300 in the MSI data register through >> > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in >> > qemu supposed to work given the delivery type of 0x3 is indeed >> > reserved and bypass the the vmsi_deliver check? >> > >> I wanted to see whether the HVM guest can interact with the MSI >> virtualized controller properly without any of the Xen-specific code >> in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code >> in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled >> such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen >> specific routines like xen_hvm_setup_msi_irqs which is where the >> 0x4300 is getting populated. This seems to work properly. The MSI data >> register for the controller ends up getting a valid value like 0x4049, >> vmsi_deliver no longer complains, all MSI notifications are delivered >> in the expected way to the guest and the raw, file-backed disks >> attached to the controller showing up in fdisk -l. >> >> My conclusion: the linux kernel''s xen specific code, specifically >> routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with >> fully virtualized qemu devices that use MSI. I will follow-up >> regarding that on LKML. > > Thanks for your analysis of the problem, I think it is correct: Linux PV > on HVM is trying to setup an event channel delivery for the MSI as it > always does (therefore choosing 0x3 as delivery mode). > However emulated devices in QEMU don''t support that. > To be honest emulated devices in QEMU didn''t support MSIs at all until > very recently, so this is why we are seeing this issue only now. > > Could you please try this Xen patch and let me know if it makes things > better? >Thanks Stefano. I have tested the below patch with the MSI device and it''s now working (without any additional changes to the linux guest kernel).> > diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c > index a90927a..f44f3b9 100644 > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data) > >> MSI_DATA_TRIGGER_SHIFT; > uint8_t vector = data & MSI_DATA_VECTOR_MASK; > > + if ( !vector ) > + { > + int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff); > + if ( pirq > 0 ) > + { > + struct pirq *info = pirq_info(d, pirq); > + > + /* if it is the first time, allocate the pirq */ > + if (info->arch.hvm.emuirq == IRQ_UNBOUND) > + { > + spin_lock(&d->event_lock); > + map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU); > + spin_unlock(&d->event_lock); > + } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU) > + { > + printk("%s: pirq %d does not correspond to an emulated MSI\n", __func__, pirq); > + return; > + } > + send_guest_pirq(d, info); > + return; > + } else { > + printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, pirq); > + } > + } > + > vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode); > } > > diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h > index 40e2245..066f64d 100644 > --- a/xen/include/asm-x86/irq.h > +++ b/xen/include/asm-x86/irq.h > @@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *); > }) > #define IRQ_UNBOUND -1 > #define IRQ_PT -2 > +#define IRQ_MSI_EMU -3 > > bool_t cpu_has_pending_apic_eoi(void); >
Stefano Stabellini
2012-Jul-03 10:20 UTC
Re: MSI message data register configuration in Xen guests
On Tue, 3 Jul 2012, Deep Debroy wrote:> On Fri, Jun 29, 2012 at 4:10 AM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > On Thu, 28 Jun 2012, Deep Debroy wrote: > >> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@gmail.com> wrote: > >> > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@roce.org> wrote: > >> >> > >> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@gmail.com> wrote: > >> >> > Hi, I was playing around with a MSI capable virtual device (so far > >> >> > submitted as patches only) in the upstream qemu tree but having > >> >> > trouble getting it to work on a Xen hvm guest. The device happens to > >> >> > be a QEMU implementation of VMWare''s pvscsi controller. The device > >> >> > works fine in a Xen guest when I switch the device''s code to force > >> >> > usage of legacy interrupts with upstream QEMU. With MSI based > >> >> > interrupts, the device works fine on a KVM guest but as stated before, > >> >> > not on a Xen guest. After digging a bit, it appears, the reason for > >> >> > the failure in Xen guests is that the MSI data register in the Xen > >> >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 > >> >> > happens to be reserved (per spec) and therefore illegal. The > >> >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as > >> >> > illegal (per expectation) causing all commands issued by the guest OS > >> >> > on the device to timeout. > >> >> > > >> >> > Given this above scenario, I was wondering if anyone can shed some > >> >> > light on how to debug this further for Xen. Something I would > >> >> > specifically like to know is where the MSI data register configuration > >> >> > actually happens. Is it done by some code specific to Xen and within > >> >> > the Xen codebase or it all done within QEMU? > >> >> > > >> >> > >> >> This seems like the same issue I ran into, though in my case it is > >> >> with passed through physical devices. See > >> >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and > >> >> the older messages in that thread for more info on what''s going on. No > >> >> fix yet but help debugging is very welcome. > >> > > >> > Thanks Rolu for pointing out the other thread - it was very useful. > >> > Some of the symptoms appear to be identical in my case. However, I am > >> > not using a pass-through device. Instead, in my case it''s a fully > >> > virtualized device pretty much identical to a raw file backed disk > >> > image where the controller is pvscsi rather than lsi. Therefore I > >> > guess some of the latter discussion in the other thread around > >> > pass-through specific areas of code in qemu are not relevant? Please > >> > correct me if I am wrong. Also note that I am using upstream qemu > >> > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor > >> > xenstore.c exsits (which is where Stefano''s suggested change appeared > >> > to be). > >> > > >> > So far, here''s what I am observing in the hvm linux guest : > >> > > >> > On the guest side, as discussed in the other thread, > >> > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 > >> > is being by xen_msi_compose_msg that is written in the data register. > >> > On the qemu (upstream) side, when the virtualized controller is trying > >> > to complete a request, it''s invoking the following chain of calls -> > >> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi > >> > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi > >> > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the > >> > delivery mode of 0x3. > >> > > >> > Is the above sequence of interactions the expected path for a HVM > >> > guest trying to use a fully virtualized device/controller that uses > >> > MSI in upstream qemu? If so, if a standard linux guest always > >> > populates the value of 0x4300 in the MSI data register through > >> > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in > >> > qemu supposed to work given the delivery type of 0x3 is indeed > >> > reserved and bypass the the vmsi_deliver check? > >> > > >> I wanted to see whether the HVM guest can interact with the MSI > >> virtualized controller properly without any of the Xen-specific code > >> in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code > >> in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled > >> such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen > >> specific routines like xen_hvm_setup_msi_irqs which is where the > >> 0x4300 is getting populated. This seems to work properly. The MSI data > >> register for the controller ends up getting a valid value like 0x4049, > >> vmsi_deliver no longer complains, all MSI notifications are delivered > >> in the expected way to the guest and the raw, file-backed disks > >> attached to the controller showing up in fdisk -l. > >> > >> My conclusion: the linux kernel''s xen specific code, specifically > >> routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with > >> fully virtualized qemu devices that use MSI. I will follow-up > >> regarding that on LKML. > > > > Thanks for your analysis of the problem, I think it is correct: Linux PV > > on HVM is trying to setup an event channel delivery for the MSI as it > > always does (therefore choosing 0x3 as delivery mode). > > However emulated devices in QEMU don''t support that. > > To be honest emulated devices in QEMU didn''t support MSIs at all until > > very recently, so this is why we are seeing this issue only now. > > > > Could you please try this Xen patch and let me know if it makes things > > better? > > > > Thanks Stefano. I have tested the below patch with the MSI device and > it''s now working (without any additional changes to the linux guest > kernel).Thanks! I''ll submit the patch and add your Tested-by.
Stefano Stabellini
2012-Jul-03 10:28 UTC
[PATCH] xen: event channel remapping for emulated MSIs
Linux PV on HVM guests remap all the MSIs onto event channels, including MSIs corresponding to QEMU''s emulated devices. This patch makes sure that we handle correctly the case of emulated MSI that have been remapped, sending a pirq to the guest instead. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Deep Debroy <ddebroy@gmail.com> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index a90927a..f44f3b9 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data) >> MSI_DATA_TRIGGER_SHIFT; uint8_t vector = data & MSI_DATA_VECTOR_MASK; + if ( !vector ) + { + int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff); + if ( pirq > 0 ) + { + struct pirq *info = pirq_info(d, pirq); + + /* if it is the first time, allocate the pirq */ + if (info->arch.hvm.emuirq == IRQ_UNBOUND) + { + spin_lock(&d->event_lock); + map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU); + spin_unlock(&d->event_lock); + } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU) + { + printk("%s: pirq %d does not correspond to an emulated MSI\n", __func__, pirq); + return; + } + send_guest_pirq(d, info); + return; + } else { + printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, pirq); + } + } + vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode); } diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h index 40e2245..066f64d 100644 --- a/xen/include/asm-x86/irq.h +++ b/xen/include/asm-x86/irq.h @@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *); }) #define IRQ_UNBOUND -1 #define IRQ_PT -2 +#define IRQ_MSI_EMU -3 bool_t cpu_has_pending_apic_eoi(void);