Andrew J Younge
2013-Jun-10 16:21 UTC
Mellanox SR-IOV IB PCI passthrough in Xen - MSI-X pciback issue
Greetings Xen user community, I am interested in using Mellanox ConnectX cards with SR-IOV capabilities to passthrough pci-e Virtual Functions (VFs) to Xen guests. The hope is to allow for the use of InfiniBand directly within virtual machines and thereby enable a plethora of high performance computing applications that already leverage InfiniBand interconnects. However, I have run into some issues using the xen-pciback driver and its initialization of MSI-X as required for VFs in Xen. The hardware used is Mellanox Connect X3 MT27500 VPI pci-express cards set up in InfiniBand mode in HP blades with Intel Xeon E5-2670 CPUs and 42GB of memory. SR-IOV is enabled in the system BIOS along with VT-X, and of course VT-d. This system is a RHEL/CENTOS 6.4 x86_64 Dom0 running a 3.9.3-1 kernel with Xen 4.1.2 installed and intel_iommu enabled in the kernel. The advantage of this kernel is the built-in mlx4_core/en/ib kernel modules which support SR-IOV added in versions 3.5 and above. The basic OFED drivers provided by Mellanox do not compile with a custom Dom0 kernel (even the 2.0-beta OFED drivers), so a 3.5 or newer linux kernel is necessary. I updated the firmware on the ConnectX3 provided by Mellanox (2.11.500) to enable SR-IOV in the firmware. Using this setup I am able to enable up to 64 VFs in InfiniBand mode ( modprobe mlx4_core num_vfs=8 port_type_array=1,1 msi_x=1) within a Xen Dom0 kernel. 21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Hewlett-Packard Company Device 18d6 Physical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 50 Memory at fbf00000 (64-bit, non-prefetchable) [size=1M] Memory at fb000000 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 3 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=128 Masked- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [100] Alternative Routing-ID Interpretation (ARI) Capabilities: [148] Device Serial Number 00-02-c9-03-00-f6-ef-f0 Capabilities: [108] Single Root I/O Virtualization (SR-IOV) Capabilities: [154] Advanced Error Reporting Capabilities: [18c] #19 Kernel driver in use: mlx4_core Kernel modules: mlx4_core 21:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] Subsystem: Hewlett-Packard Company Device 61b0 Physical Slot: 4 Flags: fast devsel [virtual] Memory at db000000 (64-bit, prefetchable) [size=8M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [9c] MSI-X: Enable- Count=4 Masked- Kernel modules: mlx4_core … up to as many VFs as enabled (in my case 8). I am able to load the xen-pciback kernel module and hide one of the VFs, and then start a Centos6.3 HVM VM with pci-passthrough enabled on one of the VFs (pci = [ ''21:00.5'' ] in the .hvm config file). The VM itself sees the VF as the xen-pciback module translates the VF to 00:05:0 int he guest as expected: 00:05.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] Subsystem: Hewlett-Packard Company Device 61b0 Physical Slot: 5 Flags: fast devsel Memory at f3000000 (64-bit, prefetchable) [size=8M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [9c] MSI-X: Enable- Count=4 Masked- Kernel modules: mlx4_core With the VM using a generic 2.6.32 Centos 6-3 kernel, I installed the MLNX 2.0-beta drivers (they actually compile with standard rhel kernel). The problem is when I modprobe mlx4_core, I get the following error in the VM: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing 0000:00:05.0 mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode mlx4_core 0000:00:05.0: Sending reset mlx4_core 0000:00:05.0: Sending vhcr0 mlx4_core 0000:00:05.0: HCA minimum page size:512 mlx4_core 0000:00:05.0: irq 48 for MSI/MSI-X mlx4_core 0000:00:05.0: irq 49 for MSI/MSI-X mlx4_core 0000:00:05.0: failed execution of VHCR_POST commandopcode 0x31 mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 49). mlx4_core 0000:00:05.0: Trying again without MSI-X. mlx4_core: probe of 0000:00:05.0 failed with error -16 Clearly, the kernel module is not happy with MSI-X. If I try to specify modprobe mlx4_core msi_x=0 (turning msi off in the VM VF), I get an error saying VFs aren''t supported without MSI-X: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing 0000:00:05.0 mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode mlx4_core 0000:00:05.0: Sending reset mlx4_core 0000:00:05.0: Sending vhcr0 mlx4_core 0000:00:05.0: HCA minimum page size:512 mlx4_core 0000:00:05.0: INTx is not supported in multi-function mode. aborting. Apparently it is necessary to have MSI-X working in order to use the VFs for the Mellanox Connect X3 card (not surprising). Looking back into the Dom0 dmesg, it seems the lack of MSI-X support is actually an error on the xen-pciback module: pciback 0000:21:00.5: seizing device pciback 0000:21:00.5: enabling device (0000 -> 0002) pciback 0000:21:00.5: MSI-X preparation failed (-38) xen-pciback: backend is vpci I''ve explicitly made sure the mlx4_core module on Dom0 has MSI-X enabled on the PF to rule-out that potential problem (via modprobe). It seems the main problem is the xen-pciback method does not know how to properly set up MSI-X for the Mellanox ConnectX3 InfiniBand card. To be explicit, I''m running a fairly recent Xen installation (4.1.2) with new Sandy Bridge hardware and a very recent linux kernel (3.9). [root@hp6 xen_tests]# uname -a Linux hp6 3.9.3-1.el6xen.x86_64 #1 SMP Tue May 21 11:55:32 EST 2013 x86_64 x86_64 x86_64 GNU/Linux [root@hp6 xen_tests]# xm info host : hp6 release : 3.9.3-1.el6xen.x86_64 version : #1 SMP Tue May 21 11:55:32 EST 2013 machine : x86_64 nr_cpus : 32 nr_nodes : 2 cores_per_socket : 8 threads_per_core : 2 cpu_mhz : 2593 hw_caps : bfebfbff:2c000800:00000000:00003f40:13bee3ff:00000000:00000001:00000000 virt_caps : hvm hvm_directio total_memory : 49117 free_memory : 8306 free_cpus : 0 xen_major : 4 xen_minor : 1 xen_extra : .2 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : cc_compiler : gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) cc_compile_by : mockbuild cc_compile_domain : cc_compile_date : Fri Jun 15 17:40:35 EDT 2012 xend_config_format : 4 [root@hp6 xen_tests]# dmesg | grep "Command line" Command line: ro root=/dev/mapper/vg_hp6-lv_root nomodeset rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_hp6/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_hp6/lv_root rd_NO_DM rdblacklist=nouveau nouveau.modeset=0 intel_iommu=on In this current state, I am currently at an impasse in getting SR-IOV InfiniBand working within Xen. Does anyone here in the Xen community have a possible solution to this problem? Is there a patch or custom version of Xen I haven''t found but need to try? I''ve done a whole lot of searching but turned up nothing that helps thus far. Is this an instance where these pci-quirks are used (and if so, how), or is that only for PV guests? Has anyone else have a working solution for enabling pci-passthrough of Mellanox IB SR-IOV VFs in Xen VMs? I know this is possible in KVM but I''d like to avoid that route at all costs obviously. I hope I am close to getting InfiniBand working with Xen. Any help would be greatly appreciated, as this success could enable a whole new set of use cases for Xen related to high performance computing. Regards, Andrew -- Andrew J. Younge Information Sciences Institute University of Southern California
Andrew J Younge
2013-Jun-19 18:21 UTC
Re: Mellanox SR-IOV IB PCI passthrough in Xen - MSI-X pciback issue
Does anybody in the Xen community have any experience with the xen-pciback driver and MSI-X?? Thanks, Andrew On 6/10/13 12:21 PM, Andrew J Younge wrote:> Greetings Xen user community, > > I am interested in using Mellanox ConnectX cards with SR-IOV capabilities to passthrough pci-e Virtual Functions (VFs) to Xen guests. The hope is to allow for the use of InfiniBand directly within virtual machines and thereby enable a plethora of high performance computing applications that already leverage InfiniBand interconnects. However, I have run into some issues using the xen-pciback driver and its initialization of MSI-X as required for VFs in Xen. The hardware used is Mellanox Connect X3 MT27500 VPI pci-express cards set up in InfiniBand mode in HP blades with Intel Xeon E5-2670 CPUs and 42GB of memory. SR-IOV is enabled in the system BIOS along with VT-X, and of course VT-d. > > This system is a RHEL/CENTOS 6.4 x86_64 Dom0 running a 3.9.3-1 kernel with Xen 4.1.2 installed and intel_iommu enabled in the kernel. The advantage of this kernel is the built-in mlx4_core/en/ib kernel modules which support SR-IOV added in versions 3.5 and above. The basic OFED drivers provided by Mellanox do not compile with a custom Dom0 kernel (even the 2.0-beta OFED drivers), so a 3.5 or newer linux kernel is necessary. I updated the firmware on the ConnectX3 provided by Mellanox (2.11.500) to enable SR-IOV in the firmware. Using this setup I am able to enable up to 64 VFs in InfiniBand mode ( modprobe mlx4_core num_vfs=8 port_type_array=1,1 msi_x=1) within a Xen Dom0 kernel. > > 21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] > Subsystem: Hewlett-Packard Company Device 18d6 > Physical Slot: 4 > Flags: bus master, fast devsel, latency 0, IRQ 50 > Memory at fbf00000 (64-bit, non-prefetchable) [size=1M] > Memory at fb000000 (64-bit, prefetchable) [size=8M] > Capabilities: [40] Power Management version 3 > Capabilities: [48] Vital Product Data > Capabilities: [9c] MSI-X: Enable+ Count=128 Masked- > Capabilities: [60] Express Endpoint, MSI 00 > Capabilities: [100] Alternative Routing-ID Interpretation (ARI) > Capabilities: [148] Device Serial Number 00-02-c9-03-00-f6-ef-f0 > Capabilities: [108] Single Root I/O Virtualization (SR-IOV) > Capabilities: [154] Advanced Error Reporting > Capabilities: [18c] #19 > Kernel driver in use: mlx4_core > Kernel modules: mlx4_core > > 21:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] > Subsystem: Hewlett-Packard Company Device 61b0 > Physical Slot: 4 > Flags: fast devsel > [virtual] Memory at db000000 (64-bit, prefetchable) [size=8M] > Capabilities: [60] Express Endpoint, MSI 00 > Capabilities: [9c] MSI-X: Enable- Count=4 Masked- > Kernel modules: mlx4_core > … up to as many VFs as enabled (in my case 8). > > I am able to load the xen-pciback kernel module and hide one of the VFs, and then start a Centos6.3 HVM VM with pci-passthrough enabled on one of the VFs (pci = [ ''21:00.5'' ] in the .hvm config file). The VM itself sees the VF as the xen-pciback module translates the VF to 00:05:0 int he guest as expected: > > 00:05.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] > Subsystem: Hewlett-Packard Company Device 61b0 > Physical Slot: 5 > Flags: fast devsel > Memory at f3000000 (64-bit, prefetchable) [size=8M] > Capabilities: [60] Express Endpoint, MSI 00 > Capabilities: [9c] MSI-X: Enable- Count=4 Masked- > Kernel modules: mlx4_core > > With the VM using a generic 2.6.32 Centos 6-3 kernel, I installed the MLNX 2.0-beta drivers (they actually compile with standard rhel kernel). The problem is when I modprobe mlx4_core, I get the following error in the VM: > > mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) > mlx4_core: Initializing 0000:00:05.0 > mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode > mlx4_core 0000:00:05.0: Sending reset > mlx4_core 0000:00:05.0: Sending vhcr0 > mlx4_core 0000:00:05.0: HCA minimum page size:512 > mlx4_core 0000:00:05.0: irq 48 for MSI/MSI-X > mlx4_core 0000:00:05.0: irq 49 for MSI/MSI-X > mlx4_core 0000:00:05.0: failed execution of VHCR_POST commandopcode 0x31 > mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 49). > mlx4_core 0000:00:05.0: Trying again without MSI-X. > mlx4_core: probe of 0000:00:05.0 failed with error -16 > > Clearly, the kernel module is not happy with MSI-X. If I try to specify modprobe mlx4_core msi_x=0 (turning msi off in the VM VF), I get an error saying VFs aren''t supported without MSI-X: > > mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) > mlx4_core: Initializing 0000:00:05.0 > mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode > mlx4_core 0000:00:05.0: Sending reset > mlx4_core 0000:00:05.0: Sending vhcr0 > mlx4_core 0000:00:05.0: HCA minimum page size:512 > mlx4_core 0000:00:05.0: INTx is not supported in multi-function mode. aborting. > > Apparently it is necessary to have MSI-X working in order to use the VFs for the Mellanox Connect X3 card (not surprising). Looking back into the Dom0 dmesg, it seems the lack of MSI-X support is actually an error on the xen-pciback module: > > pciback 0000:21:00.5: seizing device > pciback 0000:21:00.5: enabling device (0000 -> 0002) > pciback 0000:21:00.5: MSI-X preparation failed (-38) > xen-pciback: backend is vpci > > I''ve explicitly made sure the mlx4_core module on Dom0 has MSI-X enabled on the PF to rule-out that potential problem (via modprobe). It seems the main problem is the xen-pciback method does not know how to properly set up MSI-X for the Mellanox ConnectX3 InfiniBand card. To be explicit, I''m running a fairly recent Xen installation (4.1.2) with new Sandy Bridge hardware and a very recent linux kernel (3.9). > > [root@hp6 xen_tests]# uname -a > Linux hp6 3.9.3-1.el6xen.x86_64 #1 SMP Tue May 21 11:55:32 EST 2013 x86_64 x86_64 x86_64 GNU/Linux > [root@hp6 xen_tests]# xm info > host : hp6 > release : 3.9.3-1.el6xen.x86_64 > version : #1 SMP Tue May 21 11:55:32 EST 2013 > machine : x86_64 > nr_cpus : 32 > nr_nodes : 2 > cores_per_socket : 8 > threads_per_core : 2 > cpu_mhz : 2593 > hw_caps : bfebfbff:2c000800:00000000:00003f40:13bee3ff:00000000:00000001:00000000 > virt_caps : hvm hvm_directio > total_memory : 49117 > free_memory : 8306 > free_cpus : 0 > xen_major : 4 > xen_minor : 1 > xen_extra : .2 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > xen_commandline : > cc_compiler : gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) > cc_compile_by : mockbuild > cc_compile_domain : > cc_compile_date : Fri Jun 15 17:40:35 EDT 2012 > xend_config_format : 4 > [root@hp6 xen_tests]# dmesg | grep "Command line" > Command line: ro root=/dev/mapper/vg_hp6-lv_root nomodeset rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_hp6/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_hp6/lv_root rd_NO_DM rdblacklist=nouveau nouveau.modeset=0 intel_iommu=on > > In this current state, I am currently at an impasse in getting SR-IOV InfiniBand working within Xen. Does anyone here in the Xen community have a possible solution to this problem? Is there a patch or custom version of Xen I haven''t found but need to try? I''ve done a whole lot of searching but turned up nothing that helps thus far. Is this an instance where these pci-quirks are used (and if so, how), or is that only for PV guests? Has anyone else have a working solution for enabling pci-passthrough of Mellanox IB SR-IOV VFs in Xen VMs? I know this is possible in KVM but I''d like to avoid that route at all costs obviously. I hope I am close to getting InfiniBand working with Xen. Any help would be greatly appreciated, as this success could enable a whole new set of use cases for Xen related to high performance computing. > > Regards, > > Andrew > > > -- > Andrew J. Younge > Information Sciences Institute > University of Southern California >-- Andrew J. Younge Information Sciences Institute University of Southern California