Yoann Juet
2014-Feb-04 15:10 UTC
[libvirt-users] SR-IOV: no traffic isolation between VFs with Broadcom 10Gbps cards
Hi all, I'm testing on debian/unstable SR-IOV feature with Broadcom BCM57810 cards and KVM hypervisor: Compiled against library: libvirt 1.2.1 Using library: libvirt 1.2.1 Using API: QEMU 1.2.1 Running hypervisor: QEMU 1.7.0 bnx2x -> firmware 7.8.17 -> driver from kernel 3.12.7 8 VFs are created on the first PF. For each VF, a specific mac address is set manually using "ip link set eth0 vf x mac xx:xx:xx:xx:xx" command. I run several KVM guests with PCI passthrough (same kernel, bnx2x driver and firmware as the host), performance is close to bare metal. Well, that sounds good, until I start capturing the traffic inside each VM: host traffic is visible as well as traffic destined to other VM. It's like if internal card switching was inoperable. I made several tests with different kernels, different PCIe Passthrough method assignments for libvirt. All failed. Has anyone successfully experiment SR-IOV with Broadcom cards on linux ? ----- Some details: 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) 01:09.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.2 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.4 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.5 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.6 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function 01:09.7 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function # virsh nodedev-dumpxml pci_0000_01_09_0 <device> <name>pci_0000_01_09_0</name> <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:09.0</path> <parent>pci_0000_00_01_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>1</bus> <slot>9</slot> <function>0</function> <product id='0x16af'>NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function</product> <vendor id='0x14e4'>Broadcom Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </capability> <iommuGroup number='35'> <address domain='0x0000' bus='0x01' slot='0x09' function='0x0'/> </iommuGroup> </capability> </device> # virsh nodedev-dumpxml pci_0000_01_09_1 <device> <name>pci_0000_01_09_1</name> <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:09.1</path> <parent>pci_0000_00_01_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>1</bus> <slot>9</slot> <function>1</function> <product id='0x16af'>NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function</product> <vendor id='0x14e4'>Broadcom Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </capability> <iommuGroup number='36'> <address domain='0x0000' bus='0x01' slot='0x09' function='0x1'/> </iommuGroup> </capability> </device> Guest A XML: ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x09' function='0x0'/> </source> </hostdev> ... Guest B XML: ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x09' function='0x1'/> </source> </hostdev> ... -- Université de Nantes - Direction des Systèmes d'Information
Laine Stump
2014-Feb-05 10:58 UTC
Re: [libvirt-users] SR-IOV: no traffic isolation between VFs with Broadcom 10Gbps cards
On 02/04/2014 05:10 PM, Yoann Juet wrote:> Hi all, > > I'm testing on debian/unstable SR-IOV feature with Broadcom BCM57810 > cards and KVM hypervisor: > > Compiled against library: libvirt 1.2.1 > Using library: libvirt 1.2.1 > Using API: QEMU 1.2.1 > Running hypervisor: QEMU 1.7.0 > > bnx2x > -> firmware 7.8.17 > -> driver from kernel 3.12.7 > > 8 VFs are created on the first PF. For each VF, a specific mac address > is set manually using "ip link set eth0 vf x mac xx:xx:xx:xx:xx" command.Instead of using <hostdev>, you should instead try using <interface type='hostdev'>, which will allow you to specify the mac address for the interface directly in the guest's XML config (rather than needing to do it separately). Here's a link to documentation on this feature: http://wiki.libvirt.org/page/Networking#PCI_Passthrough_of_host_network_devices (look down to the section titled "Assignment with <interface type='hostdev'>") Or even better, use <interface type='network'> in your guest config (still put the <mac address='xx:xx:xx:xx:xx:xx'/> element in each one), and define a libvirt network which is a pool of SRIOV VFs - this is described further down the same page. This will not make a difference to the issue you describe below, but it should make managing your guest config and lifecycle much simpler.> I run several KVM guests with PCI passthrough (same kernel, bnx2x > driver and firmware as the host), performance is close to bare metal. > > Well, that sounds good, until I start capturing the traffic inside > each VM: host traffic is visible as well as traffic destined to other > VM. It's like if internal card switching was inoperable. I made > several tests with different kernels, different PCIe Passthrough > method assignments for libvirt. All failed.Define "failed". Do you mean that the cards communicated, but the guests can see each others' traffic? Or do you mean that they see traffic from each other, but can't seem to communicate normally? If the problem is the latter, then make sure the PF (eth0 for you, I guess) has status UP and RUNNING before you start the guests. For the former, I'm not clear on the internal rules of switching of an SRIOV card. I think in most cases, the SRIOV card's internal switch may need to make everything from each VF visible to all other VFs, because the physical switch it's connected to may not mirror back traffic that really does need to go from one guest to the other. 802.1Qbh (which libvirt supports via the <virtualport type='802.1Qbh'> element) does this differently, requiring all traffic to travel out to the switch, with the switch making the decision about what gets mirrored back, but you need an 802.1Qbh-capable switch for that.> > Has anyone successfully experiment SR-IOV with Broadcom cards on linux ? > > ----- > > Some details: > > 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet (rev 10) > 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet (rev 10) > > 01:09.0 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.1 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.2 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.3 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.4 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.5 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.6 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > 01:09.7 Ethernet controller: Broadcom Corporation NetXtreme II > BCM57810 10 Gigabit Ethernet Virtual Function > > > # virsh nodedev-dumpxml pci_0000_01_09_0 > <device> > <name>pci_0000_01_09_0</name> > <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:09.0</path> > <parent>pci_0000_00_01_0</parent> > <driver> > <name>vfio-pci</name> > </driver> > <capability type='pci'> > <domain>0</domain> > <bus>1</bus> > <slot>9</slot> > <function>0</function> > <product id='0x16af'>NetXtreme II BCM57810 10 Gigabit Ethernet > Virtual Function</product> > <vendor id='0x14e4'>Broadcom Corporation</vendor> > <capability type='phys_function'> > <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> > </capability> > <iommuGroup number='35'> > <address domain='0x0000' bus='0x01' slot='0x09' function='0x0'/> > </iommuGroup> > </capability> > </device> > > > # virsh nodedev-dumpxml pci_0000_01_09_1 > <device> > <name>pci_0000_01_09_1</name> > <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:09.1</path> > <parent>pci_0000_00_01_0</parent> > <driver> > <name>vfio-pci</name> > </driver> > <capability type='pci'> > <domain>0</domain> > <bus>1</bus> > <slot>9</slot> > <function>1</function> > <product id='0x16af'>NetXtreme II BCM57810 10 Gigabit Ethernet > Virtual Function</product> > <vendor id='0x14e4'>Broadcom Corporation</vendor> > <capability type='phys_function'> > <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> > </capability> > <iommuGroup number='36'> > <address domain='0x0000' bus='0x01' slot='0x09' function='0x1'/> > </iommuGroup> > </capability> > </device> > > > Guest A XML: > ... > <hostdev mode='subsystem' type='pci' managed='yes'> > <source> > <address domain='0x0000' bus='0x01' slot='0x09' function='0x0'/> > </source> > </hostdev> > ... > > > Guest B XML: > > ... > <hostdev mode='subsystem' type='pci' managed='yes'> > <source> > <address domain='0x0000' bus='0x01' slot='0x09' function='0x1'/> > </source> > </hostdev> > ... > > > > > _______________________________________________ > libvirt-users mailing list > libvirt-users@redhat.com > https://www.redhat.com/mailman/listinfo/libvirt-users
Yoann Juet
2014-Feb-07 10:58 UTC
Re: [libvirt-users] SR-IOV: no traffic isolation between VFs with Broadcom 10Gbps cards
> Instead of using <hostdev>, you should instead try using <interface > type='hostdev'>, which will allow you to specify the mac address for the > interface directly in the guest's XML config (rather than needing to do > it separately). Here's a link to documentation on this feature: > > http://wiki.libvirt.org/page/Networking#PCI_Passthrough_of_host_network_devices > > (look down to the section titled "Assignment with <interface > type='hostdev'>") > > Or even better, use <interface type='network'> in your guest config > (still put the <mac address='xx:xx:xx:xx:xx:xx'/> element in each one), > and define a libvirt network which is a pool of SRIOV VFs - this is > described further down the same page. > > This will not make a difference to the issue you describe below, but it > should make managing your guest config and lifecycle much simpler.I also conducted experiments with these XML configs and, as you said, it didn't make a difference to the issue observed. For most production use, my sample xml block is clearly not convenient.> Define "failed". Do you mean that the cards communicated, but the guests > can see each others' traffic? Or do you mean that they see traffic from > each other, but can't seem to communicate normally? > > If the problem is the latter, then make sure the PF (eth0 for you, I > guess) has status UP and RUNNING before you start the guests. > > For the former, I'm not clear on the internal rules of switching of an > SRIOV card. I think in most cases, the SRIOV card's internal switch may > need to make everything from each VF visible to all other VFs, because > the physical switch it's connected to may not mirror back traffic that > really does need to go from one guest to the other. 802.1Qbh (which > libvirt supports via the <virtualport type='802.1Qbh'> element) does > this differently, requiring all traffic to travel out to the switch, > with the switch making the decision about what gets mirrored back, but > you need an 802.1Qbh-capable switch for that.What I meant by "failed" was a lack of traffic switching. I thought that the PF driver was responsible for configuring L2 switching on the NIC. Some others manufacturers, such as Emulex, have an internal switch. This is an extract from Emulex Whitepaper: "I/O’s between VFs on the same PF can be processed by the adapter using an internal Layer 2 switch, eliminating routing through a physical switch" I doubt that Broadcom has a specific and unsecure behavior for PF/VF communications. Unfortunately, I do not have an Intel or Emulex NIC with SR-IOV feature to check it. Perhaps, I missed a configuration parameter somewhere or the driver/firmware used is broken... It looks really weird. -- Université de Nantes - Direction des Systèmes d'Information
Seemingly Similar Threads
- Re: SR-IOV: no traffic isolation between VFs with Broadcom 10Gbps cards
- Not able to add pcie card to guest: Operation not permitted
- virtio nic dosent use 10gbe speed.
- multiple devices in the same iommu group in L1 guest
- Re: Not able to add pcie card to guest: Operation not permitted