Jonathan Wheeler
2009-Jan-12 15:27 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Hi Folks, I have a snv_105 sxce host that I just can''t get to work as expected with crossbow + zones. My test host persephone, is a virtual machine running under VMware ESXi 3.5, with 2 virtual network cards (e1000), all on the same flat network/subnet. It started life just 2 days ago with a clean install of snv_95, and I LUed to 105 yesterday. To rule out any sharing issue, the first nic (e1000g0) is used only for the global zone. The second nic is used only by crossbow, for the vnic "zonevnic0", which is bound to e1000g1. sparse-template is the zone that I''ve been trying to get to work using a dedicated IP instance using the vnic zonevnic0. Using snoop in the zone, (or in the global zone, with "-d zonevnic0") I can see broadcast/unicast traffic going out, but only broadcast & ARP replies are coming back in again?! So my arp is full and working as expected, I don''t get any ping replies and needless to say other hosts can''t talk to the zone. I just can''t seem to get any unicast to return to the non-global zone. I left sparse-template pinging my desktop, and with snoop running on my desktop I can see both the ICMP request and the ICMP reply that I''m sending back again, it just never makes it. (I also confirmed that TCP syns come through too) I''m stumped, what could be the issue? I haven''t done any firewalling or custom flows/queues or anything fancy at all! Zone Config: zonename: sparse-template zonepath: /zones/sparse-template brand: native autoboot: true bootargs: pool: limitpriv: scheduling-class: ip-type: exclusive inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr net: address not specified physical: zonevnic0 defrouter not specified Vnic config: # dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE e1000g0 Ethernet up 1000 full e1000g0 e1000g1 Ethernet up 1000 full e1000g1 # dladm show-link LINK CLASS MTU STATE OVER e1000g0 phys 1500 up -- e1000g1 phys 1500 up -- zonevnic0 vnic 1500 up e1000g1 # dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID zonevnic0 e1000g1 1000 2:8:20:e1:ac:39 random 0 Ifconfig: # ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2 inet 192.168.1.60 netmask ffffff00 broadcast 192.168.1.255 ether 0:c:29:60:4e:c2 e1000g1: flags=201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3 inet 0.0.0.0 netmask ff000000 ether 0:50:56:ac:51:6 lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 e1000g0: flags=202004841<UP,RUNNING,MULTICAST,DHCP,IPv6,CoS> mtu 1500 index 2 inet6 fe80::20c:29ff:fe60:4ec2/10 ether 0:c:29:60:4e:c2 ifconfig from the zone itself via zlogin -C :/ bash-3.2# ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 zonevnic0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2 inet 192.168.1.61 netmask ffffff00 broadcast 192.168.1.255 ether 2:8:20:e1:ac:39 lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 zonevnic0: flags=202004841<UP,RUNNING,MULTICAST,DHCP,IPv6,CoS> mtu 1500 index 2 inet6 fe80::8:20ff:fee1:ac39/10 ether 2:8:20:e1:ac:39 bash-3.2# arp -an Net to Media Table: IPv4 Device IP Address Mask Flags Phys Addr ------ -------------------- --------------- -------- --------------- zonevnic0 192.168.1.72 255.255.255.255 o 00:14:5e:45:b9:60 zonevnic0 192.168.1.68 255.255.255.255 o 00:14:5e:45:b9:60 zonevnic0 192.168.1.61 255.255.255.255 SPLA 02:08:20:e1:ac:39 zonevnic0 192.168.1.133 255.255.255.255 o 00:15:f2:1d:48:c2 zonevnic0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00 bash-3.2# snoop -r Using device zonevnic0 (promiscuous mode) 192.168.1.133 -> (broadcast) ARP C Who is 192.168.1.133, 192.168.1.133 ? 192.168.1.68 -> 224.0.1.1 NTP broadcast [st=3] (2009-01-13 04:21:45.35306) 192.168.1.68 -> 192.168.1.254 ARP R 192.168.1.68, 192.168.1.68 is 0:14:5e:45:b9:60 192.168.1.68 -> (broadcast) ARP C Who is 192.168.1.68, 192.168.1.68 ? fe80::214:5eff:fe45:b960 -> ff02::1:2 DHCPv6 Solicit xid=6a3a7 IAs=1 fe80::8:20ff:fee1:ac39 -> ff02::1:2 DHCPv6 Solicit xid=58244d IAs=1 192.168.1.68 -> 192.168.1.254 ARP R 192.168.1.68, 192.168.1.68 is 0:14:5e:45:b9:60 192.168.1.60 -> (broadcast) ARP C Who is 192.168.1.60, 192.168.1.60 ? 192.168.1.133 -> 192.168.1.254 ARP R 192.168.1.133, 192.168.1.133 is 0:15:f2:1d:48:c2 192.168.1.254 -> (broadcast) ARP C Who is 192.168.1.72, 192.168.1.72 ? 192.168.1.72 -> 192.168.1.254 ARP R 192.168.1.72, 192.168.1.72 is 0:14:5e:45:b9:60 I''ve stayed up until 4:30am pulling hair. What am I doing wrong? - Jonathan -- This message posted from opensolaris.org
Jonathan Wheeler
2009-Jan-12 15:53 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Quick follow-up with some further tests. Shared-IP zone over vnic instead, same results. Shared-IP zone over the pnic directly (e1000g1), no problems - works as expected. I hope that rules out anything funky happening on my network or VMware itself, as the only changes are internal to Solaris, rather than the physical (virtual VM) host itself. - Jonathan -- This message posted from opensolaris.org
Steffen Weiberle
2009-Jan-12 18:50 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
On 01/12/09 10:53, Jonathan Wheeler wrote:> Quick follow-up with some further tests. > > Shared-IP zone over vnic instead, same results. > Shared-IP zone over the pnic directly (e1000g1), no problems - works as expected. > > I hope that rules out anything funky happening on my network or VMware itself, as the only changes are internal to Solaris, rather than the physical (virtual VM) host itself. > > - JonathanSince shared IP may have an effect in that e1000g0 could get involved somehow, I would stay with exclusive. I am wondering whether the VMware e1000g driver is getting confused with VNICs. What does the arp table look like on a remote system? What happens if you set the MAC address to be close to the hardware one (once you verify it is not in use elsewhere)? Such as 0:50:56:ac:51:16. Also, your snoop does not show any traffic to or from .61, which is the IP address for your zone. You may also want to snoop on e1000g1, since all traffic to the VNIC should come in on that interface. Snooping on the VNIC will only show what has been classified for that VNIC. Steffen
Jonathan Wheeler
2009-Jan-13 12:42 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Steffen Weiberle wrote:> On 01/12/09 10:53, Jonathan Wheeler wrote: >> Quick follow-up with some further tests. >> >> Shared-IP zone over vnic instead, same results. >> Shared-IP zone over the pnic directly (e1000g1), no problems - works >> as expected. >> >> I hope that rules out anything funky happening on my network or VMware >> itself, as the only changes are internal to Solaris, rather than the >> physical (virtual VM) host itself. >> >> - Jonathan >Hi Steffen, thanks for the follow up; I''ve had some weird and hard to repeat results today while continuing to dig into this.> I am wondering whether the VMware e1000g driver is getting confused with > VNICs. What does the arp table look like on a remote system?The arp tables on the remote systems correctly register the mac address of the vnic when in both modes, and obviously the shared pnic worked too.> What happens if you set the MAC address to be close to the hardware one > (once you verify it is not in use elsewhere)? Such as 0:50:56:ac:51:16.I hadn''t thought to try that. Testing a dedicated IP-instance, using a vnic with different mac addresses: Fixed: 0:50:56:ac:51:16 - worked. Woohoo! Fixed: 2:50:56:ac:51:16 - also worked. Random: 2:8:20:c9:bb:54 - also worked......waitaminute!!! Feeling like I bit of an idiot I persisted with trying different random macs, and they all kept working perfectly, then I remembered that I had been using zonevnic0 last night, but today I''ve been using zonevnic1 for my testing. What''s in a zero... after all this is unix, we start at zero...right?! I had tried about further 4 random mac addresses on zonevnic1 without issue each time. I switched back to a random mac on zonevnic0 and immediately hit my problem again. For the record, the zonevnic0 MAC was "Random: 02:08:20:f1:40:9a", which is pretty close to what was in use earlier. Next, I created deleted zonevnic0, and created zonevnic1 with that same 02:08:20:f1:40:9a as a fixed MAC. It worked correctly - not a driver/switch issue then. Here is where it gets really weird. To prove I wasn''t going mad, I once more used that same static mac address (02:08:20:f1:40:9a) back on a zonevnic0 and repeated the test... and it worked. ??? So it''s almost as though there is an issue with using a 0 in the name, AND changing mac addresses at the same time? Since then, I''ve tried every possible combination and I haven''t been able to recreate the problem. I have rebooted the hosts a number of times, maybe something just needed to be flushed out somewhere. I''ll keep trying to find a scenario to replicate the problem reliably, and I''m also building a machine using virtual box to rule out VMware. Final point of interest: Last night when I was having the issues, I did try snooping the pnic and not only was the return traffic not hitting the vnic, it wasn''t hitting the pnic either. Based on what you''ve said here, I should have been able to see the traffic on the pnic even if there was a crossbow classification issues. I would normally have diagnosed this as being a switch problem if it weren''t for the fact that arp requests were getting through ok. Jonathan -- This message posted from opensolaris.org
Jonathan Wheeler
2009-Jan-14 16:54 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Hi Folks, I''ve been contacted offlist with a request for further updates and information, and tonight I discovered something really weird well worth sharing. Apologies for another long post! First an update on the changes to my test environment: I created a new vnic on e1000g1 called dnsvnic0, and created/cloned my sparse-template zone into a new sparseroot zone named dns, which uses dnsvnic0, with the IP address 192.168.1.62. The zone booted and I was straight into this problem again. I hadn''t been able to get my sparse-template zone to fault again, but immediately after creating a new vnic/zone, I was back to having this elusive yet frustrating issue. Just as a refresher, my Solaris server here is VM running under VMware ESXi 3.5u3 (with all current patches). An extra layer of virtualisation does add extra questions, so I tried a ping test that would be entirely internal to the ESX host.; pinging the global zone from the non-global [dns] zone. Traffic test #1>From within the dns zone:bash-3.2# ping 192.168.1.60 no answer from 192.168.1.60 bash-3.2# arp -an Net to Media Table: IPv4 Device IP Address Mask Flags Phys Addr ------ -------------------- --------------- -------- --------------- dnsvnic0 192.168.1.61 255.255.255.255 o 02:08:20:be:66:8e dnsvnic0 192.168.1.60 255.255.255.255 00:0c:29:60:4e:c2 dnsvnic0 192.168.1.62 255.255.255.255 SPLA 02:08:20:ff:77:4f dnsvnic0 192.168.1.133 255.255.255.255 o 00:15:f2:1d:48:c2 dnsvnic0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00 Arp packets *are* returning. ICMP however are *not*. snoop from the global zone on the e1000g1 interface (which the vnic is running on): # snoop -d e1000g1 arp or icmp Using device e1000g1 (promiscuous mode) 192.168.1.62 -> (broadcast) ARP C Who is 192.168.1.60, persephone ? persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is 0:c:29:60:4e:c2 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence number: 0) 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence number: 1) 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence number: 2) 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence number: 3) 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence number: 4) (and so on...) # snoop -d e1000g0 arp or icmp (which only the global zone is using) Using device e1000g0 (promiscuous mode) 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence number: 0) persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 0) 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence number: 1) persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 1) 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence number: 2) persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 2) 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence number: 3) persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 3) 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence number: 4) persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 4) So the global zone is replying to the non-global zone, ''dns'' just isn''t seeing the replies. This is sounding a lot like a weird vswitch bug. Next I decided to try zone-to-zone traffic.: Server - vnic - IP Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61 DNS - dnsvnic0 (via e1000g1) - 192.168.1.62 This worked... DNS could ping Zone-template. What''s really surprised my was that that my snoop on e1000g1 was showing the traffic. It was my understanding that vnic-to-vnic traffic that''s attached to the same pnic never actually went across the wire, so why is snoop on a physical interface showing vnic <> vnic traffic ? A) Something in crossbow isn''t working properly. B) I''m misunderstanding how vnics talk to each other. I understand etherstubs, but it just makes sense that inter-zone traffic shouldn''t be sending traffic down a bottleneck like a pNIC when it''s all *internal* anyway. C) The traffic isn''t actually going out the physical interface across the wire, but it is going via the logical concept of the e1000g1 interface, which snoop is reporting on - which is rather confusing to an end user like me trying to diagnose this using snoop :( Can anyone clarify this one for me? The WTF moment of the night was this: vSwitches security in ESX is configured like this by default: Promiscuous Mode: Disabled MAC Address Changes: Accept Forged Transmits: Accept These sound like reasonable defaults to me, toggling the Promiscuous flag to my understanding would pretty much turn the vSwitch into a "vHub"! I left a [non-returning] ping running between dns and the global zone, and decided to try enabling Promiscuous mode anyway. No change. I started a snoop up on e1000g1, and suddenly the sparse-template <> dns ping that I started in another terminal moments ago started working. I disabled the snoop, and it stopped working again. !!!? Enabling the promiscuous flag on the e1000g1 driver is suddenly "fixing" my traffic problem. My best interpretation of this data is that 1 of 3 things isn''t working, and I''m starting to get out of my depth here fast. A) Crossbow itself is doing something ''funny'' with the way traffic is being passed on to the vswitch, which is causing it to not send traffic for this mac address down the correct virtual port on the switch. Arp spoofing is common enough and both of those options are already enabled so it''s something else which is causing it to get confused it would seem. Sadly there isn''t any interface to the vSwitch that I''m aware of to pull some stats/logs from. Funny promiscous ARPs? sending traffic down both pnics? something else to confuse the vswitch? I''m out of skills to troubleshoot this option any further. B) The vSwitch in ESXi has a bug. If so, why is it only effecting crossbow... ESX is very widely used so if there was a glaring bug in the vSwitch ethernet implementation it would be very common and public knowledge. Crossbow is new enough; is it possible that I''m the first to have tried this configuration under ESX and thus am the first to notice this issue? There aren''t any other options within ESX that I''m aware of that I can try to get some further data on the vSwitch itself, so I''m at a loss as to how I troubleshoot this one further. I''m also just using the free ESXi, so I can''t contact VMware for support on this and at this point it would be a pretty vauge bug report anyway :/ C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug in it, or the solaris e1000g driver has a bug when sending crossbow traffic across it (or a combination of the two). The intel pro 1000 is a very common server NIC, and I''d be gobsmacked if there was a bug with a real (non-virtual) e1000g adapter that the Sun folk hadn''t picked up in their prerelease testing. The only option for vNICs within ESX, for a 64-bit solaris host, is the e1000 NIC. I trying to setup a 32-bit host to see what NIC that ends up with. If this provides different result, that at least gives us some better information on where to start looking! Any further directions or feedback would be most welcome. If I''m heading in the wrong direction, please do tell me :) Jonathan -- This message posted from opensolaris.org
Nicolas Droux
2009-Jan-16 01:38 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Hi Jonathan, On Jan 14, 2009, at 9:54 AM, Jonathan Wheeler wrote:> Hi Folks, > > I''ve been contacted offlist with a request for further updates and > information, and tonight I discovered something really weird well > worth sharing. Apologies for another long post! > > First an update on the changes to my test environment: > I created a new vnic on e1000g1 called dnsvnic0, and created/cloned > my sparse-template zone into a new sparseroot zone named dns, which > uses dnsvnic0, with the IP address 192.168.1.62. > The zone booted and I was straight into this problem again. I hadn''t > been able to get my sparse-template zone to fault again, but > immediately after creating a new vnic/zone, I was back to having > this elusive yet frustrating issue. > > Just as a refresher, my Solaris server here is VM running under > VMware ESXi 3.5u3 (with all current patches). An extra layer of > virtualisation does add extra questions, so I tried a ping test that > would be entirely internal to the ESX host.; pinging the global zone > from the non-global [dns] zone. > > Traffic test #1 >> From within the dns zone: > bash-3.2# ping 192.168.1.60 > no answer from 192.168.1.60So what is 192.168.1.60? I guess it''s the global zone, but e1000g0 or e1000g1? If it''s e1000g0 but dnsvnic0 is created on e1000g1 there will be no virtual switching between these data-links.> > bash-3.2# arp -an > Net to Media Table: IPv4 > Device IP Address Mask Flags Phys Addr > ------ -------------------- --------------- -------- --------------- > dnsvnic0 192.168.1.61 255.255.255.255 o 02:08:20:be: > 66:8e > dnsvnic0 192.168.1.60 255.255.255.255 00:0c: > 29:60:4e:c2 > dnsvnic0 192.168.1.62 255.255.255.255 SPLA 02:08:20:ff: > 77:4f > dnsvnic0 192.168.1.133 255.255.255.255 o 00:15:f2:1d: > 48:c2 > dnsvnic0 224.0.0.0 240.0.0.0 SM 01:00:5e: > 00:00:00 > Arp packets *are* returning. ICMP however are *not*. > > snoop from the global zone on the e1000g1 interface (which the vnic > is running on): > # snoop -d e1000g1 arp or icmp > Using device e1000g1 (promiscuous mode) > 192.168.1.62 -> (broadcast) ARP C Who is 192.168.1.60, persephone ? > persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is 0:c: > 29:60:4e:c2 > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 0) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 1) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 2) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 3) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 4) > (and so on...) > > # snoop -d e1000g0 arp or icmp (which only the global zone is using) > Using device e1000g0 (promiscuous mode) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 0) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 0) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 1) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 1) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 2) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 2) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 3) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 3) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 4) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 4) > > So the global zone is replying to the non-global zone, ''dns'' just > isn''t seeing the replies. > This is sounding a lot like a weird vswitch bug.No necessarily. It depends on how you wired your NICs. If e1000g0 and e1000g1 are connected to the same switch, then the packet can go from dnsvnic0->e1000g1->switch->e1000g0->global zone. You may not see the reply come back to dnsvnic0 via global_zone->e1000g0->switch->e1000g1 due to the same problem you described initially with unicast packets not making it to the VNIC in the VMware VM.> > > Next I decided to try zone-to-zone traffic.: > Server - vnic - IP > Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61 > DNS - dnsvnic0 (via e1000g1) - 192.168.1.62 > > This worked... DNS could ping Zone-template.Because in this case you are going through the virtual switch.> > What''s really surprised my was that that my snoop on e1000g1 was > showing the traffic. It was my understanding that vnic-to-vnic > traffic that''s attached to the same pnic never actually went across > the wire, so why is snoop on a physical interface showing vnic <> > vnic traffic ?That''s done by design to allow the global zone/dom0 see all traffic exchanged between the VMs/Zones. It''s similar to a monitoring port on a physical switch.> > > A) Something in crossbow isn''t working properly. > B) I''m misunderstanding how vnics talk to each other. I understand > etherstubs, but it just makes sense that inter-zone traffic > shouldn''t be sending traffic down a bottleneck like a pNIC when it''s > all *internal* anyway. > C) The traffic isn''t actually going out the physical interface > across the wire, but it is going via the logical concept of the > e1000g1 interface, which snoop is reporting on - which is rather > confusing to an end user like me trying to diagnose this using > snoop :( > > Can anyone clarify this one for me? > > The WTF moment of the night was this: > vSwitches security in ESX is configured like this by default: > Promiscuous Mode: Disabled > MAC Address Changes: Accept > Forged Transmits: Accept > > These sound like reasonable defaults to me, toggling the Promiscuous > flag to my understanding would pretty much turn the vSwitch into a > "vHub"! > > I left a [non-returning] ping running between dns and the global > zone, and decided to try enabling Promiscuous mode anyway. > No change. > > I started a snoop up on e1000g1, and suddenly the sparse-template <> > dns ping that I started in another terminal moments ago started > working. I disabled the snoop, and it stopped working again. > > !!!? > > Enabling the promiscuous flag on the e1000g1 driver is suddenly > "fixing" my traffic problem. > > My best interpretation of this data is that 1 of 3 things isn''t > working, and I''m starting to get out of my depth here fast. > > A) Crossbow itself is doing something ''funny'' with the way traffic > is being passed on to the vswitch, which is causing it to not send > traffic for this mac address down the correct virtual port on the > switch. Arp spoofing is common enough and both of those options are > already enabled so it''s something else which is causing it to get > confused it would seem. Sadly there isn''t any interface to the > vSwitch that I''m aware of to pull some stats/logs from. > Funny promiscous ARPs? sending traffic down both pnics? something > else to confuse the vswitch? I''m out of skills to troubleshoot this > option any further. > > B) The vSwitch in ESXi has a bug. If so, why is it only effecting > crossbow... ESX is very widely used so if there was a glaring bug in > the vSwitch ethernet implementation it would be very common and > public knowledge. Crossbow is new enough; is it possible that I''m > the first to have tried this configuration under ESX and thus am the > first to notice this issue? > There aren''t any other options within ESX that I''m aware of that I > can try to get some further data on the vSwitch itself, so I''m at a > loss as to how I troubleshoot this one further. > I''m also just using the free ESXi, so I can''t contact VMware for > support on this and at this point it would be a pretty vauge bug > report anyway :/ > > C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug > in it, or the solaris e1000g driver has a bug when sending crossbow > traffic across it (or a combination of the two). > The intel pro 1000 is a very common server NIC, and I''d be > gobsmacked if there was a bug with a real (non-virtual) e1000g > adapter that the Sun folk hadn''t picked up in their prerelease > testing. > > The only option for vNICs within ESX, for a 64-bit solaris host, is > the e1000 NIC. I trying to setup a 32-bit host to see what NIC that > ends up with. If this provides different result, that at least gives > us some better information on where to start looking! > > Any further directions or feedback would be most welcome. If I''m > heading in the wrong direction, please do tell me :)I have a theory. When you create a VNIC, Crossbow will try to associate the unicast MAC address with the NIC. Most NICs have hardware unicast filters which allow traffic for multiple unicast addresses to be received without turning the NIC in promiscuous mode. e1000g provides multiple such slots for unicast addresses. What could be happening is that e1000g running in the VM happily allows Crossbow to program the unicast address for the VNIC address, but the VMware back-end driver or virtual switch doesn''t know about that address. So all broadcast and multicast packets are going in and out as expected, all traffic from the VNIC are going out without a problem, but when unicast packets are coming back for the unicast address of the VNIC, they never make it to the VM. If you simply enable promiscuous mode on the VMware virtual switch, then it will take these packets, but the back-end driver instance associated with e1000g might still filter out these packets by default and dropping them. In order to see the packets you have to turn on promiscuous mode on e1000g1 itself which probably causes the VMWare back-end to send all packets up. If this theory is correct, what would help is allow the VMware back- end to send up all packets received from the VMware virtual switch without filtering. But I don''t know if VMware provides that option. Nicolas.> > > Jonathan > -- > This message posted from opensolaris.org > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Jonathan Wheeler
2009-Jan-16 17:15 UTC
[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic
Nicolas Droux wrote:> Hi Jonathan,Hi Nicolas, thanks so much for your input. I''m a LOT closer to understanding what''s going on here now. What follows is another very long email I''m sorry. This is a full day''s research condensed in the the shortest email I could manage without fear of leaving out anything important!>> Just as a refresher, my Solaris server here is VM running under VMware >> ESXi 3.5u3 (with all current patches). An extra layer of >> virtualisation does add extra questions, so I tried a ping test that >> would be entirely internal to the ESX host.; pinging the global zone >> from the non-global [dns] zone. >> >> Traffic test #1 >>> From within the dns zone: >> bash-3.2# ping 192.168.1.60 >> no answer from 192.168.1.60 > > So what is 192.168.1.60? I guess it''s the global zone, but e1000g0 or > e1000g1?Yes it''s the global zone, which is running on e1000g0. The zone was running on e1000g1.> If it''s e1000g0 but dnsvnic0 is created on e1000g1 there will be no > virtual switching between these data-links.Ok, thanks for clearing that up for me. I''m still getting my head around difference of a shared kernel, but non-shared network stacks. The point that I was trying to make with this test was that traffic wasn''t going over any physical links. Unfortunately we have 2 levels of virtualisation going on here (ESX & Crossbow) which makes the terminologies that little bit harder to visualise. In this case the traffic was leaving the zone and going over the "wire" to talk to the global zone. That "wire" is a VMware vSwitch so the network traffic in this case was entirely self contained within the ESX server. The actual physical NIC in the physical server wasn''t used, which allowed me to rule that as a cause of this issue, along with any physical network switches :)>> So the global zone is replying to the non-global zone, ''dns'' just >> isn''t seeing the replies. >> This is sounding a lot like a weird vswitch bug. > > No necessarily. It depends on how you wired your NICs. If e1000g0 and > e1000g1 are connected to the same switch,Yeah, they are.> then the packet can go from > dnsvnic0->e1000g1->switch->e1000g0->global zone.That''s right. A "vSwitch" in this case though.> You may not see the > reply come back to dnsvnic0 via global_zone->e1000g0->switch->e1000g1 > due to the same problem you described initially with unicast packets not > making it to the VNIC in the VMware VM.Well it _should_ be working this way, it''s frustrating that this isn''t happening. Where else would it go?>> Next I decided to try zone-to-zone traffic.: >> Server - vnic - IP >> Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61 >> DNS - dnsvnic0 (via e1000g1) - 192.168.1.62 >> >> This worked... DNS could ping Zone-template. > > Because in this case you are going through the virtual switch.I expected that it would, but it''s always encouraging to actually see a successful test for a change! Now when you say "virtual switch", this time we''re talking about the crossbow internal switch and not the VMware vSwitch. I just wanted to point that out for the sake of clarity as we keep digging deeper into this.>> What really surprised me was that that my snoop on e1000g1 was >> showing the traffic. It was my understanding that vnic-to-vnic traffic >> that''s attached to the same pnic never actually went across the wire, >> so why is snoop on a physical interface showing vnic <> vnic traffic ? > > That''s done by design to allow the global zone/dom0 see all traffic > exchanged between the VMs/Zones. It''s similar to a monitoring port on a > physical switch.Ah, thanks for clearing that one up :)>> A) Something in crossbow isn''t working properly. >> B) I''m misunderstanding how vnics talk to each other. I understand >> etherstubs, but it just makes sense that inter-zone traffic shouldn''t >> be sending traffic down a bottleneck like a pNIC when it''s all >> *internal* anyway. >> C) The traffic isn''t actually going out the physical interface across >> the wire, but it is going via the logical concept of the e1000g1 >> interface, which snoop is reporting on - which is rather confusing to >> an end user like me trying to diagnose this using snoop :( >> >> Can anyone clarify this one for me?Based on your previous comment above, you''re saying that the answer is C)? So just to confirm that point, as it''s pretty crucial that I understand this distinction correctly; "snoop -d e1000g1" is showing traffic that _isn''t_ actually going across the ''wire'' on that ''physical'' interface, but rather traffic that is passing "internally, *behind* the physical interface" - to make observability easier for administrators from the global zone. If I were able to watch the switch port that e1000g1 was plugged into, I''d see no packets doing a return loop?>> The WTF moment of the night was this: >> vSwitches security in ESX is configured like this by default: >> Promiscuous Mode: Disabled >> MAC Address Changes: Accept >> Forged Transmits: Accept >> >> These sound like reasonable defaults to me, toggling the Promiscuous >> flag to my understanding would pretty much turn the vSwitch into a >> "vHub"! >> >> I left a [non-returning] ping running between dns and the global zone, >> and decided to try enabling Promiscuous mode anyway. >> No change. >> >> I started a snoop up on e1000g1, and suddenly the sparse-template <> >> dns ping that I started in another terminal moments ago started >> working. I disabled the snoop, and it stopped working again. >> >> !!!? >> >> Enabling the promiscuous flag on the e1000g1 driver is suddenly >> "fixing" my traffic problem. >> >> My best interpretation of this data is that 1 of 3 things isn''t >> working, and I''m starting to get out of my depth here fast. >> >> A) Crossbow itself is doing something ''funny'' with the way traffic is >> being passed on to the vswitch, which is causing it to not send >> traffic for this mac address down the correct virtual port on the >> switch. Arp spoofing is common enough and both of those options are >> already enabled so it''s something else which is causing it to get >> confused it would seem. Sadly there isn''t any interface to the vSwitch >> that I''m aware of to pull some stats/logs from. >> Funny promiscous ARPs? sending traffic down both pnics? something else >> to confuse the vswitch? I''m out of skills to troubleshoot this option >> any further. >> >> B) The vSwitch in ESXi has a bug. If so, why is it only effecting >> crossbow... ESX is very widely used so if there was a glaring bug in >> the vSwitch ethernet implementation it would be very common and public >> knowledge. Crossbow is new enough; is it possible that I''m the first >> to have tried this configuration under ESX and thus am the first to >> notice this issue? >> There aren''t any other options within ESX that I''m aware of that I can >> try to get some further data on the vSwitch itself, so I''m at a loss >> as to how I troubleshoot this one further. >> I''m also just using the free ESXi, so I can''t contact VMware for >> support on this and at this point it would be a pretty vauge bug >> report anyway :/ >> >> C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug in >> it, or the solaris e1000g driver has a bug when sending crossbow >> traffic across it (or a combination of the two). >> The intel pro 1000 is a very common server NIC, and I''d be gobsmacked >> if there was a bug with a real (non-virtual) e1000g adapter that the >> Sun folk hadn''t picked up in their prerelease testing. >> >> The only option for vNICs within ESX, for a 64-bit solaris host, is >> the e1000 NIC. I trying to setup a 32-bit host to see what NIC that >> ends up with. If this provides different result, that at least gives >> us some better information on where to start looking! >> >> Any further directions or feedback would be most welcome. If I''m >> heading in the wrong direction, please do tell me :) > > I have a theory. > > When you create a VNIC, Crossbow will try to associate the unicast MAC > address with the NIC. Most NICs have hardware unicast filters which > allow traffic for multiple unicast addresses to be received without > turning the NIC in promiscuous mode. e1000g provides multiple such slots > for unicast addresses.I didn''t realise that. I must have fallen behind a bit on modern network card technology. I take it that they is a performance penalty when running in promiscuous mode to handle multiple mac addresses as the filtering is no longer done in hardware by the NIC itself?> What could be happening is that e1000g running in the VM happily allows > Crossbow to program the unicast address for the VNIC address, but the > VMware back-end driver or virtual switch doesn''t know about that > address. So all broadcast and multicast packets are going in and out as > expected, all traffic from the VNIC are going out without a problem, but > when unicast packets are coming back for the unicast address of the > VNIC, they never make it to the VM.That makes a lot of sense, and I think you''re quite correct about that. It''s either that or ESX is getting upset with promiscuous being enabled on the NIC and as a security precaution it''s not allowing the traffic to be delivered to the virtual NIC in the VM. (Explored further down this email) I''ve only experienced these weird issues while using crossbow but if the above is true than this is not a crossbow problem per se all it; it''s simply that crossbow is adding mac addresses to the [VMware] e1000g card (or enabling promiscuous mode) which is causing a problem at some layer within ESX, and there haven''t been any other networking scenarios in which this would have happened prior to crossbow. (Maybe network teaming though this is not generally done *within* a VM, there is little-to-no point!). If this is the heart of the issue, then I should be able to replicate this without needing to use a zone at all, provided I can setup crossbow in the global zone in such a way that it uses different mac addresses depending on the destination.... Now that I think about this, I think I did hit this when I started off with just the 1 NIC in the VM. I moved to a second e1000, seperating the global/zone traffic as a sanity check quite early on.... hrm.> If you simply enable promiscuous mode on the VMware virtual switch, then > it will take these packets, but the back-end driver instance associated > with e1000g might still filter out these packets by default and dropping > them. In order to see the packets you have to turn on promiscuous mode > on e1000g1 itself which probably causes the VMWare back-end to send all > packets up.Agreed. VMware ESX provides some granularity when it comes to setting promiscuous options. It can be set globally on the whole switch, or at a "port group" level, though I don''t see anywhere to toggle it on a vNIC or per VM basis. Port groups are an administrative abstraction of a group of ports on a specific vSwitch, a bit like a VLANs but without network level tagging (though they can be used to enable/setup VLANs too). I have ALL virtual machines running off 1 vSwitch so enabling promiscuous mode on the vSwitch (for all VMs) just to get my zone server working with crossbow isn''t an attractive option. Making a dedicated *promiscuous-on* port group that only contains this one solaris server may work better though.> If this theory is correct, what would help is allow the VMware back-end > to send up all packets received from the VMware virtual switch without > filtering. But I don''t know if VMware provides that option.I think that is what a port group will allow me to do, however by itself remember that this didn''t fix the problem. I had to have the VMs nic in promiscuous mode too for traffic to flow correctly. I was doing this (accidentally at the time) by running snoop. Is there a better way to enable promiscuous mode on an interface within Solaris permanently? All I could dig up with google was this: http://www.kernelfaq.com/2008/04/enabling-and-disabling-promiscuous-mode.html Mac Filtering. Going back to what you said earlier about the e1000g driver handling multiple unicast macs concurrently in hardware; in my googling I''ve discovered that not all e1000 NICs support this feature. *Is there a way to tell if the VMware emulated e1000 is advertising this feature in the ''hardware'' to the guest? *Is there a way to tell if crossbow is making use of it rather than falling back to the "less fancy" promiscuous mode instead? This would be most valuable to better understand what we''re seeing here! dladm show-linkprop isn''t showing my anything. I guess we''re not quite there yet? http://markmail.org/message/qiqygyqxt5t6qp5b My current working theory is this: *vSwitch layer* VMware ESX knows exactly which vSwitch ports are connected to a physical NIC uplinking the vSwitch to the physical world and which ports are connected to NICs within VMs. The vSwitch "host" ports should only ever have a single MAC address on them at any given time as they''re directly connected to a single NIC and it enforces this limit as a security measure. This would prevent mac spoofing attacks for example. Recall that by default within a vSwitch "MAC Address Changes" are allowed, as are "Forged Transmits", which strongly hints at the behaviour that I''m theorising. *NIC layer* I''m expecting that the VMware provided emulated e1000 NIC has no concept of MAC address slots on the vSwitch end - given the behaviour of 1 MAC address per port at the vSwitch level, why would it ever need to support multiple MACs? Within the VM however, crossbow is detecting an e1000 pNIC that does support multiple MACs and it''s making use of these slots for the VNIC''s MACs as they get added, rather than toggling promiscuous mode on the e1000g. **Outbound traffic** ESX is allowing the "forged transmits" from VNIC''s additional MAC address, and broadcasts/multicasts are being passed through both the vSwitch and the e1000 correctly. **Inbound traffic** *vSwitch layer* ESX knows which MAC address the e1000 has within the guest and it will have this entered into it''s MAC forwarding table for the port that the VM is connected to. Exactly what it''s doing with the VNICs MAC that is being broadcast around as ARP requests... I have no idea. Enabling promiscuous mode at the vSwitch level bypasses/disables the MAC forwarding table so now frames with the VNICs MAC are getting to the right switch port. This functionality alone still doesn''t fix the problem because: *NIC layer* The ESX end of the e1000 NIC only knows about the primary MAC address of the NIC so it''s not passing frames addressed to the VNIC''s MAC address into the VM guest''s end of the e1000 for further processing by crossbow. When snoop is started, the interface is set to promiscuous mode in the guest and this is being trapped by the ESX end of the e1000, which is also enabling promiscuous mode on it''s end. With all frames finally now passing into the guest end of the e1000, crossbow can do it''s job and everything starts working! Phew! I''m having to theorise much of the ESX behaviour as there is simply no way to get the information I need from ESX itself, but this model all seems to fit pretty well, don''t you think? Way forward: I can focus on testing the promiscuous mode behaviour on the vSwitch port group which may lead to a tidy work around at that level. At the NIC level if my theory is correct it would seem that I really need a way to make crossbow enable promiscuous mode on the NIC rather than adding a "hardware based MAC filter" to the e1000 as it doesn''t seem that this is going to work in a VMware ESX environment.> Nicolas.Jonathan -- This message posted from opensolaris.org
Hi Folks, I''ve spent another couple of days banging away on this and since there haven''t been any other updates to this thread I thought I may as well share my latest findings in the hopes that this will better enable someone to help me with this, or simply to steer others in the right direction. I''ve had some good success when running solaris in 32-bit mode, which in turn uses the pcn or vmxnet0 drivers rather than the e1000g. Just quickly on the vSwitch side for the futures archives: VMware vSwitch port groups do work as I had hoped; working around that layer of this problem for all 3 NICs. I am able to enable promiscuous mode on a port group only, rather than the entire vSwitch. I have put the crossbow NIC of my VM into this port group all by itself, while the rest of the nics & VMs sit on the main port group on the vSwitch.>From a security standpoint this does however mean that my global zone can see all traffic (from all VMs) going across the vSwitch by snooping the second dedicated "zones NIC"....which is definitely not ideal at all.I think it''s about as good as it''s going to get for me though and I''m the only person with root access in the global zone, so it''s going to have to do. The non-global zones are limited to their own vnic specific traffic still, as you would expect, so no security risk there. With the e1000g I still have the issue of needing to enable promiscuous mode on either the vnic or pnic for traffic to flow to the vnics when there are more than 1 vnics running on a physical interface... I have noticed that it doesn''t even matter if the vnic is in use. The simple act of "dladm create-vnic [....]"ing a second vnic will effective cut the network off for another zone which is using *it''s* own vnic...>From what I''ve been able to obversve so far, it is only the most recently created vnic that gets to pass traffic.The alphabetical naming and mac address order don''t seem to matter; it''s simply the newest VNIC to be created on an interface is the effective "active" one :( On the ESXi host, I''m able to see the following in the vmware logs when adding a vnic: ~ # tail -f /var/log/messages | grep kernel Jan 20 06:10:10 vmkernel: 0:23:08:32.777 cpu3:1579)Net: 229: 0x400000b: peer not allowed promiscuous, revoking setting Jan 20 06:11:28 vmkernel: 0:23:09:50.457 cpu2:8002)Net: 229: 0x400000b: peer not allowed promiscuous, revoking setting Jan 20 06:14:54 vmkernel: 0:23:13:16.790 cpu2:8002)Net: 229: 0x400000b: peer not allowed promiscuous, revoking setting Jan 20 06:15:26 vmkernel: 0:23:13:48.260 cpu0:1420)Net: 229: 0x400000b: peer not allowed promiscuous, revoking setting Jan 20 06:35:42 vmkernel: 0:23:34:04.042 cpu1:8003)Net: 229: 0x400000b: peer not allowed promiscuous, revoking setting Maybe the act of adding a vnic is pushing the new mac address into the VMware e1000, and since it doesn''t [appear to] support multiple MAC addresses at once, it''s just retaining the most recent MAC address. Seems...logical :) I''m not quite sure why the reverse process works however. When I remove a VNIC, the next most recently created VNIC starts working again. I''m sure the answer is in the code somewhere, but I''m not a developer that can read code :/ *PCnet/vmxnet With 64-bit Solaris, e1000 is the only network card that you can use (with or without VMware-tools), hence my continued efforts to find a way to get this working! I reconfigured VMware to treat my opensolaris VM as 32-bit, replaced both VMware virtual NICs with the "flexible" virtual NICs and booted opensolaris 32-bit mode by changing the grub boot flags. In 32-bit mode VMware emulates PCnet NICs to solaris, which show up under ifconfig as pcn0/1. After swapping all the relevant configuration across... it worked. And I mean everything REALLY worked. Multiple zones/vnics all running at once. The PCnet driver is a fairly poor performer, which is why VMware provide the optimised vmxnet driver with the vmware tools instead. I stuffed around until I managed to to get the optimised 32-bit vmxnet driver to drive the "pcn" hardware instead. When using this NIC driver, the following line is logged in the ESX syslog at certain points: Jan 20 11:23:33 vmkernel: 1:04:21:55.671 cpu1:379945)Net: 4222: unicastAddr 00:50:56:ac:14:6c; That MAC address is the real address that VMware assigned to the emulated vmxnet driver, not the crossbow ones. The vmxnet driver also works perfectly. Multiple VNICs can be active at the same time, without the need for any dodgy snooping as was the case with the emulated e1000. So for now, I''ve resolved myself to running my server in 32-bit mode as the best compromise. I really want the crossbow functionality for my zones and for the moment it would seem that the only way to get it under ESX is to run in 32-bit mode :/ Suggestions welcome. Jonathan -- This message posted from opensolaris.org
I was rather hoping that there might be an update on this? I spent considerable time researching this problem and while I now have sufficient work arounds in place to have a usable (a regrettably forced 32-bit) system, I remain concerned that this problem isn''t unique to my setup.>From what I''ve seen to date, anyone that tries to use opensolaris under VMware esx on 64-bit hardware (the majority now in 2009!), will be unable to use crossbow.That''s quite a biggie don''t you agree? -- This message posted from opensolaris.org
On Feb 1, 2009, at 7:14 PM, Jonathan Wheeler wrote:> I was rather hoping that there might be an update on this? > > I spent considerable time researching this problem and while I now > have sufficient work arounds in place to have a usable (a > regrettably forced 32-bit) system, I remain concerned that this > problem isn''t unique to my setup. > >> From what I''ve seen to date, anyone that tries to use opensolaris >> under VMware esx on 64-bit hardware (the majority now in 2009!), >> will be unable to use crossbow. > > That''s quite a biggie don''t you agree?Jonathan, This is not a Crossbow issue. There isn''t much we can do from Crossbow running in a guest if the packets are not even passed up to the emulated e1000g in that guest. VMware should allow you you to pass these unicast packets up to the VM and put the underlying physical NIC in promiscuous mode. There no easy way to workaround this problem from the guest beside putting the virtual e1000g NIC in promiscuous mode, using snoop for example. For the long term we were planning to allow a user to specify from dladm(1M) that a VNIC should not be using a hardware unicast slot, which would as a side effect put the underlying NIC in promiscuous mode and help in your case. For the short term you may want to try a different VM host. For example I''ve tried this from VirtualBox in the past and it works fine with an emulated e1000g. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
> On Feb 1, 2009, at 7:14 PM, Jonathan Wheeler wrote: > >> .... > > > >> From what I''ve seen to date, anyone that tries to use opensolaris > >> under VMware esx on 64-bit hardware (the majority now in 2009!), > >> will be unable to use crossbow. > > > > That''s quite a biggie don''t you agree? > > Jonathan,Hi Nicolas :)> This is not a Crossbow issue. There isn''t much we can > do from Crossbow > running in a guest if the packets are not even passed > up to the > emulated e1000g in that guest.I quite agree that this point is obviously beyond crossbow''s control, though at the same time it should be understood that it''s not really VMware''s "fault" for locking down ports using mac level security either. In my searching to date the only other scenarios that typically runs into this problem are IDS/IPS appliances, so it''s not very common out in the wild at all. The good news is that once an administrator *does* understand the nature of this problem, or perhaps more accurately "VMware ESX''s default vSwitch behaviour", vSwitch security can quite easily be reconfigured to allow for the "crossbow-friendly" handling of promiscuous mode on the switch port for the guest.> VMware should allow > you you to pass > these unicast packets up to the VMAnd it can, you just have to *know* to enable the promiscuous port mode with port groups.> and put the > underlying physical NIC > in promiscuous mode.When using the e1000 nic, this one is the problematic bit. VMware ESX will allow a guest to put it''s NIC into promiscuous mode as we''ve seen with my infamous "snoop test", which in turn behind the scenes sets the virtual port on the virtual switch to promiscuous mode too. This "opens the floodgates" so to speak, allowing crossbow to do it''s work. This isn''t an issue when using a 32-bit solaris guest, which use the pcn or vmxnic nics. I gather the explanation here is that crossbow is setting the promiscuous flag on the nic by default because these NICs don''t support hardware unicast mac filters.> There no easy way to workaround this problem from the > guest beside > putting the virtual e1000g NIC in promiscuous mode, > using snoop for > example. For the long term we were planning to allow > a user to specify > from dladm(1M) that a VNIC should not be using a > hardware unicast > slot, which would as a side effect put the underlying > NIC in > promiscuous mode and help in your case.Yes, that would be one surefire way solve this problem. Wooho! Even better, if there was a way to detect that the solaris instance is running in a virtual environment, and therefore DON''T use a hardware unicast slot.... problem solved dynamically! I understand that this only helps to solve a problem with VMware ESX and not other virtualisation products such as Vbox, however are the alternate virtualisation platforms actually benefiting from using this NIC feature; they are only simulating virtual hardware unicast slots anyway, so they must not be hardware accelerated anyway? This logic/codepath probably isn''t something for the crossbow team is it?... Would this be something more for the ON/e1000g driver developers to look into? I don''t really know how or to whom I should be gently encouraging this RFE :)> For the short term you may want to try a different VM > host. For > example I''ve tried this from VirtualBox in the past > and it works fine > with an emulated e1000g.I agree that is problem is so far unique to VMware ESX, however that is what my production enviroment uses, so while it''s great to have confirmed that this issue is only affecting VMware ESX.... well I, and any other future VMware ESX users, still need a workaround :) I''m hoping that this thread will provide a helpful reference for all future users (hi google!) that hit this same problem, but better yet, would it be possible to more formally document this now known limitation when using crossbow in a VMware ESX enviroment, on the official project page''s Network Virtualization and Resource Control (Crossbow) FAQ? Jonathan> Nicolas. > > -- > Nicolas Droux - Solaris Kernel Networking - Sun > Microsystems, Inc. > droux at sun.com - http://blogs.sun.com/droux > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow- > discuss-- This message posted from opensolaris.org
Hi Jonathan, On Feb 5, 2009, at 2:46 AM, Jonathan Wheeler wrote:> VMware ESX will allow a guest to put it''s NIC into promiscuous mode > as we''ve seen with my infamous "snoop test", which in turn behind > the scenes sets the virtual port on the virtual switch to > promiscuous mode too. This "opens the floodgates" so to speak, > allowing crossbow to do it''s work. > > This isn''t an issue when using a 32-bit solaris guest, which use the > pcn or vmxnic nics. > I gather the explanation here is that crossbow is setting the > promiscuous flag on the nic by default because these NICs don''t > support hardware unicast mac filters.Right, if these NICs don''t support multicast hardware slots, then we''ll put the NIC in promiscuous mode.>> There no easy way to workaround this problem from the >> guest beside >> putting the virtual e1000g NIC in promiscuous mode, >> using snoop for >> example. For the long term we were planning to allow >> a user to specify >> from dladm(1M) that a VNIC should not be using a >> hardware unicast >> slot, which would as a side effect put the underlying >> NIC in >> promiscuous mode and help in your case. > > Yes, that would be one surefire way solve this problem. Wooho! > > Even better, if there was a way to detect that the solaris instance > is running in a virtual environment, and therefore DON''T use a > hardware unicast slot.... problem solved dynamically!One of the main points of having an emulated e1000g device in some environments is that you can run an OS which doesn''t have a para- virtualized network driver, and not require the guest to be aware that it is running running as a guest. If the environment provides the emulation of a virtual device such as e1000g, it should provide full emulation of that device. If it leaves some features out which can change the behavior of that device, it should clearly state so.> I understand that this only helps to solve a problem with VMware ESX > and not other virtualisation products such as Vbox, however are the > alternate virtualisation platforms actually benefiting from using > this NIC feature; they are only simulating virtual hardware unicast > slots anyway, so they must not be hardware accelerated anyway?The issue here is not hardware acceleration. The main issue is that the emulated e1000g device has a different behavior than the real thing from the point of view of the guest.> This logic/codepath probably isn''t something for the crossbow team > is it?... > Would this be something more for the ON/e1000g driver developers to > look into? > I don''t really know how or to whom I should be gently encouraging > this RFE :)We don''t want to cripple the behavior of a driver because of the limitations of a particular virtualization environment. So I don''t think having logic in e1000g which does some detection and starts disabling features based on whether it is working as a guest of a particular virtualization environment would be the right thing to do architecturally. Another workaround would be to expose the number of unicast hardware slots as a read/write data-link property which could be set by the administrator. We wanted to expose the number of unicast slots through a read-only property anyway. By making this property writable, this would allow you to specify how many hardware slots should be used by the NIC. If you were to set that property to "1" (the minimum, at least one entry is needed for the primary address), this would have the effect of putting the NIC in promiscuous mode when creating a VNIC on top of that NIC.> I''m hoping that this thread will provide a helpful reference for all > future users (hi google!) that hit this same problem, but better > yet, would it be possible to more formally document this now known > limitation when using crossbow in a VMware ESX enviroment, on the > official project page''s Network Virtualization and Resource Control > (Crossbow) FAQ?Sure, we''ll add this to the FAQ. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux