thr3ads.net - crossbow discuss - [crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jonathan Wheeler

2009-Jan-12 15:27 UTC

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Hi Folks,

I have a snv_105 sxce host that I just can''t get to work as expected
with crossbow + zones.
My test host persephone, is a virtual machine running under VMware ESXi 3.5,
with 2 virtual network cards (e1000), all on the same flat network/subnet.
It started life just 2 days ago with a clean install of snv_95, and I LUed to
105 yesterday.

To rule out any sharing issue, the first nic (e1000g0) is used only for the
global zone.
The second nic is used only by crossbow, for the vnic "zonevnic0",
which is bound to e1000g1.
sparse-template is the zone that I''ve been trying to get to work using
a dedicated IP instance using the vnic zonevnic0.

Using snoop in the zone, (or in the global zone, with "-d zonevnic0")
I can see broadcast/unicast traffic going out, but only broadcast & ARP
replies are coming back in again?!
So my arp is full and working as expected, I don''t get any ping replies
and needless to say other hosts can''t talk to the zone.
I just can''t seem to get any unicast to return to the non-global zone.

I left sparse-template pinging my desktop, and with snoop running on my desktop
I can see both the ICMP request and the ICMP reply that I''m sending
back again, it just never makes it. (I also confirmed that TCP syns come through
too)

I''m stumped, what could be the issue? I haven''t done any
firewalling or custom flows/queues or anything fancy at all!

Zone Config:
zonename: sparse-template
zonepath: /zones/sparse-template
brand: native
autoboot: true
bootargs: 
pool: 
limitpriv: 
scheduling-class: 
ip-type: exclusive
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address not specified
        physical: zonevnic0
        defrouter not specified

Vnic config:
# dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
e1000g0      Ethernet             up         1000   full      e1000g0
e1000g1      Ethernet             up         1000   full      e1000g1
# dladm show-link
LINK        CLASS    MTU    STATE    OVER
e1000g0     phys     1500   up       --
e1000g1     phys     1500   up       --
zonevnic0   vnic     1500   up       e1000g1
# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS           MACADDRTYPE         VID
zonevnic0    e1000g1      1000   2:8:20:e1:ac:39      random              0

Ifconfig:
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
	inet 127.0.0.1 netmask ff000000 
e1000g0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500
index 2
	inet 192.168.1.60 netmask ffffff00 broadcast 192.168.1.255
	ether 0:c:29:60:4e:c2 
e1000g1: flags=201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500
index 3
	inet 0.0.0.0 netmask ff000000 
	ether 0:50:56:ac:51:6 
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252
index 1
	inet6 ::1/128 
e1000g0: flags=202004841<UP,RUNNING,MULTICAST,DHCP,IPv6,CoS> mtu 1500
index 2
	inet6 fe80::20c:29ff:fe60:4ec2/10 
	ether 0:c:29:60:4e:c2

ifconfig from the zone itself via zlogin -C :/
bash-3.2# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
        inet 127.0.0.1 netmask ff000000 
zonevnic0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu
1500 index 2
        inet 192.168.1.61 netmask ffffff00 broadcast 192.168.1.255
        ether 2:8:20:e1:ac:39 
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252
index 1
        inet6 ::1/128 
zonevnic0: flags=202004841<UP,RUNNING,MULTICAST,DHCP,IPv6,CoS> mtu 1500
index 2
        inet6 fe80::8:20ff:fee1:ac39/10 
        ether 2:8:20:e1:ac:39

bash-3.2# arp -an
Net to Media Table: IPv4
Device   IP Address               Mask      Flags      Phys Addr
------ -------------------- --------------- -------- ---------------
zonevnic0 192.168.1.72         255.255.255.255 o        00:14:5e:45:b9:60
zonevnic0 192.168.1.68         255.255.255.255 o        00:14:5e:45:b9:60
zonevnic0 192.168.1.61         255.255.255.255 SPLA     02:08:20:e1:ac:39
zonevnic0 192.168.1.133        255.255.255.255 o        00:15:f2:1d:48:c2
zonevnic0 224.0.0.0            240.0.0.0       SM       01:00:5e:00:00:00

bash-3.2# snoop -r
Using device zonevnic0 (promiscuous mode)
192.168.1.133 -> (broadcast)  ARP C Who is 192.168.1.133, 192.168.1.133 ?
192.168.1.68 -> 224.0.1.1    NTP  broadcast [st=3] (2009-01-13
04:21:45.35306)
192.168.1.68 -> 192.168.1.254 ARP R 192.168.1.68, 192.168.1.68 is
0:14:5e:45:b9:60
192.168.1.68 -> (broadcast)  ARP C Who is 192.168.1.68, 192.168.1.68 ?
fe80::214:5eff:fe45:b960 -> ff02::1:2    DHCPv6 Solicit xid=6a3a7 IAs=1
fe80::8:20ff:fee1:ac39 -> ff02::1:2    DHCPv6 Solicit xid=58244d IAs=1
192.168.1.68 -> 192.168.1.254 ARP R 192.168.1.68, 192.168.1.68 is
0:14:5e:45:b9:60
192.168.1.60 -> (broadcast)  ARP C Who is 192.168.1.60, 192.168.1.60 ?
192.168.1.133 -> 192.168.1.254 ARP R 192.168.1.133, 192.168.1.133 is
0:15:f2:1d:48:c2
192.168.1.254 -> (broadcast)  ARP C Who is 192.168.1.72, 192.168.1.72 ?
192.168.1.72 -> 192.168.1.254 ARP R 192.168.1.72, 192.168.1.72 is
0:14:5e:45:b9:60

I''ve stayed up until 4:30am pulling hair. What am I doing wrong?
- Jonathan
-- 
This message posted from opensolaris.org

Jonathan Wheeler

2009-Jan-12 15:53 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Quick follow-up with some further tests.

Shared-IP zone over vnic instead, same results.
Shared-IP zone over the pnic directly (e1000g1), no problems - works as
expected.

I hope that rules out anything funky happening on my network or VMware itself,
as the only changes are internal to Solaris, rather than the physical (virtual
VM) host itself.

- Jonathan
-- 
This message posted from opensolaris.org

Steffen Weiberle

2009-Jan-12 18:50 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

On 01/12/09 10:53, Jonathan Wheeler wrote:> Quick follow-up with some further tests.
> 
> Shared-IP zone over vnic instead, same results.
> Shared-IP zone over the pnic directly (e1000g1), no problems - works as
expected.
> 
> I hope that rules out anything funky happening on my network or VMware
itself, as the only changes are internal to Solaris, rather than the physical
(virtual VM) host itself.
> 
> - Jonathan
Since shared IP may have an effect in that e1000g0 could get involved 
somehow, I would stay with exclusive.

I am wondering whether the VMware e1000g driver is getting confused with 
VNICs. What does the arp table look like on a remote system?

What happens if you set the MAC address to be close to the hardware one 
(once you verify it is not in use elsewhere)? Such as 0:50:56:ac:51:16.

Also, your snoop does not show any traffic to or from .61, which is the 
IP address for your zone. You may also want to snoop on e1000g1, since 
all traffic to the VNIC should come in on that interface. Snooping on 
the VNIC will only show what has been classified for that VNIC.

Steffen

Jonathan Wheeler

2009-Jan-13 12:42 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Steffen Weiberle wrote:> On 01/12/09 10:53, Jonathan Wheeler wrote:
>> Quick follow-up with some further tests.
>>
>> Shared-IP zone over vnic instead, same results.
>> Shared-IP zone over the pnic directly (e1000g1), no problems - works 
>> as expected.
>>
>> I hope that rules out anything funky happening on my network or VMware 
>> itself, as the only changes are internal to Solaris, rather than the 
>> physical (virtual VM) host itself.
>>
>> - Jonathan
> Hi Steffen, thanks for the follow up; I''ve had some weird and hard to
repeat results today while continuing to dig into this.
> I am wondering whether the VMware e1000g driver is getting confused with 
> VNICs. What does the arp table look like on a remote system?The arp tables on the remote systems correctly register the mac address of the
vnic when in both modes, and obviously the shared pnic worked too.
 > What happens if you set the MAC address to be close to the hardware one 
> (once you verify it is not in use elsewhere)? Such as 0:50:56:ac:51:16.I hadn''t thought to try that.

Testing a dedicated IP-instance, using a vnic with different mac addresses:
Fixed: 0:50:56:ac:51:16 - worked. Woohoo!
Fixed: 2:50:56:ac:51:16 - also worked.
Random: 2:8:20:c9:bb:54 - also worked......waitaminute!!!

Feeling like I bit of an idiot I persisted with trying different random macs,
and they all kept working perfectly, then I remembered that I had been using
zonevnic0 last night, but today I''ve been using zonevnic1 for my
testing. What''s in a zero... after all this is unix, we start at
zero...right?!

I had tried about further 4 random mac addresses on zonevnic1 without issue each
time. I switched back to a random mac on zonevnic0 and immediately hit my
problem again.

For the record, the zonevnic0 MAC was "Random: 02:08:20:f1:40:9a",
which is pretty close to what was in use earlier.
Next, I created deleted zonevnic0, and created zonevnic1 with that same
02:08:20:f1:40:9a as a fixed MAC.

It worked correctly - not a driver/switch issue then.
Here is where it gets really weird. To prove I wasn''t going mad, I once
more used that same static mac address (02:08:20:f1:40:9a) back on a zonevnic0
and repeated the test... and it worked. ???

So it''s almost as though there is an issue with using a 0 in the name,
AND changing mac addresses at the same time?

Since then, I''ve tried every possible combination and I
haven''t been able to recreate the problem.
I have rebooted the hosts a number of times, maybe something just needed to be
flushed out somewhere.

I''ll keep trying to find a scenario to replicate the problem reliably,
and I''m also building a machine using virtual box to rule out VMware.

Final point of interest:
Last night when I was having the issues, I did try snooping the pnic and not
only was the return traffic not hitting the vnic, it wasn''t hitting the
pnic either. Based on what you''ve said here, I should have been able to
see the traffic on the pnic even if there was a crossbow classification issues.
I would normally have diagnosed this as being a switch problem if it
weren''t for the fact that arp requests were getting through ok.

Jonathan
-- 
This message posted from opensolaris.org

Jonathan Wheeler

2009-Jan-14 16:54 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Hi Folks,

I''ve been contacted offlist with a request for further updates and
information, and tonight I discovered something really weird well worth sharing.
Apologies for another long post!

First an update on the changes to my test environment:
I created a new vnic on e1000g1 called dnsvnic0, and created/cloned my
sparse-template zone into a new sparseroot zone named dns, which uses dnsvnic0,
with the IP address 192.168.1.62.
The zone booted and I was straight into this problem again. I hadn''t
been able to get my sparse-template zone to fault again, but immediately after
creating a new vnic/zone, I was back to having this elusive yet frustrating
issue.

Just as a refresher, my Solaris server here is VM running under VMware ESXi
3.5u3 (with all current patches). An extra layer of virtualisation does add
extra questions, so I tried a ping test that would be entirely internal to the
ESX host.; pinging the global zone from the non-global [dns] zone.

Traffic test #1>From within the dns zone:bash-3.2# ping 192.168.1.60
no answer from 192.168.1.60
bash-3.2# arp -an
Net to Media Table: IPv4
Device   IP Address               Mask      Flags      Phys Addr
------ -------------------- --------------- -------- ---------------
dnsvnic0 192.168.1.61         255.255.255.255 o        02:08:20:be:66:8e
dnsvnic0 192.168.1.60         255.255.255.255          00:0c:29:60:4e:c2
dnsvnic0 192.168.1.62         255.255.255.255 SPLA     02:08:20:ff:77:4f
dnsvnic0 192.168.1.133        255.255.255.255 o        00:15:f2:1d:48:c2
dnsvnic0 224.0.0.0            240.0.0.0       SM       01:00:5e:00:00:00
Arp packets *are* returning. ICMP however are *not*.

snoop from the global zone on the e1000g1 interface (which the vnic is running
on):
# snoop -d e1000g1 arp or icmp
Using device e1000g1 (promiscuous mode)
192.168.1.62 -> (broadcast)  ARP C Who is 192.168.1.60, persephone ?
  persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is
0:c:29:60:4e:c2
192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence number: 0)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence number: 1)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence number: 2)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence number: 3)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence number: 4)
(and so on...)

# snoop -d e1000g0 arp or icmp (which only the global zone is using)
Using device e1000g0 (promiscuous mode)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence number: 0)
  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 0)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence number: 1)
  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 1)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence number: 2)
  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 2)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence number: 3)
  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 3)
192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence number: 4)
  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence number: 4)

So the global zone is replying to the non-global zone, ''dns''
just isn''t seeing the replies.
This is sounding a lot like a weird vswitch bug.

Next I decided to try zone-to-zone traffic.:
Server - vnic - IP
Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61
DNS - dnsvnic0 (via e1000g1) - 192.168.1.62

This worked... DNS could ping Zone-template.
What''s really surprised my was that that my snoop on e1000g1 was
showing the traffic. It was my understanding that vnic-to-vnic traffic
that''s attached to the same pnic never actually went across the wire,
so why is snoop on a physical interface showing vnic <> vnic traffic ?

A) Something in crossbow isn''t working properly.
B) I''m misunderstanding how vnics talk to each other. I understand
etherstubs, but it just makes sense that inter-zone traffic shouldn''t
be sending traffic down a bottleneck like a pNIC when it''s all
*internal* anyway.
C) The traffic isn''t actually going out the physical interface across
the wire, but it is going via the logical concept of the e1000g1 interface,
which snoop is reporting on - which is rather confusing to an end user like me
trying to diagnose this using snoop :(

Can anyone clarify this one for me?

The WTF moment of the night was this:
vSwitches security in ESX is configured like this by default:
Promiscuous Mode: Disabled
MAC Address Changes: Accept
Forged Transmits: Accept

These sound like reasonable defaults to me, toggling the Promiscuous flag to my
understanding would pretty much turn the vSwitch into a "vHub"!

I left a [non-returning] ping running between dns and the global zone, and
decided to try enabling Promiscuous mode anyway.
No change.

I started a snoop up on e1000g1, and suddenly the sparse-template <> dns
ping that I started in another terminal moments ago started working. I disabled
the snoop, and it stopped working again.

!!!?

Enabling the promiscuous flag on the e1000g1 driver is suddenly
"fixing" my traffic problem.

My best interpretation of this data is that 1 of 3 things isn''t
working, and I''m starting to get out of my depth here fast.

A) Crossbow itself is doing something ''funny'' with the way
traffic is being passed on to the vswitch, which is causing it to not send
traffic for this mac address down the correct virtual port on the switch. Arp
spoofing is common enough and both of those options are already enabled so
it''s something else which is causing it to get confused it would seem.
Sadly there isn''t any interface to the vSwitch that I''m aware
of to pull some stats/logs from.
Funny promiscous ARPs? sending traffic down both pnics? something else to
confuse the vswitch? I''m out of skills to troubleshoot this option any
further.

B) The vSwitch in ESXi has a bug. If so, why is it only effecting crossbow...
ESX is very widely used so if there was a glaring bug in the vSwitch ethernet
implementation it would be very common and public knowledge. Crossbow is new
enough; is it possible that I''m the first to have tried this
configuration under ESX and thus am the first to notice this issue?
There aren''t any other options within ESX that I''m aware of
that I can try to get some further data on the vSwitch itself, so I''m
at a loss as to how I troubleshoot this one further.
I''m also just using the free ESXi, so I can''t contact VMware
for support on this and at this point it would be a pretty vauge bug report
anyway :/

C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug in it, or
the solaris e1000g driver has a bug when sending crossbow traffic across it (or
a combination of the two).
The intel pro 1000 is a very common server NIC, and I''d be gobsmacked
if there was a bug with a real (non-virtual) e1000g adapter that the Sun folk
hadn''t picked up in their prerelease testing.

The only option for vNICs within ESX, for a 64-bit solaris host, is the e1000
NIC. I trying to setup a 32-bit host to see what NIC that ends up with. If this
provides different result, that at least gives us some better information on
where to start looking!

Any further directions or feedback would be most welcome. If I''m
heading in the wrong direction, please do tell me :)

Jonathan
-- 
This message posted from opensolaris.org

Nicolas Droux

2009-Jan-16 01:38 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Hi Jonathan,

On Jan 14, 2009, at 9:54 AM, Jonathan Wheeler wrote:
> Hi Folks,
>
> I''ve been contacted offlist with a request for further updates and
> information, and tonight I discovered something really weird well  
> worth sharing. Apologies for another long post!
>
> First an update on the changes to my test environment:
> I created a new vnic on e1000g1 called dnsvnic0, and created/cloned  
> my sparse-template zone into a new sparseroot zone named dns, which  
> uses dnsvnic0, with the IP address 192.168.1.62.
> The zone booted and I was straight into this problem again. I
hadn''t
> been able to get my sparse-template zone to fault again, but  
> immediately after creating a new vnic/zone, I was back to having  
> this elusive yet frustrating issue.
>
> Just as a refresher, my Solaris server here is VM running under  
> VMware ESXi 3.5u3 (with all current patches). An extra layer of  
> virtualisation does add extra questions, so I tried a ping test that  
> would be entirely internal to the ESX host.; pinging the global zone  
> from the non-global [dns] zone.
>
> Traffic test #1
>> From within the dns zone:
> bash-3.2# ping 192.168.1.60
> no answer from 192.168.1.60
So what is 192.168.1.60? I guess it''s the global zone, but e1000g0 or  
e1000g1?

If it''s e1000g0 but dnsvnic0 is created on e1000g1 there will be no  
virtual switching between these data-links.
>
> bash-3.2# arp -an
> Net to Media Table: IPv4
> Device   IP Address               Mask      Flags      Phys Addr
> ------ -------------------- --------------- -------- ---------------
> dnsvnic0 192.168.1.61         255.255.255.255 o        02:08:20:be: 
> 66:8e
> dnsvnic0 192.168.1.60         255.255.255.255          00:0c: 
> 29:60:4e:c2
> dnsvnic0 192.168.1.62         255.255.255.255 SPLA     02:08:20:ff: 
> 77:4f
> dnsvnic0 192.168.1.133        255.255.255.255 o        00:15:f2:1d: 
> 48:c2
> dnsvnic0 224.0.0.0            240.0.0.0       SM       01:00:5e: 
> 00:00:00
> Arp packets *are* returning. ICMP however are *not*.
>
> snoop from the global zone on the e1000g1 interface (which the vnic  
> is running on):
> # snoop -d e1000g1 arp or icmp
> Using device e1000g1 (promiscuous mode)
> 192.168.1.62 -> (broadcast)  ARP C Who is 192.168.1.60, persephone ?
>  persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is 0:c: 
> 29:60:4e:c2
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 0)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 1)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 2)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 3)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 4)
> (and so on...)
>
> # snoop -d e1000g0 arp or icmp (which only the global zone is using)
> Using device e1000g0 (promiscuous mode)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 0)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 0)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 1)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 1)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 2)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 2)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 3)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 3)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 4)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 4)
>
> So the global zone is replying to the non-global zone,
''dns'' just
> isn''t seeing the replies.
> This is sounding a lot like a weird vswitch bug.
No necessarily. It depends on how you wired your NICs. If e1000g0 and  
e1000g1 are connected to the same switch, then the packet can go from  
dnsvnic0->e1000g1->switch->e1000g0->global zone. You may not see the
reply come back to dnsvnic0 via global_zone->e1000g0->switch->e1000g1  
due to the same problem you described initially with unicast packets  
not making it to the VNIC in the VMware VM.
>
>
> Next I decided to try zone-to-zone traffic.:
> Server - vnic - IP
> Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61
> DNS - dnsvnic0 (via e1000g1) - 192.168.1.62
>
> This worked... DNS could ping Zone-template.
Because in this case you are going through the virtual switch.
>
> What''s really surprised my was that that my snoop on e1000g1 was  
> showing the traffic. It was my understanding that vnic-to-vnic  
> traffic that''s attached to the same pnic never actually went
across
> the wire, so why is snoop on a physical interface showing vnic <>  
> vnic traffic ?
That''s done by design to allow the global zone/dom0 see all traffic  
exchanged between the VMs/Zones. It''s similar to a monitoring port on  
a physical switch.
>
>
> A) Something in crossbow isn''t working properly.
> B) I''m misunderstanding how vnics talk to each other. I understand
> etherstubs, but it just makes sense that inter-zone traffic  
> shouldn''t be sending traffic down a bottleneck like a pNIC when
it''s
> all *internal* anyway.
> C) The traffic isn''t actually going out the physical interface  
> across the wire, but it is going via the logical concept of the  
> e1000g1 interface, which snoop is reporting on - which is rather  
> confusing to an end user like me trying to diagnose this using  
> snoop :(
>
> Can anyone clarify this one for me?
>
> The WTF moment of the night was this:
> vSwitches security in ESX is configured like this by default:
> Promiscuous Mode: Disabled
> MAC Address Changes: Accept
> Forged Transmits: Accept
>
> These sound like reasonable defaults to me, toggling the Promiscuous  
> flag to my understanding would pretty much turn the vSwitch into a  
> "vHub"!
>
> I left a [non-returning] ping running between dns and the global  
> zone, and decided to try enabling Promiscuous mode anyway.
> No change.
>
> I started a snoop up on e1000g1, and suddenly the sparse-template <>
> dns ping that I started in another terminal moments ago started  
> working. I disabled the snoop, and it stopped working again.
>
> !!!?
>
> Enabling the promiscuous flag on the e1000g1 driver is suddenly  
> "fixing" my traffic problem.
>
> My best interpretation of this data is that 1 of 3 things isn''t  
> working, and I''m starting to get out of my depth here fast.
>
> A) Crossbow itself is doing something ''funny'' with the
way traffic
> is being passed on to the vswitch, which is causing it to not send  
> traffic for this mac address down the correct virtual port on the  
> switch. Arp spoofing is common enough and both of those options are  
> already enabled so it''s something else which is causing it to get
> confused it would seem. Sadly there isn''t any interface to the  
> vSwitch that I''m aware of to pull some stats/logs from.
> Funny promiscous ARPs? sending traffic down both pnics? something  
> else to confuse the vswitch? I''m out of skills to troubleshoot
this
> option any further.
>
> B) The vSwitch in ESXi has a bug. If so, why is it only effecting  
> crossbow... ESX is very widely used so if there was a glaring bug in  
> the vSwitch ethernet implementation it would be very common and  
> public knowledge. Crossbow is new enough; is it possible that I''m
> the first to have tried this configuration under ESX and thus am the  
> first to notice this issue?
> There aren''t any other options within ESX that I''m aware
of that I
> can try to get some further data on the vSwitch itself, so I''m at
a
> loss as to how I troubleshoot this one further.
> I''m also just using the free ESXi, so I can''t contact
VMware for
> support on this and at this point it would be a pretty vauge bug  
> report anyway :/
>
> C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug  
> in it, or the solaris e1000g driver has a bug when sending crossbow  
> traffic across it (or a combination of the two).
> The intel pro 1000 is a very common server NIC, and I''d be  
> gobsmacked if there was a bug with a real (non-virtual) e1000g  
> adapter that the Sun folk hadn''t picked up in their prerelease  
> testing.
>
> The only option for vNICs within ESX, for a 64-bit solaris host, is  
> the e1000 NIC. I trying to setup a 32-bit host to see what NIC that  
> ends up with. If this provides different result, that at least gives  
> us some better information on where to start looking!
>
> Any further directions or feedback would be most welcome. If I''m  
> heading in the wrong direction, please do tell me :)
I have a theory.

When you create a VNIC, Crossbow will try to associate the unicast MAC  
address with the NIC. Most NICs have hardware unicast filters which  
allow traffic for multiple unicast addresses to be received without  
turning the NIC in promiscuous mode. e1000g provides multiple such  
slots for unicast addresses.

What could be happening is that e1000g running in the VM happily  
allows Crossbow to program the unicast address for the VNIC address,  
but the VMware back-end driver or virtual switch doesn''t know about  
that address. So all broadcast and multicast packets are going in and  
out as expected, all traffic from the VNIC are going out without a  
problem, but when unicast packets are coming back for the unicast  
address of the VNIC, they never make it to the VM.

If you simply enable promiscuous mode on the VMware virtual switch,  
then it will take these packets, but the back-end driver instance  
associated with e1000g might still filter out these packets by default  
and dropping them. In order to see the packets you have to turn on  
promiscuous mode on e1000g1 itself which probably causes the VMWare  
back-end to send all packets up.

If this theory is correct, what would help is allow the VMware back- 
end to send up all packets received from the VMware virtual switch  
without filtering. But I don''t know if VMware provides that option.

Nicolas.
>
>
> Jonathan
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Jonathan Wheeler

2009-Jan-16 17:15 UTC

head link

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

Nicolas Droux wrote:> Hi Jonathan,Hi Nicolas, thanks so much for your input. I''m a LOT closer to
understanding what''s going on here now.

What follows is another very long email I''m sorry. This is a full
day''s research condensed in the the shortest email I could manage
without fear of leaving out anything important!
>> Just as a refresher, my Solaris server here is VM running under VMware 
>> ESXi 3.5u3 (with all current patches). An extra layer of 
>> virtualisation does add extra questions, so I tried a ping test that 
>> would be entirely internal to the ESX host.; pinging the global zone 
>> from the non-global [dns] zone.
>>
>> Traffic test #1
>>> From within the dns zone:
>> bash-3.2# ping 192.168.1.60
>> no answer from 192.168.1.60
> 
> So what is 192.168.1.60? I guess it''s the global zone, but e1000g0
or
> e1000g1?Yes it''s the global zone, which is running on e1000g0. The zone was
running on e1000g1.
> If it''s e1000g0 but dnsvnic0 is created on e1000g1 there will be
no
> virtual switching between these data-links.Ok, thanks for clearing that up for me. I''m still getting my head
around difference of a shared kernel, but non-shared network stacks.

The point that I was trying to make with this test was that traffic
wasn''t going over any physical links. Unfortunately we have 2 levels of
virtualisation going on here (ESX & Crossbow) which makes the terminologies
that little bit harder to visualise.

In this case the traffic was leaving the zone and going over the
"wire" to talk to the global zone. That "wire" is a VMware
vSwitch so the network traffic in this case was entirely self contained within
the ESX server. The actual physical NIC in the physical server wasn''t
used, which allowed me to rule that as a cause of this issue, along with any
physical network switches :)
>> So the global zone is replying to the non-global zone,
''dns'' just
>> isn''t seeing the replies.
>> This is sounding a lot like a weird vswitch bug.
> 
> No necessarily. It depends on how you wired your NICs. If e1000g0 and 
> e1000g1 are connected to the same switch,Yeah, they are.
> then the packet can go from 
> dnsvnic0->e1000g1->switch->e1000g0->global zone.That''s right. A "vSwitch" in this case though.
> You may not see the 
> reply come back to dnsvnic0 via
global_zone->e1000g0->switch->e1000g1
> due to the same problem you described initially with unicast packets not 
> making it to the VNIC in the VMware VM.Well it _should_ be working this way, it''s frustrating that this
isn''t happening. Where else would it go?
 >> Next I decided to try zone-to-zone traffic.:
>> Server - vnic - IP
>> Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61
>> DNS - dnsvnic0 (via e1000g1) - 192.168.1.62
>>
>> This worked... DNS could ping Zone-template.
> 
> Because in this case you are going through the virtual switch.I expected that it would, but it''s always encouraging to actually see a
successful test for a change!
Now when you say "virtual switch", this time we''re talking
about the crossbow internal switch and not the VMware vSwitch. I just wanted to
point that out for the sake of clarity as we keep digging deeper into this.
>> What really surprised me was that that my snoop on e1000g1 was 
>> showing the traffic. It was my understanding that vnic-to-vnic traffic 
>> that''s attached to the same pnic never actually went across
the wire,
>> so why is snoop on a physical interface showing vnic <> vnic
traffic ?
> 
> That''s done by design to allow the global zone/dom0 see all
traffic
> exchanged between the VMs/Zones. It''s similar to a monitoring port
on a
> physical switch.Ah, thanks for clearing that one up :)
>> A) Something in crossbow isn''t working properly.
>> B) I''m misunderstanding how vnics talk to each other. I
understand
>> etherstubs, but it just makes sense that inter-zone traffic
shouldn''t
>> be sending traffic down a bottleneck like a pNIC when it''s all
>> *internal* anyway.
>> C) The traffic isn''t actually going out the physical interface
across
>> the wire, but it is going via the logical concept of the e1000g1 
>> interface, which snoop is reporting on - which is rather confusing to 
>> an end user like me trying to diagnose this using snoop :(
>>
>> Can anyone clarify this one for me?
Based on your previous comment above, you''re saying that the answer is
C)?

So just to confirm that point, as it''s pretty crucial that I understand
this distinction correctly; "snoop -d e1000g1" is showing traffic that
_isn''t_ actually going across the ''wire'' on that
''physical'' interface, but rather traffic that is passing
"internally, *behind* the physical interface" - to make observability
easier for administrators from the global zone.
If I were able to watch the switch port that e1000g1 was plugged into,
I''d see no packets doing a return loop?
>> The WTF moment of the night was this:
>> vSwitches security in ESX is configured like this by default:
>> Promiscuous Mode: Disabled
>> MAC Address Changes: Accept
>> Forged Transmits: Accept
>>
>> These sound like reasonable defaults to me, toggling the Promiscuous 
>> flag to my understanding would pretty much turn the vSwitch into a 
>> "vHub"!
>>
>> I left a [non-returning] ping running between dns and the global zone, 
>> and decided to try enabling Promiscuous mode anyway.
>> No change.
>>
>> I started a snoop up on e1000g1, and suddenly the sparse-template
<>
>> dns ping that I started in another terminal moments ago started 
>> working. I disabled the snoop, and it stopped working again.
>>
>> !!!?
>>
>> Enabling the promiscuous flag on the e1000g1 driver is suddenly 
>> "fixing" my traffic problem.
>>
>> My best interpretation of this data is that 1 of 3 things
isn''t
>> working, and I''m starting to get out of my depth here fast.
>>
>> A) Crossbow itself is doing something ''funny'' with
the way traffic is
>> being passed on to the vswitch, which is causing it to not send 
>> traffic for this mac address down the correct virtual port on the 
>> switch. Arp spoofing is common enough and both of those options are 
>> already enabled so it''s something else which is causing it to
get
>> confused it would seem. Sadly there isn''t any interface to the
vSwitch
>> that I''m aware of to pull some stats/logs from.
>> Funny promiscous ARPs? sending traffic down both pnics? something else 
>> to confuse the vswitch? I''m out of skills to troubleshoot this
option
>> any further.
>>
>> B) The vSwitch in ESXi has a bug. If so, why is it only effecting 
>> crossbow... ESX is very widely used so if there was a glaring bug in 
>> the vSwitch ethernet implementation it would be very common and public 
>> knowledge. Crossbow is new enough; is it possible that I''m the
first
>> to have tried this configuration under ESX and thus am the first to 
>> notice this issue?
>> There aren''t any other options within ESX that I''m
aware of that I can
>> try to get some further data on the vSwitch itself, so I''m at
a loss
>> as to how I troubleshoot this one further.
>> I''m also just using the free ESXi, so I can''t contact
VMware for
>> support on this and at this point it would be a pretty vauge bug 
>> report anyway :/
>>
>> C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug in 
>> it, or the solaris e1000g driver has a bug when sending crossbow 
>> traffic across it (or a combination of the two).
>> The intel pro 1000 is a very common server NIC, and I''d be
gobsmacked
>> if there was a bug with a real (non-virtual) e1000g adapter that the 
>> Sun folk hadn''t picked up in their prerelease testing.
>>
>> The only option for vNICs within ESX, for a 64-bit solaris host, is 
>> the e1000 NIC. I trying to setup a 32-bit host to see what NIC that 
>> ends up with. If this provides different result, that at least gives 
>> us some better information on where to start looking!
>>
>> Any further directions or feedback would be most welcome. If
I''m
>> heading in the wrong direction, please do tell me :)
> 
> I have a theory.
> 
> When you create a VNIC, Crossbow will try to associate the unicast MAC 
> address with the NIC. Most NICs have hardware unicast filters which 
> allow traffic for multiple unicast addresses to be received without 
> turning the NIC in promiscuous mode. e1000g provides multiple such slots 
> for unicast addresses.
I didn''t realise that. I must have fallen behind a bit on modern
network card technology. I take it that they is a performance penalty when
running in promiscuous mode to handle multiple mac addresses as the filtering is
no longer done in hardware by the NIC itself?
> What could be happening is that e1000g running in the VM happily allows 
> Crossbow to program the unicast address for the VNIC address, but the 
> VMware back-end driver or virtual switch doesn''t know about that 
> address. So all broadcast and multicast packets are going in and out as 
> expected, all traffic from the VNIC are going out without a problem, but 
> when unicast packets are coming back for the unicast address of the 
> VNIC, they never make it to the VM.
That makes a lot of sense, and I think you''re quite correct about that.
It''s either that or ESX is getting upset with promiscuous being enabled
on the NIC and as a security precaution it''s not allowing the traffic
to be delivered to the virtual NIC in the VM. (Explored further down this email)

I''ve only experienced these weird issues while using crossbow but if
the above is true than this is not a crossbow problem per se all it;
it''s simply that crossbow is adding mac addresses to the [VMware]
e1000g card (or enabling promiscuous mode) which is causing a problem at some
layer within ESX, and there haven''t been any other networking scenarios
in which this would have happened prior to crossbow. (Maybe network teaming
though this is not generally done *within* a VM, there is little-to-no point!).

If this is the heart of the issue, then I should be able to replicate this
without needing to use a zone at all, provided I can setup crossbow in the
global zone in such a way that it uses different mac addresses depending on the
destination.... Now that I think about this, I think I did hit this when I
started off with just the 1 NIC in the VM. I moved to a second e1000, seperating
the global/zone traffic as a sanity check quite early on.... hrm.
> If you simply enable promiscuous mode on the VMware virtual switch, then 
> it will take these packets, but the back-end driver instance associated 
> with e1000g might still filter out these packets by default and dropping 
> them. In order to see the packets you have to turn on promiscuous mode 
> on e1000g1 itself which probably causes the VMWare back-end to send all 
> packets up.Agreed.

VMware ESX provides some granularity when it comes to setting promiscuous
options.
It can be set globally on the whole switch, or at a "port group"
level, though I don''t see anywhere to toggle it on a vNIC or per VM
basis.

Port groups are an administrative abstraction of a group of ports on a specific
vSwitch, a bit like a VLANs but without network level tagging (though they can
be used to enable/setup VLANs too).

I have ALL virtual machines running off 1 vSwitch so enabling promiscuous mode
on the vSwitch (for all VMs) just to get my zone server working with crossbow
isn''t an attractive option. Making a dedicated *promiscuous-on* port
group that only contains this one solaris server may work better though.
> If this theory is correct, what would help is allow the VMware back-end 
> to send up all packets received from the VMware virtual switch without 
> filtering. But I don''t know if VMware provides that option.I think that is what a port group will allow me to do, however by itself
remember that this didn''t fix the problem. I had to have the VMs nic in
promiscuous mode too for traffic to flow correctly.

I was doing this (accidentally at the time) by running snoop.
Is there a better way to enable promiscuous mode on an interface within Solaris
permanently? All I could dig up with google was this:
http://www.kernelfaq.com/2008/04/enabling-and-disabling-promiscuous-mode.html

Mac Filtering.
Going back to what you said earlier about the e1000g driver handling multiple
unicast macs concurrently in hardware; in my googling I''ve discovered
that not all e1000 NICs support this feature.

*Is there a way to tell if the VMware emulated e1000 is advertising this feature
in the ''hardware'' to the guest?

*Is there a way to tell if crossbow is making use of it rather than falling back
to the "less fancy" promiscuous mode instead? This would be most
valuable to better understand what we''re seeing here!

dladm show-linkprop isn''t showing my anything. I guess we''re
not quite there yet? http://markmail.org/message/qiqygyqxt5t6qp5b

My current working theory is this:
*vSwitch layer*
VMware ESX knows exactly which vSwitch ports are connected to a physical NIC
uplinking the vSwitch to the physical world and which ports are connected to
NICs within VMs.
The vSwitch "host" ports should only ever have a single MAC address on
them at any given time as they''re directly connected to a single NIC
and it enforces this limit as a security measure. This would prevent mac
spoofing attacks for example.

Recall that by default within a vSwitch "MAC Address Changes" are
allowed, as are "Forged Transmits", which strongly hints at the
behaviour that I''m theorising.

*NIC layer*
I''m expecting that the VMware provided emulated e1000 NIC has no
concept of MAC address slots on the vSwitch end - given the behaviour of 1 MAC
address per port at the vSwitch level, why would it ever need to support
multiple MACs?
Within the VM however, crossbow is detecting an e1000 pNIC that does support
multiple MACs and it''s making use of these slots for the
VNIC''s MACs as they get added, rather than toggling promiscuous mode on
the e1000g.

**Outbound traffic**
ESX is allowing the "forged transmits" from VNIC''s additional
MAC address, and broadcasts/multicasts are being passed through both the vSwitch
and the e1000 correctly.

**Inbound traffic**
*vSwitch layer*
ESX knows which MAC address the e1000 has within the guest and it will have this
entered into it''s MAC forwarding table for the port that the VM is
connected to. Exactly what it''s doing with the VNICs MAC that is being
broadcast around as ARP requests... I have no idea.

Enabling promiscuous mode at the vSwitch level bypasses/disables the MAC
forwarding table so now frames with the VNICs MAC are getting to the right
switch port. This functionality alone still doesn''t fix the problem
because:

*NIC layer*
The ESX end of the e1000 NIC only knows about the primary MAC address of the NIC
so it''s not passing frames addressed to the VNIC''s MAC address
into the VM guest''s end of the e1000 for further processing by
crossbow.

When snoop is started, the interface is set to promiscuous mode in the guest and
this is being trapped by the ESX end of the e1000, which is also enabling
promiscuous mode on it''s end.
With all frames finally now passing into the guest end of the e1000, crossbow
can do it''s job and everything starts working!

Phew!

I''m having to theorise much of the ESX behaviour as there is simply no
way to get the information I need from ESX itself, but this model all seems to
fit pretty well, don''t you think?

Way forward:
I can focus on testing the promiscuous mode behaviour on the vSwitch port group
which may lead to a tidy work around at that level.
At the NIC level if my theory is correct it would seem that I really need a way
to make crossbow enable promiscuous mode on the NIC rather than adding a
"hardware based MAC filter" to the e1000 as it doesn''t seem
that this is going to work in a VMware ESX environment.
> Nicolas.Jonathan
-- 
This message posted from opensolaris.org

Jonathan Wheeler

2009-Jan-19 22:54 UTC

head link

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

Hi Folks,

I''ve spent another couple of days banging away on this and since there
haven''t been any other updates to this thread I thought I may as well
share my latest findings in the hopes that this will better enable someone to
help me with this, or simply to steer others in the right direction.

I''ve had some good success when running solaris in 32-bit mode, which
in turn uses the pcn or vmxnet0 drivers rather than the e1000g.

Just quickly on the vSwitch side for the futures archives:
VMware vSwitch port groups do work as I had hoped; working around that layer of
this problem for all 3 NICs.

I am able to enable promiscuous mode on a port group only, rather than the
entire vSwitch. I have put the crossbow NIC of my VM into this port group all by
itself, while the rest of the nics & VMs sit on the main port group on the
vSwitch.
>From a security standpoint this does however mean that my global zone can
see all traffic (from all VMs) going across the vSwitch by snooping the second
dedicated "zones NIC"....which is definitely not ideal at all.
I think it''s about as good as it''s going to get for me though
and I''m the only person with root access in the global zone, so
it''s going to have to do.
The non-global zones are limited to their own vnic specific traffic still, as
you would expect, so no security risk there.

With the e1000g I still have the issue of needing to enable promiscuous mode on
either the vnic or pnic for traffic to flow to the vnics when there are more
than 1 vnics running on a physical interface...

I have noticed that it doesn''t even matter if the vnic is in use. The
simple act of "dladm create-vnic [....]"ing a second vnic will
effective cut the network off for another zone which is using *it''s*
own vnic...
>From what I''ve been able to obversve so far, it is only the most
recently created vnic that gets to pass traffic.The alphabetical naming and mac address order don''t seem to matter;
it''s simply the newest VNIC to be created on an interface is the
effective "active" one :(

On the ESXi host, I''m able to see the following in the vmware logs when
adding a vnic:
~ # tail -f /var/log/messages | grep kernel
Jan 20 06:10:10 vmkernel: 0:23:08:32.777 cpu3:1579)Net: 229: 0x400000b: peer not
allowed promiscuous, revoking setting
Jan 20 06:11:28 vmkernel: 0:23:09:50.457 cpu2:8002)Net: 229: 0x400000b: peer not
allowed promiscuous, revoking setting
Jan 20 06:14:54 vmkernel: 0:23:13:16.790 cpu2:8002)Net: 229: 0x400000b: peer not
allowed promiscuous, revoking setting
Jan 20 06:15:26 vmkernel: 0:23:13:48.260 cpu0:1420)Net: 229: 0x400000b: peer not
allowed promiscuous, revoking setting
Jan 20 06:35:42 vmkernel: 0:23:34:04.042 cpu1:8003)Net: 229: 0x400000b: peer not
allowed promiscuous, revoking setting

Maybe the act of adding a vnic is pushing the new mac address into the VMware
e1000, and since it doesn''t [appear to] support multiple MAC addresses
at once, it''s just retaining the most recent MAC address.
Seems...logical :)

I''m not quite sure why the reverse process works however.
When I remove a VNIC, the next most recently created VNIC starts working again.
I''m sure the answer is in the code somewhere, but I''m not a
developer that can read code :/

*PCnet/vmxnet
With 64-bit Solaris, e1000 is the only network card that you can use (with or
without VMware-tools), hence my continued efforts to find a way to get this
working!

I reconfigured VMware to treat my opensolaris VM as 32-bit, replaced both VMware
virtual NICs with the "flexible" virtual NICs and booted opensolaris
32-bit mode by changing the grub boot flags.

In 32-bit mode VMware emulates PCnet NICs to solaris, which show up under
ifconfig as pcn0/1.
After swapping all the relevant configuration across... it worked.
And I mean everything REALLY worked. Multiple zones/vnics all running at once.

The PCnet driver is a fairly poor performer, which is why VMware provide the
optimised vmxnet driver with the vmware tools instead.
I stuffed around until I managed to to get the optimised 32-bit vmxnet driver to
drive the "pcn" hardware instead. When using this NIC driver, the
following line is logged in the ESX syslog at certain points:

Jan 20 11:23:33 vmkernel: 1:04:21:55.671 cpu1:379945)Net: 4222: unicastAddr
00:50:56:ac:14:6c;

That MAC address is the real address that VMware assigned to the emulated vmxnet
driver, not the crossbow ones.

The vmxnet driver also works perfectly. Multiple VNICs can be active at the same
time, without the need for any dodgy snooping as was the case with the emulated
e1000.

So for now, I''ve resolved myself to running my server in 32-bit mode as
the best compromise. I really want the crossbow functionality for my zones and
for the moment it would seem that the only way to get it under ESX is to run in
32-bit mode :/

Suggestions welcome.

Jonathan
--
This message posted from opensolaris.org

Jonathan Wheeler

2009-Feb-02 02:14 UTC

head link

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

I was rather hoping that there might be an update on this?

I spent considerable time researching this problem and while I now have
sufficient work arounds in place to have a usable (a regrettably forced 32-bit)
system, I remain concerned that this problem isn''t unique to my setup.
>From what I''ve seen to date, anyone that tries to use opensolaris
under VMware esx on 64-bit hardware (the majority now in 2009!), will be unable
to use crossbow.
That''s quite a biggie don''t you agree?
-- 
This message posted from opensolaris.org

Nicolas Droux

2009-Feb-05 05:56 UTC

head link

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

On Feb 1, 2009, at 7:14 PM, Jonathan Wheeler wrote:
> I was rather hoping that there might be an update on this?
>
> I spent considerable time researching this problem and while I now  
> have sufficient work arounds in place to have a usable (a  
> regrettably forced 32-bit) system, I remain concerned that this  
> problem isn''t unique to my setup.
>
>> From what I''ve seen to date, anyone that tries to use
opensolaris
>> under VMware esx on 64-bit hardware (the majority now in 2009!),  
>> will be unable to use crossbow.
>
> That''s quite a biggie don''t you agree?
Jonathan,

This is not a Crossbow issue. There isn''t much we can do from Crossbow
running in a guest if the packets are not even passed up to the  
emulated e1000g in that guest. VMware should allow you you to pass  
these unicast packets up to the VM and put the underlying physical NIC  
in promiscuous mode.

There no easy way to workaround this problem from the guest beside  
putting the virtual e1000g NIC in promiscuous mode, using snoop for  
example. For the long term we were planning to allow a user to specify  
from dladm(1M) that a VNIC should not be using a hardware unicast  
slot, which would as a side effect put the underlying NIC in  
promiscuous mode and help in your case.

For the short term you may want to try a different VM host. For  
example I''ve tried this from VirtualBox in the past and it works fine  
with an emulated e1000g.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Jonathan Wheeler

2009-Feb-05 09:46 UTC

head link

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

> On Feb 1, 2009, at 7:14 PM, Jonathan Wheeler wrote:
> 
>> ....
> >
> >> From what I''ve seen to date, anyone that tries to use
opensolaris
> >> under VMware esx on 64-bit hardware (the majority now in 2009!),  
> >> will be unable to use crossbow.
> >
> > That''s quite a biggie don''t you agree?
> 
> Jonathan,Hi Nicolas :)
> This is not a Crossbow issue. There isn''t much we can
> do from Crossbow  
> running in a guest if the packets are not even passed
> up to the  
> emulated e1000g in that guest.I quite agree that this point is obviously beyond crossbow''s control,
though at the same time it should be understood that it''s not really
VMware''s "fault" for locking down ports using mac level
security either.
In my searching to date the only other scenarios that typically runs into this
problem are IDS/IPS appliances, so it''s not very common out in the wild
at all.

The good news is that once an administrator *does* understand the nature of this
problem, or perhaps more accurately "VMware ESX''s default vSwitch
behaviour", vSwitch security can quite easily be reconfigured to allow for
the "crossbow-friendly" handling of promiscuous mode on the switch
port for the guest.
> VMware should allow
> you you to pass  
> these unicast packets up to the VMAnd it can, you just have to *know* to enable the promiscuous port mode with
port groups.
> and put the
> underlying physical NIC  
> in promiscuous mode.When using the e1000 nic, this one is the problematic bit.

VMware ESX will allow a guest to put it''s NIC into promiscuous mode as
we''ve seen with my infamous "snoop test", which in turn
behind the scenes sets the virtual port on the virtual switch to promiscuous
mode too. This "opens the floodgates" so to speak, allowing crossbow
to do it''s work.

This isn''t an issue when using a 32-bit solaris guest, which use the
pcn or vmxnic nics.
I gather the explanation here is that crossbow is setting the promiscuous flag
on the nic by default because these NICs don''t support hardware unicast
mac filters.
> There no easy way to workaround this problem from the
> guest beside  
> putting the virtual e1000g NIC in promiscuous mode,
> using snoop for  
> example. For the long term we were planning to allow
> a user to specify  
> from dladm(1M) that a VNIC should not be using a
> hardware unicast  
> slot, which would as a side effect put the underlying
> NIC in  
> promiscuous mode and help in your case.
Yes, that would be one surefire way solve this problem. Wooho! 

Even better, if there was a way to detect that the solaris instance is running
in a virtual environment, and therefore DON''T use a hardware unicast
slot.... problem solved dynamically!
I understand that this only helps to solve a problem with VMware ESX and not
other virtualisation products such as Vbox, however are the alternate
virtualisation platforms actually benefiting from using this NIC feature; they
are only simulating virtual hardware unicast slots anyway, so they must not be
hardware accelerated anyway?

This logic/codepath probably isn''t something for the crossbow team is
it?...
Would this be something more for the ON/e1000g driver developers to look into?
I don''t really know how or to whom I should be gently encouraging this
RFE :)
> For the short term you may want to try a different VM
> host. For  
> example I''ve tried this from VirtualBox in the past
> and it works fine  
> with an emulated e1000g.
I agree that is problem is so far unique to VMware ESX, however that is what my
production enviroment uses, so while it''s great to have confirmed that
this issue is only affecting VMware ESX.... well I, and any other future VMware
ESX users, still need a workaround :)

I''m hoping that this thread will provide a helpful reference for all
future users (hi google!) that hit this same problem, but better yet, would it
be possible to more formally document this now known limitation when using
crossbow in a VMware ESX enviroment, on the official project page''s
Network Virtualization and Resource Control (Crossbow) FAQ?

Jonathan
> Nicolas.
> 
> -- 
> Nicolas Droux - Solaris Kernel Networking - Sun
> Microsystems, Inc.
> droux at sun.com - http://blogs.sun.com/droux
> 
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-
> discuss-- 
This message posted from opensolaris.org

Nicolas Droux

2009-Feb-05 17:34 UTC

head link

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

Hi Jonathan,

On Feb 5, 2009, at 2:46 AM, Jonathan Wheeler wrote:> VMware ESX will allow a guest to put it''s NIC into promiscuous
mode
> as we''ve seen with my infamous "snoop test", which in
turn behind
> the scenes sets the virtual port on the virtual switch to  
> promiscuous mode too. This "opens the floodgates" so to speak,  
> allowing crossbow to do it''s work.
>
> This isn''t an issue when using a 32-bit solaris guest, which use
the
> pcn or vmxnic nics.
> I gather the explanation here is that crossbow is setting the  
> promiscuous flag on the nic by default because these NICs don''t  
> support hardware unicast mac filters.
Right, if these NICs don''t support multicast hardware slots, then  
we''ll put the NIC in promiscuous mode.
>> There no easy way to workaround this problem from the
>> guest beside
>> putting the virtual e1000g NIC in promiscuous mode,
>> using snoop for
>> example. For the long term we were planning to allow
>> a user to specify
>> from dladm(1M) that a VNIC should not be using a
>> hardware unicast
>> slot, which would as a side effect put the underlying
>> NIC in
>> promiscuous mode and help in your case.
>
> Yes, that would be one surefire way solve this problem. Wooho!
>
> Even better, if there was a way to detect that the solaris instance  
> is running in a virtual environment, and therefore DON''T use a  
> hardware unicast slot.... problem solved dynamically!
One of the main points of having an emulated e1000g device in some  
environments is that you can run an OS which doesn''t have a para- 
virtualized network driver, and not require the guest to be aware that  
it is running running as a guest.

If the environment provides the emulation of a virtual device such as  
e1000g, it should provide full emulation of that device. If it leaves  
some features out which can change the behavior of that device, it  
should clearly state so.
> I understand that this only helps to solve a problem with VMware ESX  
> and not other virtualisation products such as Vbox, however are the  
> alternate virtualisation platforms actually benefiting from using  
> this NIC feature; they are only simulating virtual hardware unicast  
> slots anyway, so they must not be hardware accelerated anyway?
The issue here is not hardware acceleration. The main issue is that  
the emulated e1000g device has a different behavior than the real  
thing from the point of view of the guest.
> This logic/codepath probably isn''t something for the crossbow team
> is it?...
> Would this be something more for the ON/e1000g driver developers to  
> look into?
> I don''t really know how or to whom I should be gently encouraging
> this RFE :)
We don''t want to cripple the behavior of a driver because of the  
limitations of a particular virtualization environment. So I don''t  
think having logic in e1000g which does some detection and starts  
disabling features based on whether it is working as a guest of a  
particular virtualization environment would be the right thing to do  
architecturally.

Another workaround would be to expose the number of unicast hardware  
slots as a read/write data-link property which could be set by the  
administrator. We wanted to expose the number of unicast slots through  
a read-only property anyway. By making this property writable, this  
would allow you to specify how many hardware slots should be used by  
the NIC. If you were to set that property to "1" (the minimum, at  
least one entry is needed for the primary address), this would have  
the effect of putting the NIC in promiscuous mode when creating a VNIC  
on top of that NIC.
> I''m hoping that this thread will provide a helpful reference for
all
> future users (hi google!) that hit this same problem, but better  
> yet, would it be possible to more formally document this now known  
> limitation when using crossbow in a VMware ESX enviroment, on the  
> official project page''s Network Virtualization and Resource
Control
> (Crossbow) FAQ?
Sure, we''ll add this to the FAQ.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Apparently Analagous Threads

Search for more reasonably related threads

crossbow discuss - Jan 2009 - dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] dedicated vnic IP zone not recieving unicast traffic

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

[crossbow-discuss] Ongoing VMware ESX vNIC issues.

Apparently Analagous Threads