Rand S. Huntzinger
2010-Jul-15  14:56 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
I have a observed a networking problem occurring largely between virtual hosts running on the same machine. I have observed the problem with both VirtualBox and xVM guests and it appears to affect multiple protocols (at least ssh and NFS). It appears that an application network connection gets clogged with transfers of data as small as 2K or so. For the purposes of testing, I''m using the command "ssh HOST cat /etc/passwd ... | wc" (where ... are additional copies of /etc/passwd). The /etc/passwd file is a little over 1K in size. I consider a test successful if a cat over 6000 bytes succeeds. Typically, the test fails with two copies of /etc/passwd (about 2400 bytes). I am not an expert in networking so I''m describing the problem at the application level. First, let me describe my environment. I am using three machines for testing all of which are Dell 1950''s with bnx ethernets. All are running OpenSolaris dev build 134. To make things simple, all virtual machines on these machines used for testing are running OpenSolaris dev build 134 as well. There are other virtual machines on the machines as well. The machines are as follows: H1 - 2 zones, 4 VirtualBox guests (2 tested). The zones and VirtualBox guests all have their own Crossbow VNIC. All VNIC''s share one bnx network interface. H2 - 5 xVM domU''s (2 tested). It is my understanding the xVM uses Crossbow VNIC''s on their domU''s. Again all of these share a single bnx NIC. H3 - configured like H2 but we''re only using the dom0 on this host as an external host during this testing. Virtual hosts: Z1 and Z2 are ipkg zones on H1. V1 and V2 are VirtualBox guests on H1. X1 and X1 are xVM domU''s on H2. What fallows is a matrix which shows the results of my tests: H1 <-> Z1 OK H1 <-> Z2 OK Z1 <-> Z2 OK H1 <-> V1 OK H1 <-> V2 OK V1 <-> V2 Fail H1 <-> H3 OK V1 -> H3 OK VI <- H3 Fail V2 -> H3 OK V2 <- H3 Fail H2 <-> X1 OK H2 <-> X2 OK X1 <-> X2 Fail H3 <-> X1 OK H3 <-> X2 OK It appears that communicaton between zones on one machine is OK and communication between separate physical hosts is OK except for ssh''s into VirtualBox hosts. The problem occurs between VirtualBox guests on one host and xVM domU''s on a single host. Symptoms: "ssh HOST cat /etc/passwd | wc" will uniformly succeed and print size info. "ssh HOST cat /etc/passwd /etc/passwd | wc" hangs on transfers maked "Fail." I have also ovserved NFS mount and trnasfer problems between Xen domU''s. --- The networking between xVM and VirtualBox virtual hosts using Crossbow VNIC''s for networking appears to be problematic. I''m not even sure I''d be able to identify this problem from a bug report if I saw it. Does anyone recognize this problem? Is there a fix or a workaround? -- This message posted from opensolaris.org
James Carlson
2010-Jul-15  15:42 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
Rand S. Huntzinger wrote:> I have a observed a networking problem occurring largely between virtual hosts running on the same machine. I have observed the problem with both VirtualBox and xVM guests and it appears to affect multiple protocols (at least ssh and NFS). It appears that an application network connection gets clogged with transfers of data as small as 2K or so. For the purposes of testing, I''m using the command "ssh HOST cat /etc/passwd ... | wc" (where ... are additional copies of /etc/passwd). The /etc/passwd file is a little over 1K in size. I consider a test successful if a cat over 6000 bytes succeeds. Typically, the test fails with two copies of /etc/passwd (about 2400 bytes). I am not an expert in networking so I''m describing the problem at the application level.That sounds a lot like a bad MTU configuration. A fundamental requirement for communicating on a shared medium like Ethernet (whether real or virtual) is that all of the attached nodes have the same MTU configured. If they don''t, then the sorts of hangs you''re seeing are the result as large packets fail to get through. (More precisely: the MTU of any node connected to a shared medium cannot be larger than the MRU of any other node on that same medium without special provision to make sure that the largest frame sent to the restricted nodes is below their MRU. But since Ethernet implementations rarely allow separate configuration of the MRU and MTU, and since special mechanisms to restrict MRU to individual hosts are both rare and hard to manage, it devolves to simply "use same MTU everywhere.") If I''m right on that, then I would expect that the "fail" cases also fail if you do "ping -sn HOST 2400". What does "netstat -ni" say on each of these hosts? (It''s also possible, but less likly, that some sort of LSO-type MTU fudging is causing the problem. I would expect a case like that to leave ''ping'' intact but cause TCP to fail.) -- James Carlson 42.703N 71.076W <carlsonj at workingcode.com>
Rand S. Huntzinger
2010-Jul-15  17:55 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
> That sounds a lot like a bad MTU configuration. A > fundamental > requirement for communicating on a shared medium like > Ethernet (whether > real or virtual) is that all of the attached nodes > have the same MTU > configured. If they don''t, then the sorts of hangs > you''re seeing are > the result as large packets fail to get through. > > (More precisely: the MTU of any node connected to a > shared medium cannot > be larger than the MRU of any other node on that same > medium without > special provision to make sure that the largest frame > sent to the > restricted nodes is below their MRU. But since > Ethernet implementations > rarely allow separate configuration of the MRU and > MTU, and since > special mechanisms to restrict MRU to individual > hosts are both rare and > hard to manage, it devolves to simply "use same MTU > everywhere.") > > If I''m right on that, then I would expect that the > "fail" cases also > fail if you do "ping -sn HOST 2400".Thank you so much for the help. You are certainly on the right track. On the Fail connections the above command does hang. I titered the packet size and found that the largest ping size that would work is 1468 which I assume is ethernet MTU (1500) minus a few overhead bytes. The netstat -ni on the two VirtualBox hosts show: indy$ netstat -ni Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 127.0.0.0 127.0.0.1 20160 0 20160 0 0 0 e1000g0 1500 10.10.102.0 10.10.102.105 422269 0 113615 0 0 0 Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis lo0 8252 ::1 ::1 20160 0 20160 0 0 e1000g0 1500 fe80::8:20ff:fef5:70d0/10 fe80::8:20ff:fef5:70d0 422271 0 113616 0 0 indy$ -- and -- nevada$ netstat -ni Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 127.0.0.0 127.0.0.1 20026 0 20026 0 0 0 e1000g0 1500 10.10.102.0 10.10.102.103 417630 0 119392 0 0 0 Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis lo0 8252 ::1 ::1 20026 0 20026 0 0 e1000g0 1500 fe80::8:20ff:fea3:c813/10 fe80::8:20ff:fea3:c813 417632 0 119395 0 0 nevada$ -- Now for something you didn''t request. The underlying machine has the following netstat -ni. hobbit$ netstat -ni Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 127.0.0.0 127.0.0.1 556534060 0 556534060 0 0 0 bnx0 1496 10.10.102.0 10.10.102.130 1177212916 0 3754428493 0 0 0 vboxnet0 1500 192.168.56.0 192.168.56.1 0 0 4952 0 0 0 Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis lo0 8252 ::1 ::1 556534060 0 556534060 0 0 hobbit$ --- Notice that the MTU on bnx0 is 1496 (not 1500). The same thing is true for xVM hosts. The underlying bnx0 interface has a 4 byte smaller MTU than the xnf0 interfaces in the domU''s. Therefore, I wouldn''t be a bit surprised if this is what is clogging the plumbing. Let''s see. Yes, if I use ifconfig to lower the MTU inside the VB guests the ping starts working again. This would appear to be a bug but it isn''t clear to me who''s bug it is. Is this something which should be fixed by the virtualization tools or in the underlying network (crossbow?). Should the virtual switch between the crossbow interfaces be resolving this? Again, I''m no networking guru. What would be the best way to persistently set a reduce MTU in the virtual hosts as a workaround? I''m assuming the MTU on bnx0 is set down 4 bytes from the normal 1500 for some reason. -- Rand Huntzinger -- This message posted from opensolaris.org
James Carlson
2010-Jul-15  18:20 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
Rand S. Huntzinger wrote:> Thank you so much for the help. You are certainly on the right track. On the Fail connections the above command does hang. I titered the packet size and found that the largest ping size that would work is 1468 which I assume is ethernet MTU (1500) minus a few overhead bytes.1468 + ICMP header 8 + IP 20 = 1496 octets in the IP datagram that will fit on the wire.> -- Now for something you didn''t request. The underlying machine has the following netstat -ni. > > hobbit$ netstat -ni > Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue > lo0 8232 127.0.0.0 127.0.0.1 556534060 0 556534060 0 0 0 > bnx0 1496 10.10.102.0 10.10.102.130 1177212916 0 3754428493 0 0 0 > vboxnet0 1500 192.168.56.0 192.168.56.1 0 0 4952 0 0 0That seems consistent to me.> --- Notice that the MTU on bnx0 is 1496 (not 1500). The same thing is true for xVM hosts. The underlying bnx0 interface has a 4 byte smaller MTU than the xnf0 interfaces in the domU''s. Therefore, I wouldn''t be a bit surprised if this is what is clogging the plumbing. Let''s see. Yes, if I use ifconfig to lower the MTU inside the VB guests the ping starts working again.I''m not too familiar with "bnx," but why would it be set like that?> This would appear to be a bug but it isn''t clear to me who''s bug it is. Is this something which should be fixed by the virtualization tools or in the underlying network (crossbow?). Should the virtual switch between the crossbow interfaces be resolving this? Again, I''m no networking guru.I would call it the system administrator''s fault. The system is doing what it''s being told to do, though it''s clearly not what you want. If this were a real physical network comprising multiple independent nodes rather than virtual ones on a single box, there would be no good way (other than something exotic like LLDP) to detect this sort of misconfiguration. It would fail just like you''re seeing now. But, yeah, it''d be nice if these virtual networking devices had enhanced abilities to detect configuration errors -- at least the ones that could in theory be detected locally -- and it''d also be nice if some of those drivers were more liberal in what they accept (i.e., having your MTU forced down to 1496 doesn''t mean that you need to discard a 1500 octet packet if you receive one). I''m just not sure I''d call it necessarily a bug.> What would be the best way to persistently set a reduce MTU in the virtual hosts as a workaround? I''m assuming the MTU on bnx0 is set down 4 bytes from the normal 1500 for some reason.I dunno what''s up with bnx0 (quasi VLANish hackery?). That''s the one thing here that does look like a "bug." Assuming that interface can''t just be fixed, I would expect that you should be able to set the ''mtu'' property on those other interfaces using dladm ... though I''m not an expert in xVM or VirtualBox issues. -- James Carlson 42.703N 71.076W <carlsonj at workingcode.com>
Rand S. Huntzinger
2010-Jul-15  20:04 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
Well, I''m the system administrator and I did pretty much what I''ve always done for ethernet MTU''s - use the default. This is the first time I''ve ever seen the default set to anything other than 1500 which is why I didn''t check this. I checked and the bnx driver having the default MTU set to 1496 is a bug (6941451) fixed in build 143 which it doesn''t appear I''m going to see for a while. There is nothing in the bug to indicate why it was set to 1496 instead of 1500. Hopefully it wasn''t to avoid another problem. I might just try upping the bnx MTU to 1500 and see what happens. If all appears cool that might be the best fix until build 143 comes around. By the way, you can''t use dladm to set the MTU less than 1500. You can set it with ifconfig and possibly via the driver''s .conf file (I''ve not tried). -- Rand -- This message posted from opensolaris.org
James Carlson
2010-Jul-15  20:11 UTC
[crossbow-discuss] Networking problem between virtual hosts on a single machine...
Rand S. Huntzinger wrote:> Well, I''m the system administrator and I did pretty much what I''ve always done for ethernet MTU''s - use the default. This is the first time I''ve ever seen the default set to anything other than 1500 which is why I didn''t check this. I checked and the bnx driver having the default MTU set to 1496 is a bug (6941451) fixed in build 143 which it doesn''t appear I''m going to see for a while. There is nothing in the bug to indicate why it was set to 1496 instead of 1500. Hopefully it wasn''t to avoid another problem. I might just try upping the bnx MTU to 1500 and see what happens. If all appears cool that might be the best fix until build 143 comes around.OK; that makes sense.> By the way, you can''t use dladm to set the MTU less than 1500. You can set it with ifconfig and possibly via the driver''s .conf file (I''ve not tried).For what it''s worth, setting the mtu by way of ifconfig only affects IP. If you have other protocols running on that interface, you''ll need to deal with them separately. I''d thought that the allowable MTU values were a function of the driver itself rather than a global restriction. But I''ll admit that I don''t know enough about that lower layer. -- James Carlson 42.703N 71.076W <carlsonj at workingcode.com>