I''ve been struggling with a xen networking problem for more than a week now and I ran out of ideas so I thought I''d ask for help here. But first a bit about my networt layout. I have xen setup with a wrapper that establishes three bridges in xen, which I named red, orange, and green. eth0 is my public interface and is a 3com card, bound to the red bridge. eth1 is my dmz interface, a forcedeth (nvidia card), and it is bound to orange. dummy0 is bound to green, and all domU''s on this bridge are in the 192.168.1.0 network, including dom0. I have a very small router/firewall domU that routes and filters all traffic between these bridges. It''s a fun layout that works flawlessly in xen 2.0.7. When I upgrade to xen3, I experience tcp and udp checksum errors when attempting to route any traffic through my firewall/router. I tried to enable "ethtool -K ethX tx off" in all of my domU''s without success. (When I tried, the root user in each domU could access the rest of the network fine, but not normal users ... and dom0 remained cut-off regardless of superuser status.) This behavior makes no sense to me whatsoever, and I was hoping someone could point out some documentation or has an idea on how to proceed. At some point I''d like to upgrade :) Oh, and I see the same behavior in the binary and custom-compiled kernels, both the 3.0.0 release and the snapshot as of 2005-12-31. Thanks ... -- Regards, - Charles _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Jan 03, 2006 at 2041 -0800, Charles Mauch appeared and said:> [...] > I have xen setup with a wrapper that establishes three bridges in xen, > which I named red, orange, and green. eth0 is my public interface and is a > 3com card, bound to the red bridge. eth1 is my dmz interface, a forcedeth > (nvidia card), and it is bound to orange. dummy0 is bound to green, and > all domU''s on this bridge are in the 192.168.1.0 network, including dom0. > [...] > > When I upgrade to xen3, I experience tcp and udp checksum errors when > attempting to route any traffic through my firewall/router.I have a similar albeit simpler setup. My Dom0 is connected to the Internet. The "first" DomU is a firewall, the second DomU is a web server. The first bridge connects physical interface with the firewall''s exterior virtual interface. The second bridge connects firewall and webserver to form a kind of LAN. Firewalling and routing with NAT is all set up. However as soon as data packets flow between web server and a client I get TCP checksum errors. The TCP handshake works fine. A.A.A.A is the firewall''s IP, B.B.B.B is a client: client:/root# tethereal -n -i eth0 host A.A.A.A Capturing on eth0 0.000000 B.B.B.B -> A.A.A.A TCP 41721 > 80 [SYN, ECN, CWR] Seq=0 Ack=0 Win=5840 Len=0 MSS=1460 TSV=333318731 TSER=0 WS=2 0.009362 A.A.A.A -> B.B.B.B TCP 80 > 41721 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=2958428 TSER=333318731 WS=2 0.009434 B.B.B.B -> A.A.A.A TCP 41721 > 80 [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=333318733 TSER=2958428 0.035842 A.A.A.A -> B.B.B.B TCP 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958430 TSER=333318733 0.243444 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958451 TSER=333318733 0.663784 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958493 TSER=333318733 1.501512 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958577 TSER=333318733 3.183639 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958745 TSER=333318733 6.539707 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2959081 TSER=333318733 ...> I tried to enable "ethtool -K ethX tx off" in all of my domU''s without > success.What NIC are you using? Mine is a Broadcom with the tg3 driver.> Oh, and I see the same behavior in the binary and custom-compiled kernels, > both the 3.0.0 release and the snapshot as of 2005-12-31.I am using a 3.0.0 system with the changeset from "Thu Dec 15 20:57:27 2005 +0100 8259:5baa96bedc13". The machine was installed on 21 December 2005. Best regards, Lynx. -- "From the delicate strands, between minds we weave our mesh: a blanket to warm the soul." --- Lady Deirdre Skye (SMAC) --- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I observed similar problems (checksum errors) when I directly assigned an IP address to the bridge in dom0. I corrected the situation by adding vif0.0 to the bridge and assigning the IP address to veth0. If you''re using the network-bridge script that comes with xend this all happens automatically, but I had to roll my own solution and ran afoul. Dan. René Pfeiffer wrote:>On Jan 03, 2006 at 2041 -0800, Charles Mauch appeared and said: > > >>[...] >>I have xen setup with a wrapper that establishes three bridges in xen, >>which I named red, orange, and green. eth0 is my public interface and is a >>3com card, bound to the red bridge. eth1 is my dmz interface, a forcedeth >>(nvidia card), and it is bound to orange. dummy0 is bound to green, and >>all domU''s on this bridge are in the 192.168.1.0 network, including dom0. >>[...] >> >>When I upgrade to xen3, I experience tcp and udp checksum errors when >>attempting to route any traffic through my firewall/router. >> >> > >I have a similar albeit simpler setup. My Dom0 is connected to the >Internet. The "first" DomU is a firewall, the second DomU is a web server. >The first bridge connects physical interface with the firewall''s >exterior virtual interface. The second bridge connects firewall and >webserver to form a kind of LAN. > >Firewalling and routing with NAT is all set up. However as soon as data >packets flow between web server and a client I get TCP checksum errors. >The TCP handshake works fine. A.A.A.A is the firewall''s IP, B.B.B.B is a >client: > >client:/root# tethereal -n -i eth0 host A.A.A.A >Capturing on eth0 > 0.000000 B.B.B.B -> A.A.A.A TCP 41721 > 80 [SYN, ECN, CWR] Seq=0 Ack=0 Win=5840 Len=0 MSS=1460 TSV=333318731 TSER=0 WS=2 > 0.009362 A.A.A.A -> B.B.B.B TCP 80 > 41721 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=2958428 TSER=333318731 WS=2 > 0.009434 B.B.B.B -> A.A.A.A TCP 41721 > 80 [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=333318733 TSER=2958428 > 0.035842 A.A.A.A -> B.B.B.B TCP 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958430 TSER=333318733 > 0.243444 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958451 TSER=333318733 > 0.663784 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958493 TSER=333318733 > 1.501512 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958577 TSER=333318733 > 3.183639 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2958745 TSER=333318733 > 6.539707 A.A.A.A -> B.B.B.B TCP [TCP Retransmission] 80 > 41721 [PSH, ACK] Seq=1 Ack=1 Win=5792 [TCP CHECKSUM INCORRECT] Len=23 TSV=2959081 TSER=333318733 >... > > > >>I tried to enable "ethtool -K ethX tx off" in all of my domU''s without >>success. >> >> > >What NIC are you using? Mine is a Broadcom with the tg3 driver. > > > >>Oh, and I see the same behavior in the binary and custom-compiled kernels, >>both the 3.0.0 release and the snapshot as of 2005-12-31. >> >> > >I am using a 3.0.0 system with the changeset from "Thu Dec 15 20:57:27 >2005 +0100 8259:5baa96bedc13". The machine was installed on 21 December >2005. > >Best regards, >Lynx. > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Jan 16, 2006 at 2136 -0600, Daniel Goertzen appeared and said:> I observed similar problems (checksum errors) when I directly assigned > an IP address to the bridge in dom0.My bridges have IP addresses, but the addresses are not used. Could this be a problem?> I corrected the situation by adding vif0.0 to the bridge and assigning > the IP address to veth0. If you''re using the network-bridge script > that comes with xend this all happens automatically, but I had to roll > my own solution and ran afoul.I use xend''s network-bridge script and have additional network interfaces defined in /etc/network/interfaces. The bridges look as follows: xen0:~# brctl show bridge name bridge id STP enabled interfaces xenbr0 8000.feffffffffff no peth0 vif0.0 vif19.0 xenbr1 8000.dabb912575c8 no dummy1 vif14.0 vif19.1 xen0:~# peth0 and vif0.0 are from Dom0. vif19.0 and vif19.1 belong to the firewall in the first DomU and the vif14.0 belongs to the webserver. dummy1 is one of two dummy interfaces. A /29 network is routed to the machine. Dom0 has the first address, eth0 inside the firewall''s DomU has the second, both use the gateway serving the /29. Neither peth0 nor vif0.0 have IP addresses configured. My assumption was that the first bridge xenbr0 forwards the packets to the gateway. ICMP, TCP SYN and even TCP SYN plus data works, everything else won''t. Maybe someone can explain the background of the checksum errors. Best regards, Lynx. -- "From the delicate strands, between minds we weave our mesh: a blanket to warm the soul." --- Lady Deirdre Skye (SMAC) --- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Instead of adding a dummy interface to the bridge try adding another vif0.x. The xen drivers try to optimize networking by skipping out on checksumming at various stages, because the assumption is that inter-domain network transport is immune to error. My theory is that if you don''t properly use the vifx.y/vethx interfaces, you bypass some of these checksum optimizations and end up performing integrity checks when you shouldn''t. My dream is that this will someday be documented somewhere, or better yet degrade gracefully to checksum operation with a kernel warning or something. The existing design seems brittle. Again, that''s only my theory based on my experience. If someone really knows whats going on, I look forward to hearing about it. Dan. René Pfeiffer wrote:>On Jan 16, 2006 at 2136 -0600, Daniel Goertzen appeared and said: > > >>I observed similar problems (checksum errors) when I directly assigned >>an IP address to the bridge in dom0. >> >> > >My bridges have IP addresses, but the addresses are not used. Could this >be a problem? > > > >>I corrected the situation by adding vif0.0 to the bridge and assigning >>the IP address to veth0. If you''re using the network-bridge script >>that comes with xend this all happens automatically, but I had to roll >>my own solution and ran afoul. >> >> > >I use xend''s network-bridge script and have additional network >interfaces defined in /etc/network/interfaces. The bridges look as >follows: > >xen0:~# brctl show >bridge name bridge id STP enabled interfaces >xenbr0 8000.feffffffffff no peth0 > vif0.0 > vif19.0 >xenbr1 8000.dabb912575c8 no dummy1 > vif14.0 > vif19.1 >xen0:~# > >peth0 and vif0.0 are from Dom0. vif19.0 and vif19.1 belong to the >firewall in the first DomU and the vif14.0 belongs to the webserver. >dummy1 is one of two dummy interfaces. A /29 network is routed to the >machine. Dom0 has the first address, eth0 inside the firewall''s DomU has >the second, both use the gateway serving the /29. Neither peth0 nor >vif0.0 have IP addresses configured. > >My assumption was that the first bridge xenbr0 forwards the packets to >the gateway. ICMP, TCP SYN and even TCP SYN plus data works, everything >else won''t. > >Maybe someone can explain the background of the checksum errors. > >Best regards, >Lynx. > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Jan 17, 2006 at 0830 -0600, Daniel Goertzen appeared and said:> Instead of adding a dummy interface to the bridge try adding another vif0.x. > > The xen drivers try to optimize networking by skipping out on > checksumming at various stages, because the assumption is that > inter-domain network transport is immune to error. My theory is that if > you don''t properly use the vifx.y/vethx interfaces, you bypass some of > these checksum optimizations and end up performing integrity checks when > you shouldn''t. [...]Aha, that sounds plausible. I think I will try it with rearranged interfaces on my test server. I worked around this problem by using TCP proxies for the ports in question. Since it was for HTTP & HTTPS a reverse proxy is better for filtering anyway.> Again, that''s only my theory based on my experience. If someone really > knows whats going on, I look forward to hearing about it.Me too. I''d like to have a switch to change this behaviour, because I am exploring Xen for use as educational tool for my students. We always run out of physical space when setting up demo networks with multiple routers and clients. Teaching TCP checksum errors isn''t the goal of the course. ;) Best, Lynx. -- "From the delicate strands, between minds we weave our mesh: a blanket to warm the soul." --- Lady Deirdre Skye (SMAC) --- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users