Rustedt, Florian
2009-Jun-03 07:26 UTC
[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??
Hello List, Because of lost packets, we began to do some research on the networking in Xen. Out Xen 3.3.0 host has got a handfull domUs with routed networking and linux in paravirtual configuration. Now, we've got lot's of packetlost on the virtual interfaces in the domUs and in betweeen we believe that this is based on the errors on the host interface (uptime 47 days). eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX inet addr:212.XX.XX.XX Bcast:212.XX.XX.255 Mask:255.255.255.0 inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:250388860 errors:8180755 dropped:131050 overruns:0 frame:22483 TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0 collisions:14726679 txqueuelen:1000 RX bytes:231736594230 (215.8 GiB) TX bytes:43873538477 (40.8 GiB) Interrupt:20 Memory:fd7f0000-fd800000 Very strange are first the massive collisions, as it is routed and second, that sending has no errors? How can that be possible? So, how does routed networking in xen work in detail? How can we solve this? Kind regards, Florian ********************************************************************************************** IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error, please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies thereof. *** eSafe scanned this email for viruses, vandals, and malicious content. *** ********************************************************************************************** _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rustedt, Florian
2009-Jun-03 07:51 UTC
[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??
Hello List, Because of lost packets, we began to do some research on the networking in Xen. Out Xen 3.3.0 host has got a handfull domUs with routed networking and linux in paravirtual configuration. Now, we've got lot's of packetlost on the virtual interfaces in the domUs and in betweeen we believe that this is based on the errors on the host interface (uptime 47 days). eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX inet addr:212.XX.XX.XX Bcast:212.XX.XX.255 Mask:255.255.255.0 inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:250388860 errors:8180755 dropped:131050 overruns:0 frame:22483 TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0 collisions:14726679 txqueuelen:1000 RX bytes:231736594230 (215.8 GiB) TX bytes:43873538477 (40.8 GiB) Interrupt:20 Memory:fd7f0000-fd800000 Very strange are first the massive collisions, as it is routed and second, that sending has no errors? How can that be possible? So, how does routed networking in xen work in detail? How can we solve this? Kind regards, Florian ********************************************************************************************** IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error, please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies thereof. *** eSafe scanned this email for viruses, vandals, and malicious content. *** ********************************************************************************************** _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Luke S Crawford
2009-Jun-03 08:16 UTC
Re: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??
"Rustedt, Florian" <Florian.Rustedt@smartnet.de> writes:> eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX > inet addr:212.XX.XX.XX Bcast:212.XX.XX.255 Mask:255.255.255.0 > inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link > UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 > RX packets:250388860 errors:8180755 dropped:131050 overruns:0 frame:22483 > TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0 > collisions:14726679 txqueuelen:1000 > RX bytes:231736594230 (215.8 GiB) TX bytes:43873538477 (40.8 GiB) > Interrupt:20 Memory:fd7f0000-fd800000 > > Very strange are first the massive collisions, as it is routed and second, that sending has no errors? How can that be possible?I''d look at netstat -i my first guess would be that you''ve got a hardware problem (as the tx and rx are done on different pairs, you could have bad wires on the rx pair and good wires on the tx pair.) I see that all the time with non-xen hosts. Cat6 is fairly robust, but it''s not bulletproof. And sometimes you get an "A+" data center guy making you network cables who thinks it doesn''t matter what order he puts the wires in so long as it is the same on both ends, and it passes the test on his $20 at frys "ethernet tester" uh, also verify you are in full-duplex on the switch and the host; duplex mismatches will kill you, and while auto-detect is pretty good these days, and it''s been a while since I''ve seen a duplex mismatch, but especially if you have old equipment it isn''t always perfect. you shouldn''t have colisions at all on full-duplex links, even if they are bridged; so something is very wrong there. (you can check duplex with ethtool) here is similar output from a host I have (this is xen-bridge, so peth0 is the physical device rather than eth0. If I''m not wrong, eth0 is the physical device when using network-route.) Now, I get a number of dropped rx packets, but I suspect that is because this is one of the boxes I set up before I believed the ''dedicate a core to the dom0'' advice. [lsc@lion ~]$ uptime 01:08:44 up 241 days, 23:21, 2 users, load average: 0.02, 0.01, 0.00 [lsc@lion ~]$ /sbin/ifconfig peth0 peth0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:10386801790 errors:0 dropped:42690346 overruns:0 frame:0 TX packets:11083748615 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:1158571105279 (1.0 TiB) TX bytes:1845366972026 (1.6 TiB) Memory:e8180000-e81a0000 Note, that the box has been up for 2/3rds of a year, and errors and collisions are both 0; this is because the physical switch it is on is a switch, and the connection is full-duplex. of course, the linux bridge that xen uses is also a full-duplex connection. [lsc@lion ~]$ sudo /sbin/ethtool peth0 Settings for peth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbag Wake-on: g Current message level: 0x00000001 (1) Link detected: yes _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rustedt, Florian
2009-Jun-03 08:28 UTC
AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??
> -----Urspr?ngliche Nachricht----- > Von: Luke S Crawford [mailto:lsc@prgmr.com] > Gesendet: Mittwoch, 3. Juni 2009 10:17 > An: Rustedt, Florian > Cc: Xen-users@lists.xensource.com > Betreff: Re: [Xen-users] Millions of errors/collisions on > physical interface in routed Xen-setup??> I''d look at netstat -i my first guess would be that you''ve > got a hardware problemOk, will do a failover to the second XEN, statistically near impossibility, that i''ve got the errors there, too, if it is hardware-based, right?> uh, also verify you are in full-duplex [...]We''ve checked this already, sync is ok in fullduplex 100Mb/s> [...] ''dedicate a core to the dom0'' advice.Don''t knew this? Why? I''ve got eight cores per dom0, so do i have to take one seperate for dom0 handling? What would be the advantage? ********************************************************************************************** IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error, please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies thereof. *** eSafe scanned this email for viruses, vandals, and malicious content. *** ********************************************************************************************** _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Luke S Crawford
2009-Jun-03 08:57 UTC
Re: AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??
"Rustedt, Florian" <Florian.Rustedt@smartnet.de> writes:> > -----Urspr?ngliche Nachricht----- > > Von: Luke S Crawford [mailto:lsc@prgmr.com] > > Gesendet: Mittwoch, 3. Juni 2009 10:17 > > An: Rustedt, Florian > > Cc: Xen-users@lists.xensource.com > > Betreff: Re: [Xen-users] Millions of errors/collisions on > > physical interface in routed Xen-setup?? > > > I''d look at netstat -i my first guess would be that you''ve > > got a hardware problem > > Ok, will do a failover to the second XEN, statistically near impossibility, that i''ve got the errors there, too, if it is hardware-based, right?well, not if you have an "a+" data center tech wiring down your patch panel. Really, I shouldn''t make fun of people for not being able to wire down a 110 patch panel well; it''s not easy. Oftentimes I choose a rats nest of (carefully labeled on both ends) patch cables instead of facing the nightmare of wiring up a 110 patch panel and finding someone to loan me a real ($3000) ethernet cable tester, then re-wiring all the connections that are just a little bit off. But my point is that yeah, it is possible to have two sets of bad hardware that are bad in the same way. I would just swap out the ethernet cable (use a pre-made brand-new cable, and skip any patch panels to be sure.)> > [...] ''dedicate a core to the dom0'' advice. > > Don''t knew this? Why? I''ve got eight cores per dom0, so do i have to take one seperate for dom0 handling? What would be the advantage?If all the CPUs are busy when you try to send a packet, or when you try to recieve a packet, that''s trouble. on the order of 60ms lag, depending on what the weights are. (I could be way off on that, but I seem to remember that guests got 60ms timeslices. 60ms is a long time. I could be way off on that 60ms number, but each domU does get a timeslice, that isn''t immediately interupted when the dom0 gets a packet. In fact, the next available timeslice is handed out to the domain that has recently used the least CPU who currently wants it, and that might not be the Dom0. fix that part by setting the dom0 weight really high.) If you dedicate a core to the dom0 (set cpus=''1-7'' or so in the domU config, and in the xend-config.sxp set dom0-cpus 1) then the dom0 can run and push packets around at the same time as the DomUs are running. (oh, make sure you set the vcpus in the domU to 7 or less if you do this; running with more vcpus in a domain than that domain has physical CPUs is seriously bad.) Me, I run a whole lot of DomUs per Dom0, so I give each DomU only 1 vcpu... 7 of them can run at any one time, while the dom0 is busy pushing packets and disk bits. It helps a whole lot once you start trying to scale to more guests than you have physical CPUs. The thing is, usually having a slightly slower system that doesn''t get a lot slower is better than a system that is much faster during the best of times and much slower during the worst. only giving guests access to one cpu at a time slows down the top-end, sure, but being able to run 7 at once sure helps keep the system responsive when several users decide to benchmark you at the same time. At the very least you should weight your dom0 higher; that''s what I did on the 4 core boxes I used to have (like the one in my example) xm sched-credit -d 0 60000 I''d put that in my /etc/rc.local. Without that, you get too many heavy DomU users, and disk and network grind to a halt. With that, well, I still saw the dropped packets you saw in the example, but it seemed to work OK otherwise. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users