thr3ads.net - Xen users - [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup?? [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Rustedt, Florian

2009-Jun-03 07:26 UTC

[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

Hello List,

Because of lost packets, we began to do some research on the networking in Xen.
Out Xen 3.3.0 host has got a handfull domUs with routed networking and linux in
paravirtual configuration.

Now, we've got lot's of packetlost on the virtual interfaces in the
domUs and in betweeen we believe that this is based on the errors on the host
interface (uptime 47 days).

eth0      Link encap:Ethernet  HWaddr 00:XX:XX:XX:XX:XX
          inet addr:212.XX.XX.XX  Bcast:212.XX.XX.255  Mask:255.255.255.0
          inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:250388860 errors:8180755 dropped:131050 overruns:0
frame:22483
          TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0
          collisions:14726679 txqueuelen:1000
          RX bytes:231736594230 (215.8 GiB)  TX bytes:43873538477 (40.8 GiB)
          Interrupt:20 Memory:fd7f0000-fd800000

Very strange are first the massive collisions, as it is routed and second, that
sending has no errors? How can that be possible?

So, how does routed networking in xen work in detail? How can we solve this?

Kind regards, Florian
**********************************************************************************************
IMPORTANT: The contents of this email and any attachments are confidential. They
are intended for the
named recipient(s) only.
If you have received this email in error, please notify the system manager or
the sender immediately and do
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**********************************************************************************************

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rustedt, Florian

2009-Jun-03 07:51 UTC

head link

[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

Hello List,

Because of lost packets, we began to do some research on the networking in Xen.
Out Xen 3.3.0 host has got a handfull domUs with routed networking and linux in
paravirtual configuration.

Now, we've got lot's of packetlost on the virtual interfaces in the
domUs and in betweeen we believe that this is based on the errors on the host
interface (uptime 47 days).

eth0      Link encap:Ethernet  HWaddr 00:XX:XX:XX:XX:XX
          inet addr:212.XX.XX.XX  Bcast:212.XX.XX.255  Mask:255.255.255.0
          inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:250388860 errors:8180755 dropped:131050 overruns:0
frame:22483
          TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0
          collisions:14726679 txqueuelen:1000
          RX bytes:231736594230 (215.8 GiB)  TX bytes:43873538477 (40.8 GiB)
          Interrupt:20 Memory:fd7f0000-fd800000

Very strange are first the massive collisions, as it is routed and second, that
sending has no errors? How can that be possible?

So, how does routed networking in xen work in detail? How can we solve this?

Kind regards, Florian
**********************************************************************************************
IMPORTANT: The contents of this email and any attachments are confidential. They
are intended for the
named recipient(s) only.
If you have received this email in error, please notify the system manager or
the sender immediately and do
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**********************************************************************************************

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Luke S Crawford

2009-Jun-03 08:16 UTC

head link

Re: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

"Rustedt, Florian" <Florian.Rustedt@smartnet.de> writes:
> eth0      Link encap:Ethernet  HWaddr 00:XX:XX:XX:XX:XX
>           inet addr:212.XX.XX.XX  Bcast:212.XX.XX.255  Mask:255.255.255.0
>           inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link
>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>           RX packets:250388860 errors:8180755 dropped:131050 overruns:0
frame:22483
>           TX packets:183684213 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:14726679 txqueuelen:1000
>           RX bytes:231736594230 (215.8 GiB)  TX bytes:43873538477 (40.8
GiB)
>           Interrupt:20 Memory:fd7f0000-fd800000
> 
> Very strange are first the massive collisions, as it is routed and second,
that sending has no errors? How can that be possible?

I''d look at netstat -i    my first guess would be that you''ve
got a hardware
problem  (as the tx and rx are done on different pairs, you could have 
bad wires on the rx pair and good wires on the tx pair.)  I see that 
all the time with non-xen hosts.  Cat6 is fairly robust, but it''s not 
bulletproof.

And sometimes you get an "A+" data center guy making you network 
cables who thinks it doesn''t matter what order he puts the 
wires in so long as it is the same on both ends, and it passes the
test on his $20 at frys "ethernet tester"

uh, also verify you are in full-duplex on the switch and the host; duplex
mismatches will kill you, and while auto-detect is pretty good these days,
and it''s been a while since I''ve seen a duplex mismatch, but
especially
if you have old equipment it isn''t always perfect.  you
shouldn''t have
colisions at all on full-duplex links, even if they are bridged; so something 
is very wrong there.  (you can check duplex with ethtool)   

here is similar output from a host I have (this is xen-bridge, so peth0 
is the physical device rather than eth0.  If I''m not wrong, eth0 is
the physical device when using network-route.)

Now, I get a number of dropped rx packets, but I suspect that is because
this is one of the boxes I set up before I believed the ''dedicate a
core
to the dom0'' advice.   

[lsc@lion ~]$ uptime
 01:08:44 up 241 days, 23:21,  2 users,  load average: 0.02, 0.01, 0.00

[lsc@lion ~]$ /sbin/ifconfig peth0
peth0     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:10386801790 errors:0 dropped:42690346 overruns:0 frame:0
          TX packets:11083748615 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1158571105279 (1.0 TiB)  TX bytes:1845366972026 (1.6 TiB)
          Memory:e8180000-e81a0000 

Note, that the box has been up for 2/3rds of a year, and errors and collisions
are both 0;  this is because the physical switch it is on is a switch, and
the connection is full-duplex.  of course, the linux bridge that xen uses
is also a full-duplex connection.

[lsc@lion ~]$ sudo /sbin/ethtool peth0
Settings for peth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rustedt, Florian

2009-Jun-03 08:28 UTC

head link

AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

> -----Urspr?ngliche Nachricht-----
> Von: Luke S Crawford [mailto:lsc@prgmr.com]
> Gesendet: Mittwoch, 3. Juni 2009 10:17
> An: Rustedt, Florian
> Cc: Xen-users@lists.xensource.com
> Betreff: Re: [Xen-users] Millions of errors/collisions on
> physical interface in routed Xen-setup??
> I''d look at netstat -i    my first guess would be that
you''ve
> got a hardware problem
Ok, will do a failover to the second XEN, statistically near impossibility, that
i''ve got the errors there, too, if it is hardware-based, right?
> uh, also verify you are in full-duplex [...]
We''ve checked this already, sync is ok in fullduplex 100Mb/s
> [...] ''dedicate a core to the dom0'' advice.
Don''t knew this? Why? I''ve got eight cores per dom0, so do i
have to take one seperate for dom0 handling? What would be the advantage?
**********************************************************************************************
IMPORTANT: The contents of this email and any attachments are confidential. They
are intended for the
named recipient(s) only.
If you have received this email in error, please notify the system manager or
the sender immediately and do
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**********************************************************************************************


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Luke S Crawford

2009-Jun-03 08:57 UTC

head link

Re: AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

"Rustedt, Florian" <Florian.Rustedt@smartnet.de> writes:
> > -----Urspr?ngliche Nachricht-----
> > Von: Luke S Crawford [mailto:lsc@prgmr.com]
> > Gesendet: Mittwoch, 3. Juni 2009 10:17
> > An: Rustedt, Florian
> > Cc: Xen-users@lists.xensource.com
> > Betreff: Re: [Xen-users] Millions of errors/collisions on
> > physical interface in routed Xen-setup??
> 
> > I''d look at netstat -i    my first guess would be that
you''ve
> > got a hardware problem
> 
> Ok, will do a failover to the second XEN, statistically near impossibility,
that i''ve got the errors there, too, if it is hardware-based, right?
well, not if you have an "a+" data center tech wiring down your patch
panel.
Really, I shouldn''t make fun of people for not being able to wire down
a
110 patch panel well;  it''s not easy.  Oftentimes I choose a rats nest
of
(carefully labeled on both ends) patch cables instead of facing the nightmare
of wiring up a 110 patch panel and finding someone to loan me a real 
($3000)  ethernet cable tester, then re-wiring all the connections that are
just a little bit off.

But my point is that yeah, it is possible to have two sets of bad hardware 
that are bad in the same way. I would just swap out the ethernet cable (use 
a pre-made brand-new cable, and skip any patch panels to be sure.)  
> > [...] ''dedicate a core to the dom0'' advice.
> 
> Don''t knew this? Why? I''ve got eight cores per dom0, so
do i have to take one seperate for dom0 handling? What would be the advantage?
If all the CPUs are busy when you try to send a packet, or when you
try to recieve a packet, that''s trouble. on the order of 60ms lag,
depending
on what the weights are. (I could be way off on that, but I seem to remember 
that guests got 60ms timeslices.  60ms is a long time. I could be way off
on that 60ms number, but each domU does get a timeslice, that isn''t
immediately
interupted when the dom0 gets a packet.  In fact, the next available timeslice
is handed out to the domain that has recently used the least CPU who currently 
wants it, and that might not be the Dom0.   fix that part by setting the dom0
weight really high.) 

If you dedicate a core to the dom0  (set cpus=''1-7'' or so in
the domU config,
and in the xend-config.sxp set dom0-cpus 1)  then the dom0 can run and
push packets around at the same time as the DomUs are running.  
(oh, make sure you set the vcpus in the domU to 7 or less if you do this;  
running with more vcpus in a domain than that domain has physical CPUs is
seriously bad.)  

Me, I run a whole lot of DomUs per Dom0, so I give each DomU only 1 vcpu... 
7 of them can run at any one time, while the dom0 is busy pushing packets
and disk bits.  It helps a whole lot once you start trying to scale to more 
guests than you have physical CPUs.  The thing is, usually having a 
slightly slower system that doesn''t get a lot slower is better than 
a system that is much faster during the best of times and much slower during
the worst.  only giving guests access to one cpu at a time slows down 
the top-end, sure, but being able to run 7 at once sure helps keep
the system responsive when several users decide to benchmark you at the
same time.  

At the very least you should weight your dom0 higher;  that''s what I
did on the 4 core boxes I used to have (like the one in my example)
xm sched-credit -d 0 60000

I''d put that in my /etc/rc.local.   Without that,  you get too many
heavy
DomU users, and disk and network grind to a halt.  With that, well, I still
saw the dropped packets you saw in the example, but it seemed to work OK
otherwise.  

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Jun 2009 - Millions of errors/collisions on physical interface in routed Xen-setup??

[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

[Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

Re: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??

Re: AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??