Axel Thimm
2007-Apr-07 15:45 UTC
[Xen-users] xen bonding and network performance dropping to ~ 0.1%
Hi, after some patching of the xen scripts (to properly `migrate'' slaves over from bond0 to pbond0), I have xen and bonding working. But only for 20-25 seconds, after that the network throughput suddenly falls from for example 110MB/sec to 70-120KB/sec, e.g. about a factor of thousand. Stopping the network bridge restores the throughput, but again after a short delay of 0.5-1 minute. Does that ring a bell? What can be the troublemaker and why does it appear with such a great delay? There is no hint in the logs on why the performance drops that dramatically. Thanks! -- Axel.Thimm at ATrpms.net _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Axel Thimm
2007-Apr-07 18:27 UTC
[Xen-users] Re: xen bonding and network performance dropping to ~ 0.1%
On Sat, Apr 07, 2007 at 05:45:24PM +0200, Axel Thimm wrote:> after some patching of the xen scripts (to properly `migrate'' slaves > over from bond0 to pbond0), I have xen and bonding working. But only > for 20-25 seconds, after that the network throughput suddenly falls > from for example 110MB/sec to 70-120KB/sec, e.g. about a factor of > thousand. Stopping the network bridge restores the throughput, but > again after a short delay of 0.5-1 minute. > > Does that ring a bell? What can be the troublemaker and why does it > appear with such a great delay? There is no hint in the logs on why > the performance drops that dramatically.I checked where the packets got dropped by checking ICMP traffic on o eth0,eth1 the two slaves o pbond0 the physical bond of these two o xenbr0 o bond0, aka veth0 While the network works well, the ICMP requests/replies can be seen on all interfaces [1]. When the network breaks down to below 1% of its bandwidth I can see the external ICMP requests reaching as far as xenbr0. The virtual interface bond0 does not see the packets anymore. So it looks like the bridge is leaking the packets, even after the packets have passed into the bridge through the bonded device. This makes it even more mysterious, since if the issue was bonding & bridges I would expect the packets to drop on the incoming side of the bridge. [1] Incoming traffic is not captured in general on the slaves, so eth0 and eth1 did not show the ICMP requests, only the outgoing ICMP replies. -- Axel.Thimm at ATrpms.net _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Igor Chubin
2007-Apr-08 06:45 UTC
Re: [Xen-users] Re: xen bonding and network performance dropping to ~ 0.1%
On Sa, Apr 07, 2007 at 08:27:06 +0200, Axel Thimm wrote:> On Sat, Apr 07, 2007 at 05:45:24PM +0200, Axel Thimm wrote: > > after some patching of the xen scripts (to properly `migrate'' slaves > > over from bond0 to pbond0), I have xen and bonding working. But only > > for 20-25 seconds, after that the network throughput suddenly falls > > from for example 110MB/sec to 70-120KB/sec, e.g. about a factor of > > thousand. Stopping the network bridge restores the throughput, but > > again after a short delay of 0.5-1 minute. > > > > Does that ring a bell? What can be the troublemaker and why does it > > appear with such a great delay? There is no hint in the logs on why > > the performance drops that dramatically. > > I checked where the packets got dropped by checking ICMP traffic on > > o eth0,eth1 the two slaves > o pbond0 the physical bond of these two > o xenbr0 > o bond0, aka veth0 > > While the network works well, the ICMP requests/replies can be seen on > all interfaces [1]. When the network breaks down to below 1% of its > bandwidth I can see the external ICMP requests reaching as far as > xenbr0. The virtual interface bond0 does not see the packets anymore. > > So it looks like the bridge is leaking the packets, even after the > packets have passed into the bridge through the bonded device. This > makes it even more mysterious, since if the issue was bonding & > bridges I would expect the packets to drop on the incoming side of > the bridge. >Are you aware of this issue? http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753 May be your problem is related to this? -- WBR, i.m.chubin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Axel Thimm
2007-Apr-08 08:24 UTC
[Xen-users] Re: xen bonding and network performance dropping to ~ 0.1%
On Sun, Apr 08, 2007 at 09:45:18AM +0300, Igor Chubin wrote:> On Sa, Apr 07, 2007 at 08:27:06 +0200, Axel Thimm wrote: > > On Sat, Apr 07, 2007 at 05:45:24PM +0200, Axel Thimm wrote: > > > after some patching of the xen scripts (to properly `migrate'' slaves > > > over from bond0 to pbond0), I have xen and bonding working. But only > > > for 20-25 seconds, after that the network throughput suddenly falls > > > from for example 110MB/sec to 70-120KB/sec, e.g. about a factor of > > > thousand. Stopping the network bridge restores the throughput, but > > > again after a short delay of 0.5-1 minute. > > > > > > Does that ring a bell? What can be the troublemaker and why does it > > > appear with such a great delay? There is no hint in the logs on why > > > the performance drops that dramatically. > > > > I checked where the packets got dropped by checking ICMP traffic on > > > > o eth0,eth1 the two slaves > > o pbond0 the physical bond of these two > > o xenbr0 > > o bond0, aka veth0 > > > > While the network works well, the ICMP requests/replies can be seen on > > all interfaces [1]. When the network breaks down to below 1% of its > > bandwidth I can see the external ICMP requests reaching as far as > > xenbr0. The virtual interface bond0 does not see the packets anymore. > > > > So it looks like the bridge is leaking the packets, even after the > > packets have passed into the bridge through the bonded device. This > > makes it even more mysterious, since if the issue was bonding & > > bridges I would expect the packets to drop on the incoming side of > > the bridge. > > > > Are you aware of this issue? > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753 > > May be your problem is related to this?Yes, I''ve seen this report, and while the vlan parts and the oops are not releavnt to my case, the explanation in http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753#c12 seems to match, but the suggested solution is to apply two patches that have already been applied in kernels >= 2.6.17 and I see this on kernels 2.6.18 (RHEL5) and 2.6.20 (FC6). -- Axel.Thimm at ATrpms.net _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Igor Chubin
2007-Apr-08 09:04 UTC
Re: [Xen-users] Re: xen bonding and network performance dropping to ~ 0.1%
...> > > > Are you aware of this issue? > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753 > > > > May be your problem is related to this? > > Yes, I''ve seen this report, and while the vlan parts and the oops are > not releavnt to my case, the explanation in > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753#c12 seems > to match, but the suggested solution is to apply two patches that have > already been applied in kernels >= 2.6.17 and I see this on kernels > 2.6.18 (RHEL5) and 2.6.20 (FC6).I experience similar problem. But I don''t use interface bonding at all! In my installation I use 8021Q. (I use 2.6.18 also, but Debian, not FC. My Xen is 3.0.4, Debian package) Here is my problem: If I don''t use xenbr0 (fig.1) all works fine. eth0 .---- eth0 VLAN1 [192.168.1.1] ============================<----- eth0.2 VLAN2 [192.168.2.1] (tagged) .____ eth0.3 VLAN3 [192.168.3.1] Figure 1. peth0 +------+ eth0 .---- eth0 VLAN1 [192.168.1.1] ============|xenbr0|========<----- eth0.2 VLAN2 [192.168.2.1] (tagged) +------+ .____ eth0.3 VLAN3 [192.168.3.1] Figure 2. But when I add the bridge (using netowrk-bridge script) from external network I can see untagged interface (192.168.1.1) only. If I add ARP entry with IP and MAC of external host that I want to ping to this host''s ARP-table, arp -s 192.168.2.2 00:11:22:33:44:55 I can ping external host. I have looked to interfaces peth0, xenbr0, eth0 and its subinterface eth0.2 using tcpdump and marked that ARP-requests can be seen on all of the interfaces, but ARP replys are on peth0 and xenbr0 only. I have faced the problem on the only one of my Xen installations. On the rest all works fine. The question to all: Have anybody seen similar problem and what can you advice me? Can anybody say, may described problem be related to network card that I use? In particular to 8021Q hardware acceleration or something like that? Axel, I understand that the problem is not like yours, but may be it helps you to get a clue.> -- > Axel.Thimm at ATrpms.net> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users-- WBR, i.m.chubin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users