I''ve just experienced a rather strange one ... I was trying to create a new DomU using debootstrap as I''ve done before - and everything stopped, quickly followed by a voice from downstairs asking why the internet had stopped ! This first happened yesterday, and assuming I''d perhaps managed some form of resource exhaustion I rebooted the system. Today I tried again (but without the "when will it be working ?" from downstairs) and found this. If I do a ping from my laptop to a DomU, then the responses just stop a short while after I hit the network from Dom0 (it only takes a few MB). If I stop the traffic and wait, then it all comes back again after about 5 minutes : 64 bytes from 192.168.0.34: icmp_seq=465 ttl=64 time=0.290 ms 64 bytes from 192.168.0.34: icmp_seq=466 ttl=64 time=0.292 ms 64 bytes from 192.168.0.34: icmp_seq=467 ttl=64 time=0.338 ms 64 bytes from 192.168.0.34: icmp_seq=468 ttl=64 time=297024.510 ms 64 bytes from 192.168.0.34: icmp_seq=469 ttl=64 time=296024.293 ms 64 bytes from 192.168.0.34: icmp_seq=470 ttl=64 time=295024.138 ms Note that packets don''t seem to have been lost, just queued up. "brctl showmacs eth0" shows MAC addresses aging in the bridge. Dom0 has one bridge on the inside network, internet access is via a router running in a DomU with external ethernet interface made available to it via pciback. Dom0 is Debian Etch, linux-image-2.6.18-6-xen-amd64 2.6.18.dfsg.1-23etch1 running on xen-hypervisor-3.2-1-amd64 3.2.1-2~bpo4+1 The guests are running Lenny. Nothing appears to be logged when the problem happens. Any ideas ? -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 11/10/09, I wrote:>I''ve just experienced a rather strange one ... > >I was trying to create a new DomU using debootstrap as I''ve done >before - and everything stopped, quickly followed by a voice from >downstairs asking why the internet had stopped ! This first happened >yesterday, and assuming I''d perhaps managed some form of resource >exhaustion I rebooted the system. > >Today I tried again (but without the "when will it be working ?" >from downstairs) and found this. > >If I do a ping from my laptop to a DomU, then the responses just >stop a short while after I hit the network from Dom0 (it only takes >a few MB). If I stop the traffic and wait, then it all comes back >again after about 5 minutes : > >64 bytes from 192.168.0.34: icmp_seq=465 ttl=64 time=0.290 ms >64 bytes from 192.168.0.34: icmp_seq=466 ttl=64 time=0.292 ms >64 bytes from 192.168.0.34: icmp_seq=467 ttl=64 time=0.338 ms >64 bytes from 192.168.0.34: icmp_seq=468 ttl=64 time=297024.510 ms >64 bytes from 192.168.0.34: icmp_seq=469 ttl=64 time=296024.293 ms >64 bytes from 192.168.0.34: icmp_seq=470 ttl=64 time=295024.138 ms >Note that packets don''t seem to have been lost, just queued up. > >"brctl showmacs eth0" shows MAC addresses aging in the bridge. > >Dom0 has one bridge on the inside network, internet access is via a >router running in a DomU with external ethernet interface made >available to it via pciback. > >Dom0 is Debian Etch, linux-image-2.6.18-6-xen-amd64 2.6.18.dfsg.1-23etch1 >running on xen-hypervisor-3.2-1-amd64 3.2.1-2~bpo4+1 > >The guests are running Lenny. > >Nothing appears to be logged when the problem happens. Any ideas ?I''ve now hit this one at work. In this case, it''s a four core Xeon - and I''ve tried pinning Dom-0 to core 0, and the guests to cores 1-3. I''ve further found that if I kill the transfer then networking returns instantly. So I can run a network intensive transfer (rsync) while watching the pings - the pings stop, I hit ctrl-c, the pings recover with no dropped packets. Dom-0 is running Debian Lenny linux-image-2.6.18-6-xen-686 (2.6.18.dfsg.1-22) and xen-hypervisor-3.2-1-i386 (3.2.1-2) Dom-U is running Lenny and the same kernel. It''s also storing the data on an iSCSI volume (iSCSI handled by Dom0 and exported to DomU). However, it''s not just traffic from Dom0 to DomU that does it. I''ve got a big transfer going from another machine copying large amounts of data to the same DomU - and ping is showing occasional dropped packets. So, anyone got hints on what;s causing it, and more importantly, how to fix it ? -- Simon Hobson WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a dead HD and need this CD so I can replace the disk and re-install the printer OS on it. If anyone knows where I might get hold of one I''d be grateful - requests to Canon drew a blank, it''s been out of support for years. Alternatively, if anyone has one of these and would let me image their hard disk ... Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon, Feb 1, 2010 at 6:37 PM, Simon Hobson <linux@thehobsons.co.uk> wrote:> However, it''s not just traffic from Dom0 to DomU that does it. I''ve got a > big transfer going from another machine copying large amounts of data to the > same DomU - and ping is showing occasional dropped packets. > > > So, anyone got hints on what;s causing it, and more importantly, how to fix > it ?Do you have traffic limit somewhere? It sounds like the switch your physical machine connected to has port storm-control or similar. I''d suggest testing using crossover cable from server <-> laptop and use iperf (or other packet generators) to isolate the problem. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha wrote:>On Mon, Feb 1, 2010 at 6:37 PM, Simon Hobson <linux@thehobsons.co.uk> wrote: >> However, it''s not just traffic from Dom0 to DomU that does it. I''ve got a >> big transfer going from another machine copying large amounts of data to the >> same DomU - and ping is showing occasional dropped packets. >> >> >> So, anyone got hints on what;s causing it, and more importantly, how to fix >> it ? > >Do you have traffic limit somewhere? It sounds like the switch your >physical machine connected to has port storm-control or similar. I''d >suggest testing using crossover cable from server <-> laptop and use >iperf (or other packet generators) to isolate the problem.It''s definitely not that - when not transferring data between Dom0 and DomU on the same machine, I can shovel data around between DomUs and external partners with no problem. The problem ONLY happens when Dom0 is exchanging data with a DomU it hosts. It''s completely repeatable - start large transfer and wait, when the traffic stops I can hit ctrl-C and the network comes back to life. So passing network traffic about is fine as long as Dom0 isn''t involved at anything (other than a low rate) with one of it''s own DomUs - I had to limit my rsync transfer to 25kbyte/s and even then it could cause a problem if there was other traffic going on. Given bits I''ve picked up before, I''m guessing that there''s a problem in the network handling process - one thread handling all traffic ? Wild stab in the dark, but it does sound a bit like some sort of mutual blocking going on - and stopping one partner then frees the other one up. Is this something I should report as a bug ? What''s the right way to do that ? Unfortunately, I don''t have another non-production machine available at the moment so it''s a bit difficult trying the later version from Debian Squeeze. I''ll have a rummage around tomorrow, but I don''t think anything has enough RAM. -- Simon Hobson WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a dead HD and need this CD so I can replace the disk and re-install the printer OS on it. If anyone knows where I might get hold of one I''d be grateful - requests to Canon drew a blank, it''s been out of support for years. Alternatively, if anyone has one of these and would let me image their hard disk ... Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Feb 2, 2010 at 3:06 AM, Simon Hobson <linux@thehobsons.co.uk> wrote:> Fajar A. Nugraha wrote: >> Do you have traffic limit somewhere? It sounds like the switch your >> physical machine connected to has port storm-control or similar.> It''s definitely not that - when not transferring data between Dom0 and DomU > on the same machine, I can shovel data around between DomUs and external > partners with no problem. > > The problem ONLY happens when Dom0 is exchanging data with a DomU it hosts. > It''s completely repeatable - start large transfer and wait, when the traffic > stops I can hit ctrl-C and the network comes back to life.Then it looks like a serious bug.> Is this something I should report as a bug ? What''s the right way to do that > ? Unfortunately, I don''t have another non-production machine available at > the moment so it''s a bit difficult trying the later version from Debian > Squeeze. I''ll have a rummage around tomorrow, but I don''t think anything has > enough RAM.The thing is you''re using a (very) old version of kernel-xen, and the problem is probably fixed in current versions. I''m using RHEL5 with its builtin kernel-xen and don''t have the problem you mentioned. I''d suggest you start with testing latest 2.6.18 kernel from xen.org, or (if you''re a bit lazy) grab a copy of RHEL5''s kernel-xen rpm, extract it somewhere, and copy the contents of /boot and /lib to your system. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha wrote:>The thing is you''re using a (very) old version of kernel-xen, and the >problem is probably fixed in current versions....>I''d suggest you start with testing latest 2.6.18 kernel from xen.org, >or (if you''re a bit lazy) grab a copy of RHEL5''s kernel-xen rpm, >extract it somewhere, and copy the contents of /boot and /lib to your >system.It''s taken me this long to squeeze up the disk usage and move a customer facing web server to another host. I installed linux-image-2.6.26-2-xen-686 today (it was the easiest step for me, just apt-get the package), and so far it appears that the problem is significantly reduced. I''ve been doing a transfer, while also running wireshark in an SSH seesion on the DomU - it''s taken a couple of minutes for the session to spit all the text out after the transfer finished. So I think there is significant buffering going on, but it doesn''t completely kill stuff like it was (pings just go up to a few ms, instead of stopping altogether) - so it will do for now. -- Simon Hobson WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a dead HD and need this CD so I can replace the disk and re-install the printer OS on it. If anyone knows where I might get hold of one I''d be grateful - requests to Canon drew a blank, it''s been out of support for years. Alternatively, if anyone has one of these and would let me image their hard disk ... Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Feb 10, 2010 at 8:43 PM, Simon Hobson <linux@thehobsons.co.uk> wrote:> Fajar A. Nugraha wrote: > I installed linux-image-2.6.26-2-xen-686... which is also known to have some bugs. Search the list archive :P> So I think there is significant > buffering going on, but it doesn''t completely kill stuff like it was (pings > just go up to a few ms, instead of stopping altogether) - so it will do for > now.oh well, ultimately that''s what matters most isn''t it, whether you can live with it or not. Glad to hear you got it working. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users