Hi, I''m experiencing random domU freezes. This is similar to Debian bug #534880. One of the numerous references to it on the web: http://moblog.wiredwings.com/archives/20090227/Lennys-Xen-Kernel-2.6.26-Causes-DomU-Freezes.html It is stated there that both 2.6.26 and 2.6.30 with pv_ops freeze on domU. I''ve been using Xen 3.2 for a few months without any problems - until now. This seems like a critical bug that bites more and more installations and will surely become a show stopper for migration to Xen at many Linux shops. My environment: - Xen 3.2.1 - multiple x86_64 and i386 dom0s and domUs - machines from different vendors, the hardware has been checked and double-checked - dom0 kernel: Debian Lenny''s xenified 2.6.26 (with OpenSUSE patches) - domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both have the same problems) - all domUs are SMP (vcpus > 1). This problem doesn''t occur with UP domUs, (unfortunately the performance hit from running the domUs with vcpus=1 is unacceptable for my installations) - no vcpu pinning (by choice) for dom0s nor domUs - the bug seems unrelated to load profiles; some domUs that freeze are almost always idle, some are I/O intensive, pushing 30 MB/s to disks and a few hundred megabits to the network. The symptoms: After a few (3-24) hours of runtime, some of the domUs become completely unresponsive: - the network stack is completely dead - xm console is unresponsive - xm vcpu-list always shows one vcpu in no state ("---") and all other vcpus in r state - xm destroy works and immediately destroys the domU - nothing useful in xm dmesg, xm log - mpstat shows less than 10% steal - I''m waiting for another freeze to check if there''s anything useful on domU consoles I''ll try the following options (and post my results to this list): - vcpu-pinning for dom0 only - vcpu-pinning for dom0 and domUs - vcpu-pinning and dedicating a core for the dom0 (however, vcpu-pinning is not a solution for me, as it wastes cores - some domUs sit idle and some wait for their turn) - downgrading dom0 kernel to xenified 2.6.18 - upgrading the hypervisor to 3.4 - downgrading domU kernel to xenified 2.6.18 -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Simon Hobson
2009-Aug-31 13:16 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
Leszek Urbanski wrote:>I''m experiencing random domU freezes.Can you elaborate ? Is it that a single DomU will ''just stop working'' while other carry on ? And Xentop reports it as consuming 100% CPU ? If so, then it might be this problem : http://markmail.org/message/7hghxj6jp55gt26k http://markmail.org/message/7hghxj6jp55gt26 The description is slightly out, but it certainly worked for me last time it happened. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Aug-31 13:44 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<a06240808c6c17d2ea5f7@simon.thehobsons.co.uk>; from Simon Hobson on Mon, Aug 31, 2009 at 14:16:39 +0100> Leszek Urbanski wrote: > > >I''m experiencing random domU freezes. > > Can you elaborate ? > Is it that a single DomU will ''just stop working'' while other carry > on ? And Xentop reports it as consuming 100% CPU ?Simon, The full description of my problem, including all symptoms, is in my first post in this thread. It''s always a single domU. I don''t use xentop, but mpstat would show a high "steal" value on one of the cpus if a domU was using 100%, wouldn''t it?> If so, then it might be this problem : > http://markmail.org/message/7hghxj6jp55gt26kThat''s not it: the guest is not using 100% CPU when frozen and I don''t have anything that would write excessively to the console - there are about 5-10 lines beyond the initial log from booting. Also, I don''t get any previous output from the buffer when I run "xm console".> http://markmail.org/message/7hghxj6jp55gt26This URL gives a 404. Regards, -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Simon Hobson
2009-Aug-31 14:06 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
Leszek Urbanski wrote:> > If so, then it might be this problem : > > http://markmail.org/message/7hghxj6jp55gt26k > >That''s not it: the guest is not using 100% CPU when frozen and I don''t have >anything that would write excessively to the console - there are about 5-10 >lines beyond the initial log from booting. Also, I don''t get any previous >output from the buffer when I run "xm console". > > > http://markmail.org/message/7hghxj6jp55gt26 > >This URL gives a 404.It''s just two messages on in the same thread. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Sep-02 11:06 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<20090831124358.GA7615@moo.pl>; from Leszek Urbanski on Mon, Aug 31, 2009 at 14:43:58 +0200> - I''m waiting for another freeze to check if there''s anything useful on > domU consoles...there''s nothing. The console just freezes. -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2009-Sep-02 17:02 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
2009/8/31 Leszek Urbanski <tygrys@moo.pl>:> - domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both have > the same problems)I had CPU softlockup problem with Lenny''s xen kernel. Using 2.6.29 xenified kernel seems to fix that.> I''ll try the following options (and post my results to this list):> - downgrading domU kernel to xenified 2.6.18That one seems to be most promising. Assuming your domU can use 2.6.18. I mostly use RHEL''s kernel-xen (2.6.18) which comes with the distro and works fine. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Sep-02 19:21 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<7207d96f0909021002j1e99901aw5b253a7bc2df7eef@mail.gmail.com>; from Fajar A. Nugraha on Thu, Sep 03, 2009 at 00:02:12 +0700> > - domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both have > > the same problems) > > I had CPU softlockup problem with Lenny''s xen kernel. Using 2.6.29 > xenified kernel seems to fix that.What bothers me is that the lockups occur with both pv_ops and xenified kernels. That''s supposed to be different code. The bug may as well be in the hypervisor or dom0 kernel (less likely).> > - downgrading domU kernel to xenified 2.6.18 > > That one seems to be most promising. Assuming your domU can use > 2.6.18. I mostly use RHEL''s kernel-xen (2.6.18) which comes with the > distro and works fine.That''s actually a last resort. We can''t stick to 2.6.18 in dom0 nor domU forever due to outdated iSCSI support etc. It''d be easier for us to switch to KVM. -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users