Hi,
I''m experiencing random domU freezes.
This is similar to Debian bug #534880. One of the numerous references to it
on the web:
http://moblog.wiredwings.com/archives/20090227/Lennys-Xen-Kernel-2.6.26-Causes-DomU-Freezes.html
It is stated there that both 2.6.26 and 2.6.30 with pv_ops freeze on domU.
I''ve been using Xen 3.2 for a few months without any problems - until
now.
This seems like a critical bug that bites more and more installations and
will surely become a show stopper for migration to Xen at many Linux shops.
My environment:
- Xen 3.2.1
- multiple x86_64 and i386 dom0s and domUs
- machines from different vendors, the hardware has been checked and
  double-checked
- dom0 kernel: Debian Lenny''s xenified 2.6.26 (with OpenSUSE patches)
- domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both
have
  the same problems)
- all domUs are SMP (vcpus > 1). This problem doesn''t occur with UP
domUs,
  (unfortunately the performance hit from running the domUs with vcpus=1 is
  unacceptable for my installations)
- no vcpu pinning (by choice) for dom0s nor domUs
- the bug seems unrelated to load profiles; some domUs that freeze are almost
  always idle, some are I/O intensive, pushing 30 MB/s to disks and a few
  hundred megabits to the network.
The symptoms:
After a few (3-24) hours of runtime, some of the domUs become completely
unresponsive:
- the network stack is completely dead
- xm console is unresponsive
- xm vcpu-list always shows one vcpu in no state ("---") and all other
vcpus
  in r state
- xm destroy works and immediately destroys the domU
- nothing useful in xm dmesg, xm log
- mpstat shows less than 10% steal
- I''m waiting for another freeze to check if there''s anything
useful on
  domU consoles
I''ll try the following options (and post my results to this list):
- vcpu-pinning for dom0 only
- vcpu-pinning for dom0 and domUs
- vcpu-pinning and dedicating a core for the dom0
(however, vcpu-pinning is not a solution for me, as it wastes cores - some
domUs sit idle and some wait for their turn)
- downgrading dom0 kernel to xenified 2.6.18
- upgrading the hypervisor to 3.4
- downgrading domU kernel to xenified 2.6.18
-- 
Leszek "Tygrys" Urbanski, SCSA, SCNA
 "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a
more
  wretched hive of bugs and flamers. We must be cautious." -- DECWARS
     http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Simon Hobson
2009-Aug-31  13:16 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
Leszek Urbanski wrote:>I''m experiencing random domU freezes.Can you elaborate ? Is it that a single DomU will ''just stop working'' while other carry on ? And Xentop reports it as consuming 100% CPU ? If so, then it might be this problem : http://markmail.org/message/7hghxj6jp55gt26k http://markmail.org/message/7hghxj6jp55gt26 The description is slightly out, but it certainly worked for me last time it happened. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Aug-31  13:44 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<a06240808c6c17d2ea5f7@simon.thehobsons.co.uk>; from Simon Hobson on Mon, Aug 31, 2009 at 14:16:39 +0100> Leszek Urbanski wrote: > > >I''m experiencing random domU freezes. > > Can you elaborate ? > Is it that a single DomU will ''just stop working'' while other carry > on ? And Xentop reports it as consuming 100% CPU ?Simon, The full description of my problem, including all symptoms, is in my first post in this thread. It''s always a single domU. I don''t use xentop, but mpstat would show a high "steal" value on one of the cpus if a domU was using 100%, wouldn''t it?> If so, then it might be this problem : > http://markmail.org/message/7hghxj6jp55gt26kThat''s not it: the guest is not using 100% CPU when frozen and I don''t have anything that would write excessively to the console - there are about 5-10 lines beyond the initial log from booting. Also, I don''t get any previous output from the buffer when I run "xm console".> http://markmail.org/message/7hghxj6jp55gt26This URL gives a 404. Regards, -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Simon Hobson
2009-Aug-31  14:06 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
Leszek Urbanski wrote:> > If so, then it might be this problem : > > http://markmail.org/message/7hghxj6jp55gt26k > >That''s not it: the guest is not using 100% CPU when frozen and I don''t have >anything that would write excessively to the console - there are about 5-10 >lines beyond the initial log from booting. Also, I don''t get any previous >output from the buffer when I run "xm console". > > > http://markmail.org/message/7hghxj6jp55gt26 > >This URL gives a 404.It''s just two messages on in the same thread. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Sep-02  11:06 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<20090831124358.GA7615@moo.pl>; from Leszek Urbanski on Mon, Aug 31, 2009 at 14:43:58 +0200> - I''m waiting for another freeze to check if there''s anything useful on > domU consoles...there''s nothing. The console just freezes. -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2009-Sep-02  17:02 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
2009/8/31 Leszek Urbanski <tygrys@moo.pl>:> - domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both have > the same problems)I had CPU softlockup problem with Lenny''s xen kernel. Using 2.6.29 xenified kernel seems to fix that.> I''ll try the following options (and post my results to this list):> - downgrading domU kernel to xenified 2.6.18That one seems to be most promising. Assuming your domU can use 2.6.18. I mostly use RHEL''s kernel-xen (2.6.18) which comes with the distro and works fine. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Leszek Urbanski
2009-Sep-02  19:21 UTC
Re: [Xen-users] PV Linux domUs freeze after a few hours
<7207d96f0909021002j1e99901aw5b253a7bc2df7eef@mail.gmail.com>; from Fajar A. Nugraha on Thu, Sep 03, 2009 at 00:02:12 +0700> > - domU kernels: paravirt ops 2.6.30 and Lenny''s xenified 2.6.26 (both have > > the same problems) > > I had CPU softlockup problem with Lenny''s xen kernel. Using 2.6.29 > xenified kernel seems to fix that.What bothers me is that the lockups occur with both pv_ops and xenified kernels. That''s supposed to be different code. The bug may as well be in the hypervisor or dom0 kernel (less likely).> > - downgrading domU kernel to xenified 2.6.18 > > That one seems to be most promising. Assuming your domU can use > 2.6.18. I mostly use RHEL''s kernel-xen (2.6.18) which comes with the > distro and works fine.That''s actually a last resort. We can''t stick to 2.6.18 in dom0 nor domU forever due to outdated iSCSI support etc. It''d be easier for us to switch to KVM. -- Leszek "Tygrys" Urbanski, SCSA, SCNA "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more wretched hive of bugs and flamers. We must be cautious." -- DECWARS http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users