Hi all, We have a number of Xen nodes used in a bunch of Ganeti clusters running on Debian Lenny. Most are 64bit kernels with a mix of 32/64bit user land VMs. Where we have a paravirtualised Lenny DomU we are experiencing a hang at seemingly random occasions. When inspecting the hypervisor it states that the DomU is in a run state (with xm list) and (with xm top) the CPUs are all maxed out. I am not able to get into the DomU either over the network or via a console. Sometimes I get output to the console but there is no information since the standard boot messages which were usually printed there from a week or so ago so not relevant. I do not have any information in the Hypervisors xen logs or kernel logs and similarly in the DomU kernel logs. I have ran a script in the DomU capturing the output of ps every 10 seconds and alerting to processes which are using more than 30% memory or cpu. Neither of these show any output at the time of the hang. I am also monitoring all DomUs via munin which is also not recording a gradual creep in resource usage. I have had a problem with the "time went backwards" issue and have attempted to fix the problem as shown on the Xen FAQ by setting the clock source to "jiffies". This was the most successful as it stopped time messages, but still exhibited the hang problem above. Before, I was experiencing kernel panics with the default clocksource of "xen" and independant_wallclock=0. I have also tried setting "disable kernel" in ntp.conf (with clocksource=xen and independent_wallclock=0) which has appeared recently as an option, but unfortunately I am back to the original problem of the physical host hanging needing a hard reset. I am considering an attempt to move these hosts to a newer version of Xen if there''s a possibility it will be more stable. Current version is standard for Lenny, xen = 3.2, kernel 2.6.26. Any assistance or advice on this would be greatly appreciated. Many thanks, Matt -- Matthew Baker, UNIX Systems Administrator ----------------------------------------------------- Institute for Learning and Research Technology (ILRT) A: University of Bristol, 8-10 Berkeley Square, Bristol. BS8 1HH W: http://www.ilrt.bris.ac.uk/ E: matt.baker@bris.ac.uk T: Berkeley Square +44 (0)117 33 14325 T: Computer Centre +44 (0)117 33 17467 F: 35BB AD51 9892 D694 7664 8BFD 2EF9 BBA4 1FDA 89C3 ----------------------------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi all, I am having what seems like an on going issue with clock syncing in xen for quite some time now. It could be that the clock issue is resolved and I am seeing something else but the clock issue is throwing me off the scent. A number of months ago I was getting "time went backwards" messages on Xen DomUs. I tested separating the clock (independant_wallclock=1) and running ntp in DomU and Dom0. I had bad synchronisation and the occasional Dom0 kernel panic or just a straight lock up (no log or terminal output). I then moved to clocksource=jiffies and independant_wallclock=0. I have reasonably well sync''d clocks and no Dom0 hangs but I am now seeing the DomUs hang in a run state (seen by xm list) and CPU usage maxed out (xm top). The DomU is not accessible via the network and the console is unresponsive (no output after the standard boot message which may be a week or so old). I am see no log messages in Dom0 or the DomU. I have ran a script continuously to capture the output of ps and logging anything using more than 30% memory or CPU time. I do not get anything around the time of the hang. I am also monitoring via munin and that just shows the host is dead and no creep of resource usage. However, the machines that this happens to are reasonably busy. They mostly run apache base web services (mixed applications), but it is not confined to that setup. Yesterday, I discovered the option of running clocksource=xen and independent_wallclock=0 with the ntp.conf option "disable kernel"[1]. I tried this last night and within a couple of hours one of my Dom0 machines hung with no output requiring a hard reset. I could not afford any more downtime on the machines which were experiencing the outages so have reverted to "jiffies" as that seems to be the most stable. The whole situation is slightly left of ideal and I am at a loss as to where to go next with this. I have left the ntp.conf option on for the time being and I am just waiting for the next hang. Can anyone suggest a course of action which will allow me to consider these machines stable? Many thanks in advance for any help. Cheers, Matt Versions: OS Debian Lenny Xen 3.2 Linux kernel 2.6.26-2-xen-amd64 64Bit hv/kernel with a mix of 64bit and 32bit user land DomUs [1] http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-part-4-and-last -- Matthew Baker, UNIX Systems Administrator ----------------------------------------------------- Institute for Learning and Research Technology (ILRT) A: University of Bristol, 8-10 Berkeley Square, Bristol. BS8 1HH W: http://www.ilrt.bris.ac.uk/ E: matt.baker@bris.ac.uk T: Berkeley Square +44 (0)117 32 14325 T: Computer Centre +44 (0)117 32 17467 F: 35BB AD51 9892 D694 7664 8BFD 2EF9 BBA4 1FDA 89C3 ----------------------------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matthew Baker <matt.baker@bristol.ac.uk> writes:> I am having what seems like an on going issue with clock syncing in xen > for quite some time now. It could be that the clock issue is resolved > and I am seeing something else but the clock issue is throwing me off > the scent.Are you sure it isn''t the infamous xenconsoled problem? Try restarting it, maybe that fixes your problem: /etc/init.d/xend stop pkill xenconsoled (don''t touch xenstored!) /etc/init.d/xend start -- Regards, Feri. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Ferenc, On 29/09/10 10:46, Ferenc Wagner wrote:> Matthew Baker <matt.baker@bristol.ac.uk> writes: > >> I am having what seems like an on going issue with clock syncing in xen >> for quite some time now. It could be that the clock issue is resolved >> and I am seeing something else but the clock issue is throwing me off >> the scent. > > Are you sure it isn''t the infamous xenconsoled problem? Try restarting > it, maybe that fixes your problem: > > /etc/init.d/xend stop > pkill xenconsoled (don''t touch xenstored!) > /etc/init.d/xend startNo I hadn''t come across that. If (when!) I experience another hang I''ll give it a try. Do you know if it applies to Debian Lenny 2.6.26? All I can see from a Google is that it''s affecting 2.6.18 CentOS/RHEL5. Cheers, Matt -- Matthew Baker, UNIX Systems Administrator ----------------------------------------------------- Institute for Learning and Research Technology (ILRT) A: University of Bristol, 8-10 Berkeley Square, Bristol. BS8 1HH W: http://www.ilrt.bris.ac.uk/ E: matt.baker@bris.ac.uk T: Berkeley Square +44 (0)117 32 14325 T: Computer Centre +44 (0)117 32 17467 F: 35BB AD51 9892 D694 7664 8BFD 2EF9 BBA4 1FDA 89C3 ----------------------------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matthew Baker wrote:> > Are you sure it isn''t the infamous xenconsoled problem? Try restarting >> it, maybe that fixes your problem: >> >> /etc/init.d/xend stop >> pkill xenconsoled (don''t touch xenstored!) >> /etc/init.d/xend start > >No I hadn''t come across that. If (when!) I experience another hang I''ll >give it a try. > >Do you know if it applies to Debian Lenny 2.6.26? All I can see from a >Google is that it''s affecting 2.6.18 CentOS/RHEL5.Yes, I have Lenny DomUs with 2.6.26 that do it, my Dom0s are Lenny with 2.6.18. I didn''t think 2.6.26 ran as Dom0. But then I''m running I386 & 32bit at work - looking back I see you''re using AMD64. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dmitry Nedospasov
2010-Sep-29 11:44 UTC
Re: [Xen-users] Re: DomU hang in run state (Debian Lenny)
One option that might be worth your consideration, is either compiling a kernel yourself or looking for a non-pvops xen kernel to run. Another option is, installing the backports 2.6.32 (pv-ops) and compiling xen 3.4 yourself. I know a few people doing this with good results. Hope that helps, D. P.S. xen 4.0.1 on Squeeze is pretty stable ;) might want to consider that too. -- Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos Web: http://nedos.net -- Github: http://github.com/nedos _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dmitry Nedospasov wrote:>P.S. xen 4.0.1 on Squeeze is pretty stable ;)I had been wondering>might want to consider that too.On my list when I get some more hand-me-down kit to play with. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matthew Baker <matt.baker@bristol.ac.uk> writes:> On 29/09/10 10:46, Ferenc Wagner wrote: > >> Matthew Baker <matt.baker@bristol.ac.uk> writes: >> >>> I am having what seems like an on going issue with clock syncing in xen >>> for quite some time now. It could be that the clock issue is resolved >>> and I am seeing something else but the clock issue is throwing me off >>> the scent. >> >> Are you sure it isn''t the infamous xenconsoled problem? Try restarting >> it, maybe that fixes your problem: >> >> /etc/init.d/xend stop >> pkill xenconsoled (don''t touch xenstored!) >> /etc/init.d/xend start > > No I hadn''t come across that. If (when!) I experience another hang I''ll > give it a try.I usually get once every two months. It starts with a single domU, then quickly spreads to those which produce console output regularly (firewall logs, mostly).> Do you know if it applies to Debian Lenny 2.6.26? All I can see from a > Google is that it''s affecting 2.6.18 CentOS/RHEL5.It affects our Debian Lenny Xen hosts, though we run the Etch dom0 kernel, as the 32bit Lenny dom0 kernel is very unstable. -- Regards, Feri. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dmitry Nedospasov
2010-Sep-29 12:17 UTC
Re: [Xen-users] Re: DomU hang in run state (Debian Lenny)
On Wed, Sep 29, 2010 at 01:07:49PM +0100, Simon Hobson wrote:> >P.S. xen 4.0.1 on Squeeze is pretty stable ;) > > I had been wonderingKeep in mind, 3.4.X is definately less bugprone than 4.0.1. I''ve encoutered a couple, all have been resolvable though up untill now. D. -- Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos Web: http://nedos.net -- Github: http://github.com/nedos _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 29/09/10 13:08, Ferenc Wagner wrote:>> > No I hadn''t come across that. If (when!) I experience another hang I''ll >> > give it a try. > I usually get once every two months. It starts with a single domU, then > quickly spreads to those which produce console output regularly > (firewall logs, mostly).Hmmnn. There''s no console output past the standard boot messages and the first login prompt. The VMs in question don''t really produce that much. Deathly silence from there. I can actually connect to the console and see the previous output but it''s frozen and doesn''t respond to input.>> > Do you know if it applies to Debian Lenny 2.6.26? All I can see from a >> > Google is that it''s affecting 2.6.18 CentOS/RHEL5. > It affects our Debian Lenny Xen hosts, though we run the Etch dom0 > kernel, as the 32bit Lenny dom0 kernel is very unstable.Yeah one of our clusters was 32bit until we upped it to lenny. Had to rebuild it to amd64. Mind you I had this issue in etch too. I''m beginning to think this isn''t the problem but I shall try it out when (if) it happens again. Thanks for your help, Matt -- Matthew Baker, UNIX Systems Administrator ----------------------------------------------------- Institute for Learning and Research Technology (ILRT) A: University of Bristol, 8-10 Berkeley Square, Bristol. BS8 1HH W: http://www.ilrt.bris.ac.uk/ E: matt.baker@bris.ac.uk T: Berkeley Square +44 (0)117 32 14325 T: Computer Centre +44 (0)117 32 17467 F: 35BB AD51 9892 D694 7664 8BFD 2EF9 BBA4 1FDA 89C3 ----------------------------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users