So over the past month or two one of my Xen boxes has been mysteriously locking up. I have found nothing in any of my logs. I setup netconsole on dom0 and I don''t see any problem msgs on my remote logger (I do see kernel msgs like modprobe etc ...). I cannot correlate anything with the lockups (no increased traffic or anything out of the norm. The lockups do seem to be getting more frequent. The first one was several months ago. Then it was a month or so later, then three weeks, then 2 weeks (last sat), then yesterday. Yesterday I did notice one thing in my sysstat logs on a domU. My steal went from >1 (well normally 0) to 5% just before the lockup. But that was just in a single vm. Here is the snippet. 11:35:02 PM CPU %user %nice %system %iowait %steal %idle 11:35:01 AM all 8.70 0.00 3.88 0.14 0.00 87.27 11:45:01 AM all 4.81 0.00 2.35 0.06 5.74 87.03 For some reason the reboot cleared my sysstat logs for that day prior to the reboot time in dom0 so I cannot reference against that. Server Specs: Supermicro H8DM8-2 16GB memory 2 x Quad-Core AMD Opteron(tm) Processor 2350 3ware 9650LE 8 drive raid 6 I have 3 domus running. The domUs are lvm backed. 1 domU has 2 vcpus, the others have a single vcpu. Dom0 and domUs are both running debian etch. I am running Xen 3.0.3 from the repository. Any ideas? I am considering upgrading to xen 3.2 from backports, but I dont want to introduce another variable unless there is a high probability upgrading will take care of the issue. Thanks -- Nick Anderson <nick@anders0n.net> http://www.cmdln.org _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 2009-02-13 at 08:42 -0600, Nick Anderson wrote:> So over the past month or two one of my Xen boxes has been mysteriously > locking up. I have found nothing in any of my logs. I setup netconsole > on dom0 and I don''t see any problem msgs on my remote logger (I do see > kernel msgs like modprobe etc ...). > > I cannot correlate anything with the lockups (no increased traffic or > anything out of the norm. The lockups do seem to be getting more > frequent. The first one was several months ago. Then it was a month or > so later, then three weeks, then 2 weeks (last sat), then yesterday. > Yesterday I did notice one thing in my sysstat logs on a domU. My > steal went from >1 (well normally 0) to 5% just before the lockup. But > that was just in a single vm. > > Here is the snippet. > > 11:35:02 PM CPU %user %nice %system %iowait %steal %idle > 11:35:01 AM all 8.70 0.00 3.88 0.14 0.00 87.27 > 11:45:01 AM all 4.81 0.00 2.35 0.06 5.74 87.03 > > For some reason the reboot cleared my sysstat logs for that day prior > to the reboot time in dom0 so I cannot reference against that. > > Server Specs: > Supermicro H8DM8-2 > 16GB memory > 2 x Quad-Core AMD Opteron(tm) Processor 2350 > 3ware 9650LE 8 drive raid 6 > > I have 3 domus running. > The domUs are lvm backed. > 1 domU has 2 vcpus, the others have a single vcpu. > > Dom0 and domUs are both running debian etch. I am running Xen 3.0.3 > from the repository. > > Any ideas? > I am considering upgrading to xen 3.2 from backports, but I dont want > to introduce another variable unless there is a high probability > upgrading will take care of the issue.Nick, My setup VERY different than yours, but I had similar unexplained outages. Not sure I can even call it a lockup, because I could still ping dom0, but it''s screen was blank, and the domU''s were hosed. This was when I was running xen 3.0.x on sles10sp1. I''ve updated all my servers to sles10sp2 running xen 3.2.x and for a couple months now I have not had the problem. Way too many difference to make a direct comparison, but... James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Feb 13, 2009 at 09:50:03AM -0500, James Pifer wrote:> outages. Not sure I can even call it a lockup, because I could still > ping dom0, but it''s screen was blank, and the domU''s were hosed.Yesterdays "lockup" was slightly different in that I was already shelled into dom0 when the lockup started. I was out smoking and got notified. I ran in and for about 10 seconds I was able to do things in dom0. I made the mistake of trying to correct the problem before assessing the situation so I didn''t get any information :(. I tried doing and xm shutdown on the vm that was unresponsive at the time. It wasn''t long from there that dom0 was locked and I couldn''t even get any information without rebooting and trying to do post-mortem. Thanks for sharing, with the steal going up it makes me hope that upgrading to xen 3.2 will help. %steal Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. Anyone else have any thoughts or input? -- Nick Anderson <nick@anders0n.net> http://www.cmdln.org _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users