dwight at supercomputer.org
2010-Jun-08 16:04 UTC
[Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
This is mostly FYI. I know someone else is going to run into this. It turns out that it''s real easy to wedge an entire Cloud with the default configurations in XCP 0.1.1. We saw this recently with our Development Cloud. It turns out that /var/log had filled up the root filesystem on the master. 500M+ worth of messages in there. After I tracked down the problem, and freed this space up, everything started working again. When this happens, various things either fail mysteriously (including a failure of the slaves and master to reboot), xsconsole wedging (on the master and slaves), and OpenXenCenter not being able to connect, and at best messages that aren''t helpful. I would recommend, at the very least, that compression of the logs in logrotate.conf be turned on. I''d also strongly recommend that this be the default in release 0.5. Myself, I''ve taken this further, by putting logrotate into the hourly cronjob. And we''re going to change our automatic installation scripts to put /var on a separate, large disk volume, not on the root filesystem. Having /var separate from the root filesystem is generally a wise move for servers, so that /var doesn''t impact the root. I''d also add that having grub available would''ve been helpful. -dwight- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Jun-08 19:08 UTC
Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
On Tue, Jun 08, 2010 at 09:04:31AM -0700, dwight at supercomputer.org wrote:> This is mostly FYI. I know someone else is going to run into this. > > It turns out that it''s real easy to wedge an entire Cloud with > the default configurations in XCP 0.1.1. We saw this recently > with our Development Cloud. > > It turns out that /var/log had filled up the root filesystem on > the master. 500M+ worth of messages in there. After I tracked > down the problem, and freed this space up, everything started > working again. > > When this happens, various things either fail mysteriously > (including a failure of the slaves and master to reboot), > xsconsole wedging (on the master and slaves), and OpenXenCenter > not being able to connect, and at best messages that aren''t > helpful. > > I would recommend, at the very least, that compression of the > logs in logrotate.conf be turned on. I''d also strongly recommend > that this be the default in release 0.5. >Thanks for the heads up.> Myself, I''ve taken this further, by putting logrotate into the > hourly cronjob. And we''re going to change our automatic > installation scripts to put /var on a separate, large disk > volume, not on the root filesystem. > > Having /var separate from the root filesystem is generally > a wise move for servers, so that /var doesn''t impact the root. > > I''d also add that having grub available would''ve been helpful. >Yeah.. I''ve been wondering why XenServer/XCP are not using grub? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jun-08 20:36 UTC
Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.org wrote:> This is mostly FYI. I know someone else is going to run into this. > > It turns out that it''s real easy to wedge an entire Cloud with > the default configurations in XCP 0.1.1. We saw this recently > with our Development Cloud. > > It turns out that /var/log had filled up the root filesystem on > the master. 500M+ worth of messages in there. After I tracked > down the problem, and freed this space up, everything started > working again.Which ones were the files growing too big? I recently caused potential trouble with blktap. But there may be more. Both xapi and storage management can get quite chatty, although I think this improved with xs5.x. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
dwight at supercomputer.org
2010-Jun-09 16:58 UTC
Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
On Tuesday 08 June 2010 01:36:53 pm Daniel Stodden wrote:> On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.orgwrote:> > It turns out that /var/log had filled up the root filesystem on > > the master. 500M+ worth of messages in there. After I tracked > > down the problem, and freed this space up, everything started > > working again. > > Which ones were the files growing too big? I recently caused > potential trouble with blktap. But there may be more. Both xapi > and storage management can get quite chatty, although I think this > improved with xs5.x. > > DanielI''m going from memory here, as the main impetus was on triage, and not proper debug/fix/testing. But if memory serves, it was xensource.log. It''s unlikely that any recent change was the culprit, as this was stock XCP 0.1.1. I have to say that it''s something else to reboot and debug an entire Cloud. I''ve dealt with wedged/crashed systems before on microcontrollers, small embedded devices, PC''s, Servers, Mainfraimes and Supercomputers, including Virtualized Systems. This is the first time I''ve had to debug and reboot an entire Cloud before. The main lesson for me is that the debugging interface could be improved. This is one of the most critical aspects of any Development environment. Being able to get to a single user shell prompt easily from the "boot:" prompt would go a long way here. -dwight- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roger Cruz
2010-Jun-09 18:02 UTC
RE: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
With XenServer, which uses XAPI, I have encountered a similar problem where the /var/log partition gets full. In my case, it was xensource.log that stopped being rotated. These logs are automatically rotated by XAPI and up to 20 files of 3MB (can''t recall exactly now) each are kept. The problem occured when I changed the system time backwards (adjusting timezones), it caused the periodic (5mins I think) checks to now be a lot longer and during that time, the partition filled up because the files grew past the 3MB. When this happens, the only way I got the system running again was to boot with a rescue CD and remove the large files. I reported the problem to Citrix a while back so this is likely already fixed, so I''m not sure how your xensource.logs could have grown to 500+ MB Roger R. Cruz ________________________________ From: xen-devel-bounces@lists.xensource.com on behalf of dwight at supercomputer.org Sent: Wed 6/9/2010 12:58 PM To: Daniel Stodden Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud On Tuesday 08 June 2010 01:36:53 pm Daniel Stodden wrote:> On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.orgwrote:> > It turns out that /var/log had filled up the root filesystem on > > the master. 500M+ worth of messages in there. After I tracked > > down the problem, and freed this space up, everything started > > working again. > > Which ones were the files growing too big? I recently caused > potential trouble with blktap. But there may be more. Both xapi > and storage management can get quite chatty, although I think this > improved with xs5.x. > > DanielI''m going from memory here, as the main impetus was on triage, and not proper debug/fix/testing. But if memory serves, it was xensource.log. It''s unlikely that any recent change was the culprit, as this was stock XCP 0.1.1. I have to say that it''s something else to reboot and debug an entire Cloud. I''ve dealt with wedged/crashed systems before on microcontrollers, small embedded devices, PC''s, Servers, Mainfraimes and Supercomputers, including Virtualized Systems. This is the first time I''ve had to debug and reboot an entire Cloud before. The main lesson for me is that the debugging interface could be improved. This is one of the most critical aspects of any Development environment. Being able to get to a single user shell prompt easily from the "boot:" prompt would go a long way here. -dwight- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jun-10 10:07 UTC
Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud
On Wed, 2010-06-09 at 17:58 +0100, dwight at supercomputer.org wrote:> Being able to get to a single user shell prompt easily from > the "boot:" prompt would go a long way here.By typing "menu.c32" you will get an interactive menu where you can edit the kernel command line and add single or init=/bin/bash or whatever. A specific single user menu item would certainly be a useful convenience though. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel