Graham, Simon
2006-Nov-20 21:34 UTC
[Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
Signed-off by: Simon Graham <Simon.Graham@stratus.com> There is a somewhat trivial issue with XendCheckpoint.py right now in that it logs everything written to stderr by xc_save and xc_restore as errors whereas in fact the vast majority of this output is information/debug (and all actual errors are marked by the string ERROR: at the start of the message) -- this is confusing to folks looking at the logs and makes automated log analysis tricky. Fix is to scan for the ERROR: string and log anything without it using log.info instead. Regards, Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Nov-20 21:56 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote:> Signed-off by: Simon Graham <Simon.Graham@stratus.com> > > There is a somewhat trivial issue with XendCheckpoint.py right now in > that it logs everything written to stderr by xc_save and xc_restore as > errors whereas in fact the vast majority of this output is > information/debug (and all actual errors are marked by the string ERROR: > at the start of the message) -- this is confusing to folks looking at > the logs and makes automated log analysis tricky. > > Fix is to scan for the ERROR: string and log anything without it using > log.info instead.This bit of XenD looks rather dubious to me. XenD spawns a background thread, which in turn forks the xc_save or xc_restore program, reading the stdout from this child process. Aside from command line argument handling, the xc_save and xc_restore programs each only make *one* single function call to xc_linux_save or xc_linux_restore APIs in libxc. But XenD already has libxc loaded & calls it directly for all other jobs its needs to do, including actually createing the domain in the first place. So why is it going to all the trouble of fork''ing a child process rather than just calling xc_linux_save/restore APIs directly ? If it did this, we wouldn''t need this loop to read, grok & log STDOUT from the child process at all - it would just end up in regular XenD logs. And even more importantly, once we get the error reporting patches integrated into libxc, having the xc_linux_save/restore calls made by XenD directly would ensure we get reliable error handling for save/restore operations. So, is there any good reason for xc_save & xc_restore to exist as separate processes - it just seems like a huge complication to me, increasing fragility of the system & reducing the quality of error reporting we can get Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Nov-20 22:34 UTC
[Xen-devel] Re: Trivial patch to fix logging level output by XendCheckpoint.py
Daniel P. Berrange wrote:> On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote: >> Signed-off by: Simon Graham <Simon.Graham@stratus.com>> And even more importantly, once we get the error reporting patches integrated > into libxc, having the xc_linux_save/restore calls made by XenD directly > would ensure we get reliable error handling for save/restore operations.It used to work that way. xc_linux_save/restore were factored out of Xend late in the 3.0 dev process. I believe the idea was that they should be executed with lower privileges (which is why they take file descriptors as command line arguments instead of just calling xc_interface_open() like you would expect). I believe it was Christian who was driving this. Perhaps he has a few more details? Regards, Anthony Liguori> So, is there any good reason for xc_save & xc_restore to exist as separate > processes - it just seems like a huge complication to me, increasing > fragility of the system & reducing the quality of error reporting we can > get > > Regards, > Dan._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Nov-20 22:40 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 09:56:00PM +0000, Daniel P. Berrange wrote:> On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote: > > Signed-off by: Simon Graham <Simon.Graham@stratus.com> > > > > There is a somewhat trivial issue with XendCheckpoint.py right now in > > that it logs everything written to stderr by xc_save and xc_restore as > > errors whereas in fact the vast majority of this output is > > information/debug (and all actual errors are marked by the string ERROR: > > at the start of the message) -- this is confusing to folks looking at > > the logs and makes automated log analysis tricky. > > > > Fix is to scan for the ERROR: string and log anything without it using > > log.info instead. > > This bit of XenD looks rather dubious to me. XenD spawns a background > thread, which in turn forks the xc_save or xc_restore program, reading > the stdout from this child process. > > Aside from command line argument handling, the xc_save and xc_restore > programs each only make *one* single function call to xc_linux_save or > xc_linux_restore APIs in libxc. > > But XenD already has libxc loaded & calls it directly for all other jobs > its needs to do, including actually createing the domain in the first > place. So why is it going to all the trouble of fork''ing a child process > rather than just calling xc_linux_save/restore APIs directly ? If it > did this, we wouldn''t need this loop to read, grok & log STDOUT from the > child process at all - it would just end up in regular XenD logs. > > And even more importantly, once we get the error reporting patches integrated > into libxc, having the xc_linux_save/restore calls made by XenD directly > would ensure we get reliable error handling for save/restore operations. > > So, is there any good reason for xc_save & xc_restore to exist as separate > processes - it just seems like a huge complication to me, increasing > fragility of the system & reducing the quality of error reporting we can > getIt was done because, in the past, we didn''t trust the stability of the save and restore code, and didn''t want it taking down the whole of Xend. You could argue that that code is much more reliable now, and so we don''t need all this complexity. Then again, with HVM live migration work upcoming, maybe we''ll come appreciate the isolation again. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Nov-20 22:43 UTC
[Xen-devel] Re: Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 04:34:22PM -0600, Anthony Liguori wrote:> Daniel P. Berrange wrote: > >On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote: > >>Signed-off by: Simon Graham <Simon.Graham@stratus.com> > > >And even more importantly, once we get the error reporting patches > >integrated > >into libxc, having the xc_linux_save/restore calls made by XenD directly > >would ensure we get reliable error handling for save/restore operations. > > It used to work that way. xc_linux_save/restore were factored out of > Xend late in the 3.0 dev process. I believe the idea was that they > should be executed with lower privileges (which is why they take file > descriptors as command line arguments instead of just calling > xc_interface_open() like you would expect). > > I believe it was Christian who was driving this. Perhaps he has a few > more details?Ok, looking back at the HG logs it appears # 5099 "Execute xc_linux_restore in a seperate process so that it can''t crash xend. Also handle errors passed from xc_linux_restore and log info messages from xc_linux_restore." and # 5121: "Implement the parts of vm save which need interaction with xend as part of xend, instead of using xfrd. Execute xc_linux_save in a seperate process so that it can''t crash xend. Also handle errors passed from xc_linux_save. are where these programs came into existance. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2006-Nov-20 22:53 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 10:40:44PM +0000, Ewan Mellor wrote:> It was done because, in the past, we didn''t trust the stability of the > save and restore code, and didn''t want it taking down the whole of Xend.One could argue that this would be a good thing! Having the separate programs makes debugging quite a bit more difficult as well... regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Nov-21 01:23 UTC
RE: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
Well, I didn''t expect this much discussion!!! I must admit I found it hard to grok this code initially but I think the idea of exec''ing a separate program is a good one to avoid the chance that xend can die (especially since you don''t currently seem to be able to restart xend without rebooting Dom0). Anyway - even if XendCheckpoint.py called xc_linux_save/restore directly, you''d still have to deal with getting the output to the right place since these APIs write their debug output directly to stderr so I''ve got to believe you''d still fork a child that made the call and have the parent slurp the contents of stderr and log them so the same issue would exist... Simon> -----Original Message----- > From: Daniel P. Berrange [mailto:berrange@redhat.com] > Sent: Monday, November 20, 2006 4:56 PM > To: Graham, Simon > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Trivial patch to fix logging level output by > XendCheckpoint.py > > On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote: > > Signed-off by: Simon Graham <Simon.Graham@stratus.com> > > > > There is a somewhat trivial issue with XendCheckpoint.py right nowin> > that it logs everything written to stderr by xc_save and xc_restore > as > > errors whereas in fact the vast majority of this output is > > information/debug (and all actual errors are marked by the string > ERROR: > > at the start of the message) -- this is confusing to folks lookingat> > the logs and makes automated log analysis tricky. > > > > Fix is to scan for the ERROR: string and log anything without it > using > > log.info instead. > > This bit of XenD looks rather dubious to me. XenD spawns a background > thread, which in turn forks the xc_save or xc_restore program, reading > the stdout from this child process. > > Aside from command line argument handling, the xc_save and xc_restore > programs each only make *one* single function call to xc_linux_save or > xc_linux_restore APIs in libxc. > > But XenD already has libxc loaded & calls it directly for all other > jobs > its needs to do, including actually createing the domain in the first > place. So why is it going to all the trouble of fork''ing a child > process > rather than just calling xc_linux_save/restore APIs directly ? If it > did this, we wouldn''t need this loop to read, grok & log STDOUT from > the > child process at all - it would just end up in regular XenD logs. > > And even more importantly, once we get the error reporting patches > integrated > into libxc, having the xc_linux_save/restore calls made by XenD > directly > would ensure we get reliable error handling for save/restore > operations. > > So, is there any good reason for xc_save & xc_restore to exist as > separate > processes - it just seems like a huge complication to me, increasing > fragility of the system & reducing the quality of error reporting we > can > get > > Regards, > Dan. > -- > |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 > 2496 -=| > |=- Perl modules: http://search.cpan.org/~danberr/ > -=| > |=- Projects: http://freshmeat.net/~danielpb/ > -=| > |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B > 9505 -=|_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Nov-21 09:20 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 08:23:24PM -0500, Graham, Simon wrote:> Well, I didn''t expect this much discussion!!! I must admit I found it > hard to grok this code initially but I think the idea of exec''ing a > separate program is a good one to avoid the chance that xend can die > (especially since you don''t currently seem to be able to restart xend > without rebooting Dom0).That''s not true -- Xend ought to be restarting itself (it forks itself so that it has a monitor process) and even then "xend restart" ought to work. Are you having trouble with that? The one process that we can''t restart at the moment is xenstored. Everything else should be fine.> Anyway - even if XendCheckpoint.py called xc_linux_save/restore > directly, you''d still have to deal with getting the output to the right > place since these APIs write their debug output directly to stderr so > I''ve got to believe you''d still fork a child that made the call and have > the parent slurp the contents of stderr and log them so the same issue > would exist...Yes, I think Daniel was proposing fixing all of that, too. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Nov-21 10:19 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Mon, Nov 20, 2006 at 04:34:25PM -0500, Graham, Simon wrote:> Signed-off by: Simon Graham <Simon.Graham@stratus.com> > > There is a somewhat trivial issue with XendCheckpoint.py right now in > that it logs everything written to stderr by xc_save and xc_restore as > errors whereas in fact the vast majority of this output is > information/debug (and all actual errors are marked by the string ERROR: > at the start of the message) -- this is confusing to folks looking at > the logs and makes automated log analysis tricky. > > Fix is to scan for the ERROR: string and log anything without it using > log.info instead.Applied, thanks. We can continue to discuss wider improvements in this area, but there''s nothing wrong with this patch for now. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Nov-21 12:29 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Tue, Nov 21, 2006 at 09:20:06AM +0000, Ewan Mellor wrote:> On Mon, Nov 20, 2006 at 08:23:24PM -0500, Graham, Simon wrote: > > > Well, I didn''t expect this much discussion!!! I must admit I found it > > hard to grok this code initially but I think the idea of exec''ing a > > separate program is a good one to avoid the chance that xend can die > > (especially since you don''t currently seem to be able to restart xend > > without rebooting Dom0). > > That''s not true -- Xend ought to be restarting itself (it forks itself so that > it has a monitor process) and even then "xend restart" ought to work. Are you > having trouble with that? > > The one process that we can''t restart at the moment is xenstored. Everything > else should be fine.While XenD itself can easily be restarted it does cause a few problems if you have any HVM domains, or paravirt + blktap domains running. XenD is the parent process for both qemu-dm and tapdisk helper processes. It appears that these two are incapable of detecting shutdown of the guest they are associated with themselves, and instead rely on XenD to tell them when to exit / kill them. So the problem is, that if you restart XenD these processes get re-parented to init, and then when you shutdown the guest domain, the qemu-dm/tapdisk helpers stay around causing the domain to linger in zombie state forever. If we could figure out how to sort this, then XenD would be trivially restartable without any ill effects.> > Anyway - even if XendCheckpoint.py called xc_linux_save/restore > > directly, you''d still have to deal with getting the output to the right > > place since these APIs write their debug output directly to stderr so > > I''ve got to believe you''d still fork a child that made the call and have > > the parent slurp the contents of stderr and log them so the same issue > > would exist... > > Yes, I think Daniel was proposing fixing all of that, too.Yes, the formalized libxc error handling patches would make that unncessary if the calls were made in-process. Of course I still need to write another iteration of those patches based on latest round of feedback, so we can put this issue off to a later date. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Nov-21 14:58 UTC
RE: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
> > > Well, I didn''t expect this much discussion!!! I must admit I found > it > > > hard to grok this code initially but I think the idea of exec''inga> > > separate program is a good one to avoid the chance that xend can > die > > > (especially since you don''t currently seem to be able to restart > xend > > > without rebooting Dom0). > > > > That''s not true -- Xend ought to be restarting itself (it forks > itself so that > > it has a monitor process) and even then "xend restart" ought towork.> Are you > > having trouble with that? > > > > The one process that we can''t restart at the moment is xenstored. > Everything > > else should be fine. >Yes we have had problems with this (in 3.0.2) -- however, we test this with /etc/init.d/xend stop/start which will, I think, restart xenstored as well, so I guess this still does not work. I will look into modifying the test so it kills the main xend process instead to more accurately test fault insertion in xend.> While XenD itself can easily be restarted it does cause a few problems > if you have any HVM domains, or paravirt + blktap domains running.XenD> is the parent process for both qemu-dm and tapdisk helper processes.It> appears that these two are incapable of detecting shutdown of theguest> they are associated with themselves, and instead rely on XenD to tell > them when to exit / kill them. So the problem is, that if you restart > XenD these processes get re-parented to init, and then when you > shutdown > the guest domain, the qemu-dm/tapdisk helpers stay around causing the > domain to linger in zombie state forever. > > If we could figure out how to sort this, then XenD would be trivially > restartable without any ill effects. >Ah - OK, well, we aren''t testing HVM guests yet but we will be; hopefully one of us can figure out how to solve this issue. Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Nov-21 15:04 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Tue, Nov 21, 2006 at 09:58:33AM -0500, Graham, Simon wrote:> > > > Well, I didn''t expect this much discussion!!! I must admit I found > > it > > > > hard to grok this code initially but I think the idea of exec''ing > a > > > > separate program is a good one to avoid the chance that xend can > > die > > > > (especially since you don''t currently seem to be able to restart > > xend > > > > without rebooting Dom0). > > > > > > That''s not true -- Xend ought to be restarting itself (it forks > > itself so that > > > it has a monitor process) and even then "xend restart" ought to > work. > > Are you > > > having trouble with that? > > > > > > The one process that we can''t restart at the moment is xenstored. > > Everything > > > else should be fine. > > > > Yes we have had problems with this (in 3.0.2) -- however, we test this > with /etc/init.d/xend stop/start which will, I think, restart xenstored > as well, so I guess this still does not work. I will look into modifying > the test so it kills the main xend process instead to more accurately > test fault insertion in xend.The init script doesn''t stop or start xenstored itself. The init script merely starts up XenD. XenD then checks to see if xenstored is already running, and if not starts it. When you do ''service xend stop'' to shutdown XenD it will leave xenstored running.> > While XenD itself can easily be restarted it does cause a few problems > > if you have any HVM domains, or paravirt + blktap domains running. > XenD > > is the parent process for both qemu-dm and tapdisk helper processes. > It > > appears that these two are incapable of detecting shutdown of the > guest > > they are associated with themselves, and instead rely on XenD to tell > > them when to exit / kill them. So the problem is, that if you restart > > XenD these processes get re-parented to init, and then when you > > shutdown > > the guest domain, the qemu-dm/tapdisk helpers stay around causing the > > domain to linger in zombie state forever. > > > > If we could figure out how to sort this, then XenD would be trivially > > restartable without any ill effects. > > > > Ah - OK, well, we aren''t testing HVM guests yet but we will be; > hopefully one of us can figure out how to solve this issue.This isn''t only an issue with HVM - the tapdisk process is used by paravirt guests when blktap is used for file backed virtual disks. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Nov-21 15:05 UTC
Re: [Xen-devel] Trivial patch to fix logging level output by XendCheckpoint.py
On Tue, Nov 21, 2006 at 09:58:33AM -0500, Graham, Simon wrote:> > > > Well, I didn''t expect this much discussion!!! I must admit I found > > it > > > > hard to grok this code initially but I think the idea of exec''ing > a > > > > separate program is a good one to avoid the chance that xend can > > die > > > > (especially since you don''t currently seem to be able to restart > > xend > > > > without rebooting Dom0). > > > > > > That''s not true -- Xend ought to be restarting itself (it forks > > itself so that > > > it has a monitor process) and even then "xend restart" ought to > work. > > Are you > > > having trouble with that? > > > > > > The one process that we can''t restart at the moment is xenstored. > > Everything > > > else should be fine. > > > > Yes we have had problems with this (in 3.0.2) -- however, we test this > with /etc/init.d/xend stop/start which will, I think, restart xenstored > as well, so I guess this still does not work.No, /etc/init.d/xend stop doesn''t (shouldn''t) stop xenstored. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel