Thileepan Subramaniam
2006-Mar-19 02:14 UTC
[Xen-devel] Detecting deadlocks with hypervisor..
Hello, I am trying to see if the hypervisor can be used to detect deadlocks in the guest VMs. My goal is to detect if a guest OS is deadlocked, and if it is, then create a clone of the deadlocked OS without the locking condition, and letting the clone run. While the clone runs I am hoping to generate some hints that could tell me what caused the deadlock. I simulated a deadlock/hang situation in a guest OS (by loading a badly written module to the kernel) and when the guestOS kernel was hanging, I ran "xm save" from Dom-0. But this command waits forever. I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These seem to be called when I run ''xm save''. But beyond a point I am not sure what the python scripts do. I also see some libxc files such as xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the XenU). Can someone help me by explaining me what happens behind the scene when "xm save" is called ? Is there any good documentation explaining which actions are done by which layers (eg: python layer, C layer etc). Also, does it seem viable to clone a copy of a deadlocked guest OS in the first place? thanks! - ts _________________________________________________________________ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thileepan Subramaniam wrote:> Can someone help me by explaining me what happens behind the scene > when "xm save" is called ? Is there any good documentation > explaining which actions are done by which layers (eg: python > layer, C layer etc).This would be immensely valuable. I imagine there''s a college student looking for some way to make their mark in the open source community.> Also, does it seem viable to clone a copy of a deadlocked guest OS > in the first place?The idea of using clones as a way of detecting deadlocks is intriguing.> I am trying to see if the hypervisor can be used to detect > deadlocks in the guest VMs. My goal is to detect if a guest OS is > deadlocked, and if it is, then create a clone of the deadlocked OS > without the locking condition, and letting the clone run. While the > clone runs I am hoping to generate some hints that could tell me > what caused the deadlock.But, I suspect that some logic injected into the lock routines (and data structures) of the host O/S are an easier and possibly better bet. -- Randy Thelen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:> I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These > seem to be called when I run ''xm save''. But beyond a point I am not sure what > the python scripts do. I also see some libxc files such as xc_linux_save.c, > but I am not sure who is using it (Dom-0 or Xen or the XenU). Can someone help > me by explaining me what happens behind the scene when "xm save" is called ? > Is there any good documentation explaining which actions are done by which > layers (eg: python layer, C layer etc).python layer only save some domain info, i think. then the app xc_save will be called, that in turn call xc_linux_save. xc_linux_save save all the memory and vcpu context of the guest. -- thanks, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:> Hello, > > I am trying to see if the hypervisor can be used to detect deadlocks in the > guest VMs. My goal is to detect if a guest OS is deadlocked, and if it is, > then create a clone of the deadlocked OS without the locking condition, and > letting the clone run. While the clone runs I am hoping to generate some > hints that could tell me what caused the deadlock. > > I simulated a deadlock/hang situation in a guest OS (by loading a badly > written module to the kernel) and when the guestOS kernel was hanging, I > ran "xm save" from Dom-0. But this command waits forever. > > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These > seem to be called when I run ''xm save''. But beyond a point I am not sure > what the python scripts do. I also see some libxc files such as > xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the > XenU). Can someone help me by explaining me what happens behind the scene > when "xm save" is called ? Is there any good documentation explaining which > actions are done by which layers (eg: python layer, C layer etc).xc_save, the executable, calls xc_linux_save, the libxc function. Depending upon whether this is a live or non-live save, some stuff is done (see xc_linux_save for details). The Python layer is then called back, requesting that the domain is suspended. This request is passed through to the guest by writing /local/domain/<domid>/control/shutdown = suspend in the store. This is seen by the guest (a watch fires inside reboot.c) and then the guest suspends itself. This is probably where you are falling down -- if the guest kernel is completely deadlocked, it''s going to struggle to suspend itself correctly. If a suspend completes correctly, Xend will see it (another watch will fire), and xc_linux_save will be free to complete the save.> Also, does it seem viable to clone a copy of a deadlocked guest OS in the > first place?If you have a byte-for-byte copy of a deadlocked guest, even if you could suspend it, surely it will be deadlocked when it is resumed. How do you intend to break the deadlock, and how is it easier to do that from outside than it is to perform deadlock detection in the guest? Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Mar-19 16:30 UTC
Re: [Xen-devel] Detecting deadlocks with hypervisor..
Thileepan Subramaniam wrote:> Hello, > > I am trying to see if the hypervisor can be used to detect deadlocks > in the guest VMs. My goal is to detect if a guest OS is deadlocked, > and if it is, then create a clone of the deadlocked OS without the > locking condition, and letting the clone run. While the clone runs I > am hoping to generate some hints that could tell me what caused the > deadlock. > > I simulated a deadlock/hang situation in a guest OS (by loading a > badly written module to the kernel) and when the guestOS kernel was > hanging, I ran "xm save" from Dom-0. But this command waits forever. > > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). > These seem to be called when I run ''xm save''. But beyond a point I am > not sure what the python scripts do. I also see some libxc files such > as xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or > the XenU). Can someone help me by explaining me what happens behind > the scene when "xm save" is called ? Is there any good documentation > explaining which actions are done by which layers (eg: python layer, C > layer etc). > > Also, does it seem viable to clone a copy of a deadlocked guest OS in > the first place?As Ewan pointed out, xm save is guest-assisted so a hung guest will not be savable. You may want to look at xc_domain_dumpcore(). You could do some post-analysis of the core dump to determine where it locked. Determining why it dead-locked is of course impossible for the general case but you may be able to develop some interesting heuristics with appropriate static analysis. As for recovering the guest, a really clever approach would be to rewrite some of the locking code (maybe temporarily?) by mapping the guest''s code page into dom0''s memory after examining EIP in the core. I reckon there''s a rather interesting paper to be written on something like this :-) Regards, Anthony Liguori> thanks! > - ts > > _________________________________________________________________ > On the road to retirement? Check out MSN Life Events for advice on how > to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ewan Mellor >To: Thileepan Subramaniam CC: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >Date: Sun, 19 Mar 2006 13:17:35 +0000 > >On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote: > > > Hello, > > > > I am trying to see if the hypervisor can be used to detect deadlocks in >the > > guest VMs. My goal is to detect if a guest OS is deadlocked, and if it >is, > > then create a clone of the deadlocked OS without the locking condition, >and > > letting the clone run. While the clone runs I am hoping to generate some > > hints that could tell me what caused the deadlock. > > > > I simulated a deadlock/hang situation in a guest OS (by loading a badly > > written module to the kernel) and when the guestOS kernel was hanging, I > > ran "xm save" from Dom-0. But this command waits forever. > > > > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). >These > > seem to be called when I run ''xm save''. But beyond a point I am not sure > > what the python scripts do. I also see some libxc files such as > > xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the > > XenU). Can someone help me by explaining me what happens behind the >scene > > when "xm save" is called ? Is there any good documentation explaining >which > > actions are done by which layers (eg: python layer, C layer etc). > >xc_save, the executable, calls xc_linux_save, the libxc function. >Depending >upon whether this is a live or non-live save, some stuff is done (see >xc_linux_save for details). The Python layer is then called back, >requesting >that the domain is suspended. This request is passed through to the guest >by >writing /local/domain/<domid>/control/shutdown = suspend in the store. >This >is seen by the guest (a watch fires inside reboot.c) and then the guest >suspends itself. This is probably where you are falling down -- if the >guest >kernel is completely deadlocked, it''s going to struggle to suspend itself >correctly. > >If a suspend completes correctly, Xend will see it (another watch will >fire), >and xc_linux_save will be free to complete the save.So, I went and experimented this: basically, I changed XendCheckpoint.py to NOT wait for the guest to shutdown; I also changed xc_linux_save() to proceed saving without waiting (essentially, suspend_and_state() returns 0 instead of retrying repeateedly). With this I am able to save a deadlocked kernel smoothly. But when I try restore, I get this error message: Error: /usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2 failed And the log says, [2006-03-24 13:48:42 xend] DEBUG (XendCheckpoint:152) [xc_restore]: /usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2 [2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) xc_linux_restore start: max_pfn = 8800 [2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Increased domain reservationby22000KB [2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Reloading memory pages: 0% [2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Received all pages (0 races) [2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Failed to pin batch of 22 page tables: 22 [2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Restore exit with rc=1 Any clue .. so that i can overcome this and restore the kernel to its previous state (i.e., deadlocked state) ? thanks, TS> > Also, does it seem viable to clone a copy of a deadlocked guest OS in >the > > first place? > >If you have a byte-for-byte copy of a deadlocked guest, even if you could >suspend it, surely it will be deadlocked when it is resumed. How do you >intend to break the deadlock, and how is it easier to do that from outside >than it is to perform deadlock detection in the guest? > >Ewan._________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar – get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ewan Mellor <ewan@xensource.com> >To: Thileepan Subramaniam <thileepan_@hotmail.com> >CC: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >Date: Sun, 19 Mar 2006 13:17:35 +0000 > >On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote: > > > Hello, > > > > I am trying to see if the hypervisor can be used to detect deadlocks in >the > > guest VMs. My goal is to detect if a guest OS is deadlocked, and if it >is, > > then create a clone of the deadlocked OS without the locking condition, >and > > letting the clone run. While the clone runs I am hoping to generate some > > hints that could tell me what caused the deadlock. > > > > I simulated a deadlock/hang situation in a guest OS (by loading a badly > > written module to the kernel) and when the guestOS kernel was hanging, I > > ran "xm save" from Dom-0. But this command waits forever. > > > > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). >These > > seem to be called when I run ''xm save''. But beyond a point I am not sure > > what the python scripts do. I also see some libxc files such as > > xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the > > XenU). Can someone help me by explaining me what happens behind the >scene > > when "xm save" is called ? Is there any good documentation explaining >which > > actions are done by which layers (eg: python layer, C layer etc). > >xc_save, the executable, calls xc_linux_save, the libxc function. >Depending >upon whether this is a live or non-live save, some stuff is done (see >xc_linux_save for details). The Python layer is then called back, >requesting >that the domain is suspended. This request is passed through to the guest >by >writing /local/domain/<domid>/control/shutdown = suspend in the store. >This >is seen by the guest (a watch fires inside reboot.c) and then the guest >suspends itself. This is probably where you are falling down -- if the >guest >kernel is completely deadlocked, it''s going to struggle to suspend itself >correctly.This may sound a silly question (pardon me because i am relatively new to linux kernel) .. will it be possible to continue running reboot.c (or for that matter any kernel thread) when the kernel is deadlocked ? In Linux, is the kernel a single process or a bunch of parallelly executing entities? If later, then during a kernel deadlock (eg: by loading a faulty module that disables interrupts and do something silly) there can still be some other processes/threads run, right? thanks TS> >If a suspend completes correctly, Xend will see it (another watch will >fire), >and xc_linux_save will be free to complete the save. > > > Also, does it seem viable to clone a copy of a deadlocked guest OS in >the > > first place? > >If you have a byte-for-byte copy of a deadlocked guest, even if you could >suspend it, surely it will be deadlocked when it is resumed. How do you >intend to break the deadlock, and how is it easier to do that from outside >than it is to perform deadlock detection in the guest? > >Ewan. > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it''s FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Mar-24 19:24 UTC
Re: [Xen-devel] Detecting deadlocks with hypervisor..
T S wrote:> This may sound a silly question (pardon me because i am relatively new > to linux kernel) .. will it be possible to continue running reboot.c > (or for that matter any kernel thread) when the kernel is deadlocked ? > In Linux, is the kernel a single process or a bunch of parallelly > executing entities? If later, then during a kernel deadlock (eg: by > loading a faulty module that disables interrupts and do something > silly) there can still be some other processes/threads run, right?Sorry for not making this more clear previously. You cannot restore a dead-locked domain if a normal xm save doesn''t work. One thing that makes Xen unique is that guests actually are aware of what physical pages are assigned to them. When one does a save/restore, the guest has to canonicalize all of it''s internal references to physical pages. When it''s restored, it then remaps it''s newly assigned physical pages to all the old places where it needed to know about them for some reason or another. If the guest isn''t responsive when you do a save, then it will never canonicalize itself and there is no way to restore the domain. Regards, Anthony Liguori> thanks > TS > >> >> If a suspend completes correctly, Xend will see it (another watch >> will fire), >> and xc_linux_save will be free to complete the save. >> >> > Also, does it seem viable to clone a copy of a deadlocked guest OS >> in the >> > first place? >> >> If you have a byte-for-byte copy of a deadlocked guest, even if you >> could >> suspend it, surely it will be deadlocked when it is resumed. How do you >> intend to break the deadlock, and how is it easier to do that from >> outside >> than it is to perform deadlock detection in the guest? >> >> Ewan. >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today - it''s > FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Anthony Liguori <aliguori@us.ibm.com> >To: T S <thileepan_@hotmail.com> >CC: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >Date: Fri, 24 Mar 2006 13:24:46 -0600 > >T S wrote: >>This may sound a silly question (pardon me because i am relatively new to >>linux kernel) .. will it be possible to continue running reboot.c (or for >>that matter any kernel thread) when the kernel is deadlocked ? In Linux, >>is the kernel a single process or a bunch of parallelly executing >>entities? If later, then during a kernel deadlock (eg: by loading a faulty >>module that disables interrupts and do something silly) there can still be >>some other processes/threads run, right? > >Sorry for not making this more clear previously. You cannot restore a >dead-locked domain if a normal xm save doesn''t work. One thing that makes >Xen unique is that guests actually are aware of what physical pages are >assigned to them. When one does a save/restore, the guest has to >canonicalize all of it''s internal references to physical pages. When it''s >restored, it then remaps it''s newly assigned physical pages to all the old >places where it needed to know about them for some reason or another.Thank you for the reply. Do you mean to say that the canonicalize..() functions in the xc_linux_save.c are actually invoked in the guest OS'' context?>If the guest isn''t responsive when you do a save, then it will never >canonicalize itself and there is no way to restore the domain. > >Regards, > >Anthony Liguori > >>thanks >>TS >> >>> >>>If a suspend completes correctly, Xend will see it (another watch will >>>fire), >>>and xc_linux_save will be free to complete the save. >>> >>> > Also, does it seem viable to clone a copy of a deadlocked guest OS in >>>the >>> > first place? >>> >>>If you have a byte-for-byte copy of a deadlocked guest, even if you could >>>suspend it, surely it will be deadlocked when it is resumed. How do you >>>intend to break the deadlock, and how is it easier to do that from >>>outside >>>than it is to perform deadlock detection in the guest? >>> >>>Ewan. >>> >>> >>>_______________________________________________ >>>Xen-devel mailing list >>>Xen-devel@lists.xensource.com >>>http://lists.xensource.com/xen-devel >> >>_________________________________________________________________ >>Express yourself instantly with MSN Messenger! Download today - it''s FREE! >>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >_________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfee® Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Anthony Liguori <aliguori@us.ibm.com> >To: T S <thileepan_@hotmail.com> >CC: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >Date: Fri, 24 Mar 2006 13:24:46 -0600 > >T S wrote: >>This may sound a silly question (pardon me because i am relatively new to >>linux kernel) .. will it be possible to continue running reboot.c (or for >>that matter any kernel thread) when the kernel is deadlocked ? In Linux, >>is the kernel a single process or a bunch of parallelly executing >>entities? If later, then during a kernel deadlock (eg: by loading a faulty >>module that disables interrupts and do something silly) there can still be >>some other processes/threads run, right? > >Sorry for not making this more clear previously. You cannot restore a >dead-locked domain if a normal xm save doesn''t work. One thing that makes >Xen unique is that guests actually are aware of what physical pages are >assigned to them. When one does a save/restore, the guest has to >canonicalize all of it''s internal references to physical pages. When it''s >restored, it then remaps it''s newly assigned physical pages to all the old >places where it needed to know about them for some reason or another.We took a look at the xc_linux_save() function ... and what we see is that the canonicalize action is actually done by the Dom-0 (and not by the Dom-U); Dom-0 is able to do this because it is able to access the page tables of Dom-U as well as the pfn2mfn list of the Dom-U. Based on this, we think the Dom-0 can actually save the ''context'' of the deadlocked Dom-U. Please correct me if this claim is wrong. Also, given that Dom-0 can access the page tables and other structures of the deadlocked guest, can one of you be able to tell me what changes I need to do to xm_linux_save( ) (and other related functions) to save the state of the deadlocked guest without doing any handshake with the guest OS ? thanks! - T>If the guest isn''t responsive when you do a save, then it will never >canonicalize itself and there is no way to restore the domain. > >Regards, > >Anthony Liguori > >>thanks >>TS >> >>> >>>If a suspend completes correctly, Xend will see it (another watch will >>>fire), >>>and xc_linux_save will be free to complete the save. >>> >>> > Also, does it seem viable to clone a copy of a deadlocked guest OS in >>>the >>> > first place? >>> >>>If you have a byte-for-byte copy of a deadlocked guest, even if you could >>>suspend it, surely it will be deadlocked when it is resumed. How do you >>>intend to break the deadlock, and how is it easier to do that from >>>outside >>>than it is to perform deadlock detection in the guest? >>> >>>Ewan. >>> >>> >>>_______________________________________________ >>>Xen-devel mailing list >>>Xen-devel@lists.xensource.com >>>http://lists.xensource.com/xen-devel >> >>_________________________________________________________________ >>Express yourself instantly with MSN Messenger! Download today - it''s FREE! >>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >_________________________________________________________________ Don’t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Apr 2006, at 18:11, T S wrote:> We took a look at the xc_linux_save() function ... and what we see is > that > the canonicalize action is actually done by the Dom-0 (and not by the > Dom-U); > Dom-0 is able to do this because it is able to access the page tables > of Dom-U > as well as the pfn2mfn list of the Dom-U. Based on this, we think the > Dom-0 can > actually save the ''context'' of the deadlocked Dom-U. Please correct me > if this > claim is wrong. > > Also, given that Dom-0 can access the page tables and other structures > of the deadlocked guest, > can one of you be able to tell me what changes I need to do to > xm_linux_save( ) (and other related functions) to save the state of > the deadlocked guest without doing any handshake with the guest OS ?You can get at the consistent state of a guest by pausing it and then reading its state. However, the reason for the handshake is to ensure that the guest is not currently accessing pagetables or doing other critical operations. If it were then we could not safely translate its memory page addresses as it could have those addresses in places like its kernel stacks or register contexts, where they would not get translated and would cause a crash on restore. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Apr-07 17:41 UTC
Re: [Xen-devel] Detecting deadlocks with hypervisor..
T S wrote:>> From: Anthony Liguori <aliguori@us.ibm.com> >> To: T S <thileepan_@hotmail.com> >> CC: xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >> Date: Fri, 24 Mar 2006 13:24:46 -0600 >> >> T S wrote: >>> This may sound a silly question (pardon me because i am relatively >>> new to linux kernel) .. will it be possible to continue running >>> reboot.c (or for that matter any kernel thread) when the kernel is >>> deadlocked ? In Linux, is the kernel a single process or a bunch of >>> parallelly executing entities? If later, then during a kernel >>> deadlock (eg: by loading a faulty module that disables interrupts >>> and do something silly) there can still be some other >>> processes/threads run, right? >> >> Sorry for not making this more clear previously. You cannot restore a >> dead-locked domain if a normal xm save doesn''t work. One thing that >> makes Xen unique is that guests actually are aware of what physical >> pages are assigned to them. When one does a save/restore, the guest >> has to canonicalize all of it''s internal references to physical >> pages. When it''s restored, it then remaps it''s newly assigned >> physical pages to all the old places where it needed to know about >> them for some reason or another. > > We took a look at the xc_linux_save() function ... and what we see is > that > the canonicalize action is actually done by the Dom-0 (and not by the > Dom-U);Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 attempts to do as much of it as it can but as I''ve said before, it cannot do all of it.> Also, given that Dom-0 can access the page tables and other structures > of the deadlocked guest, > can one of you be able to tell me what changes I need to do to > xm_linux_save( ) (and other related functions) to save the state of > the deadlocked guest without doing any handshake with the guest OS ?If you want to attempt to futz with the state of a guest while it''s running without the guest cooperating, your best bet is to do as Keir suggested and pause the domain, make your changes, and then unpause. Regards, Anthony Liguori> > thanks! > - T > > >> If the guest isn''t responsive when you do a save, then it will never >> canonicalize itself and there is no way to restore the domain. >> >> Regards, >> >> Anthony Liguori >> >>> thanks >>> TS >>> >>>> >>>> If a suspend completes correctly, Xend will see it (another watch >>>> will fire), >>>> and xc_linux_save will be free to complete the save. >>>> >>>> > Also, does it seem viable to clone a copy of a deadlocked guest >>>> OS in the >>>> > first place? >>>> >>>> If you have a byte-for-byte copy of a deadlocked guest, even if you >>>> could >>>> suspend it, surely it will be deadlocked when it is resumed. How do >>>> you >>>> intend to break the deadlock, and how is it easier to do that from >>>> outside >>>> than it is to perform deadlock detection in the guest? >>>> >>>> Ewan. >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> >>> _________________________________________________________________ >>> Express yourself instantly with MSN Messenger! Download today - it''s >>> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> > > _________________________________________________________________ > Don’t just search. Find. Check out the new MSN Search! > http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Apr-07 17:45 UTC
Re: [Xen-devel] Detecting deadlocks with hypervisor..
Keir Fraser wrote:> > On 7 Apr 2006, at 18:11, T S wrote: > >> We took a look at the xc_linux_save() function ... and what we see is >> that >> the canonicalize action is actually done by the Dom-0 (and not by the >> Dom-U); >> Dom-0 is able to do this because it is able to access the page tables >> of Dom-U >> as well as the pfn2mfn list of the Dom-U. Based on this, we think the >> Dom-0 can >> actually save the ''context'' of the deadlocked Dom-U. Please correct >> me if this >> claim is wrong. >> >> Also, given that Dom-0 can access the page tables and other >> structures of the deadlocked guest, >> can one of you be able to tell me what changes I need to do to >> xm_linux_save( ) (and other related functions) to save the state of >> the deadlocked guest without doing any handshake with the guest OS ? > > You can get at the consistent state of a guest by pausing it and then > reading its state. However, the reason for the handshake is to ensure > that the guest is not currently accessing pagetables or doing other > critical operations. If it were then we could not safely translate its > memory page addresses as it could have those addresses in places like > its kernel stacks or register contexts, where they would not get > translated and would cause a crash on restore.I should add that this is a problem specific to writable page tables as the guest must be aware of the actual physical pages that it is using. With a VT/SVM guest or on an architecture that doesn''t use writable page tables, this isn''t an issue. Regards, Anthony Liguoi> -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Anthony Liguori <aliguori@us.ibm.com> >To: T S <thileepan_@hotmail.com> >CC: ewan@xensource.com, edwin.zhai@intel.com, rthelen@netapp.com, >Xen-devel@lists.xensource.com, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> >Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >Date: Fri, 07 Apr 2006 12:41:20 -0500 > >T S wrote: >>>From: Anthony Liguori <aliguori@us.ibm.com> >>>To: T S <thileepan_@hotmail.com> >>>CC: xen-devel@lists.xensource.com >>>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor.. >>>Date: Fri, 24 Mar 2006 13:24:46 -0600 >>> >>>T S wrote: >>>>This may sound a silly question (pardon me because i am relatively new >>>>to linux kernel) .. will it be possible to continue running reboot.c (or >>>>for that matter any kernel thread) when the kernel is deadlocked ? In >>>>Linux, is the kernel a single process or a bunch of parallelly executing >>>>entities? If later, then during a kernel deadlock (eg: by loading a >>>>faulty module that disables interrupts and do something silly) there can >>>>still be some other processes/threads run, right? >>> >>>Sorry for not making this more clear previously. You cannot restore a >>>dead-locked domain if a normal xm save doesn''t work. One thing that makes >>>Xen unique is that guests actually are aware of what physical pages are >>>assigned to them. When one does a save/restore, the guest has to >>>canonicalize all of it''s internal references to physical pages. When it''s >>>restored, it then remaps it''s newly assigned physical pages to all the >>>old places where it needed to know about them for some reason or another. >> >>We took a look at the xc_linux_save() function ... and what we see is that >>the canonicalize action is actually done by the Dom-0 (and not by the >>Dom-U); > >Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). >Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 >attempts to do as much of it as it can but as I''ve said before, it cannot >do all of it.Anthony, Thank you for your reply. In linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(), we see store_mfn and console_mfn being canonicalized before the guest-OS goes to sleep (as done in "xm save"). But before this canonicalization took place the python layer writes the store_mfn and console_mfn into the save-file (in the file''s header area). Does this mean the store_mfn and console_mfn values present in the header of the file are re-written at a later part of the file ? Other than the store & console mfn''s are there any other parameters canoicalized BY the guest OS during "xm save" ? thanks.> >>Also, given that Dom-0 can access the page tables and other structures of >>the deadlocked guest, >>can one of you be able to tell me what changes I need to do to >>xm_linux_save( ) (and other related functions) to save the state of the >>deadlocked guest without doing any handshake with the guest OS ? > >If you want to attempt to futz with the state of a guest while it''s running >without the guest cooperating, your best bet is to do as Keir suggested and >pause the domain, make your changes, and then unpause. > >Regards, > >Anthony Liguori > >> >>thanks! >>- T >> >> >>>If the guest isn''t responsive when you do a save, then it will never >>>canonicalize itself and there is no way to restore the domain. >>> >>>Regards, >>> >>>Anthony Liguori >>> >>>>thanks >>>>TS >>>> >>>>> >>>>>If a suspend completes correctly, Xend will see it (another watch will >>>>>fire), >>>>>and xc_linux_save will be free to complete the save. >>>>> >>>>> > Also, does it seem viable to clone a copy of a deadlocked guest OS >>>>>in the >>>>> > first place? >>>>> >>>>>If you have a byte-for-byte copy of a deadlocked guest, even if you >>>>>could >>>>>suspend it, surely it will be deadlocked when it is resumed. How do you >>>>>intend to break the deadlock, and how is it easier to do that from >>>>>outside >>>>>than it is to perform deadlock detection in the guest? >>>>> >>>>>Ewan. >>>>> >>>>> >>>>>_______________________________________________ >>>>>Xen-devel mailing list >>>>>Xen-devel@lists.xensource.com >>>>>http://lists.xensource.com/xen-devel >>>> >>>>_________________________________________________________________ >>>>Express yourself instantly with MSN Messenger! Download today - it''s >>>>FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >>>> >>>> >>>>_______________________________________________ >>>>Xen-devel mailing list >>>>Xen-devel@lists.xensource.com >>>>http://lists.xensource.com/xen-devel >>> >> >>_________________________________________________________________ >>Don’t just search. Find. Check out the new MSN Search! >>http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >> >_________________________________________________________________ Don’t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Apr-08 14:38 UTC
Re: [Xen-devel] Detecting deadlocks with hypervisor..
T S wrote:>> Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). >> Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 >> attempts to do as much of it as it can but as I''ve said before, it >> cannot do all of it. > > Anthony, > Thank you for your reply. > In linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(), we see > store_mfn and console_mfn being canonicalized before the guest-OS goes > to sleep (as done in "xm save"). But before this canonicalization took > place the python layer writes the store_mfn and console_mfn into the > save-file (in the file''s header area).Yes, although this strictly isn''t necessary.> Does this mean the store_mfn and console_mfn values present in the > header of the file are re-written at a later part of the file ? > > Other than the store & console mfn''s are there any other parameters > canoicalized BY the guest OS during "xm save" ?Not currently, although, as Keir pointed out, you still have to contend with the fact that a guest may have a cached PFN somewhere (for instance, because it''s in the process of updating a page table). Regards, Anthony Liguori> thanks. > > >> >>> Also, given that Dom-0 can access the page tables and other >>> structures of the deadlocked guest, >>> can one of you be able to tell me what changes I need to do to >>> xm_linux_save( ) (and other related functions) to save the state of >>> the deadlocked guest without doing any handshake with the guest OS ? >> >> If you want to attempt to futz with the state of a guest while it''s >> running without the guest cooperating, your best bet is to do as Keir >> suggested and pause the domain, make your changes, and then unpause. >> >> Regards, >> >> Anthony Liguori >> >>> >>> thanks! >>> - T >>> >>> >>>> If the guest isn''t responsive when you do a save, then it will >>>> never canonicalize itself and there is no way to restore the domain. >>>> >>>> Regards, >>>> >>>> Anthony Liguori >>>> >>>>> thanks >>>>> TS >>>>> >>>>>> >>>>>> If a suspend completes correctly, Xend will see it (another watch >>>>>> will fire), >>>>>> and xc_linux_save will be free to complete the save. >>>>>> >>>>>> > Also, does it seem viable to clone a copy of a deadlocked guest >>>>>> OS in the >>>>>> > first place? >>>>>> >>>>>> If you have a byte-for-byte copy of a deadlocked guest, even if >>>>>> you could >>>>>> suspend it, surely it will be deadlocked when it is resumed. How >>>>>> do you >>>>>> intend to break the deadlock, and how is it easier to do that >>>>>> from outside >>>>>> than it is to perform deadlock detection in the guest? >>>>>> >>>>>> Ewan. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>>> >>>>> _________________________________________________________________ >>>>> Express yourself instantly with MSN Messenger! Download today - >>>>> it''s FREE! >>>>> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>>> >>> >>> _________________________________________________________________ >>> Don’t just search. Find. Check out the new MSN Search! >>> http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >>> >> > > _________________________________________________________________ > Don’t just search. Find. Check out the new MSN Search! > http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel