shawn
2008-May-21 06:33 UTC
[Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi all, There is a problem in Xen now,When fatal error happened on VM like qemu-dm process died, xend should take care of it. Don''t leave it as defunct process (zombie process). Because of mis-configuration or some other reason, qemu-dm process would die. For now, xend haven''t taken care of this dead process and it remains as defunct process, and xm list shows VM status assigned to the process as vserv1134 5 6144 1 ------ 0.0 This patch fix xend as when fatal error happened (e.g. qemu-dm process was killed) log error meesge then destroy that domain and clean up the process (no zombies) This is caused by the xend daemon, xend forks a process and run the qemu-dm program, when qemu-dm was killed directly ,xend doesn''t have a chance to call the wait() function to collect this zombie child(qemu-dm is executed by a thread).For the xend doesn''t have any idea of the qemu-dm child is alive or being killed. For the above reason,added some code in xend to check those hvm DM status each 30 seconds. Have made a patch based on the open source xen3.2.1 source code. Please review this patch. Thanks. Xiaowei --- xen-3.2.1/tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 13:53:08.000000000 +0800 +++ xen-3.2.1/tools/python/xen/xend/server/SrvServer.py 2008-05-21 13:58:56.000000000 +0800 @@ -44,6 +44,7 @@ import re import time import signal +import os from threading import Thread from xen.web.httpserver import HttpServer, UnixHttpServer @@ -148,14 +149,28 @@ # Reaching this point means we can auto start domains try: - xenddomain().autostart_domains() + dom = xenddomain() + dom.autostart_domains() except Exception, e: log.exception("Failed while autostarting domains") # loop to keep main thread alive until it receives a SIGTERM self.running = True while self.running: - time.sleep(100000000) + # loop to destroy those hvm domain that whoes DM has dead unexpectedly. + for item in dom.domains.values(): + if item.info.is_hvm(): + device_model_pid item.gatherDom((''image/device-model-pid'', str)) + dm_stat_cmd = "ps -o stat --no-headers -p"+device_model_pid + dm_stat = os.popen(dm_stat_cmd).readline().rstrip() + log.info("status of the command is:" + dm_stat + "end of output") + if dm_stat == ''Z'': + log.info("status of the command is:" + dm_stat + "end of output") + log.warn("Devices Model for " + str(item) + "was killed unexpectedly") + item.destroy() + else: + continue + time.sleep(30) if self.reloadingConfig: log.info("Restarting all XML-RPC and Xen-API servers...") _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Masaki Kanno
2008-May-21 07:21 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi Xiaowei, Nonessential comment. Your patch includes both tab-indent and space-indent. Could you change tab-indent to space-indent? Best regards, Kan Wed, 21 May 2008 14:33:05 +0800, shawn wrote:>Hi all, > >There is a problem in Xen now,When fatal error happened on VM like >qemu-dm process died, xend should take care of it. Don''t leave it as >defunct process (zombie process). >Because of mis-configuration or some other reason, qemu-dm process would >die. > >For now, xend haven''t taken care of this dead process and it remains as >defunct process, and xm list shows VM status assigned to the process as >vserv1134 5 6144 1 ------ 0.0 > >This patch fix xend as when fatal error happened (e.g. qemu-dm process >was killed) >log error meesge then destroy that domain and clean up the process (no >zombies) > >This is caused by the xend daemon, xend forks a process and run the >qemu-dm program, when qemu-dm was killed directly ,xend doesn''t have a >chance to call >the wait() function to collect this zombie child(qemu-dm is executed by >a thread).For the xend doesn''t have any idea of the qemu-dm child is >alive or being killed. > >For the above reason,added some code in xend to check those hvm DM >status each 30 seconds. > >Have made a patch based on the open source xen3.2.1 source code. > >Please review this patch. > >Thanks. >Xiaowei > >--- xen-3.2.1/tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 >13:53:08.000000000 +0800 >+++ xen-3.2.1/tools/python/xen/xend/server/SrvServer.py 2008-05-21 >13:58:56.000000000 +0800 >@@ -44,6 +44,7 @@ > import re > import time > import signal >+import os > from threading import Thread > > from xen.web.httpserver import HttpServer, UnixHttpServer >@@ -148,14 +149,28 @@ > > # Reaching this point means we can auto start domains > try: >- xenddomain().autostart_domains() >+ dom = xenddomain() >+ dom.autostart_domains() > except Exception, e: > log.exception("Failed while autostarting domains") > > # loop to keep main thread alive until it receives a >SIGTERM > self.running = True > while self.running: >- time.sleep(100000000) >+ # loop to destroy those hvm domain that whoes DM has dead >unexpectedly. >+ for item in dom.domains.values(): >+ if item.info.is_hvm(): >+ device_model_pid >item.gatherDom((''image/device-model-pid'', str)) >+ dm_stat_cmd = "ps -o stat --no-headers -p"+device_model_pid>+ dm_stat = os.popen(dm_stat_cmd).readline().rstrip() >+ log.info("status of the command is:" + dm_stat + "endof output")>+ if dm_stat == ''Z'': >+ log.info("status of the command is:" + dm_stat +"end of>output") >+ log.warn("Devices Model for " + str(item) + "waskilled>unexpectedly") >+ item.destroy() >+ else: >+ continue >+ time.sleep(30) > > if self.reloadingConfig: > log.info("Restarting all XML-RPC and Xen-API >servers...") > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-21 07:34 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi Kan, Thanks for your comment. Correctted that,and some other mistakes. Please review this patch again. thanks xiaowei --- tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 13:53:08.000000000 +0800 +++ tools/python/xen/xend/server/SrvServer.py 2008-05-21 15:28:09.000000000 +0800 @@ -44,6 +44,7 @@ import re import time import signal +import os from threading import Thread from xen.web.httpserver import HttpServer, UnixHttpServer @@ -148,14 +149,26 @@ # Reaching this point means we can auto start domains try: - xenddomain().autostart_domains() + dom = xenddomain() + dom.autostart_domains() except Exception, e: log.exception("Failed while autostarting domains") # loop to keep main thread alive until it receives a SIGTERM self.running = True while self.running: - time.sleep(100000000) + # loop to destroy those hvm domain that whoes DM has dead unexpectedly. + for item in dom.domains.values(): + if item.info.is_hvm(): + device_model_pid item.gatherDom((''image/device-model-pid'', str)) + dm_stat_cmd = "ps -o stat --no-headers -p"+device_model_pid + dm_stat os.popen(dm_stat_cmd).readline().rstrip() + if dm_stat == ''Z'': + log.warn("Devices Model for domain " + str(item.domid) + "was killed unexpectedly") + item.destroy() + else: + continue + time.sleep(30) if self.reloadingConfig: log.info("Restarting all XML-RPC and Xen-API servers...") On Wed, 2008-05-21 at 16:21 +0900, Masaki Kanno wrote:> Hi Xiaowei, > > Nonessential comment. > > Your patch includes both tab-indent and space-indent. > Could you change tab-indent to space-indent? > > Best regards, > Kan > > Wed, 21 May 2008 14:33:05 +0800, shawn wrote: > > >Hi all, > > > >There is a problem in Xen now,When fatal error happened on VM like > >qemu-dm process died, xend should take care of it. Don''t leave it as > >defunct process (zombie process). > >Because of mis-configuration or some other reason, qemu-dm process would > >die. > > > >For now, xend haven''t taken care of this dead process and it remains as > >defunct process, and xm list shows VM status assigned to the process as > >vserv1134 5 6144 1 ------ 0.0 > > > >This patch fix xend as when fatal error happened (e.g. qemu-dm process > >was killed) > >log error meesge then destroy that domain and clean up the process (no > >zombies) > > > >This is caused by the xend daemon, xend forks a process and run the > >qemu-dm program, when qemu-dm was killed directly ,xend doesn''t have a > >chance to call > >the wait() function to collect this zombie child(qemu-dm is executed by > >a thread).For the xend doesn''t have any idea of the qemu-dm child is > >alive or being killed. > > > >For the above reason,added some code in xend to check those hvm DM > >status each 30 seconds. > > > >Have made a patch based on the open source xen3.2.1 source code. > > > >Please review this patch. > > > >Thanks. > >Xiaowei > > > >--- xen-3.2.1/tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 > >13:53:08.000000000 +0800 > >+++ xen-3.2.1/tools/python/xen/xend/server/SrvServer.py 2008-05-21 > >13:58:56.000000000 +0800 > >@@ -44,6 +44,7 @@ > > import re > > import time > > import signal > >+import os > > from threading import Thread > > > > from xen.web.httpserver import HttpServer, UnixHttpServer > >@@ -148,14 +149,28 @@ > > > > # Reaching this point means we can auto start domains > > try: > >- xenddomain().autostart_domains() > >+ dom = xenddomain() > >+ dom.autostart_domains() > > except Exception, e: > > log.exception("Failed while autostarting domains") > > > > # loop to keep main thread alive until it receives a > >SIGTERM > > self.running = True > > while self.running: > >- time.sleep(100000000) > >+ # loop to destroy those hvm domain that whoes DM has dead > >unexpectedly. > >+ for item in dom.domains.values(): > >+ if item.info.is_hvm(): > >+ device_model_pid > >item.gatherDom((''image/device-model-pid'', str)) > >+ dm_stat_cmd = "ps -o stat --no-headers -p"+ > device_model_pid > >+ dm_stat = os.popen(dm_stat_cmd).readline().rstrip() > >+ log.info("status of the command is:" + dm_stat + "end > of output") > >+ if dm_stat == ''Z'': > >+ log.info("status of the command is:" + dm_stat + > "end of > >output") > >+ log.warn("Devices Model for " + str(item) + "was > killed > >unexpectedly") > >+ item.destroy() > >+ else: > >+ continue > >+ time.sleep(30) > > > > if self.reloadingConfig: > > log.info("Restarting all XML-RPC and Xen-API > >servers...") > > > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-May-21 07:43 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
On 21/5/08 07:33, "shawn" <xiaowei.hu@oracle.com> wrote:> This patch fix xend as when fatal error happened (e.g. qemu-dm process > was killed) > log error meesge then destroy that domain and clean up the process (no > zombies)Shouldn''t you put the domain into crashed state? Then you would do whatever the configured action is on crash. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-21 09:10 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
On Wed, 2008-05-21 at 08:43 +0100, Keir Fraser wrote:> On 21/5/08 07:33, "shawn" <xiaowei.hu@oracle.com> wrote: > > > This patch fix xend as when fatal error happened (e.g. qemu-dm process > > was killed) > > log error meesge then destroy that domain and clean up the process (no > > zombies) > > Shouldn''t you put the domain into crashed state? Then you would do whatever > the configured action is on crash.yes,that may be a better solution. Do you mean just set the XendDomainInfo._stateSet(DOM_STATE_CRASHED),and let xend take care of the left things? But I can''t find DOM_STATE_CRASHED like constant defined,there is only DOM_STATE_HALTED,DOM_STATE_RUNNING,etc. So not sure this could work... thanks Xiaowei> > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-May-21 09:37 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
On 21/5/08 10:10, "shawn" <xiaowei.hu@oracle.com> wrote:> yes,that may be a better solution. > Do you mean just set the XendDomainInfo._stateSet(DOM_STATE_CRASHED),and > let xend take care of the left things?Yes. I have only a dim idea of how these things work in xend, but that''s the general idea and I expect there are existing examples of this kind of approach already within xend.> But I can''t find DOM_STATE_CRASHED like constant defined,there is > only DOM_STATE_HALTED,DOM_STATE_RUNNING,etc. So not sure this could > work...It''s defined in XendConstants.py along with all the others! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-26 07:28 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi all, I changed this patch ,put the domain into crashed domains,the called refreshShutdown immediately. Yes,this works fine now ,could do what specified on crashed in the domain config file. Resending this patch ,Please review it again! Thanks Keir for your suggestion:) Thanks all. Regards, xiaowei --- ./tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 13:53:08.000000000 +0800 +++ ./tools/python/xen/xend/server/SrvServer.py 2008-05-26 15:16:39.000000000 +0800 @@ -44,6 +44,7 @@ import re import time import signal +import os from threading import Thread from xen.web.httpserver import HttpServer, UnixHttpServer @@ -148,14 +149,27 @@ # Reaching this point means we can auto start domains try: - xenddomain().autostart_domains() + dom = xenddomain() + dom.autostart_domains() except Exception, e: log.exception("Failed while autostarting domains") # loop to keep main thread alive until it receives a SIGTERM self.running = True while self.running: - time.sleep(100000000) + # loop to destroy those hvm domain that whoes DM has dead unexpectedly. + for item in dom.domains.values(): + if item.info.is_hvm(): + device_model_pid item.gatherDom((''image/device-model-pid'', str)) + dm_stat_cmd = "ps -o stat --no-headers -p"+device_model_pid + dm_stat os.popen(dm_stat_cmd).readline().rstrip() + if dm_stat == ''Z'': + log.warn("Devices Model for domain " + str(item.domid) + "was killed unexpectedly") + item.info[''crashed''] = 1 + item.refreshShutdown(item.info) + else: + continue + time.sleep(30) if self.reloadingConfig: log.info("Restarting all XML-RPC and Xen-API servers...") _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-May-26 07:35 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Please re-send with a changeset comment and a signed-off-by line. -- Keir On 26/5/08 08:28, "shawn" <xiaowei.hu@oracle.com> wrote:> Hi all, > > I changed this patch ,put the domain into crashed domains,the called > refreshShutdown immediately. > Yes,this works fine now ,could do what specified on crashed in the > domain config file. > > Resending this patch ,Please review it again! > > Thanks Keir for your suggestion:) > > Thanks all. > > Regards, > xiaowei > > > --- ./tools/python/xen/xend/server/SrvServer.py.org 2008-05-21 > 13:53:08.000000000 +0800 > +++ ./tools/python/xen/xend/server/SrvServer.py 2008-05-26 > 15:16:39.000000000 +0800 > @@ -44,6 +44,7 @@ > import re > import time > import signal > +import os > from threading import Thread > > from xen.web.httpserver import HttpServer, UnixHttpServer > @@ -148,14 +149,27 @@ > > # Reaching this point means we can auto start domains > try: > - xenddomain().autostart_domains() > + dom = xenddomain() > + dom.autostart_domains() > except Exception, e: > log.exception("Failed while autostarting domains") > > # loop to keep main thread alive until it receives a > SIGTERM > self.running = True > while self.running: > - time.sleep(100000000) > + # loop to destroy those hvm domain that whoes DM has > dead unexpectedly. > + for item in dom.domains.values(): > + if item.info.is_hvm(): > + device_model_pid > item.gatherDom((''image/device-model-pid'', str)) > + dm_stat_cmd = "ps -o stat --no-headers > -p"+device_model_pid > + dm_stat > os.popen(dm_stat_cmd).readline().rstrip() > + if dm_stat == ''Z'': > + log.warn("Devices Model for domain " + > str(item.domid) + "was killed unexpectedly") > + item.info[''crashed''] = 1 > + item.refreshShutdown(item.info) > + else: > + continue > + time.sleep(30) > > if self.reloadingConfig: > log.info("Restarting all XML-RPC and Xen-API > servers...") >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-26 07:48 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi, This patch fix xend as when fatal error happened (e.g. qemu-dm process was killed) log error meesge then marked that domain as crashed ,do what specified on crashed in the domain config file.Added some code in xend to check those crashed hvm DM status each 30 seconds. Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com> Best regards. Xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-28 09:24 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi keir, I noticed this patch has been reverted. And yes,I found some improper code in this patch,after review it times. Could I ask if there is any methodology mistakes to solve this problem? or need I keep improving this patch? Thanks xiaow On Mon, 2008-05-26 at 15:48 +0800, shawn wrote:> Hi, > > This patch fix xend as when fatal error happened (e.g. qemu-dm process > was killed) log error meesge then marked that domain as crashed ,do > what specified on crashed in the domain config file.Added some code in > xend to check those crashed hvm DM status each 30 seconds. > > Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com> > > > Best regards. > Xiaowei > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2008-May-28 09:37 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
shawn writes ("Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process"):> Could I ask if there is any methodology mistakes to solve this problem? > or need I keep improving this patch?I made some suggestions in a recent pair of messages in the thread `c/s 17731 portability issues''. Did you not receive those messages ?>From over here they appear to have been copied to you as the author ofthe errant patch. Anyway, let me repeat myself: Certainly running ps in this way is not the right way to do it. Since qemu-dm is started by xend, it is quite possible for xend to have a better and more reliable arrangement for detecting termination of the qemu-dm process. No polling is needed (and thus failure detection can be immediate). I suggested a design involving a named pipe. qemu-dm would be passed the writing end across exec but just keep it, and not write anything to it. xend would keep the reading end, and when it becomes readable would collect the qemu-dm exit status with waitpid (with W_NOHANG). xend would then kill the domain and report the fact of termination and also qemu-dm''s exit status if available. On restart, xend would attempt to open the fifo again with O_RDONLY|O_NONBLOCK which would fail EWOUDLBLOCK if qemu-dm was no longer running; if it was still running then termination can be detected as above, although the exit status won''t be recoverable. Does this all make sense ? I''d be happy to expand on it if you''d like to ask questions. We''ll make sure to review your next submission thoroughly. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-May-29 02:23 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi Ian, Thanks for your explanation:) Imporving this patch regards, xiaowei On Wed, 2008-05-28 at 10:37 +0100, Ian Jackson wrote:> shawn writes ("Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process"): > > Could I ask if there is any methodology mistakes to solve this problem? > > or need I keep improving this patch? > > I made some suggestions in a recent pair of messages in the thread > `c/s 17731 portability issues''. Did you not receive those messages ? > >From over here they appear to have been copied to you as the author of > the errant patch. > > Anyway, let me repeat myself: > > Certainly running ps in this way is not the right way to do it. > > Since qemu-dm is started by xend, it is quite possible for xend to > have a better and more reliable arrangement for detecting termination > of the qemu-dm process. No polling is needed (and thus failure > detection can be immediate). > > I suggested a design involving a named pipe. qemu-dm would be passed > the writing end across exec but just keep it, and not write anything > to it. xend would keep the reading end, and when it becomes readable > would collect the qemu-dm exit status with waitpid (with W_NOHANG). > xend would then kill the domain and report the fact of termination and > also qemu-dm''s exit status if available. > > On restart, xend would attempt to open the fifo again with > O_RDONLY|O_NONBLOCK which would fail EWOUDLBLOCK if qemu-dm was no > longer running; if it was still running then termination can be > detected as above, although the exit status won''t be recoverable. > > Does this all make sense ? I''d be happy to expand on it if you''d like > to ask questions. We''ll make sure to review your next submission > thoroughly. > > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
shawn
2008-Jul-17 08:00 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi Ian, I have some question now. 1.For each hvm guest there will be a separate qemu-dm process created,so we need to track multi opened named pipes.If use blocked read,does that mean I have to fork a new child in xend for each hvm guest when it was created? 2.If I have to fork childs in xend, Could I kill the corresponding domain in this child process directly? thanks xiaowei On Thu, 2008-05-29 at 10:23 +0800, shawn wrote:> Hi Ian, > > Thanks for your explanation:) > Imporving this patch > > regards, > xiaowei > > On Wed, 2008-05-28 at 10:37 +0100, Ian Jackson wrote: > > shawn writes ("Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process"): > > > Could I ask if there is any methodology mistakes to solve this problem? > > > or need I keep improving this patch? > > > > I made some suggestions in a recent pair of messages in the thread > > `c/s 17731 portability issues''. Did you not receive those messages ? > > >From over here they appear to have been copied to you as the author of > > the errant patch. > > > > Anyway, let me repeat myself: > > > > Certainly running ps in this way is not the right way to do it. > > > > Since qemu-dm is started by xend, it is quite possible for xend to > > have a better and more reliable arrangement for detecting termination > > of the qemu-dm process. No polling is needed (and thus failure > > detection can be immediate). > > > > I suggested a design involving a named pipe. qemu-dm would be passed > > the writing end across exec but just keep it, and not write anything > > to it. xend would keep the reading end, and when it becomes readable > > would collect the qemu-dm exit status with waitpid (with W_NOHANG). > > xend would then kill the domain and report the fact of termination and > > also qemu-dm''s exit status if available. > > > > On restart, xend would attempt to open the fifo again with > > O_RDONLY|O_NONBLOCK which would fail EWOUDLBLOCK if qemu-dm was no > > longer running; if it was still running then termination can be > > detected as above, although the exit status won''t be recoverable. > > > > Does this all make sense ? I''d be happy to expand on it if you''d like > > to ask questions. We''ll make sure to review your next submission > > thoroughly. > > > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jean Guyader
2008-Jul-17 08:14 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
Hi, On Thu, Jul 17, 2008 at 9:00 AM, shawn <xiaowei.hu@oracle.com> wrote:> Hi Ian, > > I have some question now. > 1.For each hvm guest there will be a separate qemu-dm process created,so > we need to track multi opened named pipes.If use blocked read,does that > mean I have to fork a new child in xend for each hvm guest when it was > created?You could use a select to watch every named pipes opened in Xend.> > 2.If I have to fork childs in xend, Could I kill the corresponding > domain in this child process directly? > > thanks > xiaowei > > > On Thu, 2008-05-29 at 10:23 +0800, shawn wrote: >> Hi Ian, >> >> Thanks for your explanation:) >> Imporving this patch >> >> regards, >> xiaowei >> >> On Wed, 2008-05-28 at 10:37 +0100, Ian Jackson wrote: >> > shawn writes ("Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process"): >> > > Could I ask if there is any methodology mistakes to solve this problem? >> > > or need I keep improving this patch? >> > >> > I made some suggestions in a recent pair of messages in the thread >> > `c/s 17731 portability issues''. Did you not receive those messages ? >> > >From over here they appear to have been copied to you as the author of >> > the errant patch. >> > >> > Anyway, let me repeat myself: >> > >> > Certainly running ps in this way is not the right way to do it. >> > >> > Since qemu-dm is started by xend, it is quite possible for xend to >> > have a better and more reliable arrangement for detecting termination >> > of the qemu-dm process. No polling is needed (and thus failure >> > detection can be immediate). >> > >> > I suggested a design involving a named pipe. qemu-dm would be passed >> > the writing end across exec but just keep it, and not write anything >> > to it. xend would keep the reading end, and when it becomes readable >> > would collect the qemu-dm exit status with waitpid (with W_NOHANG). >> > xend would then kill the domain and report the fact of termination and >> > also qemu-dm''s exit status if available. >> > >> > On restart, xend would attempt to open the fifo again with >> > O_RDONLY|O_NONBLOCK which would fail EWOUDLBLOCK if qemu-dm was no >> > longer running; if it was still running then termination can be >> > detected as above, although the exit status won''t be recoverable. >> > >> > Does this all make sense ? I''d be happy to expand on it if you''d like >> > to ask questions. We''ll make sure to review your next submission >> > thoroughly. >> > >> > Ian. >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- Jean Guyader _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2008-Jul-18 12:35 UTC
Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process
shawn writes ("Re: [Xen-devel] [patch]Make xend to take care of dead qemu-dm process"):> I have some question now. > 1.For each hvm guest there will be a separate qemu-dm process created,so > we need to track multi opened named pipes.If use blocked read,does that > mean I have to fork a new child in xend for each hvm guest when it was > created? > > 2.If I have to fork childs in xend, Could I kill the corresponding > domain in this child process directly?I did most of the work for detecting failures of qemu-dm in what became changeset 17843:6f189de0f73d. That appears to work reasonably well. It doesn''t automatically destroy the domain when qemu-dm dies, because in my tests this, when combined with on_crash=restart, caused very rapid looping if qemu-dm failed to start up. Before we make qemu-dm failures automatically kill the domain, we should have some kind of detection for early domain failures with on_crash=restart. General domain boot failures will cause undesirable rapid looping too, so that would be a general improvement. Depending how intrusive such a change would ends up being it might be better to postpone it to 3.4, as we''re currently in feature freeze. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel