If an HVM guest is waiting for an ioemu assist, when qemu isn''t running, and domain_shutdown(SHUTDOWN_crash) is called, then the domain isn''t crashed properly: 446 void domain_shutdown(struct domain *d, u8 reason) 447 { ... 466 for_each_vcpu ( d, v ) 467 { 468 if ( v->defer_shutdown ) 469 continue; Nothing will ever end the deferral. I added code to bust through the deferral if SHUTDOWN_crash was the reason, and it seemed to help, but I''m not sure it''s the right fix. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/02/2009 21:01, "John Levon" <levon@movementarian.org> wrote:> If an HVM guest is waiting for an ioemu assist, when qemu isn''t running, and > domain_shutdown(SHUTDOWN_crash) is called, then the domain isn''t crashed > properly: > > Nothing will ever end the deferral. I added code to bust through the > deferral if SHUTDOWN_crash was the reason, and it seemed to help, but > I''m not sure it''s the right fix.Hm. If qemu is down you''re kind of screwed anyway. Even a non-crashed guest will likely hang. If you care about that eventuality (i.e., you believe qemu problems are possible/likely and need to detect them, defend against them, or whatever), would it be better to have tools try to detect it through keepalives or something, and basically tackle that class of problem head on? If you want the hack, I think what you''re doing is probably about right. I''d have to go back over that code again to be exactly sure though, since it''s a bit subtle. Personally I think a dead qemu is pretty bad, and bugs leading to such should simply be found and fixed (oh for a perfect world :-). That bad things happen to a guest, like SHUTDOWN_crash hanging, after qemu is dead... I''d just live with that -- a worse thing has *already* happened to that guest''s virtualisation environment. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Feb 20, 2009 at 09:35:16PM +0000, Keir Fraser wrote:> Hm. If qemu is down you''re kind of screwed anyway.You''re totally screwed. But what happens today is this: you get some weird message about sentinels in xend.log (if you happen to read it), and a domain state that looks like this: domu-224 2 1024 1 ------ 0.0 which is not exactly very useful. But we detect qemu failures now in xend. So we turn on this code: # ideally we would like to forcibly crash the domain with # something like # xc.domain_shutdown(self.vm.getDomid(), DOMAIN_CRASH) # but this can easily lead to very rapid restart loops against # which we currently have no protection (The comment being completely incorrect), but then the crash doesn''t work because of the bug I pointed out. All I want to do is mark a domain without a qemu process as crashed. Is that clearer? And yes, it''s pretty trivial to make qemu break. Most typically by passing bogus parameters (say, a broken kernel image, an incorrect NIC, etc.) regards, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/02/2009 22:03, "John Levon" <levon@movementarian.org> wrote:> All I want to do is mark a domain without a qemu process as crashed. Is > that clearer? > > And yes, it''s pretty trivial to make qemu break. Most typically by > passing bogus parameters (say, a broken kernel image, an incorrect NIC, > etc.)Hmmmm.... Okay, I guess that is pretty reasonable. I''ll sort out a patch after the summit. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon writes ("Re: [Xen-devel] SHUTDOWN_crash and vcpu deferrals"):> # ideally we would like to forcibly crash the domain with > # something like > # xc.domain_shutdown(self.vm.getDomid(), DOMAIN_CRASH) > # but this can easily lead to very rapid restart loops against > # which we currently have no protection > > (The comment being completely incorrect), but then the crash doesn''t > work because of the bug I pointed out.I wrote that comment. I haven''t been following this bit of xend. Do you mean that nowadays if you say on_crash = ''restart'' and the domain immediately crashes on boot, you don''t get an infinite restart loop ? One of the most common causes of qemu `crashing'' is that it wasn''t able to open the dom0 device corresponding to some emulated device for the guest''s benefit and that obviously happens at startup.> All I want to do is mark a domain without a qemu process as crashed. Is > that clearer?I think that would be good, provided that we can prevent it restarting rapidly.> And yes, it''s pretty trivial to make qemu break. Most typically by > passing bogus parameters (say, a broken kernel image, an incorrect NIC, > etc.)As you say. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Feb 23, 2009 at 04:51:10PM +0000, Ian Jackson wrote:> > (The comment being completely incorrect), but then the crash doesn''t > > work because of the bug I pointed out. > > I wrote that comment. I haven''t been following this bit of xend. Do > you mean that nowadays if you say > on_crash = ''restart'' > and the domain immediately crashes on boot, you don''t get an infinite > restart loop ? One of the most common causes of qemu `crashing'' isAFAIK this has been the case since forever: rst = self._readVm(''xend/previous_restart_time'') if rst: rst = float(rst) timeout = now - rst if timeout < MINIMUM_RESTART_TIME: log.error( ''VM %s restarting too fast (%f seconds since the last '' ''restart). Refusing to restart to avoid loops.'', self.info[''name_label''], timeout) self.destroy() return self._writeVm(''xend/previous_restart_time'', str(now)) This is from 3.1.4. Perhaps it was broken when you tried it, but it certainly seems to do its intended job on 3.3.2pre for me. regards, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon writes ("Re: [Xen-devel] SHUTDOWN_crash and vcpu deferrals"):> This is from 3.1.4. Perhaps it was broken when you tried it, but it > certainly seems to do its intended job on 3.3.2pre for me.Oh, great. I put the comment there because I remembered it happening to me once (with some kind of pre-3.2 unstable tree I think) but perhaps I misremembered or there was something else wrong. I didn''t try to reproduce it. Well, in that case we should definitely fix Xen so that the guest can be crashed and get rid of my bogus comment. Regards, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel