Hello, When destroying a guest (xl destroy <domid>) on NetBSD I get the following error on the log file: Waiting for domain test (domid 1) to die [pid 11675] Domain 1 is dead Unknown shutdown reason code 255. Destroying domain. Action for shutdown reason code 255 is destroy Domain 1 needs to be cleaned up: destroying the domain do_domctl failed: errno 3 libxl: error: libxl.c:762:libxl_domain_destroy: xc_domain_pause failed for 1 libxl: error: libxl_dom.c:658:userdata_path: unable to find domain info for domain 1: No such file or directory do_domctl failed: errno 3 libxl: error: libxl.c:787:libxl_domain_destroy: xc_domain_destroy failed for 1 Done. Exiting now The domain is destroyed, but xenstore is not cleaned properly, and hotplug scripts are not executed because the state of the devices doesn''t get to 6 until xl exits. From libxl code I guess the following procedure is used to destroy the domain: * Destroy PCI devices * Pause domain * Destroy device model (not applicable here, the guest is a PV with no dm) * Destroy devices (here xl waits for devices to reach state ''6'', but they never get there, only when xl exits the state changes to 6) * Clean xenstore * Destroy the domain The ''pause'' ctl call fails here, but I''ve tried to pause a domain using ''xl pause <domid>'' and it works fine. BTW, I''m using xen-unstable, and the guest is a Debian 6.0.3 PV. Any help on what might be happening here is welcome. Thanks, Roger.
Also, libxl_domain_destroy is called twice, the first time it is called from destroy_domain (xl_cmdimpl.c), and the second time it is called from handle_domain_death, the errors shown on the log above are from the second call, the one that comes from handle_domain_death. Don''t know if this is the normal behavior, but it seems quite strange that libxl_domain_destroy is called twice.
On Fri, 2011-12-02 at 09:45 +0000, Roger Pau Monné wrote:> Also, libxl_domain_destroy is called twice, the first time it is > called from destroy_domain (xl_cmdimpl.c), and the second time it is > called from handle_domain_death, the errors shown on the log above are > from the second call, the one that comes from handle_domain_death. > Don't know if this is the normal behavior, but it seems quite strange > that libxl_domain_destroy is called twice.It's normal I think. handle_domain_death() is there to deal with graceful shutdown and reboot scenarios e.g. to clear up after the domain and restart as necessary. xl destroy is explictly the command which shoots the domain in the head and so it also calls destroy. If you want graceful you should use "xl shutdown". Although handle_domain_death also picks up on the destroy it shouldn't be doing too much in that case since the interesting work has already happened. The interesting logs will be the xl -vvv destroy ones I think. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:> It''s normal I think. handle_domain_death() is there to deal with > graceful shutdown and reboot scenarios e.g. to clear up after the domain > and restart as necessary. > > xl destroy is explictly the command which shoots the domain in the head > and so it also calls destroy. If you want graceful you should use "xl > shutdown". > > Although handle_domain_death also picks up on the destroy it shouldn''t > be doing too much in that case since the interesting work has already > happened. The interesting logs will be the xl -vvv destroy ones I think.Devices doesn''t get disconnected, or at least they don''t get to state ''6'' until xl exits, maybe I should modify libxl_domain_destroy to search xenstore and try to manually execute hotplug scripts for devices that are still connected after calling ''libxl__devices_destroy''. Using xl -vvv destroy <domid>'' doesn''t print any debug message, only: xc: debug: hypercall buffer: total allocations:131 total releases:131 xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 xc: debug: hypercall buffer: cache current size:2 xc: debug: hypercall buffer: cache hits:128 misses:2 toobig:1 Thanks, Roger.
On Fri, 2011-12-02 at 10:10 +0000, Roger Pau Monné wrote:> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>: > > It's normal I think. handle_domain_death() is there to deal with > > graceful shutdown and reboot scenarios e.g. to clear up after the domain > > and restart as necessary. > > > > xl destroy is explictly the command which shoots the domain in the head > > and so it also calls destroy. If you want graceful you should use "xl > > shutdown". > > > > Although handle_domain_death also picks up on the destroy it shouldn't > > be doing too much in that case since the interesting work has already > > happened. The interesting logs will be the xl -vvv destroy ones I think. > > Devices doesn't get disconnected, or at least they don't get to state > '6' until xl exits, maybe I should modify libxl_domain_destroy to > search xenstore and try to manually execute hotplug scripts for > devices that are still connected after calling > 'libxl__devices_destroy'.libxl_destroy_domain should be called with force = 1 in the main_destroy case, I suspect. Does that cause the scripts to be run?> Using xl -vvv destroy <domid>' doesn't > print any debug message, only: > > xc: debug: hypercall buffer: total allocations:131 total releases:131 > xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 > xc: debug: hypercall buffer: cache current size:2 > xc: debug: hypercall buffer: cache hits:128 misses:2 toobig:1 > > Thanks, Roger._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:> libxl_destroy_domain should be called with force = 1 in the main_destroy > case, I suspect. Does that cause the scripts to be run?Well, with force = 1 hotplug scripts are executed, but devices are still busy and they cannot be disconnected (mainly vnd). Also crashed the server, but that''s NetBSD buggy vnd driver problem. Seeing the execution order in libxl_domain_destroy, shouldn''t we first destroy the domain (xc_domain_destroy) and then remove the devices?
On Fri, 2011-12-02 at 10:29 +0000, Roger Pau Monné wrote:> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>: > > libxl_destroy_domain should be called with force = 1 in the main_destroy > > case, I suspect. Does that cause the scripts to be run? > > Well, with force = 1 hotplug scripts are executed, but devices are > still busy and they cannot be disconnected (mainly vnd). Also crashed > the server, but that's NetBSD buggy vnd driver problem. Seeing the > execution order in libxl_domain_destroy, shouldn't we first destroy > the domain (xc_domain_destroy) and then remove the devices?In the force case, yes, I expect so. In the non-force case you want to let the guest shutdown its devices gracefully so you would do devices first. However I'm not entirely sure that a non-forced libxl_domain_destroy makes much sense. The callsites are: * handle_domain_death: The guest has already shutdown at this point. Nothing graceful can happen. * create_domain: We have failed to start the guest, no chance of graceful shutdown. * destroy_domain: Semantics are explicitly the force case. * save_domain: Domain has already suspended. There's nothing which can be done gracefully. * migrate_domain: Already forced, domain is gone already, no chance of a graceful shutdown. * migrate_receive: Already forced, we have failed to receive the domain, no possibility of a graceful shutdown. * libxl_domain_create, on the failure path so no need for a graceful option. * libxl__destroy_device_model. Maybe this should be doing a graceful shutdown but in that case it should either be calling libxl_domain_shutdown or writing the qemu-dm control node and waiting, at which point after some timeout perhaps a forced shutdown would be appropriate. So it seems to me that the non-forced option in libxl_domain_destroy can be removed and we should just shoot the domain and then forcibly teardown the backends, running script as necessary. The only wrinkle is the stub device-model case but really that's already a special domain and should be treated as such. Does that make sense to anyone else? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:> On Fri, 2011-12-02 at 10:29 +0000, Roger Pau Monné wrote: >> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>: >> > libxl_destroy_domain should be called with force = 1 in the main_destroy >> > case, I suspect. Does that cause the scripts to be run? >> >> Well, with force = 1 hotplug scripts are executed, but devices are >> still busy and they cannot be disconnected (mainly vnd). Also crashed >> the server, but that's NetBSD buggy vnd driver problem. Seeing the >> execution order in libxl_domain_destroy, shouldn't we first destroy >> the domain (xc_domain_destroy) and then remove the devices? > > In the force case, yes, I expect so. > > In the non-force case you want to let the guest shutdown its devices > gracefully so you would do devices first. > > However I'm not entirely sure that a non-forced libxl_domain_destroy > makes much sense. The callsites are: > > * handle_domain_death: The guest has already shutdown at this > point. Nothing graceful can happen. > * create_domain: We have failed to start the guest, no chance of > graceful shutdown. > * destroy_domain: Semantics are explicitly the force case. > * save_domain: Domain has already suspended. There's nothing which > can be done gracefully. > * migrate_domain: Already forced, domain is gone already, no > chance of a graceful shutdown. > * migrate_receive: Already forced, we have failed to receive the > domain, no possibility of a graceful shutdown. > * libxl_domain_create, on the failure path so no need for a > graceful option. > * libxl__destroy_device_model. Maybe this should be doing a > graceful shutdown but in that case it should either be calling > libxl_domain_shutdown or writing the qemu-dm control node and > waiting, at which point after some timeout perhaps a forced > shutdown would be appropriate. > > So it seems to me that the non-forced option in libxl_domain_destroy can > be removed and we should just shoot the domain and then forcibly > teardown the backends, running script as necessary. > > The only wrinkle is the stub device-model case but really that's already > a special domain and should be treated as such. > > Does that make sense to anyone else? > > Ian. >Well, I think I found the underlying problem that was preventing NetBSD from correctly detaching vnd devices when destroying a domain, the frontend state needs to be manually set to 6 (/local/domain/<domid>/device/vbd/<devid>/state and the same for vif to be more "correct") so the kernel closes the device and it can then be correctly unmounted. I will prepare a patch (or a series) to adress this, and change libxl_domain_destroy to use the force when called. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Seemingly Similar Threads
- [PATCH] xl: Introduce shutdown xm compatibility option -a to shutdown all domains
- [PATCH] Make XEN_DOMCTL_destroydomain hypercall continuable.
- [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
- [PATCH] libxl: make domain resume API asynchronous
- Xen 4.1 rc1 test report