Hi xen-users, I've recently installed two Dom0 Xen 4.4.0 (xnode1 and xnode2). PV domU migrates between them without any problem. But remus doesn't work. Every time I'm starting remus at first it looks good, but domU never renames it's temporarily "*-incoming" name on receiving dom0 (xnode2). While remus is syncing domU state I'm destroying domU at xnode1 and domU at xnode2 dies also. Here is an example output of remus: xnode1 ~ # xl -vvvvv remus tr1 xnode2 Password: migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/185) libxl: debug: libxl.c:709:libxl_domain_remus_start: ao 0x17cdb60: create: how=(nil) callback=(nil) poller=0x17cd880 libxl: debug: libxl_dom.c:1244:libxl__toolstack_save: domain=7 toolstack data size=8 Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/185) Savefile contains xl domain config libxl: debug: libxl.c:736:libxl_domain_remus_start: ao 0x17cdb60: inprogress: poller=0x17cd880, flags=i libxl-save-helper: debug: starting save: Success xc: detail: xc_domain_save: starting save of domid 7 xc: detail: Had 0 unexplained entries in p2m table xc: Saving memory: iter 0 (last sent 0 skipped 0): 4096/65536 6%xc: progress: Reloading memory pages: 4096/65536 6% xc: Saving memory: iter 0 (last sent 0 skipped 0): 8212/65536 12%xc: progress: Reloading memory pages: 8192/65536 12% xc: Saving memory: iter 0 (last sent 0 skipped 0): 11284/65536 17%xc: progress: Reloading memory pages: 11264/65536 17% xc: Saving memory: iter 0 (last sent 0 skipped 0): 15380/65536 23%xc: progress: Reloading memory pages: 15360/65536 23% xc: Saving memory: iter 0 (last sent 0 skipped 0): 18452/65536 28%xc: progress: Reloading memory pages: 18432/65536 28% xc: Saving memory: iter 0 (last sent 0 skipped 0): 22548/65536 34%xc: progress: Reloading memory pages: 22528/65536 34% xc: Saving memory: iter 0 (last sent 0 skipped 0): 25620/65536 39%xc: progress: Reloading memory pages: 25600/65536 39% xc: Saving memory: iter 0 (last sent 0 skipped 0): 29716/65536 45%xc: progress: Reloading memory pages: 29696/65536 45% xc: Saving memory: iter 0 (last sent 0 skipped 0): 32788/65536 50%xc: progress: Reloading memory pages: 32768/65536 50% xc: Saving memory: iter 0 (last sent 0 skipped 0): 36884/65536 56%xc: progress: Reloading memory pages: 36864/65536 56% xc: Saving memory: iter 0 (last sent 0 skipped 0): 40980/65536 62%xc: progress: Reloading memory pages: 40960/65536 62% xc: Saving memory: iter 0 (last sent 0 skipped 0): 44052/65536 67%xc: progress: Reloading memory pages: 44032/65536 67% xc: Saving memory: iter 0 (last sent 0 skipped 0): 48148/65536 73%xc: progress: Reloading memory pages: 48128/65536 73% xc: Saving memory: iter 0 (last sent 0 skipped 0): 51220/65536 78%xc: progress: Reloading memory pages: 51200/65536 78% xc: Saving memory: iter 0 (last sent 0 skipped 0): 55316/65536 84%xc: progress: Reloading memory pages: 55296/65536 84% xc: Saving memory: iter 0 (last sent 0 skipped 0): 58393/65536 89%xc: progress: Reloading memory pages: 58368/65536 89% xc: Saving memory: iter 0 (last sent 0 skipped 0): 62522/65536 95%xc: progress: Reloading memory pages: 62464/65536 95% xc: Saving memory: iter 0 (last sent 0 skipped 0): 65536/65536 100% xc: detail: delta 22754ms, dom0 18%, target 0%, sent 94Mb/s, dirtied 0Mb/s 84 pages xc: Saving memory: iter 1 (last sent 65454 skipped 82): 65536/65536 100% xc: detail: delta 31ms, dom0 12%, target 0%, sent 88Mb/s, dirtied 0Mb/s 0 pages xc: Saving memory: iter 2 (last sent 84 skipped 0): 65536/65536 100% xc: detail: Start last iteration libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend xc: progress: Reloading memory pages: 65538/65536 100% libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback: guest has suspended xc: detail: SUSPEND shinfo 0007daf8 xc: detail: delta 202ms, dom0 15%, target 0%, sent 0Mb/s, dirtied 25Mb/s 160 pages xc: Saving memory: iter 3 (last sent 0 skipped 0): 65536/65536 100% xc: detail: delta 2ms, dom0 0%, target 0%, sent 2621Mb/s, dirtied 2621Mb/s 160 pages xc: detail: Total pages sent= 65698 (1.00x) xc: detail: (of which 0 were fixups) xc: detail: All memory is saved libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback: guest has suspended xc: detail: SUSPEND shinfo 0007daf8 xc: detail: delta 201ms, dom0 0%, target 0%, sent 0Mb/s, dirtied 26Mb/s 160 pages xc: Saving memory: iter 4 (last sent 160 skipped 0): 65536/65536 100% xc: detail: delta 3ms, dom0 0%, target 0%, sent 1900Mb/s, dirtied 1900Mb/s 174 pages xc: detail: Total pages sent= 65872 (1.01x) xc: detail: (of which 0 were fixups) xc: detail: All memory is saved ........[skipped]............ ------------------------------------------------------------------------------- xnode1 ~ # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 8191 4 r----- 200.8 tr1 3 256 2 ---ss- 1.1 ------------------------------------------------------------------------------- xnode2 ~ # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 8192 4 r----- 728.9 tr1--incoming 11 256 0 --p--- 0.0 ------------------------------------------------------------------------------- then I'm destroying domU at xnode1: node1 ~ # xl destroy tr1 ........[continue of xl remus output at xnode1]............ libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend xc: error: rdexact failed (select returned 0): Internal error xc: error: Error when reading batch size (110 = Connection timed out): Internal error xc: error: error when buffering batch, finishing (110 = Connection timed out): Internal error libxl: error: libxl_create.c:940:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable libxl: error: libxl_create.c:1022:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: error: libxl.c:1384:libxl__destroy_domid: non-existant domain 14 libxl: error: libxl.c:1348:domain_destroy_callback: unable to destroy guest with domid 14 libxl: error: libxl_create.c:1320:domcreate_destruction_cb: unable to destroy domain 14 following failed creation migration target: Domain creation failed (code -3). libxl: error: libxl_dom.c:1151:libxl__domain_suspend_common_callback: guest did not suspend xc: error: Suspend request failed: Internal error xc: error: Domain appears not to have suspended: Internal error xc: detail: Save exit of domid 7 with rc=1 libxl-save-helper: debug: complete r=1: Invalid argument libxl: error: libxl_dom.c:1406:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Invalid argument libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60: complete, rc=-3 libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60: destroy remus sender: libxl_domain_suspend failed (rc=-3) Remus: Backup failed? resuming domain at primary. libxl: debug: libxl.c:433:libxl_domain_resume: ao 0x17cdb60: create: how=(nil) callback=(nil) poller=0x17cd880 xc: error: Could not get domain info (3 = No such process): Internal error libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 7: No such process libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60: complete, rc=-3 libxl: debug: libxl.c:436:libxl_domain_resume: ao 0x17cdb60: inprogress: poller=0x17cd880, flags=ic libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60: destroy xc: debug: hypercall buffer: total allocations:298 total releases:298 xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 xc: debug: hypercall buffer: cache current size:2 xc: debug: hypercall buffer: cache hits:265 misses:2 toobig:31 xnode1 ~ # And domU dies on both hypervisors. No logs at xnode2. domU configuration file: ------------------------------------------------------------------------------- kernel = "/etc/xen/kernels/kernel-3.14.14-gentoo-domU" memory = 256 name = "tr1" disk = ['drbd:r1,xvda1,w'] root = "/dev/xvda1 ro" vcpus=2 ------------------------------------------------------------------------------- I'm currently using 3.14.14 kernel for both dom0 anf domU. Can anybody more experienced help me to make remus work or at least give a hint what steps I can make to debug it deeper? Thank you, Konstantin