Hi xen-users,
I've recently installed two Dom0 Xen 4.4.0 (xnode1 and xnode2). PV domU
migrates between them without any problem. But remus doesn't work. Every
time I'm starting remus at first it looks good, but domU never renames
it's temporarily "*-incoming" name on receiving dom0 (xnode2).
While
remus is syncing domU state I'm destroying domU at xnode1 and domU at
xnode2 dies also.
Here is an example output of remus:
xnode1 ~ # xl -vvvvv remus tr1 xnode2
Password:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/185)
libxl: debug: libxl.c:709:libxl_domain_remus_start: ao 0x17cdb60:
create: how=(nil) callback=(nil) poller=0x17cd880
libxl: debug: libxl_dom.c:1244:libxl__toolstack_save: domain=7 toolstack
data size=8
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/185)
Savefile contains xl domain config
libxl: debug: libxl.c:736:libxl_domain_remus_start: ao 0x17cdb60:
inprogress: poller=0x17cd880, flags=i
libxl-save-helper: debug: starting save: Success
xc: detail: xc_domain_save: starting save of domid 7
xc: detail: Had 0 unexplained entries in p2m table
xc: Saving memory: iter 0 (last sent 0 skipped 0): 4096/65536 6%xc:
progress: Reloading memory pages: 4096/65536 6%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 8212/65536 12%xc:
progress: Reloading memory pages: 8192/65536 12%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 11284/65536 17%xc:
progress: Reloading memory pages: 11264/65536 17%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 15380/65536 23%xc:
progress: Reloading memory pages: 15360/65536 23%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 18452/65536 28%xc:
progress: Reloading memory pages: 18432/65536 28%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 22548/65536 34%xc:
progress: Reloading memory pages: 22528/65536 34%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 25620/65536 39%xc:
progress: Reloading memory pages: 25600/65536 39%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 29716/65536 45%xc:
progress: Reloading memory pages: 29696/65536 45%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 32788/65536 50%xc:
progress: Reloading memory pages: 32768/65536 50%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 36884/65536 56%xc:
progress: Reloading memory pages: 36864/65536 56%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 40980/65536 62%xc:
progress: Reloading memory pages: 40960/65536 62%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 44052/65536 67%xc:
progress: Reloading memory pages: 44032/65536 67%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 48148/65536 73%xc:
progress: Reloading memory pages: 48128/65536 73%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 51220/65536 78%xc:
progress: Reloading memory pages: 51200/65536 78%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 55316/65536 84%xc:
progress: Reloading memory pages: 55296/65536 84%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 58393/65536 89%xc:
progress: Reloading memory pages: 58368/65536 89%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 62522/65536 95%xc:
progress: Reloading memory pages: 62464/65536 95%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 65536/65536 100%
xc: detail: delta 22754ms, dom0 18%, target 0%, sent 94Mb/s, dirtied
0Mb/s 84 pages
xc: Saving memory: iter 1 (last sent 65454 skipped 82): 65536/65536
100%
xc: detail: delta 31ms, dom0 12%, target 0%, sent 88Mb/s, dirtied 0Mb/s
0 pages
xc: Saving memory: iter 2 (last sent 84 skipped 0): 65536/65536 100%
xc: detail: Start last iteration
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback:
wait for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback:
guest acknowledged suspend request
libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback:
wait for the guest to suspend
xc: progress: Reloading memory pages: 65538/65536 100%
libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback:
guest has suspended
xc: detail: SUSPEND shinfo 0007daf8
xc: detail: delta 202ms, dom0 15%, target 0%, sent 0Mb/s, dirtied 25Mb/s
160 pages
xc: Saving memory: iter 3 (last sent 0 skipped 0): 65536/65536 100%
xc: detail: delta 2ms, dom0 0%, target 0%, sent 2621Mb/s, dirtied
2621Mb/s 160 pages
xc: detail: Total pages sent= 65698 (1.00x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback:
wait for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback:
guest acknowledged suspend request
libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback:
wait for the guest to suspend
libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback:
guest has suspended
xc: detail: SUSPEND shinfo 0007daf8
xc: detail: delta 201ms, dom0 0%, target 0%, sent 0Mb/s, dirtied 26Mb/s
160 pages
xc: Saving memory: iter 4 (last sent 160 skipped 0): 65536/65536 100%
xc: detail: delta 3ms, dom0 0%, target 0%, sent 1900Mb/s, dirtied
1900Mb/s 174 pages
xc: detail: Total pages sent= 65872 (1.01x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
........[skipped]............
-------------------------------------------------------------------------------
xnode1 ~ # xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 8191 4 r-----
200.8
tr1 3 256 2 ---ss-
1.1
-------------------------------------------------------------------------------
xnode2 ~ # xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 8192 4 r-----
728.9
tr1--incoming 11 256 0 --p---
0.0
-------------------------------------------------------------------------------
then I'm destroying domU at xnode1:
node1 ~ # xl destroy tr1
........[continue of xl remus output at xnode1]............
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback:
wait for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback:
guest acknowledged suspend request
libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback:
wait for the guest to suspend
xc: error: rdexact failed (select returned 0): Internal error
xc: error: Error when reading batch size (110 = Connection timed out):
Internal error
xc: error: error when buffering batch, finishing (110 = Connection timed
out): Internal error
libxl: error: libxl_create.c:940:libxl__xc_domain_restore_done:
restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:1022:domcreate_rebuild_done: cannot
(re-)build domain: -3
libxl: error: libxl.c:1384:libxl__destroy_domid: non-existant domain 14
libxl: error: libxl.c:1348:domain_destroy_callback: unable to destroy
guest with domid 14
libxl: error: libxl_create.c:1320:domcreate_destruction_cb: unable to
destroy domain 14 following failed creation
migration target: Domain creation failed (code -3).
libxl: error: libxl_dom.c:1151:libxl__domain_suspend_common_callback:
guest did not suspend
xc: error: Suspend request failed: Internal error
xc: error: Domain appears not to have suspended: Internal error
xc: detail: Save exit of domid 7 with rc=1
libxl-save-helper: debug: complete r=1: Invalid argument
libxl: error: libxl_dom.c:1406:libxl__xc_domain_save_done: saving
domain: domain responded to suspend request: Invalid argument
libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60:
complete, rc=-3
libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60:
destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
libxl: debug: libxl.c:433:libxl_domain_resume: ao 0x17cdb60: create:
how=(nil) callback=(nil) poller=0x17cd880
xc: error: Could not get domain info (3 = No such process): Internal
error
libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed
for domain 7: No such process
libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60:
complete, rc=-3
libxl: debug: libxl.c:436:libxl_domain_resume: ao 0x17cdb60: inprogress:
poller=0x17cd880, flags=ic
libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60:
destroy
xc: debug: hypercall buffer: total allocations:298 total releases:298
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:265 misses:2 toobig:31
xnode1 ~ #
And domU dies on both hypervisors. No logs at xnode2.
domU configuration file:
-------------------------------------------------------------------------------
kernel = "/etc/xen/kernels/kernel-3.14.14-gentoo-domU"
memory = 256
name = "tr1"
disk = ['drbd:r1,xvda1,w']
root = "/dev/xvda1 ro"
vcpus=2
-------------------------------------------------------------------------------
I'm currently using 3.14.14 kernel for both dom0 anf domU. Can anybody
more experienced help me to make remus work or at least give a hint what
steps I can make to debug it deeper?
Thank you,
Konstantin