Kyungjin Yoo
2010-Sep-14 16:05 UTC
[Xen-devel] remus failure -xen 4.0.1: xc_restore failed only at some heavy workload
I have done some experiments with remus and had some problems with its
failover.
I set up dormO, and dormU like below and backup server is setup as same as
primary.
Ubuntu 9.10
Xen 4.0.1-rc2
kernel for dorm0 : 2.6.32.18
kernel for dormU : 2.6.18.8
with idle guest running on dorm0, I run remus on primary server, and destroy
guest or remus,
remus failover works and guest from primary server moves to backup server.
but for some workload experiment, I run specweb or kernel compile on the
guest and primary server runs remus.
when the guest is destroyed or remus is killed, it doesn''t survive at
backup
server even though it is checkpointing before. there was ''p''
state of guest
at backup server while checkpointing, but it''s disappeared.
Error in xend.log at backup server shows this message.
----
[XXXX-XX-XX 13:56:50 6038] ERROR (XendCheckpoint:357)
/usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py",
line
309, in restore
forkHelper(cmd, fd, handler.handler, True)
File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py",
line
411, in forkHelper
raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
[XXXX-XX-XX 13:56:50 6038] ERROR (XendDomain:1175) Restore failed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/xen/xend/XendDomain.py", line
1159,
in domain_restore_fd
dominfo = XendCheckpoint.restore(self, fd, paused=paused,
relocating=relocating)
File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py",
line
358, in restore
raise exn
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
----
it looks quite same with previous question from Shriram Rajagopalan
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00369.html
and this error seems appeared in xen live migration in the past, since remus
shares functions with live migration, and error showed at xen live migration
function.
anyone has previous similar experience either with remus or xen live
migration?
anyone found any reason or solution for this?
I will appreciate it if anyone can help with this.
Thank you.
Kyungjin.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel