Florian Heigl
2011-Sep-29 16:37 UTC
[Xen-users] CentOS domU hangs on "Restarting system" - didn''t you have that one, too?
Hi, I''m still trying to pin down one of the last issues on some systems here. I''m interested for input from people who *recognize* the following: Sending all processes the TERM signal... [ OK ] Sending all processes the KILL signal... [ OK ] Saving random seed: [ OK ] Syncing hardware clock to system time [FAILED] Turning off swap: [ OK ] Unmounting file systems: [ OK ] Please stand by while rebooting the system. Restarting the system. \ \_______ this is a lie, no restart ever happens. This error will occur sometimes, not always. It reliably goes away upon a XenD restart. Setup: =====OS: CentOS 5.4 / 32bit / Xen 3 (outdatedness grade indicator: .1.2-164.15.1.el5) All guests (around 80) & hosts (10ish) run the same release, but I also have done a test with one host running the latetest and greatest Xen version from CentOS 5.7 Things that I tried to blame so far: ------------------------------------ = Old Xen version (switching to less old one didn''t help) qemu VFB due to https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=718620 = the event channel issue where the dom0 and domU are using different vcpus while talking to each other http://lists.xensource.com/archives/html/xen-devel/2009-01/msg00004.html this could possibly be sorted with a nightmarish hack that maps all vcpus onto one cpu on shutdown time by sshing into dom0. One would have to ensure the mapping is OK again after a reboot. err. you can imagine how much I "like" this idea. = domU kernel: yet untested, I hardly have any chance of updating it, rather would need to backport the fix (if there was one) to 5.4 I found some posts by people that didn''t get the error any more after moving to something newer than CentOS 5.2 but this doesn''t seem to have completely done away with it. So far I failed to make this issue 100% reproducible. It will show up minutes after freshly installing a Xen host, or it will not show up for a week on another one. It may affect all VMs on a host, or it may affect only one. You can work around it by using xm destroy plus killing of any stuck qemu vfb processes (which is one of the reasons for pointing at the VFB) service xend restart xm create vm but the xend restart introduces other issues, i.e. that any unaffected VM that is rebooted during the restart will be gone with the winds, or the fact that you''d have to have a magic way that detects a stuck VM and triggers the restart. Also I don''t feel quite sure that a few 100 xend restarts would do no harm over time... The low chance of reproducing the issue is one of the big problems with it[*], so if you remember that issue and did any successful troubleshooting for it (or fixed it...) let me know. Thanks :) Florian [*]Let alone systems that won''t even make a reboot and what it makes me think about the QA -- Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R Florian Heigl | |\/| | '' / Steinstr. 44 | | | | . \ Linux Beratung & Schulung 81667 München |_| |_|_|\_\ http://mathias-kettner.de Tel.: 089 / 1890 4210 Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users