Florian Heigl
2011-Nov-22 16:33 UTC
CentOS 5.4 jumps in time and shows 9999.99% CPU - anyone.
Hi, (repost: the mailer daemon just ated the mail as described. I hope it was yummy) I spend half an hour asking myself if this is for xen-users or -devel. Since I can''t decide I''ll cross-post a little, hoping that the topic catches only people who happend to have or fix this issue.: I''m pretty amazed by one bug that I can''t seem to pinpoint or get rid of right now: This is using old CentOS 5.4 with the 3.x Xen version that came with it. Sometimes, without much reason, a domU will make a jump in time to the future, by like 30 minutes to an hour. This, as you could expect, breaks the OS quite badly. It stops most processing until that point in time is reached and then happily continues to work. You can recognize it when all processes in top end up at 9999.99% cpu usage, or by issueing date (examples follow) There are NO related errors in dmesg or on the host. No others either. reboot -f inside the VM will not work any more once this happened, a crash via sysrq does work. There have been old reports of this issue, some, but not all in combination with live migration, which we''re not using. In that case the restore functions overwrote the timer information, I seem to be getting broken information instead. (wtf) So live migration issue == fixed. Other issue == i don''t know If you look at this and ignore the comments from other people that didn''t have the same issue (i.e. they have a vmware timeacceleration platform) then it makes me think this is something introduced with 5.4 https://www.centos.org/modules/newbb/viewtopic.php?topic_id=23402 (timmsteve and the thread starter) I would like to fix the issue instead of switching to independent wallclock. Reasons: running independent wallclock uses up CPU time for nothing. having one all VMs run using the time of one shared perfectly synced clock just makes a lot more sense. Backporting a patch to old Xen wouldn''t worry me too much, I just need to get a solution somehow. I''m not sure what the bigger problem is right now: Not finding a way to fix it, or not being able to reproduce it at will. Either of this would make me glad right now. OS CentOS 5.4 Upgrading the hosts would is possible, upgrading the VMs is *not*. Real time: Tue Nov 22 16:37:29 2011 Time / top displayed on the system, almost 20 minutes in the future. top - 16:52:04 up 8 days, 3:41, 2 users, load average: 0.25, 0.08, 0.02 Tasks: 87 total, 1 running, 86 sleeping, 0 stopped, 0 zombie Cpu(s): 0.9%us, 0.7%sy, 0.0%ni, 97.7%id, 0.1%wa, 0.0%hi, 0.1%si, 0.5%st Mem: 524464k total, 514608k used, 9856k free, 53556k buffers Swap: 522104k total, 52k used, 522052k free, 60292k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1380 xxx 15 0 194m 24m 12m S 9999.0 4.7 17:17.77 xxxx 1 root 15 0 2088 732 628 S 9999.0 0.1 0:00.11 init 2 root RT -5 0 0 0 S 9999.0 0.0 0:02.50 migration/0 3 root 34 19 0 0 0 S 9999.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 9999.0 0.0 0:00.00 watchdog/0 5 root 10 -5 0 0 0 S 9999.0 0.0 0:00.05 events/0 6 root 14 -5 0 0 0 S 9999.0 0.0 0:00.21 khelper 7 root 11 -5 0 0 0 S 9999.0 0.0 0:00.00 kthread Current Clocksource = jiffies Independent wallclock = 0 permitted_clock_jitter = 10000000 Did anybody who had this issue *fix* it? Is it *definitely* gone with indepent clocks? Does CentOS / RedHat really have a QA team? :) Going to read through the kernel-xen patches in 5.7 now :/ Thanks for reading / any help, Florian -- the purpose of libvirt is to provide an abstraction layer hiding all xen features added since 2006 until they were finally understood and copied by the kvm devs.