Hi, all -- I''m in the process of configuring a new machine for use as a Xen server, and am having some rather significant stability issues. The hardware/distro/kernel info is: opensuse 10.3 Linux offxen2 2.6.22.5-31-xen #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux 2 x Dual-Core AMD Opteron(tm) Processor 2212 HE 8 x 2GB PC2-5300 RAM Xen 3.1.0_15042-51 I currently have three guest domains created: - 1 opensuse 10.3, paravirtualized - 1 Windows XP, HVM - 1 Windows 2003, HVM I can start all three domains, etc. The trouble that I''m having appears as soon as I begin to stress-test the configuration. Running standard burn-in testing tools (BurnInTest, for example), I quickly manage to lock the machine up. Testing this morning, for example, I ran BurnInTest in all four domains -- dom0, and all three domU''s. It was configured to run with only CPU, Memory, and Disk tests. I''ll grant that I''m putting far more than standard load on the hardware, but, well... Within 3 minutes of the tests beginning, the host machine became entirely unresponsive at the console, or to the network. After reboot, the following messages appear in the messages file on dom0: Oct 26 11:16:24 offxen2 kernel: clocksource/0: Time went backwards: delta=-87834561 shadow=1219075978646569 offset=40156133 Oct 26 11:16:24 offxen2 kernel: clocksource/3: Time went backwards: delta=-60020543 shadow=1219076192448184 offset=116224924 Oct 26 11:16:24 offxen2 kernel: clocksource/3: Time went backwards: delta=-30041758 shadow=1219076192448184 offset=242467112 Oct 26 11:16:28 offxen2 kernel: clocksource/2: Time went backwards: delta=-29692236 shadow=1219079978661601 offset=564764018 Oct 26 11:16:34 offxen2 kernel: clocksource/0: Time went backwards: delta=-21729405 shadow=1219085978683108 offset=317335629 Oct 26 11:16:37 offxen2 kernel: clocksource/3: Time went backwards: delta=-15880975 shadow=1219087978689164 offset=670825178 Oct 26 11:16:37 offxen2 kernel: clocksource/1: Time went backwards: delta=-121188552 shadow=1219089233041322 offset=181068207 Oct 26 11:16:39 offxen2 kernel: clocksource/2: Time went backwards: delta=-130702951 shadow=1219090051851342 offset=632907468 Oct 26 11:16:50 offxen2 kernel: clocksource/0: Time went backwards: delta=-150041357 shadow=1219101192548457 offset=648236947 Oct 26 11:16:52 offxen2 kernel: clocksource/1: Time went backwards: delta=-43577936 shadow=1219103192555550 offset=917191565 Oct 26 11:17:00 offxen2 kernel: clocksource/2: Time went backwards: delta=-30133282 shadow=1219112051923316 offset=182801944 Similarly, these messages appear in the messages file in the opensuse domU: Oct 26 11:14:42 mx1 kernel: clocksource/0: Time went backwards: delta=-30524524 shadow=1218974192040103 offset=299209428 Oct 26 11:14:42 mx1 kernel: klogd 1.4.1, ---------- state change ---------- Oct 26 11:14:50 mx1 kernel: clocksource/0: Time went backwards: delta=-35339594 shadow=1218980978262608 offset=653280945 Oct 26 11:14:50 mx1 kernel: clocksource/1: Time went backwards: delta=-32626156 shadow=1218980978262608 offset=685231574 Oct 26 11:15:06 mx1 kernel: clocksource/1: Time went backwards: delta=-33094544 shadow=1218997051525512 offset=883292081 Oct 26 11:15:07 mx1 kernel: clocksource/1: Time went backwards: delta=-28049569 shadow=1218998978340581 offset=27113639 Oct 26 11:15:07 mx1 kernel: clocksource/0: Time went backwards: delta=-29452075 shadow=1218999192142335 offset=173497295 Oct 26 11:15:10 mx1 kernel: clocksource/0: Time went backwards: delta=-24180432 shadow=1219001232726262 offset=495644020 Oct 26 11:15:12 mx1 kernel: clocksource/1: Time went backwards: delta=-29407880 shadow=1219004232742124 offset=267503786 Oct 26 11:15:14 mx1 kernel: clocksource/1: Time went backwards: delta=-72293790 shadow=1219005978366429 offset=618198368 Oct 26 11:15:19 mx1 kernel: clocksource/1: Time went backwards: delta=-30010141 shadow=1219010051576699 offset=685063187 Oct 26 11:15:25 mx1 kernel: clocksource/0: Time went backwards: delta=-34145209 shadow=1219016192222797 offset=464415813 Oct 26 11:15:26 mx1 kernel: clocksource/1: Time went backwards: delta=-11964123 shadow=1219017192226704 offset=564340684 Oct 26 11:15:26 mx1 kernel: clocksource/0: Time went backwards: delta=-11068356 shadow=1219017051577257 offset=721244578 Oct 26 11:15:30 mx1 kernel: clocksource/1: Time went backwards: delta=-11507778 shadow=1219022192247457 offset=191448477 Oct 26 11:15:31 mx1 kernel: clocksource/0: Time went backwards: delta=-15586265 shadow=1219022978434436 offset=66316116 Oct 26 11:15:35 mx1 kernel: clocksource/0: Time went backwards: delta=-55640150 shadow=1219027051616803 offset=76664416 Oct 26 11:15:40 mx1 kernel: clocksource/1: Time went backwards: delta=-27264629 shadow=1219031192284307 offset=827170075 Oct 26 11:15:48 mx1 kernel: clocksource/0: Time went backwards: delta=-13974148 shadow=1219039232839657 offset=399204868 Oct 26 11:15:57 mx1 kernel: clocksource/0: Time went backwards: delta=-32130780 shadow=1219047978530514 offset=790642402 Oct 26 11:15:57 mx1 kernel: clocksource/0: Time went backwards: delta=-30011595 shadow=1219048051699939 offset=989293882 Oct 26 11:16:00 mx1 kernel: clocksource/0: Time went backwards: delta=-16655720 shadow=1219051192343976 offset=787632598 Oct 26 11:16:03 mx1 kernel: clocksource/0: Time went backwards: delta=-11904065 shadow=1219054192356289 offset=971736210 Oct 26 11:16:06 mx1 kernel: clocksource/1: Time went backwards: delta=-23181648 shadow=1219057192369484 offset=459549949 Oct 26 11:16:07 mx1 kernel: clocksource/1: Time went backwards: delta=-30345878 shadow=1219058978577439 offset=435960608 Oct 26 11:16:10 mx1 kernel: clocksource/0: Time went backwards: delta=-31035353 shadow=1219061051756571 offset=795467413 Oct 26 11:16:11 mx1 kernel: clocksource/0: Time went backwards: delta=-20089266 shadow=1219062978593081 offset=101685066 Oct 26 11:16:16 mx1 kernel: printk: 50269 messages suppressed. Oct 26 11:16:16 mx1 kernel: clocksource/1: Time went backwards: delta=-24188173 shadow=1219067232963516 offset=765471054 Oct 26 11:16:21 mx1 kernel: printk: 28000 messages suppressed. Oct 26 11:16:21 mx1 kernel: clocksource/1: Time went backwards: delta=-30051890 shadow=1219073051795398 offset=556484104 Oct 26 11:16:26 mx1 kernel: clocksource/0: Time went backwards: delta=-60024893 shadow=1219077192453422 offset=815345740 Oct 26 11:16:31 mx1 kernel: printk: 1 messages suppressed. Oct 26 11:16:31 mx1 kernel: clocksource/0: Time went backwards: delta=-805691001 shadow=1219082978673021 offset=152356711 Oct 26 11:16:37 mx1 kernel: printk: 2 messages suppressed. Oct 26 11:16:37 mx1 kernel: clocksource/1: Time went backwards: delta=-10843062 shadow=1219088233054364 offset=442409918 Oct 26 11:16:43 mx1 kernel: clocksource/1: Time went backwards: delta=-40515520 shadow=1219094233037687 offset=442490158 Oct 26 11:16:55 mx1 kernel: printk: 10106 messages suppressed. Oct 26 11:16:55 mx1 kernel: clocksource/1: Time went backwards: delta=-50129861 shadow=1219106192568569 offset=847989665 Googling around, I''ve found references to this issue appearing at various times in the past, but haven''t seen anything that appears to be a standard and/or confirmed workaround for it. I have seen suggestions like using NTP, etc, and have tried those, but without success. I''m wondering if there are any known workarounds, or if I''m SOL on this particular issue. For what it''s worth, we saw this same issue several weeks ago, when we first configured the machine, with only light load on the box. (A single opensuse 10.3 guest domain, being tested as a mail relay, receiving and doing spam analysis on about 10-20 messages per minute.) So, while the load I was putting on is quite heavy, I do know that the same problem can occur in a non-stress environment. It just seems to happen much more quickly when being stressed. Any suggestions/comments/etc welcome! Thanks, Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Verwilst
2007-Oct-27 12:17 UTC
Re: [Xen-users] Time went backwards / Stability issues
Myeah, xen on 2.6.22 is a bit flawed in that area so it seems :) I''ve had the same troubles with the Ubuntu 2.6.22-14-xen kernel. You can try settings your clocksource to jiffies instead of xen, which seem to fix the problem.. I ''fixed'' the problem here to just compile my own ( 2.6.18 ) kernel. The people at xensource aren''t too bothered to keep their source up to date with the latest kernels, which ( after 5 releases ) starts getting harder and harder for external people to port it, and more bugs start cropping in.. Kind regards, Bart ----- Original Message ----- From: "Ian Marlier" <ian.marlier@studentuniverse.com> To: xen-users@lists.xensource.com Sent: Friday, October 26, 2007 6:13:52 PM (GMT+0100) Europe/Berlin Subject: [Xen-users] Time went backwards / Stability issues Hi, all -- I''m in the process of configuring a new machine for use as a Xen server, and am having some rather significant stability issues. The hardware/distro/kernel info is: opensuse 10.3 Linux offxen2 2.6.22.5-31-xen #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux 2 x Dual-Core AMD Opteron(tm) Processor 2212 HE 8 x 2GB PC2-5300 RAM Xen 3.1.0_15042-51 I currently have three guest domains created: - 1 opensuse 10.3, paravirtualized - 1 Windows XP, HVM - 1 Windows 2003, HVM I can start all three domains, etc. The trouble that I''m having appears as soon as I begin to stress-test the configuration. Running standard burn-in testing tools (BurnInTest, for example), I quickly manage to lock the machine up. Testing this morning, for example, I ran BurnInTest in all four domains -- dom0, and all three domU''s. It was configured to run with only CPU, Memory, and Disk tests. I''ll grant that I''m putting far more than standard load on the hardware, but, well... Within 3 minutes of the tests beginning, the host machine became entirely unresponsive at the console, or to the network. After reboot, the following messages appear in the messages file on dom0: Oct 26 11:16:24 offxen2 kernel: clocksource/0: Time went backwards: delta=-87834561 shadow=1219075978646569 offset=40156133 Oct 26 11:16:24 offxen2 kernel: clocksource/3: Time went backwards: delta=-60020543 shadow=1219076192448184 offset=116224924 Oct 26 11:16:24 offxen2 kernel: clocksource/3: Time went backwards: delta=-30041758 shadow=1219076192448184 offset=242467112 Oct 26 11:16:28 offxen2 kernel: clocksource/2: Time went backwards: delta=-29692236 shadow=1219079978661601 offset=564764018 Oct 26 11:16:34 offxen2 kernel: clocksource/0: Time went backwards: delta=-21729405 shadow=1219085978683108 offset=317335629 Oct 26 11:16:37 offxen2 kernel: clocksource/3: Time went backwards: delta=-15880975 shadow=1219087978689164 offset=670825178 Oct 26 11:16:37 offxen2 kernel: clocksource/1: Time went backwards: delta=-121188552 shadow=1219089233041322 offset=181068207 Oct 26 11:16:39 offxen2 kernel: clocksource/2: Time went backwards: delta=-130702951 shadow=1219090051851342 offset=632907468 Oct 26 11:16:50 offxen2 kernel: clocksource/0: Time went backwards: delta=-150041357 shadow=1219101192548457 offset=648236947 Oct 26 11:16:52 offxen2 kernel: clocksource/1: Time went backwards: delta=-43577936 shadow=1219103192555550 offset=917191565 Oct 26 11:17:00 offxen2 kernel: clocksource/2: Time went backwards: delta=-30133282 shadow=1219112051923316 offset=182801944 Similarly, these messages appear in the messages file in the opensuse domU: Oct 26 11:14:42 mx1 kernel: clocksource/0: Time went backwards: delta=-30524524 shadow=1218974192040103 offset=299209428 Oct 26 11:14:42 mx1 kernel: klogd 1.4.1, ---------- state change ---------- Oct 26 11:14:50 mx1 kernel: clocksource/0: Time went backwards: delta=-35339594 shadow=1218980978262608 offset=653280945 Oct 26 11:14:50 mx1 kernel: clocksource/1: Time went backwards: delta=-32626156 shadow=1218980978262608 offset=685231574 Oct 26 11:15:06 mx1 kernel: clocksource/1: Time went backwards: delta=-33094544 shadow=1218997051525512 offset=883292081 Oct 26 11:15:07 mx1 kernel: clocksource/1: Time went backwards: delta=-28049569 shadow=1218998978340581 offset=27113639 Oct 26 11:15:07 mx1 kernel: clocksource/0: Time went backwards: delta=-29452075 shadow=1218999192142335 offset=173497295 Oct 26 11:15:10 mx1 kernel: clocksource/0: Time went backwards: delta=-24180432 shadow=1219001232726262 offset=495644020 Oct 26 11:15:12 mx1 kernel: clocksource/1: Time went backwards: delta=-29407880 shadow=1219004232742124 offset=267503786 Oct 26 11:15:14 mx1 kernel: clocksource/1: Time went backwards: delta=-72293790 shadow=1219005978366429 offset=618198368 Oct 26 11:15:19 mx1 kernel: clocksource/1: Time went backwards: delta=-30010141 shadow=1219010051576699 offset=685063187 Oct 26 11:15:25 mx1 kernel: clocksource/0: Time went backwards: delta=-34145209 shadow=1219016192222797 offset=464415813 Oct 26 11:15:26 mx1 kernel: clocksource/1: Time went backwards: delta=-11964123 shadow=1219017192226704 offset=564340684 Oct 26 11:15:26 mx1 kernel: clocksource/0: Time went backwards: delta=-11068356 shadow=1219017051577257 offset=721244578 Oct 26 11:15:30 mx1 kernel: clocksource/1: Time went backwards: delta=-11507778 shadow=1219022192247457 offset=191448477 Oct 26 11:15:31 mx1 kernel: clocksource/0: Time went backwards: delta=-15586265 shadow=1219022978434436 offset=66316116 Oct 26 11:15:35 mx1 kernel: clocksource/0: Time went backwards: delta=-55640150 shadow=1219027051616803 offset=76664416 Oct 26 11:15:40 mx1 kernel: clocksource/1: Time went backwards: delta=-27264629 shadow=1219031192284307 offset=827170075 Oct 26 11:15:48 mx1 kernel: clocksource/0: Time went backwards: delta=-13974148 shadow=1219039232839657 offset=399204868 Oct 26 11:15:57 mx1 kernel: clocksource/0: Time went backwards: delta=-32130780 shadow=1219047978530514 offset=790642402 Oct 26 11:15:57 mx1 kernel: clocksource/0: Time went backwards: delta=-30011595 shadow=1219048051699939 offset=989293882 Oct 26 11:16:00 mx1 kernel: clocksource/0: Time went backwards: delta=-16655720 shadow=1219051192343976 offset=787632598 Oct 26 11:16:03 mx1 kernel: clocksource/0: Time went backwards: delta=-11904065 shadow=1219054192356289 offset=971736210 Oct 26 11:16:06 mx1 kernel: clocksource/1: Time went backwards: delta=-23181648 shadow=1219057192369484 offset=459549949 Oct 26 11:16:07 mx1 kernel: clocksource/1: Time went backwards: delta=-30345878 shadow=1219058978577439 offset=435960608 Oct 26 11:16:10 mx1 kernel: clocksource/0: Time went backwards: delta=-31035353 shadow=1219061051756571 offset=795467413 Oct 26 11:16:11 mx1 kernel: clocksource/0: Time went backwards: delta=-20089266 shadow=1219062978593081 offset=101685066 Oct 26 11:16:16 mx1 kernel: printk: 50269 messages suppressed. Oct 26 11:16:16 mx1 kernel: clocksource/1: Time went backwards: delta=-24188173 shadow=1219067232963516 offset=765471054 Oct 26 11:16:21 mx1 kernel: printk: 28000 messages suppressed. Oct 26 11:16:21 mx1 kernel: clocksource/1: Time went backwards: delta=-30051890 shadow=1219073051795398 offset=556484104 Oct 26 11:16:26 mx1 kernel: clocksource/0: Time went backwards: delta=-60024893 shadow=1219077192453422 offset=815345740 Oct 26 11:16:31 mx1 kernel: printk: 1 messages suppressed. Oct 26 11:16:31 mx1 kernel: clocksource/0: Time went backwards: delta=-805691001 shadow=1219082978673021 offset=152356711 Oct 26 11:16:37 mx1 kernel: printk: 2 messages suppressed. Oct 26 11:16:37 mx1 kernel: clocksource/1: Time went backwards: delta=-10843062 shadow=1219088233054364 offset=442409918 Oct 26 11:16:43 mx1 kernel: clocksource/1: Time went backwards: delta=-40515520 shadow=1219094233037687 offset=442490158 Oct 26 11:16:55 mx1 kernel: printk: 10106 messages suppressed. Oct 26 11:16:55 mx1 kernel: clocksource/1: Time went backwards: delta=-50129861 shadow=1219106192568569 offset=847989665 Googling around, I''ve found references to this issue appearing at various times in the past, but haven''t seen anything that appears to be a standard and/or confirmed workaround for it. I have seen suggestions like using NTP, etc, and have tried those, but without success. I''m wondering if there are any known workarounds, or if I''m SOL on this particular issue. For what it''s worth, we saw this same issue several weeks ago, when we first configured the machine, with only light load on the box. (A single opensuse 10.3 guest domain, being tested as a mail relay, receiving and doing spam analysis on about 10-20 messages per minute.) So, while the load I was putting on is quite heavy, I do know that the same problem can occur in a non-stress environment. It just seems to happen much more quickly when being stressed. Any suggestions/comments/etc welcome! Thanks, Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat October 27 2007 8:17:56 am Bart Verwilst wrote:> I''ve had the same troubles with the Ubuntu 2.6.22-14-xen kernel.Not just 2.6.22. Happens on fedora''s 2.6.20 also. Everytime I close or open my laptop lid, the error gets triggered, so it must be acpi related. Haven''t seen it in my test PV domUs, just dom0, but the only thing I keep running for any length of time is HVM. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/28/07 1:05 AM, "jim burns" <jim_burn@bellsouth.net> wrote:> On Sat October 27 2007 8:17:56 am Bart Verwilst wrote: >> I''ve had the same troubles with the Ubuntu 2.6.22-14-xen kernel. > > Not just 2.6.22. Happens on fedora''s 2.6.20 also. Everytime I close or open my > laptop lid, the error gets triggered, so it must be acpi related. Haven''t > seen it in my test PV domUs, just dom0, but the only thing I keep running for > any length of time is HVM.Thanks to both Jim and Bart for their reports on this. I found an older thread about this -- Bart''s from earlier this month -- and tried the jiffies clocksource suggestion there. After changing the clocksource in _only_ dom0, the burn-in tests (that were causing an error within minutes previously) have run for 72 hours straight without causing any errors. I''ve had full CPU utilization for that entire period without a single clocksource issue. My questions: 1) Is there any potential harm in changing the clocksource? That is, is it somehow important that the xen clocksource be used? 2) Bart and Jim, you guys have both seen this before; do either of you know if there''s a bug filed for this? I searched the xensource bugzilla, but wasn''t able to find one -- but I find bugzilla very difficult to search. - Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Verwilst
2007-Oct-29 21:32 UTC
Re: [Xen-users] Time went backwards / Stability issues
Hi Ian, Well, I''ve posted an Ubuntu report here https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/146924 , with a reference to http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=195 , which seems pretty much the same bug.. Kind regards, Bart Verwilst ----- Original Message ----- From: "Ian Marlier" <ian.marlier@studentuniverse.com> To: xen-users@lists.xensource.com Sent: Monday, October 29, 2007 7:47:01 PM (GMT+0100) Europe/Berlin Subject: Re: [Xen-users] Time went backwards / Stability issues On 10/28/07 1:05 AM, "jim burns" <jim_burn@bellsouth.net> wrote:> On Sat October 27 2007 8:17:56 am Bart Verwilst wrote: >> I''ve had the same troubles with the Ubuntu 2.6.22-14-xen kernel. > > Not just 2.6.22. Happens on fedora''s 2.6.20 also. Everytime I close or open my > laptop lid, the error gets triggered, so it must be acpi related. Haven''t > seen it in my test PV domUs, just dom0, but the only thing I keep running for > any length of time is HVM.Thanks to both Jim and Bart for their reports on this. I found an older thread about this -- Bart''s from earlier this month -- and tried the jiffies clocksource suggestion there. After changing the clocksource in _only_ dom0, the burn-in tests (that were causing an error within minutes previously) have run for 72 hours straight without causing any errors. I''ve had full CPU utilization for that entire period without a single clocksource issue. My questions: 1) Is there any potential harm in changing the clocksource? That is, is it somehow important that the xen clocksource be used? 2) Bart and Jim, you guys have both seen this before; do either of you know if there''s a bug filed for this? I searched the xensource bugzilla, but wasn''t able to find one -- but I find bugzilla very difficult to search. - Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: Bart Verwilst [mailto:lists@verwilst.be] > Sent: Monday, October 29, 2007 5:33 PM > To: Marlier, Ian > Cc: xen-users@lists.xensource.com > Subject: Re: [Xen-users] Time went backwards / Stability issues > > Hi Ian, > > Well, I''ve posted an Ubuntu report here > https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/ > +bug/146924 , with a reference to > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=195 , > which seems pretty much the same bug.. > > Kind regards, > > Bart VerwilstI''m actually not sure that 195 is the same bug; though the error message is the same, that bug was reported against the 2.0 codebase, and it''s not clear to me whether the Timer module that was referenced there is still in use in the clocksource stuff. In any case, I went ahead and filed a new bug specifically for the clocksource issue. It''s here: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1098 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon October 29 2007 2:47:01 pm Ian Marlier wrote:> I found an older thread about this -- Bart''s from earlier this month -- and > tried the jiffies clocksource suggestion there.My /sys/devices/system/clocksource/clocksource0/current_clocksource is already jiffies, and I still have that laptop lid triggered problem. I don''t follow xen-devel list, but regularly read the changelog of my fedora kernels. Starting with the 2931 kernel build, this note appeared: * Thu Aug 09 2007 Eduardo Habkost <ehabkost@redhat.com> - Add linux-2.6-xen-backwards-time.patch from linux-2.6.18-xen.hg changeset 87bb8705768a. This should fix bug #236307 It didn''t fix my problem, so this is a work in progress. :-( _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users