Adam Strohl
2012-Mar-10 08:07 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
I've now seen this on two different VMs on two different ESXi servers (Xeon based hosts but different hardware otherwise and at different facilities): Everything runs fine for weeks then (seemingly) suddenly/randomly the clock STOPS. In the first case I saw a jump backwards of about 15 minutes (and then a 'freeze' of the clock). The second time just 'time standing still' with no backwards jump. Logging accuracy is of course questionable given the nature of the issue, but nothing really jumps out (ie; I don't see NTPd adjusting the time just before this happens or anything like that). Naturally the clock stopping causes major issues, but the machine does technically stay running. My open sessions respond, but anything that relies on time moving forward hangs. I can't even gracefully reboot it because shutdown/etc all rely on time moving forward (heh). So I'm not sure if this is a VMWare/ESXi issue or a FreeBSD issue, or some kind of interaction between the two. I manage lots of VMWare based FreeBSD VMs, but these are the only ESXi 5.0 servers and the only FreeBSD 9.0 VMs. I have never seen anything quite like this before, and last night as I mentioned above I had it happen for the second time on a different VM + ESXi server combo so I'm not thinking its a fluke anymore. I've looked for other reports of this both in VMWare and FreeBSD contexts and not seeing anything. What is interesting is that the 2 servers that have shown this issue perform similar tasks, which are different from the other VMs which have not shown this issue (yet). This is 2 VMs out of a dozen VMs spread over two ESXi servers on different coasts. This might be a coincidence but seems suspicious. These two VMs run these services (where as the other VMs don't): - BIND - CouchDB - MySQL - NFS server - Dovecot 2.x I would also say that these two VMs probably are the most active, have the most RAM and consume the most CPU because of what they do (vs. the others). I have disabled NTPd since I am running the OpenVM Tools (which I believe should be keeping the time in sync with the ESXi host, which itself uses NTP), my only guess is maybe there is some kind of collision where NTPd and OpenVMTools were adjusting the time at the same time. I'm playing the waiting game now to see what this brings (again though I am running NTPd and OpenVMTools on all the other VMs which have yet to show this issue). Anyone seen anything like this? Ring any bells? -- Adam Strohl A-Team Systems http://ateamsystems.com/
Bjoern A. Zeeb
2012-Mar-10 10:10 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
On 10. Mar 2012, at 08:07 , Adam Strohl wrote:> I've now seen this on two different VMs on two different ESXi servers (Xeon based hosts but different hardware otherwise and at different facilities): > > Everything runs fine for weeks then (seemingly) suddenly/randomly the clock STOPS.Apart from the ntp vs. openvm-tools thing, do you have an idea what "for weeks" means in more detail? Can you check based on last/daily mails/.. how many days it was since last reboot to a) see if it's close to a integer wrap-around or b) to give anyone who wants to reproduce this maybe a clue on how long they'll have to wait? For that matter, is it a stock 9.0 or your own kernel? What other modules are loaded? /bz -- Bjoern A. Zeeb You have to have visions! It does not matter how good you are. It matters what good you do!
Ian Lepore
2012-Mar-11 17:03 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
On Sat, 2012-03-10 at 15:07 +0700, Adam Strohl wrote:> I've now seen this on two different VMs on two different ESXi servers > (Xeon based hosts but different hardware otherwise and at different > facilities): > > Everything runs fine for weeks then (seemingly) suddenly/randomly the > clock STOPS. In the first case I saw a jump backwards of about 15 > minutes (and then a 'freeze' of the clock). The second time just 'time > standing still' with no backwards jump. Logging accuracy is of course > questionable given the nature of the issue, but nothing really jumps out > (ie; I don't see NTPd adjusting the time just before this happens or > anything like that). > > Naturally the clock stopping causes major issues, but the machine does > technically stay running. My open sessions respond, but anything that > relies on time moving forward hangs. I can't even gracefully reboot it > because shutdown/etc all rely on time moving forward (heh). > > So I'm not sure if this is a VMWare/ESXi issue or a FreeBSD issue, or > some kind of interaction between the two. I manage lots of VMWare > based FreeBSD VMs, but these are the only ESXi 5.0 servers and the only > FreeBSD 9.0 VMs. I have never seen anything quite like this before, and > last night as I mentioned above I had it happen for the second time on a > different VM + ESXi server combo so I'm not thinking its a fluke > anymore. I've looked for other reports of this both in VMWare and > FreeBSD contexts and not seeing anything. > > What is interesting is that the 2 servers that have shown this issue > perform similar tasks, which are different from the other VMs which have > not shown this issue (yet). This is 2 VMs out of a dozen VMs spread > over two ESXi servers on different coasts. This might be a coincidence > but seems suspicious. These two VMs run these services (where as the > other VMs don't): > > - BIND > - CouchDB > - MySQL > - NFS server > - Dovecot 2.x > > I would also say that these two VMs probably are the most active, have > the most RAM and consume the most CPU because of what they do (vs. the > others). > > I have disabled NTPd since I am running the OpenVM Tools (which I > believe should be keeping the time in sync with the ESXi host, which > itself uses NTP), my only guess is maybe there is some kind of collision > where NTPd and OpenVMTools were adjusting the time at the same time. > I'm playing the waiting game now to see what this brings (again though I > am running NTPd and OpenVMTools on all the other VMs which have yet to > show this issue). > > Anyone seen anything like this? Ring any bells? >I've run into the "time standing still" problem, but only on bringing up FreeBSD on new hardware (usually industrial single-board computers). In those cases time never advances beyond the time obtained from the RTC hardware at boot. I've never seen it happen that time runs normally for a while then stops advancing, but I have almost no experience with FreeBSD as a VM guest OS. When I have seen the problem, it's always been due to interrupt problems, such as the timer tick handler getting hung or the selected timer hardware not generating interrupts. It seems unlikely to me that ntpd and the vm tools would be fighting in a way that caused this symptom. The way ntpd affects timing is to step the clock (which gets logged), or to numerically steer the kernel's timekeeping routines. The steering is clamped at 500 ppm; to make the clock appear to stop it would have to steer at 1e6 ppm. I've always assumed that VM guest services daemons that handle timekeeping use the same ntp_adjtime() interface to the kernel timekeeping that ntpd itself uses, so the same steering limits would apply. If it happens again, interesting data might be found in the output of: sysctl kern.timecounter sysctl kern.eventtimer vmstat -i ntpdc -c kerninfo <anything unusual in dmesg output> -- Ian
Mike Tkachuk
2012-Mar-22 13:19 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
Hello Ian, I'm also facing same problem, just updated to esxi 5 update1, will see if it changes anything. It really looks like an esxi problem but I did not experienced it with FreeBSD 8 Here is the output of requested commands: sysctl kern.timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000) kern.timecounter.hardware: HPET kern.timecounter.stepwarnings: 0 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.HPET.counter: 1217640570 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.ACPI-fast.counter: 708780 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 14912 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 1849391829 kern.timecounter.tc.TSC.frequency: 3411483000 kern.timecounter.tc.TSC.quality: -100 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 1 sysctl kern.eventtimer kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.et.LAPIC.flags: 7 kern.eventtimer.et.LAPIC.frequency: 33000086 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.periodic: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 4 vmstat -i interrupt total rate irq1: atkbd0 192 0 irq15: ata1 17 0 irq18: em0 3519101 2 cpu0:timer 92428991 63 irq256: mpt0 1887875 1 cpu2:timer 29982105 20 cpu1:timer 46249642 31 cpu3:timer 15983874 10 cpu6:timer 4721414 3 cpu7:timer 4554357 3 cpu4:timer 8397088 5 cpu5:timer 6163947 4 Total 213888603 146 ----- near 10 sec later ----- sysctl kern.timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000) kern.timecounter.hardware: HPET kern.timecounter.stepwarnings: 0 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.HPET.counter: 1217640570 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.ACPI-fast.counter: 802778 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 24402 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 3853372129 kern.timecounter.tc.TSC.frequency: 3411483000 kern.timecounter.tc.TSC.quality: -100 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 1 sysctl kern.eventtimer kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.et.LAPIC.flags: 7 kern.eventtimer.et.LAPIC.frequency: 33000086 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.periodic: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 4 vmstat -i interrupt total rate irq1: atkbd0 192 0 irq15: ata1 17 0 irq18: em0 3519133 2 cpu0:timer 92429487 63 irq256: mpt0 1887875 1 cpu2:timer 29983186 20 cpu1:timer 46250719 31 cpu3:timer 15983969 10 cpu6:timer 4721451 3 cpu7:timer 4554394 3 cpu4:timer 8397125 5 cpu5:timer 6164274 4 Total 213891822 146 -- Best regards, Mike mailto:mike@tkachuk.name
Andriy Gapon
2012-Mar-22 15:07 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
on 22/03/2012 15:19 Mike Tkachuk said the following:> kern.eventtimer.periodic: 0It might make sense to try 1 here. Also you could attempt to involve mav@ directly - here is an author of the code and an expert on it. -- Andriy Gapon
Volodymyr Kostyrko
2012-Mar-22 15:33 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
Andriy Gapon wrote:> on 22/03/2012 15:19 Mike Tkachuk said the following: >> kern.eventtimer.periodic: 0 > > It might make sense to try 1 here. > Also you could attempt to involve mav@ directly - here is an author of the code > and an expert on it.Better ask before setting as this doubles hpet0 (with HPET) or cpu0:timer (with LAPIC) interrupt rate for me. -- Sphinx of black quartz judge my vow.
Andriy Gapon
2012-Mar-22 15:56 UTC
Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
on 22/03/2012 17:33 Volodymyr Kostyrko said the following:> Andriy Gapon wrote: >> on 22/03/2012 15:19 Mike Tkachuk said the following: >>> kern.eventtimer.periodic: 0 >> >> It might make sense to try 1 here. >> Also you could attempt to involve mav@ directly - here is an author of the code >> and an expert on it. > > Better ask before setting as this doubles hpet0 (with HPET) or cpu0:timer (with > LAPIC) interrupt rate for me.Does it make your system unusable? Are you comparing with pre-eventtimers version of FreeBSD? -- Andriy Gapon