I just upgraded my host to Xen 3.2 built from mercurial running on debian etch x86_64. I''ve been gradually testing all the virtual machines created on previous versions of xen, and I just ran into a weird one: I have a SLES 10 sp1 HVM (i386) virtual machine, and when I boot, it seems to come up fine, but apparently believes time is passing at warp speed. I can''t login because it thinks the login timed out faster than I can type the password, and the console keeps going blank every half second or so because it thinks it has been inactive for a long time. Does this sound familiar to anyone? Anything I can do to the config file or the installed OS to fix it? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Greetings, Tom! I smiled as I read your note; you''ve given me a great starting point for this email. I''ve not only seen similar oddities as you''ve described, myself, in the past, but I also I believe you''re spot-on in your analysis of what''s going on; "wall time" is passing too quickly on your HVM. Here''s a 50-000-foot-level-overview of the issues of time in context of virtualization: - some HVMs run slower than reality, some faster - some Paravirts run slower than reality, some faster - sometimes they run great for a while THEN go whacky - sometimes they run wacky for a while then get more stable - time issues/constraints/ during virtual host boot are very much different from running stable-state - time skew (more accurately, VH wall-clock vs. Dom0 wall-clock skew) isn''t consistent, predictable, or even predictably skewed in the same direction I''ve had the recent pleasure of working on ''the time problem'' with a client, and they''ve agreed verbally that I can release the time monitoring stuff I''ve designed and partially built for them; I''m in the process of cleaning it up and ''completing'' it on my own dime. There may or may not be better or more expedient routes for you to take to get to the bottom of your problem... but, as a bit of potential "light at the end of that tunnel", I''ll share what I''m personally doing to help along these lines. A pretty simple (but very longly named) Proportional-Integral-Derivative control loop should be able to be put in place to quite effectively dampen and correct for the time skews as and when they occur in VMs. An ongoing and dynamic corrective measure for time on VMs is preferable to a brute-force "trying to tell the OS on the VM what time it is" approach. What I''ve built so far is Ruby-based, lightweight (so as to keep the "observer" as minimal a part of the equation as possible), thread-safe (pretty much), and only half of a real solution. The first step is collecting the data right... then once we have a reliable means of assessing the (current) skews of a set of VMs, we can instrument a PID control for each of them to dynamically correct them whenever they start to get their wallclocks skewed. Tom, if you (or anyone else reading this) would like to be part of this effort, I welcome your input and your energy. I can''t say when it''ll emerge, since it''s not a "front burner" project for me right now (though it is mighty compelling) - thus any and all help from others to get a framework such as the above in place would be quite appreciated. Best regards, -Paul Reiber http://Reiber.org _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 8 Feb 2008 15:34:29 -0800 "Paul Reiber" <reiber@gmail.com> wrote:> I''ve not only seen similar oddities as you''ve described, myself, in > the past, but I also I believe you''re spot-on in your analysis of > what''s going on; "wall time" is passing too quickly on your HVM. > ...I''m not sure what fixed it, but I noticed that the failsafe boot didn''t have the problem, and when I copied "apm=off acpi=off noapic" that the failsafe boot had to the primary kernel boot parameters, the problem disappeared (I have no idea why that made it disappear, or which one was important :-). _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Feb 8, 2008 3:46 PM, Tom Horsley <tom.horsley@att.net> wrote:> On Fri, 8 Feb 2008 15:34:29 -0800 > [...] > I''m not sure what fixed it, but I noticed that the failsafe boot didn''t > have the problem, and when I copied "apm=off acpi=off noapic" that the > failsafe boot had to the primary kernel boot parameters, the problem > disappeared (I have no idea why that made it disappear, or which > one was important :-).Outstanding! However...my guess is that nothing actually "fixed" the problem, and that if you went back to the same conditions as when it happened last time - _really_ went back... which is sometimes downright impossible... that you''d find that particular HVM time-skew problem to still be right where it''s always been. I hope. I''ll explain that better below, (again) I hope. It''s awesome that you''ve got HVM virt working much better with the triad of kernel params you''ve mentioned above - that qualifies as a "major clue" (read: it should be in a FAQ somewhere if it isn''t yet) both (1) for others who want to get HVM virt working well, and (2) for me in particular, since _omitting_ those options as part of my testbed scenarios never even entered my mind until I found out you''d experienced problems when (accidentally) omitting them. So... Thank you - big time! (getting tired of the time-related puns yet? are they... untimely?) The wallclock on your HVM may or may not be "right" now, though. From what you''ve said, things appear _much better_ and the wallclock is not wildly out of kilter as it was before. You''ll still want to keep a close eye on it, test it slowly - not quickly - for example, ask it what time it is every hour, and see if it stays relatively accurate or if there''s a tendency for the answer to get, pardon my non-english, wronger-and-wronger over time. That test is much more telling than say, a forever loop that prints a timestamp then sleeps for one second. In the second case, the observer''s become way too much a part of the equation to be providing really useful and/or (time-)telling information; its results are in fact often totally misleading as compared to the results of a more subtle wallclock-sampling approach. My recent development effort (the ruby-based time monitor and skew-correcting feedback-loop) is being influenced by a lot of fun design criteria - including subtlety, a careful attention to the fact that observers really _are_ part of whatever equations they''re in... but mostly it''s been fueled by a driving need for something solid to help with the particularly thorny issue of wallclock skew in virtual environments. ...I seem to be putting this thing more and more on my own "front burner" now, aren''t I? ...that''s a sign maybe I''ll finish it up and release it sometime soon. :-) -pbr _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 8 Feb 2008 18:09:49 -0800 "Paul Reiber" <reiber@gmail.com> wrote:> You''ll still want to keep a close eye on it, test it slowly...Actually the time doesn''t stay in sync at all on any of the VMs when left to themselves, which is why I run ntp on all of them :-). I found about as many web pages saying I really need to run ntp as I found saying it is a horrible mistake to run ntp on the virtual machines, but for me, running ntp makes all the machines work much better. (We can do things like build software from NFS mounted source directories without getting thousands of messages about clock times being in the future :-). On a possibly related note: I used to notice console messages in the VM consoles about "instable clock source" (usually right before they crashed), but I haven''t seen any of those since I switched the host to xen 3.2. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users