I just found a Linux kernel use of rdtsc that MAY cause a significant failure if rdtsc is unemulated and a poorly timed migration (or save/restore) occurs under Xen or KVM. The problem is that a call to __udelay() -- or any member of the delay() family -- may return prematurely**. Since these functions "must guarantee that we wait at least the amount of time" specified, there are likely unknown kernel circumstances where a premature return will cause problems. (Disclaimer: I haven''t gone through every use of every call site of every member of the delay function family to prove this.) I observed this use of rdtsc on a real running released EL5U2-32b PV kernel, but the problem also exists on 2.6.31 and probably on any currently shipping PV kernel. AND due to a bug(?) in HVM management of TSC, I think it will occur in any Linux HVM as well. And, other than Xen/KVM guaranteeing rdtsc is monotonically-increasing (and tracks wallclock time across a migration which Xen''s emulated rdtsc doesn''t yet do), I don''t think there is a solution. The problem can occur if a migration or save/restore results in the appearance that the physical TSC went backwards. For example: 1) A live migration occurs from machine A to machine B, and machine B was much more recently booted than machine A; or 2) A guest is saved on machine A, machine A has been running for a long time, machine A is rebooted, and the guest is restored on machine A shortly after it is booted. If a delay() function is currently executing in the guest kernel when the above occurs and the rdtsc instruction is unemulated, the delay() function will return immediately** when the kernel vcpu regains control. True, in many circumstances, the overhead incurred by the migration or save/restore will expire the intended delay, and so perhaps serve the same purpose as the intended delay, but there may also be circumstances where this is not true. ** Note that some clever coding in the Linux kernel sources averts a much worse disaster, namely a very extended spinwait for hours or days or more! This cleverness may not exist in all kernels -- or in applications that might implement a similar rdtsc-based __udelay()-like technique. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-22 21:24 UTC
Re: [Xen-devel] non-emulated rdtsc: a smoking gun!
On 10/22/09 13:53, Dan Magenheimer wrote:> I just found a Linux kernel use of rdtsc that > MAY cause a significant failure if rdtsc is > unemulated and a poorly timed migration (or > save/restore) occurs under Xen or KVM. > > The problem is that a call to __udelay() -- > or any member of the delay() family -- may > return prematurely**. Since these functions > "must guarantee that we wait at least the > amount of time" specified, there are likely > unknown kernel circumstances where a > premature return will cause problems. > (Disclaimer: I haven''t gone through every use > of every call site of every member of the delay > function family to prove this.) >It won''t matter, as they''re only used to control timing to external IO devices. If a domain has a passthrough device, it can''t be migrated or save/restored. Nobody should be using them for software timing. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel