I''m starting to play with implementing softtsc for PV guests, but am not adequately familiar with the low level x86 instruction set or emulation code in Xen. The attached patch seems to work fine for awhile. Dom0 begins the boot process and the printk added to traps.c observes more than 256K TSC traps (mostly in the BogoMIPS calculation) and continues on loading drivers etc but eventually freezes after: device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Any ideas on what might be stopping the dom0 boot? Possibly related, the code added to pv_guest_cr4_fixup() in domain.c DOES catch a couple of attempts early in boot by Linux trying to enable X86_CR4_TSD. Yet the code handling RDTSC in emulate_privileged_op() in traps.c doesn''t appear to ever result in a call to do_guest_trap(). Is this a bug at least on OS''s that do care about seeing rdtsc attempts by apps trapped? Thanks, Dan diff -r 5619bed51ec4 xen/arch/x86/domain.c --- a/xen/arch/x86/domain.c Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/arch/x86/domain.c Fri Aug 21 15:33:36 2009 -0600 @@ -569,12 +569,13 @@ unsigned long pv_guest_cr4_fixup(unsigne { unsigned long hv_cr4_mask, hv_cr4 = real_cr4_to_pv_guest_cr4(read_cr4()); - hv_cr4_mask = ~X86_CR4_TSD; + hv_cr4_mask = (opt_softtsc ? ~0L : ~X86_CR4_TSD); if ( cpu_has_de ) hv_cr4_mask &= ~X86_CR4_DE; if ( (guest_cr4 & hv_cr4_mask) != (hv_cr4 & hv_cr4_mask) ) - gdprintk(XENLOG_WARNING, +// gdprintk(XENLOG_WARNING, +printk("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" "Attempt to change CR4 flags %08lx -> %08lx\n", hv_cr4, guest_cr4); diff -r 5619bed51ec4 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/arch/x86/hvm/hvm.c Fri Aug 21 15:33:36 2009 -0600 @@ -61,8 +61,7 @@ unsigned int opt_hvm_debug_level __read_ unsigned int opt_hvm_debug_level __read_mostly; integer_param("hvm_debug", opt_hvm_debug_level); -int opt_softtsc; -boolean_param("softtsc", opt_softtsc); +extern int opt_softtsc; struct hvm_function_table hvm_funcs __read_mostly; diff -r 5619bed51ec4 xen/arch/x86/time.c --- a/xen/arch/x86/time.c Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/arch/x86/time.c Fri Aug 21 15:33:36 2009 -0600 @@ -34,6 +34,9 @@ /* opt_clocksource: Force clocksource to one of: pit, hpet, cyclone, acpi. */ static char opt_clocksource[10]; string_param("clocksource", opt_clocksource); + +int opt_softtsc; +boolean_param("softtsc", opt_softtsc); /* * opt_consistent_tscs: All TSCs tick at the exact same rate, allowing diff -r 5619bed51ec4 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/arch/x86/traps.c Fri Aug 21 15:33:36 2009 -0600 @@ -2266,6 +2266,12 @@ static int emulate_privileged_op(struct } case 0x31: /* RDTSC */ +{ +static unsigned long count = 0; +++count; +if (!(count & (count-1))) +printk("TSC:%lu\n",count); +} rdtsc(regs->eax, regs->edx); break; diff -r 5619bed51ec4 xen/arch/x86/x86_emulate/x86_emulate.c --- a/xen/arch/x86/x86_emulate/x86_emulate.c Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/arch/x86/x86_emulate/x86_emulate.c Fri Aug 21 15:33:36 2009 -0600 @@ -47,6 +47,8 @@ #define Mov (1<<7) /* All operands are implicit in the opcode. */ #define ImplicitOps (DstImplicit|SrcImplicit) + +extern int opt_softtsc; static uint8_t opcode_table[256] = { /* 0x00 - 0x07 */ @@ -3714,10 +3716,12 @@ x86_emulate( case 0x31: /* rdtsc */ { unsigned long cr4; uint64_t val; +printk("DJM: RDTSC in x86_emulate\n"); fail_if(ops->read_cr == NULL); if ( (rc = ops->read_cr(4, &cr4, ctxt)) ) goto done; - generate_exception_if((cr4 & CR4_TSD) && !mode_ring0(), EXC_GP, 0); + if ( !opt_softtsc ) + generate_exception_if((cr4 & CR4_TSD) && !mode_ring0(), EXC_GP, 0); fail_if(ops->read_msr == NULL); if ( (rc = ops->read_msr(MSR_TSC, &val, ctxt)) != 0 ) goto done; diff -r 5619bed51ec4 xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h Fri Aug 14 17:26:23 2009 +0100 +++ b/xen/include/asm-x86/domain.h Fri Aug 21 15:33:36 2009 -0600 @@ -2,6 +2,7 @@ #define __ASM_DOMAIN_H__ #include <xen/config.h> +#include <xen/mm.h> #include <xen/mm.h> #include <asm/hvm/vcpu.h> #include <asm/hvm/domain.h> @@ -426,10 +427,12 @@ unsigned long pv_guest_cr4_fixup(unsigne unsigned long pv_guest_cr4_fixup(unsigned long guest_cr4); /* Convert between guest-visible and real CR4 values. */ +extern int opt_softtsc; #define pv_guest_cr4_to_real_cr4(c) \ - (((c) | (mmu_cr4_features & (X86_CR4_PGE | X86_CR4_PSE))) & ~X86_CR4_DE) + ((((c) | (mmu_cr4_features & (X86_CR4_PGE | X86_CR4_PSE))) & ~X86_CR4_DE) \ + | (opt_softtsc ? X86_CR4_TSD : 0)) #define real_cr4_to_pv_guest_cr4(c) \ - ((c) & ~(X86_CR4_PGE | X86_CR4_PSE)) + ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | (opt_softtsc ? X86_CR4_TSD : 0))) void domain_cpuid(struct domain *d, unsigned int input, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/21/09 15:17, Dan Magenheimer wrote:> I''m starting to play with implementing softtsc for > PV guests, but am not adequately familiar with the low > level x86 instruction set or emulation code in Xen. > > The attached patch seems to work fine for awhile. > Dom0 begins the boot process and the printk added > to traps.c observes more than 256K TSC traps (mostly > in the BogoMIPS calculation) and continues on loading > drivers etc but eventually freezes after: >The Xen clocksource uses rdtsc extensively for timing; emulating it would be a bad idea. I guess it would make some sense to emulate usermode rdtsc, but it shouldn''t affect kernel rdtscs.> device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com > kjournald starting. Commit interval 5 seconds > EXT3-fs: mounted filesystem with ordered data mode. > > Any ideas on what might be stopping the dom0 boot? >How dead is the system? Does it respond to sysrq-p? ''q'' or ''0'' on the Xen console? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 08/21/09 15:17, Dan Magenheimer wrote: > > I''m starting to play with implementing softtsc for > > PV guests, but am not adequately familiar with the low > > level x86 instruction set or emulation code in Xen. > > > > The attached patch seems to work fine for awhile. > > Dom0 begins the boot process and the printk added > > to traps.c observes more than 256K TSC traps (mostly > > in the BogoMIPS calculation) and continues on loading > > drivers etc but eventually freezes after: > > The Xen clocksource uses rdtsc extensively for timing; emulating it > would be a bad idea. I guess it would make some sense to emulate > usermode rdtsc, but it shouldn''t affect kernel rdtscs.Enabling CR4_TSD only traps ring>0 rdtscs. Trapping guest kernel rdtsc''s is ultimately necessary because the Linux kernel does NOT adequately handle all the possible changes in TSC characteristics that can occur if Xen moves an already booted guest from one physical machine to another (or even from one set of pcpus to another on certain physical machines). I recognize this is very ugly, but it may be the only way to guarantee correctness 100% of the time. Full TSC emulation is done by VMware and KVM is moving in that direction. Lots more discussion needed here, will take it offline (including a spark of a possible solution).> > device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: > dm-devel@redhat.com > > kjournald starting. Commit interval 5 seconds > > EXT3-fs: mounted filesystem with ordered data mode. > > > > Any ideas on what might be stopping the dom0 boot? > > > > How dead is the system? Does it respond to sysrq-p? ''q'' or > ''0'' on the Xen console?The system is definitely not dead, but dom0 is busy looping or something. I can probably isolate the code, but the xen changes seem small enough that it''s hard to believe they could cause this kind of problem. Interestingly, rdtsc continues to be emulated... the counter output 512K and 1M and 2M, though it took well over an hour to get to 2M. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Oops, got carried away discussing the general problem rather than the specific one... :-) At this point, I just want to trap all rdtsc''s so that I can measure how bad trapping is. But I can''t do that if dom0 (and/or a PV guest) won''t boot.> -----Original Message----- > From: Dan Magenheimer > Sent: Friday, August 21, 2009 5:31 PM > To: Jeremy Fitzhardinge > Cc: Xen-Devel (E-mail) > Subject: RE: [Xen-devel] softtsc for PV guests > > > > On 08/21/09 15:17, Dan Magenheimer wrote: > > > I''m starting to play with implementing softtsc for > > > PV guests, but am not adequately familiar with the low > > > level x86 instruction set or emulation code in Xen. > > > > > > The attached patch seems to work fine for awhile. > > > Dom0 begins the boot process and the printk added > > > to traps.c observes more than 256K TSC traps (mostly > > > in the BogoMIPS calculation) and continues on loading > > > drivers etc but eventually freezes after: > > > > The Xen clocksource uses rdtsc extensively for timing; emulating it > > would be a bad idea. I guess it would make some sense to emulate > > usermode rdtsc, but it shouldn''t affect kernel rdtscs. > > Enabling CR4_TSD only traps ring>0 rdtscs. Trapping guest kernel > rdtsc''s is ultimately necessary because the Linux kernel does NOT > adequately handle all the possible changes in TSC characteristics > that can occur if Xen moves an already booted guest from one > physical machine to another (or even from one set of pcpus > to another on certain physical machines). I recognize this > is very ugly, but it may be the only way to guarantee > correctness 100% of the time. Full TSC emulation is done by > VMware and KVM is moving in that direction. > > Lots more discussion needed here, will take it offline > (including a spark of a possible solution). > > > > device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: > > dm-devel@redhat.com > > > kjournald starting. Commit interval 5 seconds > > > EXT3-fs: mounted filesystem with ordered data mode. > > > > > > Any ideas on what might be stopping the dom0 boot? > > > > > > > How dead is the system? Does it respond to sysrq-p? ''q'' or > > ''0'' on the Xen console? > > The system is definitely not dead, but dom0 is busy looping or > something. I can probably isolate the code, but the xen > changes seem small enough that it''s hard to believe they > could cause this kind of problem. > > Interestingly, rdtsc continues to be emulated... the counter > output 512K and 1M and 2M, though it took well over an > hour to get to 2M. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/21/09 16:31, Dan Magenheimer wrote:> Enabling CR4_TSD only traps ring>0 rdtscs. Trapping guest kernel > rdtsc''s is ultimately necessary because the Linux kernel does NOT > adequately handle all the possible changes in TSC characteristics > that can occur if Xen moves an already booted guest from one > physical machine to another (or even from one set of pcpus > to another on certain physical machines). I recognize this > is very ugly, but it may be the only way to guarantee > correctness 100% of the time.PV guests already correct for that by using the data Xen provides; they don''t require Xen to do any correction or synthesis of tsc values.> The system is definitely not dead, but dom0 is busy looping or > something. I can probably isolate the code, but the xen > changes seem small enough that it''s hard to believe they > could cause this kind of problem. >''0'' on the Xen console will tell you where its spinning. Oh, is it dom0 or domU? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 08/21/09 16:31, Dan Magenheimer wrote: > > Enabling CR4_TSD only traps ring>0 rdtscs. Trapping guest kernel > > rdtsc''s is ultimately necessary because the Linux kernel does NOT > > adequately handle all the possible changes in TSC characteristics > > that can occur if Xen moves an already booted guest from one > > physical machine to another (or even from one set of pcpus > > to another on certain physical machines). I recognize this > > is very ugly, but it may be the only way to guarantee > > correctness 100% of the time. > > PV guests already correct for that by using the data Xen > provides; they > don''t require Xen to do any correction or synthesis of tsc values.While I''m hoping that this is true, I am skeptical. The PV time algorithm does depend on TSC accuracy for interpolating over short intervals doesn''t it? Assuming an SMP PV guest starts on a machine with "safe TSC" (e.g. a recent multi-core single-socket) and migrates successively to a sequence of machines with: 1) a multi-socket where TSCs are not synchronized and skew badly 2) a different multi-core single-socket with a faster TSC frequencey 3) a multi-core/socket where TSC frequency varies according to per-cpu power-saving configuration does the SMP PV guest maintain time properly? And even if it does, this doesn''t help applications that read TSC directly (which, admittedly, they shouldn''t, but since the processor vendors have made TSC much "safer" on most systems, which will probably soon account for >90% of systems shipped, SMP app direct use of TSC will likely become more prevalent.)> > The system is definitely not dead, but dom0 is busy looping or > > something. I can probably isolate the code, but the xen > > changes seem small enough that it''s hard to believe they > > could cause this kind of problem. > > ''0'' on the Xen console will tell you where its spinning. Oh, > is it dom0 or domU?It''s dom0. I do see get an IP but it varies pretty widely from sample (of ''0'') to sample and I haven''t tried a symbol lookup yet because I fear they will be buried in layers of block drivers I''m still hoping for some clue without digging that deep... All I''ve presumably done (assuming my patch doesn''t have a weird bug) is make rdtsc slower. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/23/09 09:42, Dan Magenheimer wrote:> While I''m hoping that this is true, I am skeptical. The > PV time algorithm does depend on TSC accuracy for interpolating > over short intervals doesn''t it? >Yes, it extrapolates, assuming that in the absence of power events, etc, the tsc is stable over a period of a few seconds on a given CPU.> Assuming an SMP PV guest starts on a machine with "safe TSC" (e.g. a > recent multi-core single-socket) and migrates successively to > a sequence of machines with: > > 1) a multi-socket where TSCs are not synchronized and skew badly > 2) a different multi-core single-socket with a faster TSC frequencey > 3) a multi-core/socket where TSC frequency varies according to > per-cpu power-saving configuration > > does the SMP PV guest maintain time properly? >It uses timing parameters from Xen. If Xen can''t keep track of the tsc and events which affect it and provides bad info, it will fail. But then it means that Xen can''t use the tsc internally either, so presumably won''t be able to accurately emulate it either. The ABI never assumes that the tsc is synchronized between CPUs, or that they''re running at even approximately the same rate. The main risk is having the CPU asynchronously change speed under Xen, with either no notification or a delayed notification (like thermal events). Any synchronous speed change can be dealt with.> And even if it does, this doesn''t help applications that read > TSC directly (which, admittedly, they shouldn''t, but since > the processor vendors have made TSC much "safer" on most > systems, which will probably soon account for >90% of systems > shipped, SMP app direct use of TSC will likely become more prevalent.) >Right. That''s basically not supported under Linux, except as part of certain ABIs like vgettimeofday (which is functionally identical to the Xen PV clock ABI).> It''s dom0. I do see get an IP but it varies pretty widely from > sample (of ''0'') to sample and I haven''t tried a symbol lookup > yet because I fear they will be buried in layers of block drivers > > I''m still hoping for some clue without digging that deep... > All I''ve presumably done (assuming my patch doesn''t have a weird > bug) is make rdtsc slower. >It''s presumed to be fast in a number of places, but it shouldn''t cause it to fail. Maybe some race is coming up. If you just revert the register write to make rdtsc trap, does it still hang? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 08/23/09 09:42, Dan Magenheimer wrote: > > While I''m hoping that this is true, I am skeptical. The > > PV time algorithm does depend on TSC accuracy for interpolating > > over short intervals doesn''t it? > > Yes, it extrapolates, assuming that in the absence of power > events, etc, > the tsc is stable over a period of a few seconds on a given CPU.A lot can happen in a few seconds...> > does the SMP PV guest maintain time properly? > > It uses timing parameters from Xen. > If Xen can''t keep track of the tsc > and events which affect it and provides bad info, it will fail.Let''s assume that Xen CAN keep track. How does the PV guest know if Xen''s timing parameters change? Is it required to remember Xen''s timing parameters from the last time it checked and compare them with this time?> The ABI never > assumes that the tsc is synchronized between CPUs, or that they''re > running at even approximately the same rate.This is a shame, given that it IS synchronized between CPUs and they ARE running at exactly the same rate on the vast majority of future (single-socket multi-core) systems. Especially given that the alternative is one-to-three orders of magnitude slower.> The main risk is having the CPU asynchronously change speed under Xen, > with either no notification or a delayed notification (like thermal > events). Any synchronous speed change can be dealt with.I guess I need to understand this better.> > And even if it does, this doesn''t help applications that read > > TSC directly (which, admittedly, they shouldn''t, but since > > the processor vendors have made TSC much "safer" on most > > systems, which will probably soon account for >90% of systems > > shipped, SMP app direct use of TSC will likely become more > prevalent.) > > Right. That''s basically not supported under Linux, except as part of > certain ABIs like vgettimeofday (which is functionally > identical to the > Xen PV clock ABI).Again, a shame. I''m learning that it is not uncommon for unprivileged code to sample "time" tens of thousands or even hundreds of thousands of times per processor per second. Trapping all app rdtscs or Linux going to HPET or PIT just doesn''t cut it if the frequency is this high. If TSC is "safe" 99.99% of the time, it sure would be nice if those apps could use rdtsc. I''m trying to find a solution that allows this to be supported in a virtual environment (without huge loss of performance). And I think I might have one.> > It''s dom0. I do see get an IP but it varies pretty widely from > > sample (of ''0'') to sample and I haven''t tried a symbol lookup > > yet because I fear they will be buried in layers of block drivers > > > > I''m still hoping for some clue without digging that deep... > > All I''ve presumably done (assuming my patch doesn''t have a weird > > bug) is make rdtsc slower. > > It''s presumed to be fast in a number of places, but it shouldn''t cause > it to fail. Maybe some race is coming up. If you just revert the > register write to make rdtsc trap, does it still hang?I just got a big clue... the next line of console output in a successful boot AFTER the EXT3-fs mounting message is from the Real Time Clock Driver. That sounds like something that might be affected by rdtsc changes. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/23/09 12:26, Dan Magenheimer wrote:>> On 08/23/09 09:42, Dan Magenheimer wrote: >> >>> While I''m hoping that this is true, I am skeptical. The >>> PV time algorithm does depend on TSC accuracy for interpolating >>> over short intervals doesn''t it? >>> >> Yes, it extrapolates, assuming that in the absence of power >> events, etc, >> the tsc is stable over a period of a few seconds on a given CPU. >> > A lot can happen in a few seconds... >That only matters if things happen that Xen doesn''t know about. If something happens that affects the tsc''s parameters, it will update them immediately.>>> does the SMP PV guest maintain time properly? >>> >> It uses timing parameters from Xen. >> If Xen can''t keep track of the tsc >> and events which affect it and provides bad info, it will fail. >> > Let''s assume that Xen CAN keep track. > > How does the PV guest know if Xen''s timing parameters change? > Is it required to remember Xen''s timing parameters from the last > time it checked and compare them with this time? >No, they''re in the shared info area. It reads them afresh each time it reads the tsc. The info has a version counter which gets updated when the info changes so the guest can make sure it has a consistent snapshot of both the timing parameters and the tsc. The timing parameters for a given CPU are only ever updated by that CPU, so there''s no risk of races between CPUs. BTW, kvm presents exactly the same ABI for its guests using pvclock. See pvclock_clocksource_read().>> Right. That''s basically not supported under Linux, except as part of >> certain ABIs like vgettimeofday (which is functionally >> identical to the >> Xen PV clock ABI). >> > Again, a shame. I''m learning that it is not uncommon for unprivileged > code to sample "time" tens of thousands or even hundreds of thousands > of times per processor per second. Trapping all app rdtscs or Linux > going to HPET or PIT just doesn''t cut it if the frequency is > this high. If TSC is "safe" 99.99% of the time, it sure would > be nice if those apps could use rdtsc. >They can, with the gettimeofday vsyscall (= "syscall" which executes entirely in usermode within a kernel-provided vsyscall page). You''re trying to make rdtsc something it isn''t, even in native execution. rdtsc represents a massive lost opportunity and failure of imagination on Intel''s part; one hopes that they''ll eventually redeem themselves with a new mechanism which does actually have all the properties one wants - and that mechanism may eventually end up with rdtsc in it somewhere. But we''re not really there yet, and I think trying to make rdtsc that thing is a quixotic effort.> I''m trying to find a solution that allows this to be supported > in a virtual environment (without huge loss of performance). > And I think I might have one. >Apps can''t reliably use a raw rdtsc anyway, without making unwarranted assumptions about the underlying hardware. Any app which does may work well on one system, but then mysteriously fail when you move it to the backup server.> I just got a big clue... the next line of console output in a > successful boot AFTER the EXT3-fs mounting message is from > the Real Time Clock Driver. That sounds like something that > might be affected by rdtsc changes. >Ah, yes. It may be doing some calibration thing which never converges with a slow rdtsc. But that would be pretty obvious from looking at the eip/rip. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I just got a big clue... the next line of console output in a > > successful boot AFTER the EXT3-fs mounting message is from > > the Real Time Clock Driver. That sounds like something that > > might be affected by rdtsc changes. > > > > Ah, yes. It may be doing some calibration thing which never converges > with a slow rdtsc. But that would be pretty obvious from > looking at the > eip/rip.No, the clue led me astray. The code in RTC was never reached. The eip pointed me to the probable answer: I think this is the first time a userland rdtsc is executed, Xen is "reflecting" the GPF to Linux, Linux doesn''t really know what to do with the GPF and would like to deliver a signal to userland but theres no signal registered for it, so nobody ever updates the IP and an infinite loop results. So more work needed on my part to fix this. More on the general topic later. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> That only matters if things happen that Xen doesn''t know about. If > something happens that affects the tsc''s parameters, it will > update them > immediately. > > No, they''re in the shared info area. It reads them afresh > each time it > reads the tsc. The info has a version counter which gets updated when > the info changes so the guest can make sure it has a > consistent snapshot > of both the timing parameters and the tsc. The timing > parameters for a > given CPU are only ever updated by that CPU, so there''s no > risk of races between CPUs.OK, now looking at the code in 2.6.30, that all makes sense. Has anyone stress-tested this code across the wide range of TSC characteristics that might exist in migrating around a virtualized data center? I wonder, for example, what is the longest period of time for which vgettimeofday will return the same result (e.g. for which time is "stopped").> >> Right. That''s basically not supported under Linux, except > as part of > >> certain ABIs like vgettimeofday (which is functionally > >> identical to the > >> Xen PV clock ABI). > >> > > Again, a shame. I''m learning that it is not uncommon for > unprivileged > > code to sample "time" tens of thousands or even hundreds of > thousands > > of times per processor per second. Trapping all app rdtscs or Linux > > going to HPET or PIT just doesn''t cut it if the frequency is > > this high. If TSC is "safe" 99.99% of the time, it sure would > > be nice if those apps could use rdtsc. > > They can, with the gettimeofday vsyscall (= "syscall" which executes > entirely in usermode within a kernel-provided vsyscall page).Any idea what the cost of a gettimeofday vsyscall is relative to an rdtsc? (Alternately, do I need to do anything in a 2.6.30 kernel or when compiling a simple C test program to enable vgettimeofday to be used? I''d like to compare the cost myself.)> You''re trying to make rdtsc something it isn''t, even in > native execution. > > rdtsc represents a massive lost opportunity and failure of imagination > on Intel''s part; one hopes that they''ll eventually redeem themselves > with a new mechanism which does actually have all the properties one > wants - and that mechanism may eventually end up with rdtsc in it > somewhere. But we''re not really there yet, and I think trying to make > rdtsc that thing is a quixotic effort.Windmills are my specialty :-) Intel(AMD) *has* solved the TSC problem on the vast majority of new (single-socket multi-core) systems. The trick is determining when the mechanism is safe to use and when it is not.> > I''m trying to find a solution that allows this to be supported > > in a virtual environment (without huge loss of performance). > > And I think I might have one. > > Apps can''t reliably use a raw rdtsc anyway, without making unwarranted > assumptions about the underlying hardware. Any app which > does may work > well on one system, but then mysteriously fail when you move it to the > backup server.Exactly. But, reliable or not, they *can* and *do* and *will* use rdtsc. And it *will* be reliable in enough systems that it may never be noticed as unreliable, except as some weird bug that occurs randmomly only when the app is run in a virtual environment and which never gets root-caused to be a TSC-related issue. So wouldn''t it be nice if apps could take advantage of a fast synchronized rdtsc that it IS reliable 99% of the time, but be smart enough to adapt when it is NOT reliable? And, for that matter, if rdtsc is much faster than vgettimeofday (to be determined), wouldn''t it be nice if Linux could take advantage of a TSC clocksource that IS reliable 99% of the time, but be smart enough to adapt when it is NOT reliable? Dan (Quixote) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel