thr3ads.net - Xen devel - [Xen-devel] softtsc for PV guests [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2009-Aug-21 22:17 UTC

[Xen-devel] softtsc for PV guests

I''m starting to play with implementing softtsc for
PV guests, but am not adequately familiar with the low
level x86 instruction set or emulation code in Xen.

The attached patch seems to work fine for awhile.
Dom0 begins the boot process and the printk added
to traps.c observes more than 256K TSC traps (mostly
in the BogoMIPS calculation) and continues on loading
drivers etc but eventually freezes after:

device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.

Any ideas on what might be stopping the dom0 boot?

Possibly related, the code added to pv_guest_cr4_fixup()
in domain.c DOES catch a couple of attempts early in
boot by Linux trying to enable X86_CR4_TSD. Yet
the code handling RDTSC in emulate_privileged_op()
in traps.c doesn''t appear to ever result in a call
to do_guest_trap().  Is this a bug at least on OS''s
that do care about seeing rdtsc attempts by apps trapped?

Thanks,
Dan


diff -r 5619bed51ec4 xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/arch/x86/domain.c	Fri Aug 21 15:33:36 2009 -0600
@@ -569,12 +569,13 @@ unsigned long pv_guest_cr4_fixup(unsigne
 {
     unsigned long hv_cr4_mask, hv_cr4 = real_cr4_to_pv_guest_cr4(read_cr4());
 
-    hv_cr4_mask = ~X86_CR4_TSD;
+    hv_cr4_mask = (opt_softtsc ? ~0L : ~X86_CR4_TSD);
     if ( cpu_has_de )
         hv_cr4_mask &= ~X86_CR4_DE;
 
     if ( (guest_cr4 & hv_cr4_mask) != (hv_cr4 & hv_cr4_mask) )
-        gdprintk(XENLOG_WARNING,
+//        gdprintk(XENLOG_WARNING,
+printk("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
                  "Attempt to change CR4 flags %08lx -> %08lx\n",
                  hv_cr4, guest_cr4);
 
diff -r 5619bed51ec4 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/arch/x86/hvm/hvm.c	Fri Aug 21 15:33:36 2009 -0600
@@ -61,8 +61,7 @@ unsigned int opt_hvm_debug_level __read_
 unsigned int opt_hvm_debug_level __read_mostly;
 integer_param("hvm_debug", opt_hvm_debug_level);
 
-int opt_softtsc;
-boolean_param("softtsc", opt_softtsc);
+extern int opt_softtsc;
 
 struct hvm_function_table hvm_funcs __read_mostly;
 
diff -r 5619bed51ec4 xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/arch/x86/time.c	Fri Aug 21 15:33:36 2009 -0600
@@ -34,6 +34,9 @@
 /* opt_clocksource: Force clocksource to one of: pit, hpet, cyclone, acpi. */
 static char opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
+
+int opt_softtsc;
+boolean_param("softtsc", opt_softtsc);
 
 /*
  * opt_consistent_tscs: All TSCs tick at the exact same rate, allowing
diff -r 5619bed51ec4 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/arch/x86/traps.c	Fri Aug 21 15:33:36 2009 -0600
@@ -2266,6 +2266,12 @@ static int emulate_privileged_op(struct 
     }
 
     case 0x31: /* RDTSC */
+{
+static unsigned long count = 0;
+++count;
+if (!(count & (count-1)))
+printk("TSC:%lu\n",count);
+}
         rdtsc(regs->eax, regs->edx);
         break;
 
diff -r 5619bed51ec4 xen/arch/x86/x86_emulate/x86_emulate.c
--- a/xen/arch/x86/x86_emulate/x86_emulate.c	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c	Fri Aug 21 15:33:36 2009 -0600
@@ -47,6 +47,8 @@
 #define Mov         (1<<7)
 /* All operands are implicit in the opcode. */
 #define ImplicitOps (DstImplicit|SrcImplicit)
+
+extern int opt_softtsc;
 
 static uint8_t opcode_table[256] = {
     /* 0x00 - 0x07 */
@@ -3714,10 +3716,12 @@ x86_emulate(
     case 0x31: /* rdtsc */ {
         unsigned long cr4;
         uint64_t val;
+printk("DJM: RDTSC in x86_emulate\n");
         fail_if(ops->read_cr == NULL);
         if ( (rc = ops->read_cr(4, &cr4, ctxt)) )
             goto done;
-        generate_exception_if((cr4 & CR4_TSD) && !mode_ring0(),
EXC_GP, 0);
+        if ( !opt_softtsc )
+            generate_exception_if((cr4 & CR4_TSD) && !mode_ring0(),
EXC_GP, 0);
         fail_if(ops->read_msr == NULL);
         if ( (rc = ops->read_msr(MSR_TSC, &val, ctxt)) != 0 )
             goto done;
diff -r 5619bed51ec4 xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Fri Aug 14 17:26:23 2009 +0100
+++ b/xen/include/asm-x86/domain.h	Fri Aug 21 15:33:36 2009 -0600
@@ -2,6 +2,7 @@
 #define __ASM_DOMAIN_H__
 
 #include <xen/config.h>
+#include <xen/mm.h>
 #include <xen/mm.h>
 #include <asm/hvm/vcpu.h>
 #include <asm/hvm/domain.h>
@@ -426,10 +427,12 @@ unsigned long pv_guest_cr4_fixup(unsigne
 unsigned long pv_guest_cr4_fixup(unsigned long guest_cr4);
 
 /* Convert between guest-visible and real CR4 values. */
+extern int opt_softtsc;
 #define pv_guest_cr4_to_real_cr4(c) \
-    (((c) | (mmu_cr4_features & (X86_CR4_PGE | X86_CR4_PSE))) &
~X86_CR4_DE)
+    ((((c) | (mmu_cr4_features & (X86_CR4_PGE | X86_CR4_PSE))) &
~X86_CR4_DE) \
+        | (opt_softtsc ? X86_CR4_TSD : 0))
 #define real_cr4_to_pv_guest_cr4(c) \
-    ((c) & ~(X86_CR4_PGE | X86_CR4_PSE))
+    ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | (opt_softtsc ? X86_CR4_TSD : 0)))
 
 void domain_cpuid(struct domain *d,
                   unsigned int  input,

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Aug-21 23:02 UTC

head link

Re: [Xen-devel] softtsc for PV guests

On 08/21/09 15:17, Dan Magenheimer wrote:> I''m starting to play with implementing softtsc for
> PV guests, but am not adequately familiar with the low
> level x86 instruction set or emulation code in Xen.
>
> The attached patch seems to work fine for awhile.
> Dom0 begins the boot process and the printk added
> to traps.c observes more than 256K TSC traps (mostly
> in the BogoMIPS calculation) and continues on loading
> drivers etc but eventually freezes after:
>   
The Xen clocksource uses rdtsc extensively for timing; emulating it
would be a bad idea.  I guess it would make some sense to emulate
usermode rdtsc, but it shouldn''t affect kernel rdtscs.
> device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised:
dm-devel@redhat.com
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
>
> Any ideas on what might be stopping the dom0 boot?
>   
How dead is the system?  Does it respond to sysrq-p?  ''q'' or
''0'' on the
Xen console?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-21 23:31 UTC

head link

RE: [Xen-devel] softtsc for PV guests

> On 08/21/09 15:17, Dan Magenheimer wrote:
> > I''m starting to play with implementing softtsc for
> > PV guests, but am not adequately familiar with the low
> > level x86 instruction set or emulation code in Xen.
> >
> > The attached patch seems to work fine for awhile.
> > Dom0 begins the boot process and the printk added
> > to traps.c observes more than 256K TSC traps (mostly
> > in the BogoMIPS calculation) and continues on loading
> > drivers etc but eventually freezes after:
> 
> The Xen clocksource uses rdtsc extensively for timing; emulating it
> would be a bad idea.  I guess it would make some sense to emulate
> usermode rdtsc, but it shouldn''t affect kernel rdtscs.
Enabling CR4_TSD only traps ring>0 rdtscs.  Trapping guest kernel
rdtsc''s is ultimately necessary because the Linux kernel does NOT
adequately handle all the possible changes in TSC characteristics
that can occur if Xen moves an already booted guest from one
physical machine to another (or even from one set of pcpus
to another on certain physical machines).  I recognize this
is very ugly, but it may be the only way to guarantee
correctness 100% of the time.  Full TSC emulation is done by
VMware and KVM is moving in that direction.

Lots more discussion needed here, will take it offline
(including a spark of a possible solution).
> > device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: 
> dm-devel@redhat.com
> > kjournald starting.  Commit interval 5 seconds
> > EXT3-fs: mounted filesystem with ordered data mode.
> >
> > Any ideas on what might be stopping the dom0 boot?
> >   
> 
> How dead is the system?  Does it respond to sysrq-p?  ''q''
or
> ''0'' on the Xen console?
The system is definitely not dead, but dom0 is busy looping or
something.  I can probably isolate the code, but the xen
changes seem small enough that it''s hard to believe they
could cause this kind of problem.

Interestingly, rdtsc continues to be emulated... the counter
output 512K and 1M and 2M, though it took well over an
hour to get to 2M.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-21 23:38 UTC

head link

RE: [Xen-devel] softtsc for PV guests

Oops, got carried away discussing the general problem
rather than the specific one... :-)

At this point, I just want to trap all rdtsc''s
so that I can measure how bad trapping is.
But I can''t do that if dom0 (and/or a PV guest)
won''t boot.
> -----Original Message-----
> From: Dan Magenheimer 
> Sent: Friday, August 21, 2009 5:31 PM
> To: Jeremy Fitzhardinge
> Cc: Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] softtsc for PV guests
> 
> 
> > On 08/21/09 15:17, Dan Magenheimer wrote:
> > > I''m starting to play with implementing softtsc for
> > > PV guests, but am not adequately familiar with the low
> > > level x86 instruction set or emulation code in Xen.
> > >
> > > The attached patch seems to work fine for awhile.
> > > Dom0 begins the boot process and the printk added
> > > to traps.c observes more than 256K TSC traps (mostly
> > > in the BogoMIPS calculation) and continues on loading
> > > drivers etc but eventually freezes after:
> > 
> > The Xen clocksource uses rdtsc extensively for timing; emulating it
> > would be a bad idea.  I guess it would make some sense to emulate
> > usermode rdtsc, but it shouldn''t affect kernel rdtscs.
> 
> Enabling CR4_TSD only traps ring>0 rdtscs.  Trapping guest kernel
> rdtsc''s is ultimately necessary because the Linux kernel does NOT
> adequately handle all the possible changes in TSC characteristics
> that can occur if Xen moves an already booted guest from one
> physical machine to another (or even from one set of pcpus
> to another on certain physical machines).  I recognize this
> is very ugly, but it may be the only way to guarantee
> correctness 100% of the time.  Full TSC emulation is done by
> VMware and KVM is moving in that direction.
> 
> Lots more discussion needed here, will take it offline
> (including a spark of a possible solution).
> 
> > > device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: 
> > dm-devel@redhat.com
> > > kjournald starting.  Commit interval 5 seconds
> > > EXT3-fs: mounted filesystem with ordered data mode.
> > >
> > > Any ideas on what might be stopping the dom0 boot?
> > >   
> > 
> > How dead is the system?  Does it respond to sysrq-p? 
''q'' or
> > ''0'' on the Xen console?
> 
> The system is definitely not dead, but dom0 is busy looping or
> something.  I can probably isolate the code, but the xen
> changes seem small enough that it''s hard to believe they
> could cause this kind of problem.
> 
> Interestingly, rdtsc continues to be emulated... the counter
> output 512K and 1M and 2M, though it took well over an
> hour to get to 2M.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Aug-21 23:59 UTC

head link

Re: [Xen-devel] softtsc for PV guests

On 08/21/09 16:31, Dan Magenheimer wrote:> Enabling CR4_TSD only traps ring>0 rdtscs.  Trapping guest kernel
> rdtsc''s is ultimately necessary because the Linux kernel does NOT
> adequately handle all the possible changes in TSC characteristics
> that can occur if Xen moves an already booted guest from one
> physical machine to another (or even from one set of pcpus
> to another on certain physical machines).  I recognize this
> is very ugly, but it may be the only way to guarantee
> correctness 100% of the time.
PV guests already correct for that by using the data Xen provides; they
don''t require Xen to do any correction or synthesis of tsc values.
> The system is definitely not dead, but dom0 is busy looping or
> something.  I can probably isolate the code, but the xen
> changes seem small enough that it''s hard to believe they
> could cause this kind of problem.
>   
''0'' on the Xen console will tell you where its spinning.  Oh,
is it dom0
or domU?

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-23 16:42 UTC

head link

RE: [Xen-devel] softtsc for PV guests

> On 08/21/09 16:31, Dan Magenheimer wrote:
> > Enabling CR4_TSD only traps ring>0 rdtscs.  Trapping guest kernel
> > rdtsc''s is ultimately necessary because the Linux kernel does
NOT
> > adequately handle all the possible changes in TSC characteristics
> > that can occur if Xen moves an already booted guest from one
> > physical machine to another (or even from one set of pcpus
> > to another on certain physical machines).  I recognize this
> > is very ugly, but it may be the only way to guarantee
> > correctness 100% of the time.
> 
> PV guests already correct for that by using the data Xen 
> provides; they
> don''t require Xen to do any correction or synthesis of tsc values.
While I''m hoping that this is true, I am skeptical.  The
PV time algorithm does depend on TSC accuracy for interpolating
over short intervals doesn''t it?

Assuming an SMP PV guest starts on a machine with "safe TSC" (e.g. a
recent multi-core single-socket) and migrates successively to
a sequence of machines with:

1) a multi-socket where TSCs are not synchronized and skew badly
2) a different multi-core single-socket with a faster TSC frequencey
3) a multi-core/socket where TSC frequency varies according to
   per-cpu power-saving configuration

does the SMP PV guest maintain time properly?

And even if it does, this doesn''t help applications that read
TSC directly (which, admittedly, they shouldn''t, but since
the processor vendors have made TSC much "safer" on most
systems, which will probably soon account for >90% of systems
shipped, SMP app direct use of TSC will likely become more prevalent.)
> > The system is definitely not dead, but dom0 is busy looping or
> > something.  I can probably isolate the code, but the xen
> > changes seem small enough that it''s hard to believe they
> > could cause this kind of problem.
> 
> ''0'' on the Xen console will tell you where its spinning. 
Oh,
> is it dom0 or domU?
It''s dom0.  I do see get an IP but it varies pretty widely from
sample (of ''0'') to sample and I haven''t tried a
symbol lookup
yet because I fear they will be buried in layers of block drivers

I''m still hoping for some clue without digging that deep...
All I''ve presumably done (assuming my patch doesn''t have a
weird
bug) is make rdtsc slower.

Thanks,
Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Aug-23 17:03 UTC

head link

Re: [Xen-devel] softtsc for PV guests

On 08/23/09 09:42, Dan Magenheimer wrote:> While I''m hoping that this is true, I am skeptical.  The
> PV time algorithm does depend on TSC accuracy for interpolating
> over short intervals doesn''t it?
>   
Yes, it extrapolates, assuming that in the absence of power events, etc,
the tsc is stable over a period of a few seconds on a given CPU.
> Assuming an SMP PV guest starts on a machine with "safe TSC"
(e.g. a
> recent multi-core single-socket) and migrates successively to
> a sequence of machines with:
>
> 1) a multi-socket where TSCs are not synchronized and skew badly
> 2) a different multi-core single-socket with a faster TSC frequencey
> 3) a multi-core/socket where TSC frequency varies according to
>    per-cpu power-saving configuration
>
> does the SMP PV guest maintain time properly?
>   
It uses timing parameters from Xen.  If Xen can''t keep track of the tsc
and events which affect it and provides bad info, it will fail.  But
then it means that Xen can''t use the tsc internally either, so
presumably won''t be able to accurately emulate it either.  The ABI
never
assumes that the tsc is synchronized between CPUs, or that they''re
running at even approximately the same rate.

The main risk is having the CPU asynchronously change speed under Xen,
with either no notification or a delayed notification (like thermal
events).  Any synchronous speed change can be dealt with.
> And even if it does, this doesn''t help applications that read
> TSC directly (which, admittedly, they shouldn''t, but since
> the processor vendors have made TSC much "safer" on most
> systems, which will probably soon account for >90% of systems
> shipped, SMP app direct use of TSC will likely become more prevalent.)
>   
Right.  That''s basically not supported under Linux, except as part of
certain ABIs like vgettimeofday (which is functionally identical to the
Xen PV clock ABI).
> It''s dom0.  I do see get an IP but it varies pretty widely from
> sample (of ''0'') to sample and I haven''t tried a
symbol lookup
> yet because I fear they will be buried in layers of block drivers
>
> I''m still hoping for some clue without digging that deep...
> All I''ve presumably done (assuming my patch doesn''t have
a weird
> bug) is make rdtsc slower.
>   
It''s presumed to be fast in a number of places, but it
shouldn''t cause
it to fail.  Maybe some race is coming up.  If you just revert the
register write to make rdtsc trap, does it still hang?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-23 19:26 UTC

head link

RE: [Xen-devel] softtsc for PV guests

> On 08/23/09 09:42, Dan Magenheimer wrote:
> > While I''m hoping that this is true, I am skeptical.  The
> > PV time algorithm does depend on TSC accuracy for interpolating
> > over short intervals doesn''t it?
> 
> Yes, it extrapolates, assuming that in the absence of power 
> events, etc,
> the tsc is stable over a period of a few seconds on a given CPU.
A lot can happen in a few seconds...
 > > does the SMP PV guest maintain time properly?  
> 
> It uses timing parameters from Xen.
> If Xen can''t keep track of the tsc
> and events which affect it and provides bad info, it will fail.
Let''s assume that Xen CAN keep track.

How does the PV guest know if Xen''s timing parameters change?
Is it required to remember Xen''s timing parameters from the last
time it checked and compare them with this time?
> The ABI never
> assumes that the tsc is synchronized between CPUs, or that they''re
> running at even approximately the same rate.
This is a shame, given that it IS synchronized between CPUs
and they ARE running at exactly the same rate on the vast majority
of future (single-socket multi-core) systems.  Especially given that the
alternative is one-to-three orders of magnitude slower.
 > The main risk is having the CPU asynchronously change speed under Xen,
> with either no notification or a delayed notification (like thermal
> events).  Any synchronous speed change can be dealt with.
I guess I need to understand this better.
> > And even if it does, this doesn''t help applications that read
> > TSC directly (which, admittedly, they shouldn''t, but since
> > the processor vendors have made TSC much "safer" on most
> > systems, which will probably soon account for >90% of systems
> > shipped, SMP app direct use of TSC will likely become more 
> prevalent.) 
> 
> Right.  That''s basically not supported under Linux, except as part
of
> certain ABIs like vgettimeofday (which is functionally 
> identical to the
> Xen PV clock ABI).
Again, a shame.  I''m learning that it is not uncommon for unprivileged
code to sample "time" tens of thousands or even hundreds of thousands
of times per processor per second.  Trapping all app rdtscs or Linux
going to HPET or PIT just doesn''t cut it if the frequency is
this high.  If TSC is "safe" 99.99% of the time, it sure would
be nice if those apps could use rdtsc.

I''m trying to find a solution that allows this to be supported
in a virtual environment (without huge loss of performance).
And I think I might have one.
> > It''s dom0.  I do see get an IP but it varies pretty widely
from
> > sample (of ''0'') to sample and I haven''t
tried a symbol lookup
> > yet because I fear they will be buried in layers of block drivers
> >
> > I''m still hoping for some clue without digging that deep...
> > All I''ve presumably done (assuming my patch doesn''t
have a weird
> > bug) is make rdtsc slower.
> 
> It''s presumed to be fast in a number of places, but it
shouldn''t cause
> it to fail.  Maybe some race is coming up.  If you just revert the
> register write to make rdtsc trap, does it still hang?
I just got a big clue... the next line of console output in a
successful boot AFTER the EXT3-fs mounting message is from
the Real Time Clock Driver.  That sounds like something that
might be affected by rdtsc changes.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Aug-24 16:43 UTC

head link

Re: [Xen-devel] softtsc for PV guests

On 08/23/09 12:26, Dan Magenheimer wrote:>> On 08/23/09 09:42, Dan Magenheimer wrote:
>>     
>>> While I''m hoping that this is true, I am skeptical.  The
>>> PV time algorithm does depend on TSC accuracy for interpolating
>>> over short intervals doesn''t it?
>>>       
>> Yes, it extrapolates, assuming that in the absence of power 
>> events, etc,
>> the tsc is stable over a period of a few seconds on a given CPU.
>>     
> A lot can happen in a few seconds...
>   
That only matters if things happen that Xen doesn''t know about.  If
something happens that affects the tsc''s parameters, it will update
them
immediately.
>>> does the SMP PV guest maintain time properly?  
>>>       
>> It uses timing parameters from Xen.
>> If Xen can''t keep track of the tsc
>> and events which affect it and provides bad info, it will fail.
>>     
> Let''s assume that Xen CAN keep track.
>
> How does the PV guest know if Xen''s timing parameters change?
> Is it required to remember Xen''s timing parameters from the last
> time it checked and compare them with this time?
>   
No, they''re in the shared info area.  It reads them afresh each time it
reads the tsc.  The info has a version counter which gets updated when
the info changes so the guest can make sure it has a consistent snapshot
of both the timing parameters and the tsc.  The timing parameters for a
given CPU are only ever updated by that CPU, so there''s no risk of
races
between CPUs.

BTW, kvm presents exactly the same ABI for its guests using pvclock. 
See pvclock_clocksource_read().
>> Right.  That''s basically not supported under Linux, except as
part of
>> certain ABIs like vgettimeofday (which is functionally 
>> identical to the
>> Xen PV clock ABI).
>>     
> Again, a shame.  I''m learning that it is not uncommon for
unprivileged
> code to sample "time" tens of thousands or even hundreds of
thousands
> of times per processor per second.  Trapping all app rdtscs or Linux
> going to HPET or PIT just doesn''t cut it if the frequency is
> this high.  If TSC is "safe" 99.99% of the time, it sure would
> be nice if those apps could use rdtsc.
>   
They can, with the gettimeofday vsyscall (= "syscall" which executes
entirely in usermode within a kernel-provided vsyscall page).

You''re trying to make rdtsc something it isn''t, even in native
execution.

rdtsc represents a massive lost opportunity and failure of imagination
on Intel''s part; one hopes that they''ll eventually redeem
themselves
with a new mechanism which does actually have all the properties one
wants - and that mechanism may eventually end up with rdtsc in it
somewhere.  But we''re not really there yet, and I think trying to make
rdtsc that thing is a quixotic effort.
> I''m trying to find a solution that allows this to be supported
> in a virtual environment (without huge loss of performance).
> And I think I might have one.
>   
Apps can''t reliably use a raw rdtsc anyway, without making unwarranted
assumptions about the underlying hardware.  Any app which does may work
well on one system, but then mysteriously fail when you move it to the
backup server.
> I just got a big clue... the next line of console output in a
> successful boot AFTER the EXT3-fs mounting message is from
> the Real Time Clock Driver.  That sounds like something that
> might be affected by rdtsc changes.
>   
Ah, yes.  It may be doing some calibration thing which never converges
with a slow rdtsc.  But that would be pretty obvious from looking at the
eip/rip.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-24 17:58 UTC

head link

RE: [Xen-devel] softtsc for PV guests

> > I just got a big clue... the next line of console output in a
> > successful boot AFTER the EXT3-fs mounting message is from
> > the Real Time Clock Driver.  That sounds like something that
> > might be affected by rdtsc changes.
> >   
> 
> Ah, yes.  It may be doing some calibration thing which never converges
> with a slow rdtsc.  But that would be pretty obvious from 
> looking at the
> eip/rip.
No, the clue led me astray.  The code in RTC was never reached.
The eip pointed me to the probable answer:  I think this is the
first time a userland rdtsc is executed, Xen is "reflecting"
the GPF to Linux, Linux doesn''t really know what to do with the
GPF and would like to deliver a signal to userland but theres
no signal registered for it, so nobody ever updates the IP and
an infinite loop results.  So more work needed on my part to
fix this.

More on the general topic later.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-24 21:20 UTC

head link

RE: [Xen-devel] softtsc for PV guests

> That only matters if things happen that Xen doesn''t know about. 
If
> something happens that affects the tsc''s parameters, it will 
> update them
> immediately.
> 
> No, they''re in the shared info area.  It reads them afresh 
> each time it
> reads the tsc.  The info has a version counter which gets updated when
> the info changes so the guest can make sure it has a 
> consistent snapshot
> of both the timing parameters and the tsc.  The timing 
> parameters for a
> given CPU are only ever updated by that CPU, so there''s no 
> risk of races between CPUs.
OK, now looking at the code in 2.6.30, that all makes sense.

Has anyone stress-tested this code across the wide range
of TSC characteristics that might exist in migrating around
a virtualized data center?  I wonder, for example, what is
the longest period of time for which vgettimeofday will
return the same result (e.g. for which time is "stopped").
> >> Right.  That''s basically not supported under Linux,
except
> as part of
> >> certain ABIs like vgettimeofday (which is functionally 
> >> identical to the
> >> Xen PV clock ABI).
> >>     
> > Again, a shame.  I''m learning that it is not uncommon for 
> unprivileged
> > code to sample "time" tens of thousands or even hundreds of 
> thousands
> > of times per processor per second.  Trapping all app rdtscs or Linux
> > going to HPET or PIT just doesn''t cut it if the frequency is
> > this high.  If TSC is "safe" 99.99% of the time, it sure
would
> > be nice if those apps could use rdtsc.
> 
> They can, with the gettimeofday vsyscall (= "syscall" which
executes
> entirely in usermode within a kernel-provided vsyscall page).
Any idea what the cost of a gettimeofday vsyscall is relative
to an rdtsc?

(Alternately, do I need to do anything in a 2.6.30 kernel or when
compiling a simple C test program to enable vgettimeofday to be used?
I''d like to compare the cost myself.)
> You''re trying to make rdtsc something it isn''t, even in 
> native execution.
>
> rdtsc represents a massive lost opportunity and failure of imagination
> on Intel''s part; one hopes that they''ll eventually redeem
themselves
> with a new mechanism which does actually have all the properties one
> wants - and that mechanism may eventually end up with rdtsc in it
> somewhere.  But we''re not really there yet, and I think trying to
make
> rdtsc that thing is a quixotic effort.
Windmills are my specialty :-)  Intel(AMD) *has* solved the TSC
problem on the vast majority of new (single-socket multi-core) systems.
The trick is determining when the mechanism is safe to use and
when it is not.
 > > I''m trying to find a solution that allows this to be
supported
> > in a virtual environment (without huge loss of performance).
> > And I think I might have one.
> 
> Apps can''t reliably use a raw rdtsc anyway, without making
unwarranted
> assumptions about the underlying hardware.  Any app which 
> does may work
> well on one system, but then mysteriously fail when you move it to the
> backup server.
Exactly.

But, reliable or not, they *can* and *do* and *will* use rdtsc.
And it *will* be reliable in enough systems that it may never
be noticed as unreliable, except as some weird bug that
occurs randmomly only when the app is run in a virtual environment
and which never gets root-caused to be a TSC-related issue.

So wouldn''t it be nice if apps could take advantage of a fast
synchronized rdtsc that it IS reliable 99% of the time, but be
smart enough to adapt when it is NOT reliable?

And, for that matter, if rdtsc is much faster than vgettimeofday
(to be determined), wouldn''t it be nice if Linux could take
advantage of a TSC clocksource that IS reliable 99%
of the time, but be smart enough to adapt when it is NOT
reliable?

Dan (Quixote)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2009 - softtsc for PV guests

[Xen-devel] softtsc for PV guests

Re: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests

Re: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests

Re: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests

Re: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests

RE: [Xen-devel] softtsc for PV guests