X86: Disable PCID/INVPCID for dom0 PCID (Process-context identifier) is a facility by which a logical processor may cache information for multiple linear-address spaces. INVPCID is an new instruction to invalidate TLB. Refer latest Intel SDM http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html We disable PCID/INVPCID for dom0 and pv. Exposing them into dom0 and pv may result in performance regression, and it would trigger GP or UD depending on whether platform suppport INVPCID or not. This patch disable PCID/INVPCID for dom0. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> diff -r 519a0e3c982d xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu Nov 17 18:06:01 2011 +0800 +++ b/xen/arch/x86/traps.c Thu Nov 17 18:41:59 2011 +0800 @@ -813,6 +813,7 @@ static void pv_cpuid(struct cpu_user_reg __clear_bit(X86_FEATURE_CX16 % 32, &c); __clear_bit(X86_FEATURE_XTPR % 32, &c); __clear_bit(X86_FEATURE_PDCM % 32, &c); + __clear_bit(X86_FEATURE_PCID % 32, &c); __clear_bit(X86_FEATURE_DCA % 32, &c); if ( !xsave_enabled(current) ) { diff -r 519a0e3c982d xen/include/asm-x86/cpufeature.h --- a/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:06:01 2011 +0800 +++ b/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:41:59 2011 +0800 @@ -97,6 +97,7 @@ #define X86_FEATURE_CX16 (4*32+13) /* CMPXCHG16B */ #define X86_FEATURE_XTPR (4*32+14) /* Send Task Priority Messages */ #define X86_FEATURE_PDCM (4*32+15) /* Perf/Debug Capability MSR */ +#define X86_FEATURE_PCID (4*32+17) /* Process Context ID */ #define X86_FEATURE_DCA (4*32+18) /* Direct Cache Access */ #define X86_FEATURE_SSE4_1 (4*32+19) /* Streaming SIMD Extensions 4.1 */ #define X86_FEATURE_SSE4_2 (4*32+20) /* Streaming SIMD Extensions 4.2 */ @@ -152,6 +153,7 @@ #define X86_FEATURE_SMEP (7*32+ 7) /* Supervisor Mode Execution Protection */ #define X86_FEATURE_BMI2 (7*32+ 8) /* 2nd bit manipulation extensions */ #define X86_FEATURE_ERMS (7*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_INVPCID (7*32+10) /* Invalidate Process Context ID */ #define cpu_has(c, bit) test_bit(bit, (c)->x86_capability) #define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > X86: Disable PCID/INVPCID for dom0 > > PCID (Process-context identifier) is a facility by which a logical processor > may cache information for multiple linear-address spaces. INVPCID is an new > instruction to invalidate TLB. Refer latest Intel SDM > http://www.intel.com/content/www/us/en/processors/architectures-software-develo > per-manuals.html > > We disable PCID/INVPCID for dom0 and pv. Exposing them into dom0 and pv may > result in performance regression, and it would trigger GP or UD depending on > whether platform suppport INVPCID or not. > > This patch disable PCID/INVPCID for dom0.Do we really need to disable it, rather than making it work? Conceptually the feature seems usable, and the instruction would need replacement by a hypercall anyway (just like invlpg). Jan> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> > > diff -r 519a0e3c982d xen/arch/x86/traps.c > --- a/xen/arch/x86/traps.c Thu Nov 17 18:06:01 2011 +0800 > +++ b/xen/arch/x86/traps.c Thu Nov 17 18:41:59 2011 +0800 > @@ -813,6 +813,7 @@ static void pv_cpuid(struct cpu_user_reg > __clear_bit(X86_FEATURE_CX16 % 32, &c); > __clear_bit(X86_FEATURE_XTPR % 32, &c); > __clear_bit(X86_FEATURE_PDCM % 32, &c); > + __clear_bit(X86_FEATURE_PCID % 32, &c); > __clear_bit(X86_FEATURE_DCA % 32, &c); > if ( !xsave_enabled(current) ) > { > diff -r 519a0e3c982d xen/include/asm-x86/cpufeature.h > --- a/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:06:01 2011 +0800 > +++ b/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:41:59 2011 +0800 > @@ -97,6 +97,7 @@ > #define X86_FEATURE_CX16 (4*32+13) /* CMPXCHG16B */ > #define X86_FEATURE_XTPR (4*32+14) /* Send Task Priority Messages */ > #define X86_FEATURE_PDCM (4*32+15) /* Perf/Debug Capability MSR */ > +#define X86_FEATURE_PCID (4*32+17) /* Process Context ID */ > #define X86_FEATURE_DCA (4*32+18) /* Direct Cache Access */ > #define X86_FEATURE_SSE4_1 (4*32+19) /* Streaming SIMD Extensions 4.1 */ > #define X86_FEATURE_SSE4_2 (4*32+20) /* Streaming SIMD Extensions 4.2 */ > @@ -152,6 +153,7 @@ > #define X86_FEATURE_SMEP (7*32+ 7) /* Supervisor Mode Execution Protection > */ > #define X86_FEATURE_BMI2 (7*32+ 8) /* 2nd bit manipulation extensions */ > #define X86_FEATURE_ERMS (7*32+ 9) /* Enhanced REP MOVSB/STOSB */ > +#define X86_FEATURE_INVPCID (7*32+10) /* Invalidate Process Context ID */ > > #define cpu_has(c, bit) test_bit(bit, (c)->x86_capability) > #define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability)
Jan Beulich wrote:>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> X86: Disable PCID/INVPCID for dom0 >> >> PCID (Process-context identifier) is a facility by which a logical >> processor may cache information for multiple linear-address spaces. >> INVPCID is an new instruction to invalidate TLB. Refer latest Intel >> SDM >> http://www.intel.com/content/www/us/en/processors/architectures-software-develo >> per-manuals.html >> >> We disable PCID/INVPCID for dom0 and pv. Exposing them into dom0 and >> pv may result in performance regression, and it would trigger GP or >> UD depending on whether platform suppport INVPCID or not. >> >> This patch disable PCID/INVPCID for dom0. > > Do we really need to disable it, rather than making it work? > Conceptually the feature seems usable, and the instruction would > need replacement by a hypercall anyway (just like invlpg). > > Jan >It''s a design choice. Exposing PCID/INVPCID to pv would involve some additional task, like coordinated PCID allocation algorithm, or change vmm vcpu context swich, which would make it complex. However, exposing PCID/INVPCID to pv has not obvious benefit or even result in performance regression. So we choose to disable PCID/INVPCID to pv. Thanks, Jinsong>> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> >> >> diff -r 519a0e3c982d xen/arch/x86/traps.c >> --- a/xen/arch/x86/traps.c Thu Nov 17 18:06:01 2011 +0800 >> +++ b/xen/arch/x86/traps.c Thu Nov 17 18:41:59 2011 +0800 >> @@ -813,6 +813,7 @@ static void pv_cpuid(struct cpu_user_reg >> __clear_bit(X86_FEATURE_CX16 % 32, &c); >> __clear_bit(X86_FEATURE_XTPR % 32, &c); >> __clear_bit(X86_FEATURE_PDCM % 32, &c); >> + __clear_bit(X86_FEATURE_PCID % 32, &c); >> __clear_bit(X86_FEATURE_DCA % 32, &c); >> if ( !xsave_enabled(current) ) >> { >> diff -r 519a0e3c982d xen/include/asm-x86/cpufeature.h >> --- a/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:06:01 2011 +0800 >> +++ b/xen/include/asm-x86/cpufeature.h Thu Nov 17 18:41:59 2011 >> +0800 @@ -97,6 +97,7 @@ #define X86_FEATURE_CX16 (4*32+13) >> /* CMPXCHG16B */ #define X86_FEATURE_XTPR (4*32+14) /* Send Task >> Priority Messages */ #define X86_FEATURE_PDCM (4*32+15) /* >> Perf/Debug Capability MSR */ +#define X86_FEATURE_PCID (4*32+17) /* >> Process Context ID */ #define X86_FEATURE_DCA (4*32+18) /* Direct >> Cache Access */ #define X86_FEATURE_SSE4_1 (4*32+19) /* Streaming >> SIMD Extensions 4.1 */ #define X86_FEATURE_SSE4_2 (4*32+20) /* >> Streaming SIMD Extensions 4.2 */ @@ -152,6 +153,7 @@ #define >> X86_FEATURE_SMEP (7*32+ 7) /* Supervisor Mode Execution Protection >> */ #define X86_FEATURE_BMI2 (7*32+ 8) /* 2nd bit manipulation >> extensions */ #define X86_FEATURE_ERMS (7*32+ 9) /* Enhanced REP >> MOVSB/STOSB */ +#define X86_FEATURE_INVPCID (7*32+10) /* Invalidate >> Process Context ID */ >> >> #define cpu_has(c, bit) test_bit(bit, (c)->x86_capability) >> #define boot_cpu_has(bit) test_bit(bit, >> boot_cpu_data.x86_capability)
>>> On 27.11.11 at 11:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Jan Beulich wrote: >>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >>> This patch disable PCID/INVPCID for dom0. >> >> Do we really need to disable it, rather than making it work? >> Conceptually the feature seems usable, and the instruction would >> need replacement by a hypercall anyway (just like invlpg). > > It''s a design choice. > Exposing PCID/INVPCID to pv would involve some additional task, like > coordinated PCID allocation algorithm, or change vmm vcpu context swich, > which would make it complex. However, exposing PCID/INVPCID to pv has not > obvious benefit or even result in performance regression.Would you mind elaborating on that statement? Jan> So we choose to disable PCID/INVPCID to pv. > > Thanks, > Jinsong
Jan Beulich wrote:>>>> On 27.11.11 at 11:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Jan Beulich wrote: >>>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>> wrote: >>>> This patch disable PCID/INVPCID for dom0. >>> >>> Do we really need to disable it, rather than making it work? >>> Conceptually the feature seems usable, and the instruction would >>> need replacement by a hypercall anyway (just like invlpg). >> >> It''s a design choice. >> Exposing PCID/INVPCID to pv would involve some additional task, like >> coordinated PCID allocation algorithm, or change vmm vcpu context >> swich, which would make it complex. However, exposing PCID/INVPCID >> to pv has not obvious benefit or even result in performance >> regression. > > Would you mind elaborating on that statement? > > JanFor pv, if expose PCID to pv, the PCIDs of different pv domain may conflict, which make processor confused at TLB. To make PCID work at pv, it need 1, either a coordinated PCID allocation algorithm, so that the local PCID of pv domain can be changed to a global unique PCID; 2, or, a ''clean'' vcpu context switch logic to flush all TLB; method 1 make things complex w/o obvious benefit; method 2 need change current vcpu context switch logic (i.e, mov cr3 only flush TLB entries of specific PCID if PCID enabled), and if flush *all* TLB is required at context switch, we lose the change to optimize context switch by partly flush TLB case by case, which may result in performance regression; Thanks, Jinsong
On 01/12/2011 10:12, "Jan Beulich" <JBeulich@suse.com> wrote:>>> >>> For pv, if expose PCID to pv, the PCIDs of different pv domain may >>> conflict, which make processor confused at TLB. >>> To make PCID work at pv, it need >>> 1, either a coordinated PCID allocation algorithm, so that the local >>> PCID of pv domain can be changed to a global unique PCID; 2, or, a >>> ''clean'' vcpu context switch logic to flush all TLB; >>> method 1 make things complex w/o obvious benefit; >>> method 2 need change current vcpu context switch logic (i.e, mov cr3 >>> only flush TLB entries of specific PCID if PCID enabled), and if >>> flush *all* TLB is required at context switch, we lose the change to >>> optimize context switch by partly flush TLB case by case, which may >>> result in performance regression; >>> >>> Thanks, >>> Jinsong >> >> Jan, any comments? Thanks, Jinsong > > No, no further comments (just don''t have the time right now to think > through the possible alternatives). So for the moment I think things > could go in as posted by you. It''s not immediately clear though > whether the series needs to be applied in order (it would seem that''s > not a requirement, but I''d like your confirmation), as I could at most > take care of patches 2, 3, and 6.I''m happy for you to apply the whole lot. Acked-by: Keir Fraser <keir@xen.org>> Jan >
Liu, Jinsong wrote:> Jan Beulich wrote: >>>>> On 27.11.11 at 11:16, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>> wrote: >>> Jan Beulich wrote: >>>>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>>> wrote: >>>>> This patch disable PCID/INVPCID for dom0. >>>> >>>> Do we really need to disable it, rather than making it work? >>>> Conceptually the feature seems usable, and the instruction would >>>> need replacement by a hypercall anyway (just like invlpg). >>> >>> It''s a design choice. >>> Exposing PCID/INVPCID to pv would involve some additional task, like >>> coordinated PCID allocation algorithm, or change vmm vcpu context >>> swich, which would make it complex. However, exposing PCID/INVPCID >>> to pv has not obvious benefit or even result in performance >>> regression. >> >> Would you mind elaborating on that statement? >> >> Jan > > For pv, if expose PCID to pv, the PCIDs of different pv domain may > conflict, which make processor confused at TLB. > To make PCID work at pv, it need > 1, either a coordinated PCID allocation algorithm, so that the local > PCID of pv domain can be changed to a global unique PCID; 2, or, a > ''clean'' vcpu context switch logic to flush all TLB; > method 1 make things complex w/o obvious benefit; > method 2 need change current vcpu context switch logic (i.e, mov cr3 > only flush TLB entries of specific PCID if PCID enabled), and if > flush *all* TLB is required at context switch, we lose the change to > optimize context switch by partly flush TLB case by case, which may > result in performance regression; > > Thanks, > JinsongJan, any comments? Thanks, Jinsong
>>> On 01.12.11 at 11:01, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Liu, Jinsong wrote: >> Jan Beulich wrote: >>>>>> On 27.11.11 at 11:16, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>> wrote: >>>> Jan Beulich wrote: >>>>>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>>>> wrote: >>>>>> This patch disable PCID/INVPCID for dom0. >>>>> >>>>> Do we really need to disable it, rather than making it work? >>>>> Conceptually the feature seems usable, and the instruction would >>>>> need replacement by a hypercall anyway (just like invlpg). >>>> >>>> It''s a design choice. >>>> Exposing PCID/INVPCID to pv would involve some additional task, like >>>> coordinated PCID allocation algorithm, or change vmm vcpu context >>>> swich, which would make it complex. However, exposing PCID/INVPCID >>>> to pv has not obvious benefit or even result in performance >>>> regression. >>> >>> Would you mind elaborating on that statement? >>> >>> Jan >> >> For pv, if expose PCID to pv, the PCIDs of different pv domain may >> conflict, which make processor confused at TLB. >> To make PCID work at pv, it need >> 1, either a coordinated PCID allocation algorithm, so that the local >> PCID of pv domain can be changed to a global unique PCID; 2, or, a >> ''clean'' vcpu context switch logic to flush all TLB; >> method 1 make things complex w/o obvious benefit; >> method 2 need change current vcpu context switch logic (i.e, mov cr3 >> only flush TLB entries of specific PCID if PCID enabled), and if >> flush *all* TLB is required at context switch, we lose the change to >> optimize context switch by partly flush TLB case by case, which may >> result in performance regression; >> >> Thanks, >> Jinsong > > Jan, any comments? Thanks, JinsongNo, no further comments (just don''t have the time right now to think through the possible alternatives). So for the moment I think things could go in as posted by you. It''s not immediately clear though whether the series needs to be applied in order (it would seem that''s not a requirement, but I''d like your confirmation), as I could at most take care of patches 2, 3, and 6. Jan
Jan Beulich wrote:>>>> On 01.12.11 at 11:01, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> Liu, Jinsong wrote: >>> Jan Beulich wrote: >>>>>>> On 27.11.11 at 11:16, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>>> wrote: >>>>> Jan Beulich wrote: >>>>>>>>> On 24.11.11 at 16:53, "Liu, Jinsong" <jinsong.liu@intel.com> >>>>>>>>> wrote: >>>>>>> This patch disable PCID/INVPCID for dom0. >>>>>> >>>>>> Do we really need to disable it, rather than making it work? >>>>>> Conceptually the feature seems usable, and the instruction would >>>>>> need replacement by a hypercall anyway (just like invlpg). >>>>> >>>>> It''s a design choice. >>>>> Exposing PCID/INVPCID to pv would involve some additional task, >>>>> like coordinated PCID allocation algorithm, or change vmm vcpu >>>>> context swich, which would make it complex. However, exposing >>>>> PCID/INVPCID to pv has not obvious benefit or even result in >>>>> performance regression. >>>> >>>> Would you mind elaborating on that statement? >>>> >>>> Jan >>> >>> For pv, if expose PCID to pv, the PCIDs of different pv domain may >>> conflict, which make processor confused at TLB. >>> To make PCID work at pv, it need >>> 1, either a coordinated PCID allocation algorithm, so that the local >>> PCID of pv domain can be changed to a global unique PCID; 2, or, a >>> ''clean'' vcpu context switch logic to flush all TLB; >>> method 1 make things complex w/o obvious benefit; >>> method 2 need change current vcpu context switch logic (i.e, mov cr3 >>> only flush TLB entries of specific PCID if PCID enabled), and if >>> flush *all* TLB is required at context switch, we lose the change to >>> optimize context switch by partly flush TLB case by case, which may >>> result in performance regression; >>> >>> Thanks, >>> Jinsong >> >> Jan, any comments? Thanks, Jinsong > > No, no further comments (just don''t have the time right now to think > through the possible alternatives). So for the moment I think things > could go in as posted by you. It''s not immediately clear though > whether the series needs to be applied in order (it would seem that''s > not a requirement, but I''d like your confirmation), as I could at most > take care of patches 2, 3, and 6. > > JanDo you mean checkin patches 2/3/6 first? It''s OK to checkin patches 2/3/6 first, they are ./xen code, no order-dependence with patches 1/4/5 (which are ./tools code). Thanks, Jinsong
Keir Fraser wrote:> On 01/12/2011 10:12, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>> >>>> For pv, if expose PCID to pv, the PCIDs of different pv domain may >>>> conflict, which make processor confused at TLB. >>>> To make PCID work at pv, it need >>>> 1, either a coordinated PCID allocation algorithm, so that the >>>> local PCID of pv domain can be changed to a global unique PCID; 2, >>>> or, a ''clean'' vcpu context switch logic to flush all TLB; >>>> method 1 make things complex w/o obvious benefit; >>>> method 2 need change current vcpu context switch logic (i.e, mov >>>> cr3 only flush TLB entries of specific PCID if PCID enabled), and >>>> if flush *all* TLB is required at context switch, we lose the >>>> change to optimize context switch by partly flush TLB case by >>>> case, which may result in performance regression; >>>> >>>> Thanks, >>>> Jinsong >>> >>> Jan, any comments? Thanks, Jinsong >> >> No, no further comments (just don''t have the time right now to think >> through the possible alternatives). So for the moment I think things >> could go in as posted by you. It''s not immediately clear though >> whether the series needs to be applied in order (it would seem that''s >> not a requirement, but I''d like your confirmation), as I could at >> most take care of patches 2, 3, and 6. > > I''m happy for you to apply the whole lot. > > Acked-by: Keir Fraser <keir@xen.org> > >> JanAny comments about patches 1/4/5? Thanks, Jinsong