thr3ads.net - search: "flushmask"

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 27

3

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

...flush_tlb_others(const struct cpumask *cpumask, +static void kvm_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { u8 state; @@ -594,6 +594,9 @@ static void kvm_flush_tlb_others(const s * queue flush_on_enter for pre-empted vCPUs */ for_each_cpu(cpu, flushmask) { + if (cpu == smp_processor_id()) + continue; + src = &per_cpu(steal_time, cpu); state = READ_ONCE(src->preempted); if ((state & KVM_VCPU_PREEMPTED)) { @@ -603,7 +606,7 @@ static void kvm_flush_tlb_others(const s } } - native_flush_tlb_others(flushmask, info); + nati...

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 27

3

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

...flush_tlb_others(const struct cpumask *cpumask, +static void kvm_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { u8 state; @@ -594,6 +594,9 @@ static void kvm_flush_tlb_others(const s * queue flush_on_enter for pre-empted vCPUs */ for_each_cpu(cpu, flushmask) { + if (cpu == smp_processor_id()) + continue; + src = &per_cpu(steal_time, cpu); state = READ_ONCE(src->preempted); if ((state & KVM_VCPU_PREEMPTED)) { @@ -603,7 +606,7 @@ static void kvm_flush_tlb_others(const s } } - native_flush_tlb_others(flushmask, info); + nati...

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 27

0

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

...pumask, > +static void kvm_flush_tlb_multi(const struct cpumask *cpumask, > const struct flush_tlb_info *info) > { > u8 state; > @@ -594,6 +594,9 @@ static void kvm_flush_tlb_others(const s > * queue flush_on_enter for pre-empted vCPUs > */ > for_each_cpu(cpu, flushmask) { > + if (cpu == smp_processor_id()) > + continue; > + Even this would be just an optimization; the vCPU you're running on cannot be preempted. You can just change others to multi. Paolo > src = &per_cpu(steal_time, cpu); > state = READ_ONCE(src->preempted);...

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 27

1

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

..._tlb_multi(const struct cpumask *cpumask, > > const struct flush_tlb_info *info) > > { > > u8 state; > > @@ -594,6 +594,9 @@ static void kvm_flush_tlb_others(const s > > * queue flush_on_enter for pre-empted vCPUs > > */ > > for_each_cpu(cpu, flushmask) { > > + if (cpu == smp_processor_id()) > > + continue; > > + > > Even this would be just an optimization; the vCPU you're running on > cannot be preempted. You can just change others to multi. Yeah, I know, but it felt weird so I added the explicit skip. No str...

[PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 Jul 19

0

[PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

...truct cpumask *cpumask, +static void kvm_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { u8 state; @@ -609,6 +609,11 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, * queue flush_on_enter for pre-empted vCPUs */ for_each_cpu(cpu, flushmask) { + /* + * The local vCPU is never preempted, so we do not explicitly + * skip check for local vCPU - it will never be cleared from + * flushmask. + */ src = &per_cpu(steal_time, cpu); state = READ_ONCE(src->preempted); if ((state & KVM_VCPU_PREEMPTED)) { @@ -618,7 +62...

[PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 Jul 02

0

[PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

...truct cpumask *cpumask, +static void kvm_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { u8 state; @@ -594,6 +594,11 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, * queue flush_on_enter for pre-empted vCPUs */ for_each_cpu(cpu, flushmask) { + /* + * The local vCPU is never preempted, so we do not explicitly + * skip check for local vCPU - it will never be cleared from + * flushmask. + */ src = &per_cpu(steal_time, cpu); state = READ_ONCE(src->preempted); if ((state & KVM_VCPU_PREEMPTED)) { @@ -603,7 +60...

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 25

3

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

To improve TLB shootdown performance, flush the remote and local TLBs concurrently. Introduce flush_tlb_multi() that does so. The current flush_tlb_others() interface is kept, since paravirtual interfaces need to be adapted first before it can be removed. This is left for future work. In such PV environments, TLB flushes are not performed, at this time, concurrently. Add a static key to tell

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

2019 May 25

3

[RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently

To improve TLB shootdown performance, flush the remote and local TLBs concurrently. Introduce flush_tlb_multi() that does so. The current flush_tlb_others() interface is kept, since paravirtual interfaces need to be adapted first before it can be removed. This is left for future work. In such PV environments, TLB flushes are not performed, at this time, concurrently. Add a static key to tell

[PATCH v2 0/9] x86: Concurrent TLB flushes

2019 Jul 02

2

[PATCH v2 0/9] x86: Concurrent TLB flushes

Currently, local and remote TLB flushes are not performed concurrently, which introduces unnecessary overhead - each INVLPG can take 100s of cycles. This patch-set allows TLB flushes to be run concurrently: first request the remote CPUs to initiate the flush, then run it locally, and finally wait for the remote CPUs to finish their work. In addition, there are various small optimizations to avoid

[PATCH v3 0/9] x86: Concurrent TLB flushes

2019 Jul 19

5

[PATCH v3 0/9] x86: Concurrent TLB flushes

[ Cover-letter is identical to v2, including benchmark results, excluding the change log. ] Currently, local and remote TLB flushes are not performed concurrently, which introduces unnecessary overhead - each INVLPG can take 100s of cycles. This patch-set allows TLB flushes to be run concurrently: first request the remote CPUs to initiate the flush, then run it locally, and finally wait for

search for: flushmask