hpa at zytor.com
2017-Feb-13 20:06 UTC
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote:>On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >> That way we'd end up with something like: >> >> asm(" >> push %rdi; >> movslq %edi, %rdi; >> movq __per_cpu_offset(,%rdi,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> pop %rdi; >> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct >steal_time, preempted))); >> >> And if we could get rid of the sign extend on edi we could avoid all >the >> push-pop nonsense, but I'm not sure I see how to do that (then again, >> this asm foo isn't my strongest point). > >Maybe: > >movsql %edi, %rax; >movq __per_cpu_offset(,%rax,8), %rax; >cmpb $0, %[offset](%rax); >setne %al; > >?We could kill the zero or sign extend by changing the calling interface to pass an unsigned long instead of an int. It is much more likely that a zero extend is free for the caller than a sign extend. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Peter Zijlstra
2017-Feb-13 21:57 UTC
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On Mon, Feb 13, 2017 at 12:06:44PM -0800, hpa at zytor.com wrote:> >Maybe: > > > >movsql %edi, %rax; > >movq __per_cpu_offset(,%rax,8), %rax; > >cmpb $0, %[offset](%rax); > >setne %al; > > > >? > > We could kill the zero or sign extend by changing the calling > interface to pass an unsigned long instead of an int. It is much more > likely that a zero extend is free for the caller than a sign extend.Right, Boris and me talked about that on IRC. I was wondering if the argument was u32 if we could assume the top 32 bits are 0 and then use rdi without prior movzx. That would allow reducing the thing one more instruction. Also, PVOP_CALL_ARG#() have an (unsigned long) cast in them that doesn't make sense. That cast ends up resulting in the calling code doing explicit sign or zero extends into the full 64bit register for no good reason. If one removes that cast things still compile, but I worry something somehow relies on this weird behaviour and will come apart.
Waiman Long
2017-Feb-13 22:24 UTC
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On 02/13/2017 03:06 PM, hpa at zytor.com wrote:> On February 13, 2017 2:53:43 AM PST, Peter Zijlstra <peterz at infradead.org> wrote: >> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote: >>> That way we'd end up with something like: >>> >>> asm(" >>> push %rdi; >>> movslq %edi, %rdi; >>> movq __per_cpu_offset(,%rdi,8), %rax; >>> cmpb $0, %[offset](%rax); >>> setne %al; >>> pop %rdi; >>> " : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct >> steal_time, preempted))); >>> And if we could get rid of the sign extend on edi we could avoid all >> the >>> push-pop nonsense, but I'm not sure I see how to do that (then again, >>> this asm foo isn't my strongest point). >> Maybe: >> >> movsql %edi, %rax; >> movq __per_cpu_offset(,%rax,8), %rax; >> cmpb $0, %[offset](%rax); >> setne %al; >> >> ? > We could kill the zero or sign extend by changing the calling interface to pass an unsigned long instead of an int. It is much more likely that a zero extend is free for the caller than a sign extend.I have thought of that too. However, the goal is to eliminate memory read/write from/to stack. Eliminating a register sign-extend instruction won't help much in term of performance. Cheers, Longman
Peter Zijlstra
2017-Feb-13 22:31 UTC
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
On Mon, Feb 13, 2017 at 05:24:36PM -0500, Waiman Long wrote:> >> movsql %edi, %rax; > >> movq __per_cpu_offset(,%rax,8), %rax; > >> cmpb $0, %[offset](%rax); > >> setne %al;> I have thought of that too. However, the goal is to eliminate memory > read/write from/to stack. Eliminating a register sign-extend instruction > won't help much in term of performance.Problem here is that all instructions have dependencies, so if you can get rid of the sign extend mov you kill a bunch of stall cycles (I would expect). But yes, peanuts vs the stack load/stores.
Possibly Parallel Threads
- [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
- [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
- [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
- [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
- [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function