In __prepare_to_wait(), properly mark early clobbered registers. By doing so, we at once eliminate the need to save/restore rCX and rDI. In check_wakeup_from_wait(), make the current constraints match by removing the code that actuall alters registers. By adjusting the resume address in __prepare_to_wait(), we can simply re-use the copying operation there (rather than doing a second pointless copy in the opposite direction after branching to the resume point), which at once eliminates the need for re-loading rCX and rDI inside the asm(). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -126,6 +126,7 @@ static void __prepare_to_wait(struct wai { char *cpu_info = (char *)get_cpu_info(); struct vcpu *curr = current; + unsigned long dummy; ASSERT(wqv->esp == 0); @@ -140,27 +141,27 @@ static void __prepare_to_wait(struct wai asm volatile ( #ifdef CONFIG_X86_64 - "push %%rax; push %%rbx; push %%rcx; push %%rdx; push %%rdi; " + "push %%rax; push %%rbx; push %%rdx; " "push %%rbp; push %%r8; push %%r9; push %%r10; push %%r11; " "push %%r12; push %%r13; push %%r14; push %%r15; call 1f; " - "1: mov 80(%%rsp),%%rdi; mov 96(%%rsp),%%rcx; mov %%rsp,%%rsi; " + "1: mov %%rsp,%%rsi; addq $2f-1b,(%%rsp); " "sub %%rsi,%%rcx; cmp %3,%%rcx; jbe 2f; " "xor %%esi,%%esi; jmp 3f; " "2: rep movsb; mov %%rsp,%%rsi; 3: pop %%rax; " "pop %%r15; pop %%r14; pop %%r13; pop %%r12; " "pop %%r11; pop %%r10; pop %%r9; pop %%r8; " - "pop %%rbp; pop %%rdi; pop %%rdx; pop %%rcx; pop %%rbx; pop %%rax" + "pop %%rbp; pop %%rdx; pop %%rbx; pop %%rax" #else - "push %%eax; push %%ebx; push %%ecx; push %%edx; push %%edi; " + "push %%eax; push %%ebx; push %%edx; " "push %%ebp; call 1f; " - "1: mov 8(%%esp),%%edi; mov 16(%%esp),%%ecx; mov %%esp,%%esi; " + "1: mov %%esp,%%esi; addl $2f-1b,(%%esp); " "sub %%esi,%%ecx; cmp %3,%%ecx; jbe 2f; " "xor %%esi,%%esi; jmp 3f; " "2: rep movsb; mov %%esp,%%esi; 3: pop %%eax; " - "pop %%ebp; pop %%edi; pop %%edx; pop %%ecx; pop %%ebx; pop %%eax" + "pop %%ebp; pop %%edx; pop %%ebx; pop %%eax" #endif - : "=S" (wqv->esp) - : "c" (cpu_info), "D" (wqv->stack), "i" (PAGE_SIZE) + : "=&S" (wqv->esp), "=&c" (dummy), "=&D" (dummy) + : "i" (PAGE_SIZE), "1" (cpu_info), "2" (wqv->stack) : "memory" ); if ( unlikely(wqv->esp == 0) ) @@ -200,7 +201,7 @@ void check_wakeup_from_wait(void) } asm volatile ( - "mov %1,%%"__OP"sp; rep movsb; jmp *(%%"__OP"sp)" + "mov %1,%%"__OP"sp; jmp *(%0)" : : "S" (wqv->stack), "D" (wqv->esp), "c" ((char *)get_cpu_info() - (char *)wqv->esp) : "memory" ); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/08/2012 09:40, "Jan Beulich" <JBeulich@suse.com> wrote:> In __prepare_to_wait(), properly mark early clobbered registers. By > doing so, we at once eliminate the need to save/restore rCX and rDI. > > In check_wakeup_from_wait(), make the current constraints match by > removing the code that actuall alters registers. By adjusting the > resume address in __prepare_to_wait(), we can simply re-use the copying > operation there (rather than doing a second pointless copy in the > opposite direction after branching to the resume point), which at once > eliminates the need for re-loading rCX and rDI inside the asm().First of all, this is code improvement, rather than a bug fix, right? The asm constraints are correct for the code as it is, I believe. It also seems the patch splits into two independent parts: A. Not sure whether the trade-off of the rCX/rDI save/restore versus more complex asm constraints makes sense. B. Separately, the adjustment of the restore return address, and avoiding needing to reload rCX/rDI after label 1, as well as avoiding the copy in check_wakeup_from_wait(), is very nice. I''m inclined to take the second part only, and make it clearer in the changeset comment that it is not a bug fix. What do you think? -- Keir> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/common/wait.c > +++ b/xen/common/wait.c > @@ -126,6 +126,7 @@ static void __prepare_to_wait(struct wai > { > char *cpu_info = (char *)get_cpu_info(); > struct vcpu *curr = current; > + unsigned long dummy; > > ASSERT(wqv->esp == 0); > > @@ -140,27 +141,27 @@ static void __prepare_to_wait(struct wai > > asm volatile ( > #ifdef CONFIG_X86_64 > - "push %%rax; push %%rbx; push %%rcx; push %%rdx; push %%rdi; " > + "push %%rax; push %%rbx; push %%rdx; " > "push %%rbp; push %%r8; push %%r9; push %%r10; push %%r11; " > "push %%r12; push %%r13; push %%r14; push %%r15; call 1f; " > - "1: mov 80(%%rsp),%%rdi; mov 96(%%rsp),%%rcx; mov %%rsp,%%rsi; " > + "1: mov %%rsp,%%rsi; addq $2f-1b,(%%rsp); " > "sub %%rsi,%%rcx; cmp %3,%%rcx; jbe 2f; " > "xor %%esi,%%esi; jmp 3f; " > "2: rep movsb; mov %%rsp,%%rsi; 3: pop %%rax; " > "pop %%r15; pop %%r14; pop %%r13; pop %%r12; " > "pop %%r11; pop %%r10; pop %%r9; pop %%r8; " > - "pop %%rbp; pop %%rdi; pop %%rdx; pop %%rcx; pop %%rbx; pop %%rax" > + "pop %%rbp; pop %%rdx; pop %%rbx; pop %%rax" > #else > - "push %%eax; push %%ebx; push %%ecx; push %%edx; push %%edi; " > + "push %%eax; push %%ebx; push %%edx; " > "push %%ebp; call 1f; " > - "1: mov 8(%%esp),%%edi; mov 16(%%esp),%%ecx; mov %%esp,%%esi; " > + "1: mov %%esp,%%esi; addl $2f-1b,(%%esp); " > "sub %%esi,%%ecx; cmp %3,%%ecx; jbe 2f; " > "xor %%esi,%%esi; jmp 3f; " > "2: rep movsb; mov %%esp,%%esi; 3: pop %%eax; " > - "pop %%ebp; pop %%edi; pop %%edx; pop %%ecx; pop %%ebx; pop %%eax" > + "pop %%ebp; pop %%edx; pop %%ebx; pop %%eax" > #endif > - : "=S" (wqv->esp) > - : "c" (cpu_info), "D" (wqv->stack), "i" (PAGE_SIZE) > + : "=&S" (wqv->esp), "=&c" (dummy), "=&D" (dummy) > + : "i" (PAGE_SIZE), "1" (cpu_info), "2" (wqv->stack) > : "memory" ); > > if ( unlikely(wqv->esp == 0) ) > @@ -200,7 +201,7 @@ void check_wakeup_from_wait(void) > } > > asm volatile ( > - "mov %1,%%"__OP"sp; rep movsb; jmp *(%%"__OP"sp)" > + "mov %1,%%"__OP"sp; jmp *(%0)" > : : "S" (wqv->stack), "D" (wqv->esp), > "c" ((char *)get_cpu_info() - (char *)wqv->esp) > : "memory" ); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 03.08.12 at 12:04, Keir Fraser <keir@xen.org> wrote: > On 03/08/2012 09:40, "Jan Beulich" <JBeulich@suse.com> wrote: > >> In __prepare_to_wait(), properly mark early clobbered registers. By >> doing so, we at once eliminate the need to save/restore rCX and rDI. >> >> In check_wakeup_from_wait(), make the current constraints match by >> removing the code that actuall alters registers. By adjusting the >> resume address in __prepare_to_wait(), we can simply re-use the copying >> operation there (rather than doing a second pointless copy in the >> opposite direction after branching to the resume point), which at once >> eliminates the need for re-loading rCX and rDI inside the asm(). > > First of all, this is code improvement, rather than a bug fix, right? The > asm constraints are correct for the code as it is, I believe.No, the constraints aren''t really correct at present (yet this is not visible as a functional bug in any way) - from a formal perspective, the early clobber specification is needed on _any_ operand that doesn''t retain its value throughout an asm(). Any future compiler could derive something from this that we don''t intend.> It also seems the patch splits into two independent parts: > > A. Not sure whether the trade-off of the rCX/rDI save/restore versus more > complex asm constraints makes sense. > > B. Separately, the adjustment of the restore return address, and avoiding > needing to reload rCX/rDI after label 1, as well as avoiding the copy in > check_wakeup_from_wait(), is very nice. > > I''m inclined to take the second part only, and make it clearer in the > changeset comment that it is not a bug fix. > > What do you think?The patch could be split, yes, but where exactly the split(s) should be isn''t that obvious to me. And as it''s fixing the same kind of issue on both asm()-s, it seemed sensible to keep the changes together. Jan
On 03/08/2012 11:34, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 03.08.12 at 12:04, Keir Fraser <keir@xen.org> wrote: >> On 03/08/2012 09:40, "Jan Beulich" <JBeulich@suse.com> wrote: >> >>> In __prepare_to_wait(), properly mark early clobbered registers. By >>> doing so, we at once eliminate the need to save/restore rCX and rDI. >>> >>> In check_wakeup_from_wait(), make the current constraints match by >>> removing the code that actuall alters registers. By adjusting the >>> resume address in __prepare_to_wait(), we can simply re-use the copying >>> operation there (rather than doing a second pointless copy in the >>> opposite direction after branching to the resume point), which at once >>> eliminates the need for re-loading rCX and rDI inside the asm(). >> >> First of all, this is code improvement, rather than a bug fix, right? The >> asm constraints are correct for the code as it is, I believe. > > No, the constraints aren''t really correct at present (yet this is > not visible as a functional bug in any way) - from a formal > perspective, the early clobber specification is needed on _any_ > operand that doesn''t retain its value throughout an asm(). Any > future compiler could derive something from this that we don''t > intend.I''m confused. The registers have the same values at the start and the end of the asm statement. How can it possibly matter, even in theory, whether they temporarily change in the middle? Is this fairly strong assumption written down in the gcc documentation anywhere?>> It also seems the patch splits into two independent parts: >> >> A. Not sure whether the trade-off of the rCX/rDI save/restore versus more >> complex asm constraints makes sense. >> >> B. Separately, the adjustment of the restore return address, and avoiding >> needing to reload rCX/rDI after label 1, as well as avoiding the copy in >> check_wakeup_from_wait(), is very nice. >> >> I''m inclined to take the second part only, and make it clearer in the >> changeset comment that it is not a bug fix. >> >> What do you think? > > The patch could be split, yes, but where exactly the split(s) > should be isn''t that obvious to me. And as it''s fixing the same > kind of issue on both asm()-s, it seemed sensible to keep the > changes together.Yes, that confused me too -- the output constraints on the second asm can hardly be wrong, or at least matter, since it never returns! Execution state is completely reloaded within the asm statement. -- Keir> Jan >
>>> On 03.08.12 at 13:00, Keir Fraser <keir.xen@gmail.com> wrote: > On 03/08/2012 11:34, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>>> On 03.08.12 at 12:04, Keir Fraser <keir@xen.org> wrote: >>> On 03/08/2012 09:40, "Jan Beulich" <JBeulich@suse.com> wrote: >>> >>>> In __prepare_to_wait(), properly mark early clobbered registers. By >>>> doing so, we at once eliminate the need to save/restore rCX and rDI. >>>> >>>> In check_wakeup_from_wait(), make the current constraints match by >>>> removing the code that actuall alters registers. By adjusting the >>>> resume address in __prepare_to_wait(), we can simply re-use the copying >>>> operation there (rather than doing a second pointless copy in the >>>> opposite direction after branching to the resume point), which at once >>>> eliminates the need for re-loading rCX and rDI inside the asm(). >>> >>> First of all, this is code improvement, rather than a bug fix, right? The >>> asm constraints are correct for the code as it is, I believe. >> >> No, the constraints aren't really correct at present (yet this is >> not visible as a functional bug in any way) - from a formal >> perspective, the early clobber specification is needed on _any_ >> operand that doesn't retain its value throughout an asm(). Any >> future compiler could derive something from this that we don't >> intend. > > I'm confused. The registers have the same values at the start and the end of > the asm statement. How can it possibly matter, even in theory, whether they > temporarily change in the middle? Is this fairly strong assumption written > down in the gcc documentation anywhere?It's in the specification of the & modifier: "‘&’ Means (in a particular alternative) that this operand is an earlyclobber operand, which is modified before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is used as an input operand or as part of any memory address." Of course, here we're not having any other operands, which is why at least at present getting this wrong does no harm.>>> It also seems the patch splits into two independent parts: >>> >>> A. Not sure whether the trade-off of the rCX/rDI save/restore versus more >>> complex asm constraints makes sense. >>> >>> B. Separately, the adjustment of the restore return address, and avoiding >>> needing to reload rCX/rDI after label 1, as well as avoiding the copy in >>> check_wakeup_from_wait(), is very nice. >>> >>> I'm inclined to take the second part only, and make it clearer in the >>> changeset comment that it is not a bug fix. >>> >>> What do you think? >> >> The patch could be split, yes, but where exactly the split(s) >> should be isn't that obvious to me. And as it's fixing the same >> kind of issue on both asm()-s, it seemed sensible to keep the >> changes together. > > Yes, that confused me too -- the output constraints on the second asm can > hardly be wrong, or at least matter, since it never returns! Execution state > is completely reloaded within the asm statement.Formally they're wrong too without that change. And the fact that the asm() does not "return" is irrelevant here, as the restriction is because of the potential use of the register inside the asm(), after it already got modified. Formally it's also not permitted for the asm() to branch elsewhere, but that is violated in so many places (Linux not the least) that they can hardly dare to ever come up with something breaking this. "Speaking of labels, jumps from one asm to another are not supported. The compiler’s optimizers do not know about these jumps, and therefore they cannot take account of them when deciding how to optimize." Plus, this likely is really targeting jumps from one asm to another _within_ one function, albeit that's not being said explicitly. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/08/2012 09:40, "Jan Beulich" <JBeulich@suse.com> wrote:> In __prepare_to_wait(), properly mark early clobbered registers. By > doing so, we at once eliminate the need to save/restore rCX and rDI.Okay, this patch has my blessing as is. But please add a remark that the existing constraints are falling foul of a strict reading of the gcc specification, and are actually okay in practice (being very straightforward, no memory constraints, etc). I really thought you had found a bug in practice, but this was not the case.> In check_wakeup_from_wait(), make the current constraints match by > removing the code that actuall alters registers. By adjusting the > resume address in __prepare_to_wait(), we can simply re-use the copying > operation there (rather than doing a second pointless copy in the > opposite direction after branching to the resume point), which at once > eliminates the need for re-loading rCX and rDI inside the asm(). > > Signed-off-by: Jan Beulich <jbeulich@suse.com>Acked-by: Keir Fraser <keir@xen.org>> --- a/xen/common/wait.c > +++ b/xen/common/wait.c > @@ -126,6 +126,7 @@ static void __prepare_to_wait(struct wai > { > char *cpu_info = (char *)get_cpu_info(); > struct vcpu *curr = current; > + unsigned long dummy; > > ASSERT(wqv->esp == 0); > > @@ -140,27 +141,27 @@ static void __prepare_to_wait(struct wai > > asm volatile ( > #ifdef CONFIG_X86_64 > - "push %%rax; push %%rbx; push %%rcx; push %%rdx; push %%rdi; " > + "push %%rax; push %%rbx; push %%rdx; " > "push %%rbp; push %%r8; push %%r9; push %%r10; push %%r11; " > "push %%r12; push %%r13; push %%r14; push %%r15; call 1f; " > - "1: mov 80(%%rsp),%%rdi; mov 96(%%rsp),%%rcx; mov %%rsp,%%rsi; " > + "1: mov %%rsp,%%rsi; addq $2f-1b,(%%rsp); " > "sub %%rsi,%%rcx; cmp %3,%%rcx; jbe 2f; " > "xor %%esi,%%esi; jmp 3f; " > "2: rep movsb; mov %%rsp,%%rsi; 3: pop %%rax; " > "pop %%r15; pop %%r14; pop %%r13; pop %%r12; " > "pop %%r11; pop %%r10; pop %%r9; pop %%r8; " > - "pop %%rbp; pop %%rdi; pop %%rdx; pop %%rcx; pop %%rbx; pop %%rax" > + "pop %%rbp; pop %%rdx; pop %%rbx; pop %%rax" > #else > - "push %%eax; push %%ebx; push %%ecx; push %%edx; push %%edi; " > + "push %%eax; push %%ebx; push %%edx; " > "push %%ebp; call 1f; " > - "1: mov 8(%%esp),%%edi; mov 16(%%esp),%%ecx; mov %%esp,%%esi; " > + "1: mov %%esp,%%esi; addl $2f-1b,(%%esp); " > "sub %%esi,%%ecx; cmp %3,%%ecx; jbe 2f; " > "xor %%esi,%%esi; jmp 3f; " > "2: rep movsb; mov %%esp,%%esi; 3: pop %%eax; " > - "pop %%ebp; pop %%edi; pop %%edx; pop %%ecx; pop %%ebx; pop %%eax" > + "pop %%ebp; pop %%edx; pop %%ebx; pop %%eax" > #endif > - : "=S" (wqv->esp) > - : "c" (cpu_info), "D" (wqv->stack), "i" (PAGE_SIZE) > + : "=&S" (wqv->esp), "=&c" (dummy), "=&D" (dummy) > + : "i" (PAGE_SIZE), "1" (cpu_info), "2" (wqv->stack) > : "memory" ); > > if ( unlikely(wqv->esp == 0) ) > @@ -200,7 +201,7 @@ void check_wakeup_from_wait(void) > } > > asm volatile ( > - "mov %1,%%"__OP"sp; rep movsb; jmp *(%%"__OP"sp)" > + "mov %1,%%"__OP"sp; jmp *(%0)" > : : "S" (wqv->stack), "D" (wqv->esp), > "c" ((char *)get_cpu_info() - (char *)wqv->esp) > : "memory" ); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 03/08/2012 12:36, "Jan Beulich" <JBeulich@suse.com> wrote:>> I''m confused. The registers have the same values at the start and the end of >> the asm statement. How can it possibly matter, even in theory, whether they >> temporarily change in the middle? Is this fairly strong assumption written >> down in the gcc documentation anywhere? > > It''s in the specification of the & modifier: > > "&¹ Means (in a particular alternative) that this operand is an > earlyclobber operand, which is modified before the instruction > is finished using the input operands. Therefore, this operand > may not lie in a register that is used as an input operand or as > part of any memory address." > > Of course, here we''re not having any other operands, which > is why at least at present getting this wrong does no harm.Yep, okay, that makes sense. Especially the use of an input operand to form a memory address, although of course that cannot happen in our specific case here. I have acked your patch, although I''d like an update to the patch comment. Thanks, -- Keir
>>> On 03.08.12 at 14:08, Keir Fraser <keir.xen@gmail.com> wrote: > On 03/08/2012 12:36, "Jan Beulich" <JBeulich@suse.com> wrote: > >>> I'm confused. The registers have the same values at the start and the end of >>> the asm statement. How can it possibly matter, even in theory, whether they >>> temporarily change in the middle? Is this fairly strong assumption written >>> down in the gcc documentation anywhere? >> >> It's in the specification of the & modifier: >> >> "Œ&¹ Means (in a particular alternative) that this operand is an >> earlyclobber operand, which is modified before the instruction >> is finished using the input operands. Therefore, this operand >> may not lie in a register that is used as an input operand or as >> part of any memory address." >> >> Of course, here we're not having any other operands, which >> is why at least at present getting this wrong does no harm. > > Yep, okay, that makes sense. Especially the use of an input operand to form > a memory address, although of course that cannot happen in our specific case > here. I have acked your patch, although I'd like an update to the patch > comment.How about this as an added initial paragraph? This fixes theoretical issues with those constraints - operands that get clobbered before consuming all input operands must be marked so according the the gcc documentation. Beyond that, the change is merely code improvement, not a bug fix. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/08/2012 13:15, "Jan Beulich" <JBeulich@suse.com> wrote:>> Yep, okay, that makes sense. Especially the use of an input operand to form >> a memory address, although of course that cannot happen in our specific case >> here. I have acked your patch, although I''d like an update to the patch >> comment. > > How about this as an added initial paragraph? > > This fixes theoretical issues with those constraints - operands that > get clobbered before consuming all input operands must be marked so > according the the gcc documentation. Beyond that, the change is merely > code improvement, not a bug fix.Perfect! K.