Krzysztof Parzyszek via llvm-dev
2016-May-24 15:03 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
I'm trying to see how the x86 backend deals with the relationship between AL, AH and AX, but I can't get it to generate any code that would expose an interesting scenario. For example, I wrote this piece: typedef struct { char x, y; } struct_t; struct_t z; struct_t foo(char *p) { struct_t s; s.x = *p++; s.y = *p; z = s; s.x++; return s; } But the output at -O2 is foo: # @foo .cfi_startproc # BB#0: # %entry movb (%rdi), %al movzbl 1(%rdi), %ecx movb %al, z(%rip) movb %cl, z+1(%rip) incb %al shll $8, %ecx movzbl %al, %eax orl %ecx, %eax retq I was hoping it would do something along the lines of movb (%rdi), %al movb 1(%rdi), %ah movh %ax, z(%rip) incb %al retq Why is the x86 backend not getting this code? Does it know that AH:AL = AX? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Smith, Kevin B via llvm-dev
2016-May-24 16:09 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Try using x86 mode rather than Intel64 mode. I have definitely gotten it to use both ah and al in 32 bit x86 code generation. In particular, I have seen that in loops for both the spec2000 and spec2006 versions of bzip. It can happen, but it does only rarely. Kevin Smith>-----Original Message----- >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of >Krzysztof Parzyszek via llvm-dev >Sent: Tuesday, May 24, 2016 8:04 AM >To: LLVM Dev <llvm-dev at lists.llvm.org> >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend > >I'm trying to see how the x86 backend deals with the relationship >between AL, AH and AX, but I can't get it to generate any code that >would expose an interesting scenario. > >For example, I wrote this piece: > >typedef struct { > char x, y; >} struct_t; > >struct_t z; > >struct_t foo(char *p) { > struct_t s; > s.x = *p++; > s.y = *p; > z = s; > s.x++; > return s; >} > >But the output at -O2 is > >foo: # @foo > .cfi_startproc ># BB#0: # %entry > movb (%rdi), %al > movzbl 1(%rdi), %ecx > movb %al, z(%rip) > movb %cl, z+1(%rip) > incb %al > shll $8, %ecx > movzbl %al, %eax > orl %ecx, %eax > retq > > >I was hoping it would do something along the lines of > > movb (%rdi), %al > movb 1(%rdi), %ah > movh %ax, z(%rip) > incb %al > retq > > >Why is the x86 backend not getting this code? Does it know that AH:AL >AX? > >-Krzysztof > > > >-- >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >hosted by The Linux Foundation >_______________________________________________ >LLVM Developers mailing list >llvm-dev at lists.llvm.org >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Krzysztof Parzyszek via llvm-dev
2016-May-24 17:03 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
I changed the triple to i386 and the CPU to pentium, but I still didn't get the good code. foo: # @foo .cfi_startproc # BB#0: # %entry movl 4(%esp), %eax movb (%eax), %dl movzbl 1(%eax), %ecx movb %dl, z movb %cl, z+1 incb %dl shll $8, %ecx movzbl %dl, %eax orl %ecx, %eax retl -Krzysztof On 5/24/2016 11:09 AM, Smith, Kevin B wrote:> Try using x86 mode rather than Intel64 mode. I have definitely gotten it to use both ah and al in 32 bit x86 code generation. > In particular, I have seen that in loops for both the spec2000 and spec2006 versions of bzip. It can happen, but it does only rarely. > > Kevin Smith > >> -----Original Message----- >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of >> Krzysztof Parzyszek via llvm-dev >> Sent: Tuesday, May 24, 2016 8:04 AM >> To: LLVM Dev <llvm-dev at lists.llvm.org> >> Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend >> >> I'm trying to see how the x86 backend deals with the relationship >> between AL, AH and AX, but I can't get it to generate any code that >> would expose an interesting scenario. >> >> For example, I wrote this piece: >> >> typedef struct { >> char x, y; >> } struct_t; >> >> struct_t z; >> >> struct_t foo(char *p) { >> struct_t s; >> s.x = *p++; >> s.y = *p; >> z = s; >> s.x++; >> return s; >> } >> >> But the output at -O2 is >> >> foo: # @foo >> .cfi_startproc >> # BB#0: # %entry >> movb (%rdi), %al >> movzbl 1(%rdi), %ecx >> movb %al, z(%rip) >> movb %cl, z+1(%rip) >> incb %al >> shll $8, %ecx >> movzbl %al, %eax >> orl %ecx, %eax >> retq >> >> >> I was hoping it would do something along the lines of >> >> movb (%rdi), %al >> movb 1(%rdi), %ah >> movh %ax, z(%rip) >> incb %al >> retq >> >> >> Why is the x86 backend not getting this code? Does it know that AH:AL >> AX? >> >> -Krzysztof >> >> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
mats petersson via llvm-dev
2016-May-24 17:03 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
On several variants of x86 processors, mixing `ah`, `al` and `ax` as source/destination in the same dependency chain will have some penalties, so for THOSE processors, there is a benefit to NOT use `al` and `ah` to reflect parts of `ax` - I believe this is caused by the fact that the processor doesn't ACTUALLY see these as parts of a bigger register internally, and will execute two independent dependency chains, UNTIL you start using `ax` as one register. At this point, the processor has to make sure both of dependency chains for `al` and `ah` have been complete, and that the merged value is available in `ax`. If the processor uses `cl` and `al`, this sort of problem is avoided. <<Quote from Intel Optimisation guide, page 3-44 http://www.intel.co.uk/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf A partial register stall happens when an instruction refers to a register, portions of which were previously modified by other instructions. For example, partial register stalls occurs with a read to AX while previous instructions stored AL and AH, or a read to EAX while previous in struction modified AX. The delay of a partial register stall is small in processors based on Intel Core and NetBurst microarchitectures, and in Pentium M processor (with CPUID signature family 6, model 13), Intel Core Solo, and Intel Core Duo processors. Pentium M processors (CPUID signature with family 6, model 9) and the P6 family incur a large penalty. <<Enq quote>> So for compact code, yes, it's probably an advantage. For SOME processors in the x86 range, not so good for performance. Whether LLVM has the information as to WHICH processor models have such penalties (or better yet, can determine the amount of time lost for this sort of operation), I'm not sure. It's obviously something that CAN be programmed into a compiler, it's just a matter of understanding the effort vs. reward factor for this particular type of optimisation, compared to other things that could be done to improve the quality of the code generated. -- Mats On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Try using x86 mode rather than Intel64 mode. I have definitely gotten it > to use both ah and al in 32 bit x86 code generation. > In particular, I have seen that in loops for both the spec2000 and > spec2006 versions of bzip. It can happen, but it does only rarely. > > Kevin Smith > > >-----Original Message----- > >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > >Krzysztof Parzyszek via llvm-dev > >Sent: Tuesday, May 24, 2016 8:04 AM > >To: LLVM Dev <llvm-dev at lists.llvm.org> > >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend > > > >I'm trying to see how the x86 backend deals with the relationship > >between AL, AH and AX, but I can't get it to generate any code that > >would expose an interesting scenario. > > > >For example, I wrote this piece: > > > >typedef struct { > > char x, y; > >} struct_t; > > > >struct_t z; > > > >struct_t foo(char *p) { > > struct_t s; > > s.x = *p++; > > s.y = *p; > > z = s; > > s.x++; > > return s; > >} > > > >But the output at -O2 is > > > >foo: # @foo > > .cfi_startproc > ># BB#0: # %entry > > movb (%rdi), %al > > movzbl 1(%rdi), %ecx > > movb %al, z(%rip) > > movb %cl, z+1(%rip) > > incb %al > > shll $8, %ecx > > movzbl %al, %eax > > orl %ecx, %eax > > retq > > > > > >I was hoping it would do something along the lines of > > > > movb (%rdi), %al > > movb 1(%rdi), %ah > > movh %ax, z(%rip) > > incb %al > > retq > > > > > >Why is the x86 backend not getting this code? Does it know that AH:AL > >AX? > > > >-Krzysztof > > > > > > > >-- > >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > >hosted by The Linux Foundation > >_______________________________________________ > >LLVM Developers mailing list > >llvm-dev at lists.llvm.org > >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160524/4c39a269/attachment.html>
Quentin Colombet via llvm-dev
2016-May-24 17:40 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Hi Krzysztof,> On May 24, 2016, at 8:03 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I'm trying to see how the x86 backend deals with the relationship between AL, AH and AX, but I can't get it to generate any code that would expose an interesting scenario. > > For example, I wrote this piece: > > typedef struct { > char x, y; > } struct_t; > > struct_t z; > > struct_t foo(char *p) { > struct_t s; > s.x = *p++; > s.y = *p; > z = s; > s.x++; > return s; > } > > But the output at -O2 is > > foo: # @foo > .cfi_startproc > # BB#0: # %entry > movb (%rdi), %al > movzbl 1(%rdi), %ecx > movb %al, z(%rip) > movb %cl, z+1(%rip) > incb %al > shll $8, %ecx > movzbl %al, %eax > orl %ecx, %eax > retq > > > I was hoping it would do something along the lines of > > movb (%rdi), %al > movb 1(%rdi), %ah > movh %ax, z(%rip) > incb %al > retq > > > Why is the x86 backend not getting this code?Try enabling the sub-register liveness feature. I am guessing we think we cannot use the same register for the low and high part. Though, I would need to see the machine instrs to be sure.> Does it know that AH:AL = AX?Yes it does. Cheers, -Quentin> > -Krzysztof > > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Krzysztof Parzyszek via llvm-dev
2016-May-24 18:01 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Enabling subreg liveness tracking didn't do anything. By altering the allocation order I managed to get the backend to use CL/CH for the struct, but the stores were still separate (even though storing CX would be correct)... Here's another question that falls into the same category: The function X86InstrInfo::loadRegFromStackSlot does not append any implicit uses/defs. How does it know that it won't need them? If AX was spilled in the middle of a live range of EAX, wouldn't restoring of AX need to implicitly define EAX? We deal with such cases a lot in the Hexagon backend and it continues to be a major pain. I'm trying to understand if there are better options for us. -Krzysztof On 5/24/2016 12:40 PM, Quentin Colombet wrote:> Hi Krzysztof, > >> On May 24, 2016, at 8:03 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I'm trying to see how the x86 backend deals with the relationship between AL, AH and AX, but I can't get it to generate any code that would expose an interesting scenario. >> >> For example, I wrote this piece: >> >> typedef struct { >> char x, y; >> } struct_t; >> >> struct_t z; >> >> struct_t foo(char *p) { >> struct_t s; >> s.x = *p++; >> s.y = *p; >> z = s; >> s.x++; >> return s; >> } >> >> But the output at -O2 is >> >> foo: # @foo >> .cfi_startproc >> # BB#0: # %entry >> movb (%rdi), %al >> movzbl 1(%rdi), %ecx >> movb %al, z(%rip) >> movb %cl, z+1(%rip) >> incb %al >> shll $8, %ecx >> movzbl %al, %eax >> orl %ecx, %eax >> retq >> >> >> I was hoping it would do something along the lines of >> >> movb (%rdi), %al >> movb 1(%rdi), %ah >> movh %ax, z(%rip) >> incb %al >> retq >> >> >> Why is the x86 backend not getting this code? > > Try enabling the sub-register liveness feature. I am guessing we think we cannot use the same register for the low and high part. > Though, I would need to see the machine instrs to be sure. > >> Does it know that AH:AL = AX? > > Yes it does. > > Cheers, > -Quentin >> >> -Krzysztof >> >> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation