Mehdi Amini via llvm-dev
2016-May-24 20:02 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Hi, Could you use "MIR" to forge the example you're looking for? -- Mehdi> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Then let me shift focus from performance to size. With either optsize or minsize, the output is still the same. > > As per the subject, I'm not really interested in the quality of the final code, but in the way that the x86 target deals with the structural relationship between these registers. Specifically, I'd like to see if it would generate implicit defs/uses for AX on defs/uses of AH/AL. I looked in the X86 sources and I didn't find code that would make me certain, but I'm not too familiar with that backend. Having a testcase to work with would make it a lot easier for me. > > -Krzysztof > > > On 5/24/2016 12:03 PM, mats petersson wrote: >> On several variants of x86 processors, mixing `ah`, `al` and `ax` as >> source/destination in the same dependency chain will have some >> penalties, so for THOSE processors, there is a benefit to NOT use `al` >> and `ah` to reflect parts of `ax` - I believe this is caused by the fact >> that the processor doesn't ACTUALLY see these as parts of a bigger >> register internally, and will execute two independent dependency chains, >> UNTIL you start using `ax` as one register. At this point, the processor >> has to make sure both of dependency chains for `al` and `ah` have been >> complete, and that the merged value is available in `ax`. If the >> processor uses `cl` and `al`, this sort of problem is avoided. >> >> <<Quote from Intel Optimisation guide, page 3-44 >> intel.co.uk/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf >> >> A partial register stall happens when an instruction refers to a >> register, portions of >> which were previously modified by other instructions. For example, >> partial register >> stalls occurs with a read to AX while previous instructions stored AL >> and AH, or a read >> to EAX while previous in >> struction modified AX. >> The delay of a partial register stall is small in processors based on >> Intel Core and >> NetBurst microarchitectures, and in Pentium M processor (with CPUID >> signature >> family 6, model 13), Intel Core Solo, >> and Intel Core Duo processors. Pentium M >> processors (CPUID signature with family 6, >> model 9) and the P6 family incur a large >> penalty. >> <<Enq quote>> >> >> So for compact code, yes, it's probably an advantage. For SOME >> processors in the x86 range, not so good for performance. >> >> Whether LLVM has the information as to WHICH processor models have such >> penalties (or better yet, can determine the amount of time lost for this >> sort of operation), I'm not sure. It's obviously something that CAN be >> programmed into a compiler, it's just a matter of understanding the >> effort vs. reward factor for this particular type of optimisation, >> compared to other things that could be done to improve the quality of >> the code generated. >> >> -- >> Mats >> >> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Try using x86 mode rather than Intel64 mode. I have definitely >> gotten it to use both ah and al in 32 bit x86 code generation. >> In particular, I have seen that in loops for both the spec2000 and >> spec2006 versions of bzip. It can happen, but it does only rarely. >> >> Kevin Smith >> >> >-----Original Message----- >> >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org >> <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of >> >Krzysztof Parzyszek via llvm-dev >> >Sent: Tuesday, May 24, 2016 8:04 AM >> >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> >> >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend >> > >> >I'm trying to see how the x86 backend deals with the relationship >> >between AL, AH and AX, but I can't get it to generate any code that >> >would expose an interesting scenario. >> > >> >For example, I wrote this piece: >> > >> >typedef struct { >> > char x, y; >> >} struct_t; >> > >> >struct_t z; >> > >> >struct_t foo(char *p) { >> > struct_t s; >> > s.x = *p++; >> > s.y = *p; >> > z = s; >> > s.x++; >> > return s; >> >} >> > >> >But the output at -O2 is >> > >> >foo: # @foo >> > .cfi_startproc >> ># BB#0: # %entry >> > movb (%rdi), %al >> > movzbl 1(%rdi), %ecx >> > movb %al, z(%rip) >> > movb %cl, z+1(%rip) >> > incb %al >> > shll $8, %ecx >> > movzbl %al, %eax >> > orl %ecx, %eax >> > retq >> > >> > >> >I was hoping it would do something along the lines of >> > >> > movb (%rdi), %al >> > movb 1(%rdi), %ah >> > movh %ax, z(%rip) >> > incb %al >> > retq >> > >> > >> >Why is the x86 backend not getting this code? Does it know that >> AH:AL >> >AX? >> > >> >-Krzysztof >> > >> > >> > >> >-- >> >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> >hosted by The Linux Foundation >> >_______________________________________________ >> >LLVM Developers mailing list >> >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> >lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Smith, Kevin B via llvm-dev
2016-May-24 20:25 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Here's some of the generated code from the current community head for bzip2.c from spec 256.bzip2, with these options: clang -m32 -S -O2 bzip2.c .LBB14_4: # %bsW.exit24 subl %eax, %ebx addl $8, %eax movl %ebx, %ecx movl %eax, bsLive shll %cl, %edi movl %ebp, %ecx orl %esi, %edi movzbl %ch, %esi cmpl $8, %eax movl %edi, bsBuff jl .LBB14_6 As you can see, it is using both cl and ch for different values in this basic block. This occurs in the generated code for the routine bsPutUInt32 Kevin Smith>-----Original Message----- >From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] >Sent: Tuesday, May 24, 2016 1:03 PM >To: Krzysztof Parzyszek <kparzysz at codeaurora.org> >Cc: mats petersson <mats at planetcatfish.com>; Smith, Kevin B ><kevin.b.smith at intel.com>; llvm-dev at lists.llvm.org >Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend > >Hi, > >Could you use "MIR" to forge the example you're looking for? > >-- >Mehdi > > >> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm- >dev at lists.llvm.org> wrote: >> >> Then let me shift focus from performance to size. With either optsize or >minsize, the output is still the same. >> >> As per the subject, I'm not really interested in the quality of the final code, >but in the way that the x86 target deals with the structural relationship >between these registers. Specifically, I'd like to see if it would generate >implicit defs/uses for AX on defs/uses of AH/AL. I looked in the X86 >sources and I didn't find code that would make me certain, but I'm not too >familiar with that backend. Having a testcase to work with would make it a lot >easier for me. >> >> -Krzysztof >> >> >> On 5/24/2016 12:03 PM, mats petersson wrote: >>> On several variants of x86 processors, mixing `ah`, `al` and `ax` as >>> source/destination in the same dependency chain will have some >>> penalties, so for THOSE processors, there is a benefit to NOT use `al` >>> and `ah` to reflect parts of `ax` - I believe this is caused by the fact >>> that the processor doesn't ACTUALLY see these as parts of a bigger >>> register internally, and will execute two independent dependency chains, >>> UNTIL you start using `ax` as one register. At this point, the processor >>> has to make sure both of dependency chains for `al` and `ah` have been >>> complete, and that the merged value is available in `ax`. If the >>> processor uses `cl` and `al`, this sort of problem is avoided. >>> >>> <<Quote from Intel Optimisation guide, page 3-44 >>> intel.co.uk/content/dam/doc/manual/64-ia-32-architectures- >optimization-manual.pdf >>> >>> A partial register stall happens when an instruction refers to a >>> register, portions of >>> which were previously modified by other instructions. For example, >>> partial register >>> stalls occurs with a read to AX while previous instructions stored AL >>> and AH, or a read >>> to EAX while previous in >>> struction modified AX. >>> The delay of a partial register stall is small in processors based on >>> Intel Core and >>> NetBurst microarchitectures, and in Pentium M processor (with CPUID >>> signature >>> family 6, model 13), Intel Core Solo, >>> and Intel Core Duo processors. Pentium M >>> processors (CPUID signature with family 6, >>> model 9) and the P6 family incur a large >>> penalty. >>> <<Enq quote>> >>> >>> So for compact code, yes, it's probably an advantage. For SOME >>> processors in the x86 range, not so good for performance. >>> >>> Whether LLVM has the information as to WHICH processor models have >such >>> penalties (or better yet, can determine the amount of time lost for this >>> sort of operation), I'm not sure. It's obviously something that CAN be >>> programmed into a compiler, it's just a matter of understanding the >>> effort vs. reward factor for this particular type of optimisation, >>> compared to other things that could be done to improve the quality of >>> the code generated. >>> >>> -- >>> Mats >>> >>> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev >>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> Try using x86 mode rather than Intel64 mode. I have definitely >>> gotten it to use both ah and al in 32 bit x86 code generation. >>> In particular, I have seen that in loops for both the spec2000 and >>> spec2006 versions of bzip. It can happen, but it does only rarely. >>> >>> Kevin Smith >>> >>> >-----Original Message----- >>> >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org >>> <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of >>> >Krzysztof Parzyszek via llvm-dev >>> >Sent: Tuesday, May 24, 2016 8:04 AM >>> >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm- >dev at lists.llvm.org>> >>> >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend >>> > >>> >I'm trying to see how the x86 backend deals with the relationship >>> >between AL, AH and AX, but I can't get it to generate any code that >>> >would expose an interesting scenario. >>> > >>> >For example, I wrote this piece: >>> > >>> >typedef struct { >>> > char x, y; >>> >} struct_t; >>> > >>> >struct_t z; >>> > >>> >struct_t foo(char *p) { >>> > struct_t s; >>> > s.x = *p++; >>> > s.y = *p; >>> > z = s; >>> > s.x++; >>> > return s; >>> >} >>> > >>> >But the output at -O2 is >>> > >>> >foo: # @foo >>> > .cfi_startproc >>> ># BB#0: # %entry >>> > movb (%rdi), %al >>> > movzbl 1(%rdi), %ecx >>> > movb %al, z(%rip) >>> > movb %cl, z+1(%rip) >>> > incb %al >>> > shll $8, %ecx >>> > movzbl %al, %eax >>> > orl %ecx, %eax >>> > retq >>> > >>> > >>> >I was hoping it would do something along the lines of >>> > >>> > movb (%rdi), %al >>> > movb 1(%rdi), %ah >>> > movh %ax, z(%rip) >>> > incb %al >>> > retq >>> > >>> > >>> >Why is the x86 backend not getting this code? Does it know that >>> AH:AL >>> >AX? >>> > >>> >-Krzysztof >>> > >>> > >>> > >>> >-- >>> >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>> >hosted by The Linux Foundation >>> >_______________________________________________ >>> >LLVM Developers mailing list >>> >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> >lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >hosted by The Linux Foundation >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Krzysztof Parzyszek via llvm-dev
2016-May-24 20:52 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
On 5/24/2016 3:02 PM, Mehdi Amini wrote:> > Could you use "MIR" to forge the example you're looking for?I have thought of that. At the moment I'm experimenting with tracking subregister liveness on Hexagon. It looks like it may be what we need, although it causes some testcase failures. If the subregister liveness is not the way to go, I'll have to try MIR. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Krzysztof Parzyszek via llvm-dev
2016-May-24 20:59 UTC
[llvm-dev] Liveness of AL, AH and AX in x86 backend
Thanks Kevin. This isn't exactly what I'm looking for, though. The ECX is explicitly defined here and CL/CH are only used. I was interested in the opposite situation---where the sub-registers are defined separately and then the super-register is used as a whole. Hopefully the sub-register liveness tracking is what I need, so the questions about x86 may become moot. -Krzysztof On 5/24/2016 3:25 PM, Smith, Kevin B wrote:> Here's some of the generated code from the current community head for bzip2.c from spec 256.bzip2, with these options: > > clang -m32 -S -O2 bzip2.c > > .LBB14_4: # %bsW.exit24 > subl %eax, %ebx > addl $8, %eax > movl %ebx, %ecx > movl %eax, bsLive > shll %cl, %edi > movl %ebp, %ecx > orl %esi, %edi > movzbl %ch, %esi > cmpl $8, %eax > movl %edi, bsBuff > jl .LBB14_6 > > As you can see, it is using both cl and ch for different values in this basic block. This occurs in the generated code for the routine bsPutUInt32 > > Kevin Smith > >> -----Original Message----- >> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] >> Sent: Tuesday, May 24, 2016 1:03 PM >> To: Krzysztof Parzyszek <kparzysz at codeaurora.org> >> Cc: mats petersson <mats at planetcatfish.com>; Smith, Kevin B >> <kevin.b.smith at intel.com>; llvm-dev at lists.llvm.org >> Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend >> >> Hi, >> >> Could you use "MIR" to forge the example you're looking for? >> >> -- >> Mehdi >> >> >>> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm- >> dev at lists.llvm.org> wrote: >>> >>> Then let me shift focus from performance to size. With either optsize or >> minsize, the output is still the same. >>> >>> As per the subject, I'm not really interested in the quality of the final code, >> but in the way that the x86 target deals with the structural relationship >> between these registers. Specifically, I'd like to see if it would generate >> implicit defs/uses for AX on defs/uses of AH/AL. I looked in the X86 >> sources and I didn't find code that would make me certain, but I'm not too >> familiar with that backend. Having a testcase to work with would make it a lot >> easier for me. >>> >>> -Krzysztof >>> >>> >>> On 5/24/2016 12:03 PM, mats petersson wrote: >>>> On several variants of x86 processors, mixing `ah`, `al` and `ax` as >>>> source/destination in the same dependency chain will have some >>>> penalties, so for THOSE processors, there is a benefit to NOT use `al` >>>> and `ah` to reflect parts of `ax` - I believe this is caused by the fact >>>> that the processor doesn't ACTUALLY see these as parts of a bigger >>>> register internally, and will execute two independent dependency chains, >>>> UNTIL you start using `ax` as one register. At this point, the processor >>>> has to make sure both of dependency chains for `al` and `ah` have been >>>> complete, and that the merged value is available in `ax`. If the >>>> processor uses `cl` and `al`, this sort of problem is avoided. >>>> >>>> <<Quote from Intel Optimisation guide, page 3-44 >>>> intel.co.uk/content/dam/doc/manual/64-ia-32-architectures- >> optimization-manual.pdf >>>> >>>> A partial register stall happens when an instruction refers to a >>>> register, portions of >>>> which were previously modified by other instructions. For example, >>>> partial register >>>> stalls occurs with a read to AX while previous instructions stored AL >>>> and AH, or a read >>>> to EAX while previous in >>>> struction modified AX. >>>> The delay of a partial register stall is small in processors based on >>>> Intel Core and >>>> NetBurst microarchitectures, and in Pentium M processor (with CPUID >>>> signature >>>> family 6, model 13), Intel Core Solo, >>>> and Intel Core Duo processors. Pentium M >>>> processors (CPUID signature with family 6, >>>> model 9) and the P6 family incur a large >>>> penalty. >>>> <<Enq quote>> >>>> >>>> So for compact code, yes, it's probably an advantage. For SOME >>>> processors in the x86 range, not so good for performance. >>>> >>>> Whether LLVM has the information as to WHICH processor models have >> such >>>> penalties (or better yet, can determine the amount of time lost for this >>>> sort of operation), I'm not sure. It's obviously something that CAN be >>>> programmed into a compiler, it's just a matter of understanding the >>>> effort vs. reward factor for this particular type of optimisation, >>>> compared to other things that could be done to improve the quality of >>>> the code generated. >>>> >>>> -- >>>> Mats >>>> >>>> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev >>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> Try using x86 mode rather than Intel64 mode. I have definitely >>>> gotten it to use both ah and al in 32 bit x86 code generation. >>>> In particular, I have seen that in loops for both the spec2000 and >>>> spec2006 versions of bzip. It can happen, but it does only rarely. >>>> >>>> Kevin Smith >>>> >>>> >-----Original Message----- >>>> >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org >>>> <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of >>>> >Krzysztof Parzyszek via llvm-dev >>>> >Sent: Tuesday, May 24, 2016 8:04 AM >>>> >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm- >> dev at lists.llvm.org>> >>>> >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend >>>> > >>>> >I'm trying to see how the x86 backend deals with the relationship >>>> >between AL, AH and AX, but I can't get it to generate any code that >>>> >would expose an interesting scenario. >>>> > >>>> >For example, I wrote this piece: >>>> > >>>> >typedef struct { >>>> > char x, y; >>>> >} struct_t; >>>> > >>>> >struct_t z; >>>> > >>>> >struct_t foo(char *p) { >>>> > struct_t s; >>>> > s.x = *p++; >>>> > s.y = *p; >>>> > z = s; >>>> > s.x++; >>>> > return s; >>>> >} >>>> > >>>> >But the output at -O2 is >>>> > >>>> >foo: # @foo >>>> > .cfi_startproc >>>> ># BB#0: # %entry >>>> > movb (%rdi), %al >>>> > movzbl 1(%rdi), %ecx >>>> > movb %al, z(%rip) >>>> > movb %cl, z+1(%rip) >>>> > incb %al >>>> > shll $8, %ecx >>>> > movzbl %al, %eax >>>> > orl %ecx, %eax >>>> > retq >>>> > >>>> > >>>> >I was hoping it would do something along the lines of >>>> > >>>> > movb (%rdi), %al >>>> > movb 1(%rdi), %ah >>>> > movh %ax, z(%rip) >>>> > incb %al >>>> > retq >>>> > >>>> > >>>> >Why is the x86 backend not getting this code? Does it know that >>>> AH:AL >>>> >AX? >>>> > >>>> >-Krzysztof >>>> > >>>> > >>>> > >>>> >-- >>>> >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>>> >hosted by The Linux Foundation >>>> >_______________________________________________ >>>> >LLVM Developers mailing list >>>> >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>> >lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation