Matthias Braun
2015-Jul-30 00:14 UTC
[LLVMdev] [3.7.0] Two late issues with cross compilation to mips
To reduce memory consumption clobbered registers are handled with RegisterMask machine operands which contain a bitset of all registers clobbered. - Matthias> On Jul 29, 2015, at 3:00 PM, Daniel Sanders <daniel.sanders at imgtec.com> wrote: > > I believe I've identified the problem with almabench but I haven't found the root cause in the compiler yet. > > The problem is that a caller saved register ($f14) is being moved across a call and this call sometimes clobbers the value. As a result, the value of the TWOPI constant used in the fmod() calls isn't always 2*PI. > > According to -print-after-all, the pass that moves the instruction is Simple Register Coalescing. The bit I'm stuck on at the moment is that I'm not sure what information is supposed to prevent this move from happening. I thought there was supposed to be an ImplicitDefine on the call instruction for each clobbered register but this doesn't seem to be the case. Am I missing something obvious? > ________________________________________ > From: Daniel Sanders > Sent: 29 July 2015 11:08 > To: Hans Wennborg (hans at chromium.org <mailto:hans at chromium.org>) > Cc: Simon Atanasyan (simon at atanasyan.com <mailto:simon at atanasyan.com>); LLVM Developers Mailing List (llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>) > Subject: [3.7.0] Two late issues with cross compilation to mips > > Hi, > > Sorry for the late report but I've only just found these issues. Llvm.org isn't working for me at the moment but I'll file tickets once it is. > > The issues are: > > 1. Almabench has some significant numerical differences and fails the reference check for some configs. I'm investigating this one at the moment but early indications are that it's a similar (but different) problem to the one we had in LLVM 3.6.2. > > 2. Read-only exception tables have broken compatibility with the ~2 year old gcc toolchains I was using for release testing cross compilation. This isn't a problem for most test-suite runs since we can just update the assembler but is causing trouble for microMIPS. More recent toolchains lack the microMIPS multilib I was using and migrating to the new one is causing link failures. These failures are related to ELF header bits specifying the SNaN/QNaN encodings to be IEEE754-1985 or IEEE754-2008 compliant. I suspect the –mnan=2008 isn't reaching the assembler. > > 3. Clang is incompatible with changes to the mips-mti-linux-gnu sysroot from Imagination's mips-mti-linux-gnu toolchain. Libaries are still multilib'd (albeit with a reduced set) but some of the include paths aren't anymore. It's also no longer correct to include sysroot/include (this path is added by common code) since this skips some function definitions. Instead, we must only include sysroot/usr/include like GCC does. There may be more details but so far the fix doesn't look simple. As far as I can tell, clang's multilib expects includes and libraries to have the same layout (osSuffix() seems to control both). The good news is that it's not a regression since we can use toolchains from before this layout change. > > Daniel Sanders > Leading Software Design Engineer, MIPS Processor IP > Imagination Technologies Limited > www.imgtec.com <http://www.imgtec.com/><http://www.imgtec.com/ <http://www.imgtec.com/>> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150729/b2ffc141/attachment.html>
Daniel Sanders
2015-Jul-30 10:03 UTC
[LLVMdev] [3.7.0] Two late issues with cross compilation to mips
Thanks. This is making a lot more sense now and it's looking like this issue isn't Mips specific. Here's the IR dump before simple register coalescing (note: I've patched the IR printer to print the contents of the regmask): 4480B %vreg260<def> = LDC1 %vreg253, <cp#3>[TF=6]; mem:LD8[ConstantPool] AFGR64:%vreg260 GPR32:%vreg253 4496B %vreg261<def> = FMUL_D32 %vreg247, %vreg248; AFGR64:%vreg261,%vreg247,%vreg248 4512B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> 4528B %D6<def> = COPY %vreg243; AFGR64:%vreg243 4544B JAL <ga:@sin>, <regmask %FP %RA %D10 %D11 %D12 %D13 %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> 4560B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> 4576B %vreg262<def> = COPY %D0<kill>; AFGR64:%vreg262 4592B %vreg263<def> = FMUL_D32 %vreg256, %vreg262; AFGR64:%vreg263,%vreg256,%vreg262 4608B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> 4624B %vreg264<def> = FADD_D32 %vreg261, %vreg263; AFGR64:%vreg264,%vreg261,%vreg263 4640B %D6<def> = COPY %vreg255; AFGR64:%vreg255 4656B JAL <ga:@cos>, <regmask %FP %RA %D10 %D11 %D12 %D13 %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> 4672B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> 4688B %vreg265<def> = COPY %D0<kill>; AFGR64:%vreg265 4704B %vreg266<def> = FMUL_D32 %vreg258, %vreg265; AFGR64:%vreg266,%vreg258,%vreg265 4720B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> 4736B %D6<def> = COPY %vreg255; AFGR64:%vreg255 4752B JAL <ga:@sin>, <regmask %FP %RA %D10 %D11 %D12 %D13 %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> 4768B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> 4784B %vreg267<def> = COPY %D0<kill>; AFGR64:%vreg267 4800B %vreg268<def> = FMUL_D32 %vreg257, %vreg267; AFGR64:%vreg268,%vreg257,%vreg267 4816B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> 4832B %vreg269<def> = FMUL_D32 %vreg0, %vreg264; AFGR64:%vreg269,%vreg0,%vreg264 4848B %vreg270<def> = FADD_D32 %vreg266, %vreg268; AFGR64:%vreg270,%vreg266,%vreg268 4864B %vreg271<def> = FMUL_D32 %vreg0, %vreg270; AFGR64:%vreg271,%vreg0,%vreg270 4880B %vreg272<def> = FMUL_D32 %vreg271, %vreg259; AFGR64:%vreg272,%vreg271,%vreg259 4896B %vreg273<def> = FMUL_D32 %vreg269, %vreg259; AFGR64:%vreg273,%vreg269,%vreg259 4912B %vreg274<def> = FADD_D32 %vreg24, %vreg273; AFGR64:%vreg274,%vreg24,%vreg273 4928B %vreg275<def> = FADD_D32 %vreg274, %vreg272; AFGR64:%vreg275,%vreg274,%vreg272 4944B %D6<def> = COPY %vreg275; AFGR64:%vreg275 4960B %D7<def> = COPY %vreg260; AFGR64:%vreg260 4976B JAL <ga:@fmod>, <regmask %FP %RA %D10 %D11 %D12 %D13 %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, %D6<imp-use,kill>, %D7<imp-use>, %SP<imp-def>, %D0<imp-def> The %vreg260 at 4480B is being coalesced with the %D7 at 4960B but the call preserved masks in the JAL's (jump and link) at 4544B, 4752B, and 4656B say that D7 isn't preserved. The pass doesn't call getRegMask() in any obvious way so it seems likely that it's not respecting the mask. ________________________________________ From: Matthias Braun [mbraun at apple.com] Sent: 30 July 2015 01:14 To: Daniel Sanders Cc: Hans Wennborg (hans at chromium.org); LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: [LLVMdev] [3.7.0] Two late issues with cross compilation to mips To reduce memory consumption clobbered registers are handled with RegisterMask machine operands which contain a bitset of all registers clobbered. - Matthias On Jul 29, 2015, at 3:00 PM, Daniel Sanders <daniel.sanders at imgtec.com<mailto:daniel.sanders at imgtec.com>> wrote: I believe I've identified the problem with almabench but I haven't found the root cause in the compiler yet. The problem is that a caller saved register ($f14) is being moved across a call and this call sometimes clobbers the value. As a result, the value of the TWOPI constant used in the fmod() calls isn't always 2*PI. According to -print-after-all, the pass that moves the instruction is Simple Register Coalescing. The bit I'm stuck on at the moment is that I'm not sure what information is supposed to prevent this move from happening. I thought there was supposed to be an ImplicitDefine on the call instruction for each clobbered register but this doesn't seem to be the case. Am I missing something obvious? ________________________________________ From: Daniel Sanders Sent: 29 July 2015 11:08 To: Hans Wennborg (hans at chromium.org<mailto:hans at chromium.org>) Cc: Simon Atanasyan (simon at atanasyan.com<mailto:simon at atanasyan.com>); LLVM Developers Mailing List (llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>) Subject: [3.7.0] Two late issues with cross compilation to mips Hi, Sorry for the late report but I've only just found these issues. Llvm.org<http://Llvm.org> isn't working for me at the moment but I'll file tickets once it is. The issues are: 1. Almabench has some significant numerical differences and fails the reference check for some configs. I'm investigating this one at the moment but early indications are that it's a similar (but different) problem to the one we had in LLVM 3.6.2. 2. Read-only exception tables have broken compatibility with the ~2 year old gcc toolchains I was using for release testing cross compilation. This isn't a problem for most test-suite runs since we can just update the assembler but is causing trouble for microMIPS. More recent toolchains lack the microMIPS multilib I was using and migrating to the new one is causing link failures. These failures are related to ELF header bits specifying the SNaN/QNaN encodings to be IEEE754-1985 or IEEE754-2008 compliant. I suspect the –mnan=2008 isn't reaching the assembler. 3. Clang is incompatible with changes to the mips-mti-linux-gnu sysroot from Imagination's mips-mti-linux-gnu toolchain. Libaries are still multilib'd (albeit with a reduced set) but some of the include paths aren't anymore. It's also no longer correct to include sysroot/include (this path is added by common code) since this skips some function definitions. Instead, we must only include sysroot/usr/include like GCC does. There may be more details but so far the fix doesn't look simple. As far as I can tell, clang's multilib expects includes and libraries to have the same layout (osSuffix() seems to control both). The good news is that it's not a regression since we can use toolchains from before this layout change. Daniel Sanders Leading Software Design Engineer, MIPS Processor IP Imagination Technologies Limited www.imgtec.com<http://www.imgtec.com/><http://www.imgtec.com/> _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Daniel Sanders
2015-Jul-30 16:05 UTC
[LLVMdev] [3.7.0] Two late issues with cross compilation to mips
Thanks again. I've got to the bottom of this and submitted a patch at http://reviews.llvm.org/D11649. It seems the register coalescer doesn't look at regmask operands.> -----Original Message----- > From: Daniel Sanders > Sent: 30 July 2015 11:04 > To: Matthias Braun > Cc: Hans Wennborg (hans at chromium.org); LLVM Developers Mailing List > (llvmdev at cs.uiuc.edu) > Subject: RE: [LLVMdev] [3.7.0] Two late issues with cross compilation to mips > > Thanks. This is making a lot more sense now and it's looking like this issue > isn't Mips specific. > > Here's the IR dump before simple register coalescing (note: I've patched the > IR printer to print the contents of the regmask): > 4480B %vreg260<def> = LDC1 %vreg253, <cp#3>[TF=6]; > mem:LD8[ConstantPool] AFGR64:%vreg260 GPR32:%vreg253 > 4496B %vreg261<def> = FMUL_D32 %vreg247, %vreg248; > AFGR64:%vreg261,%vreg247,%vreg248 > 4512B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> > 4528B %D6<def> = COPY %vreg243; AFGR64:%vreg243 > 4544B JAL <ga:@sin>, <regmask %FP %RA %D10 %D11 %D12 %D13 > %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 > %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, > %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> > 4560B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> > 4576B %vreg262<def> = COPY %D0<kill>; AFGR64:%vreg262 > 4592B %vreg263<def> = FMUL_D32 %vreg256, %vreg262; > AFGR64:%vreg263,%vreg256,%vreg262 > 4608B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> > 4624B %vreg264<def> = FADD_D32 %vreg261, %vreg263; > AFGR64:%vreg264,%vreg261,%vreg263 > 4640B %D6<def> = COPY %vreg255; AFGR64:%vreg255 > 4656B JAL <ga:@cos>, <regmask %FP %RA %D10 %D11 %D12 %D13 > %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 > %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, > %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> > 4672B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> > 4688B %vreg265<def> = COPY %D0<kill>; AFGR64:%vreg265 > 4704B %vreg266<def> = FMUL_D32 %vreg258, %vreg265; > AFGR64:%vreg266,%vreg258,%vreg265 > 4720B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> > 4736B %D6<def> = COPY %vreg255; AFGR64:%vreg255 > 4752B JAL <ga:@sin>, <regmask %FP %RA %D10 %D11 %D12 %D13 > %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 > %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, > %D6<imp-use,kill>, %SP<imp-def>, %D0<imp-def> > 4768B ADJCALLSTACKUP 16, 0, %SP<imp-def>, %SP<imp-use> > 4784B %vreg267<def> = COPY %D0<kill>; AFGR64:%vreg267 > 4800B %vreg268<def> = FMUL_D32 %vreg257, %vreg267; > AFGR64:%vreg268,%vreg257,%vreg267 > 4816B ADJCALLSTACKDOWN 16, %SP<imp-def>, %SP<imp-use> > 4832B %vreg269<def> = FMUL_D32 %vreg0, %vreg264; > AFGR64:%vreg269,%vreg0,%vreg264 > 4848B %vreg270<def> = FADD_D32 %vreg266, %vreg268; > AFGR64:%vreg270,%vreg266,%vreg268 > 4864B %vreg271<def> = FMUL_D32 %vreg0, %vreg270; > AFGR64:%vreg271,%vreg0,%vreg270 > 4880B %vreg272<def> = FMUL_D32 %vreg271, %vreg259; > AFGR64:%vreg272,%vreg271,%vreg259 > 4896B %vreg273<def> = FMUL_D32 %vreg269, %vreg259; > AFGR64:%vreg273,%vreg269,%vreg259 > 4912B %vreg274<def> = FADD_D32 %vreg24, %vreg273; > AFGR64:%vreg274,%vreg24,%vreg273 > 4928B %vreg275<def> = FADD_D32 %vreg274, %vreg272; > AFGR64:%vreg275,%vreg274,%vreg272 > 4944B %D6<def> = COPY %vreg275; AFGR64:%vreg275 > 4960B %D7<def> = COPY %vreg260; AFGR64:%vreg260 > 4976B JAL <ga:@fmod>, <regmask %FP %RA %D10 %D11 %D12 > %D13 %D14 %D15 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 > %F30 %F31 %S0 %S1 %S2 %S3 %S4 %S5 %S6 %S7 >, %RA<imp-def,dead>, > %D6<imp-use,kill>, %D7<imp-use>, %SP<imp-def>, %D0<imp-def> > > The %vreg260 at 4480B is being coalesced with the %D7 at 4960B but the call > preserved masks in the JAL's (jump and link) at 4544B, 4752B, and 4656B say > that D7 isn't preserved. The pass doesn't call getRegMask() in any obvious > way so it seems likely that it's not respecting the mask. > ________________________________________ > From: Matthias Braun [mbraun at apple.com] > Sent: 30 July 2015 01:14 > To: Daniel Sanders > Cc: Hans Wennborg (hans at chromium.org); LLVM Developers Mailing List > (llvmdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] [3.7.0] Two late issues with cross compilation to mips > > To reduce memory consumption clobbered registers are handled with > RegisterMask machine operands which contain a bitset of all registers > clobbered. > > - Matthias > > On Jul 29, 2015, at 3:00 PM, Daniel Sanders > <daniel.sanders at imgtec.com<mailto:daniel.sanders at imgtec.com>> wrote: > > I believe I've identified the problem with almabench but I haven't found the > root cause in the compiler yet. > > The problem is that a caller saved register ($f14) is being moved across a call > and this call sometimes clobbers the value. As a result, the value of the > TWOPI constant used in the fmod() calls isn't always 2*PI. > > According to -print-after-all, the pass that moves the instruction is Simple > Register Coalescing. The bit I'm stuck on at the moment is that I'm not sure > what information is supposed to prevent this move from happening. I > thought there was supposed to be an ImplicitDefine on the call instruction > for each clobbered register but this doesn't seem to be the case. Am I > missing something obvious? > ________________________________________ > From: Daniel Sanders > Sent: 29 July 2015 11:08 > To: Hans Wennborg (hans at chromium.org<mailto:hans at chromium.org>) > Cc: Simon Atanasyan > (simon at atanasyan.com<mailto:simon at atanasyan.com>); LLVM Developers > Mailing List (llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>) > Subject: [3.7.0] Two late issues with cross compilation to mips > > Hi, > > Sorry for the late report but I've only just found these issues. > Llvm.org<http://Llvm.org> isn't working for me at the moment but I'll file > tickets once it is. > > The issues are: > > 1. Almabench has some significant numerical differences and fails the > reference check for some configs. I'm investigating this one at the moment > but early indications are that it's a similar (but different) problem to the one > we had in LLVM 3.6.2. > > 2. Read-only exception tables have broken compatibility with the ~2 year > old gcc toolchains I was using for release testing cross compilation. This isn't a > problem for most test-suite runs since we can just update the assembler but > is causing trouble for microMIPS. More recent toolchains lack the microMIPS > multilib I was using and migrating to the new one is causing link failures. > These failures are related to ELF header bits specifying the SNaN/QNaN > encodings to be IEEE754-1985 or IEEE754-2008 compliant. I suspect the - > mnan=2008 isn't reaching the assembler. > > 3. Clang is incompatible with changes to the mips-mti-linux-gnu sysroot > from Imagination's mips-mti-linux-gnu toolchain. Libaries are still multilib'd > (albeit with a reduced set) but some of the include paths aren't anymore. It's > also no longer correct to include sysroot/include (this path is added by > common code) since this skips some function definitions. Instead, we must > only include sysroot/usr/include like GCC does. There may be more details > but so far the fix doesn't look simple. As far as I can tell, clang's multilib > expects includes and libraries to have the same layout (osSuffix() seems to > control both). The good news is that it's not a regression since we can use > toolchains from before this layout change. > > Daniel Sanders > Leading Software Design Engineer, MIPS Processor IP > Imagination Technologies Limited > www.imgtec.com<http://www.imgtec.com/><http://www.imgtec.com/> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev