I don't think that's going to work. On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote:> Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: > > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a > Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote: > >> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >> are the sqrt, I see the sqrtpd's in the output. Also there are three >> fptoui's and there are 3 call instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >> >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with what >> occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been getting, both with >>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>>> see if I get a different result there. Maybe there was a recent commit for >>>> this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> >>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> Is there something specifically required to enable SSE? If it's not >>>>> detected as available (based from the target triple?) then I don't think we >>>>> enable it specifically. >>>>> >>>>> Also it seems that it should handle converting to/from the vector >>>>> types, although I can see it getting confused about needing to do that if >>>>> it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> >>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>> sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function as this is the only function >>>>>> called in the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>> circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> >>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>> prefixed with "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>> culprit. >>>>>>>> >>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>> previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>> >>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>> >>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>> >>>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>> >>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>> issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>> >>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>> >>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>> memory access. >>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>>> the error. >>>>>>>>>>> >>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/a2282c12/attachment.html>
That does appear to have worked. All my tests are passing now. I'll hand this out to our other devs & testers and make sure it's working for them as well (not just on my machine). Thank you, again. -- Peter N On 19/07/2013 5:45 PM, Craig Topper wrote:> I don't think that's going to work. > > > On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: >> Try adding ECX to the Defs of this part of >> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't >> have a Windows machine to test myself. >> >> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP32:$src)]>, >> Requires<[In32BitMode]>; >> >> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP64:$src)]>, >> Requires<[In32BitMode]>; >> } >> >> >> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Oh, excellent point, I agree. My bad. Now that I'm not >> assuming those are the sqrt, I see the sqrtpd's in the >> output. Also there are three fptoui's and there are 3 call >> instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >>> I think those calls correspond to this >>> >>> %110 = fptoui double %109 to i32 >>> >>> The calls are followed by an imul with 12 which matches up >>> with what occurs right after the fptoui in the IR. >>> >>> >>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>> Does this correspond to one of the .ll files you sent >>>> earlier? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been >>>> getting, both with CodeGenOpt::Default and >>>> CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX >>>> isn't being used - it still gets set to 0x7fffffff >>>> by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an >>>> optimization might be removing it, but I don't get >>>> the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll >>>> try it with trunk and see if I get a different >>>> result there. Maybe there was a recent commit for this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>> Hmm, I'm not able to get those .ll files to >>>>> compile if I disable SSE and I end up with SSE >>>>> instructions(including sqrtpd) if I don't disable it. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>> >>>>> Is there something specifically required to >>>>> enable SSE? If it's not detected as available >>>>> (based from the target triple?) then I don't >>>>> think we enable it specifically. >>>>> >>>>> Also it seems that it should handle converting >>>>> to/from the vector types, although I can see >>>>> it getting confused about needing to do that >>>>> if it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>> Hmm, maybe sse isn't being enabled so its >>>>>> falling back to emulating sqrt? >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter >>>>>> Newman <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function >>>>>> as this is the only function called in >>>>>> the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd >>>>>> function (for example, I'm not seeing any >>>>>> SSE instructions here?) but whatever it >>>>>> is, it's being called from the IR I >>>>>> attached earlier, and is modifying ECX >>>>>> under some circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>> That should map directly to sqrtpd which >>>>>>> can't modify ecx. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter >>>>>>> Newman <peter at uformia.com >>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>> >>>>>>> Sorry, that should have been >>>>>>> llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper >>>>>>> wrote: >>>>>>>> What is "frep.x86.sse2.sqrt.pd". >>>>>>>> I'm only familiar with things >>>>>>>> prefixed with "llvm.x86". >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, >>>>>>>> Peter Newman <peter at uformia.com >>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>> >>>>>>>> After stepping through the >>>>>>>> produced assembly, I believe I >>>>>>>> have a culprit. >>>>>>>> >>>>>>>> One of the calls to >>>>>>>> @frep.x86.sse2.sqrt.pd is >>>>>>>> modifying the value of ECX - >>>>>>>> while the produced code is >>>>>>>> expecting it to still contain >>>>>>>> its previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter >>>>>>>> Newman wrote: >>>>>>>>> I've attached the >>>>>>>>> module->dump() that our code >>>>>>>>> is producing. Unfortunately >>>>>>>>> this is the smallest test case >>>>>>>>> I have available. >>>>>>>>> >>>>>>>>> This is before any >>>>>>>>> optimization passes are >>>>>>>>> applied. There are two >>>>>>>>> separate modules in existence >>>>>>>>> at the time, and there are no >>>>>>>>> guarantees about the order the >>>>>>>>> surrounding code calls those >>>>>>>>> functions, so there may be >>>>>>>>> some interaction between them? >>>>>>>>> There shouldn't be, they don't >>>>>>>>> refer to any common memory >>>>>>>>> etc. There is no >>>>>>>>> multi-threading occurring. >>>>>>>>> >>>>>>>>> The function in module-dump.ll >>>>>>>>> (called crashfunc in this >>>>>>>>> file) is called with >>>>>>>>> - func_params 0x0018f3b0 >>>>>>>>> double [3] >>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>> at the time of the exception. >>>>>>>>> >>>>>>>>> This is compiled on a >>>>>>>>> "i686-pc-win32" triple. All of >>>>>>>>> the non-intrinsic functions >>>>>>>>> referred to in these modules >>>>>>>>> are the standard equivalents >>>>>>>>> from the MSVC library (e.g. >>>>>>>>> @asin is the standard C lib >>>>>>>>> double asin( double ) ). >>>>>>>>> >>>>>>>>> Hopefully this is reproducible >>>>>>>>> for you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> PeterN >>>>>>>>> >>>>>>>>> On 18/07/2013 4:37 PM, Craig >>>>>>>>> Topper wrote: >>>>>>>>>> Are you able to send any IR >>>>>>>>>> for others to reproduce this >>>>>>>>>> issue? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2013 at 11:23 >>>>>>>>>> PM, Peter Newman >>>>>>>>>> <peter at uformia.com >>>>>>>>>> <mailto:peter at uformia.com>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Unfortunately, this >>>>>>>>>> doesn't appear to be the >>>>>>>>>> bug I'm hitting. I >>>>>>>>>> applied the fix to my >>>>>>>>>> source and it didn't make >>>>>>>>>> a difference. >>>>>>>>>> >>>>>>>>>> Also further testing >>>>>>>>>> found me getting the same >>>>>>>>>> behavior with other SIMD >>>>>>>>>> instructions. The common >>>>>>>>>> factor is in each case, >>>>>>>>>> ECX is set to 0x7fffffff, >>>>>>>>>> and it's an operation >>>>>>>>>> using xmm ptr ecx+offset . >>>>>>>>>> >>>>>>>>>> Additionally, turning the >>>>>>>>>> optimization level passed >>>>>>>>>> to createJIT down appears >>>>>>>>>> to avoid it, so I'm now >>>>>>>>>> leaning towards a bug in >>>>>>>>>> one of the optimization >>>>>>>>>> passes. >>>>>>>>>> >>>>>>>>>> I'm going to dig through >>>>>>>>>> the passes controlled by >>>>>>>>>> that parameter and see if >>>>>>>>>> I can narrow down which >>>>>>>>>> optimization is causing it. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/07/2013 1:58 PM, >>>>>>>>>> Solomon Boulos wrote: >>>>>>>>>> >>>>>>>>>> As someone off list >>>>>>>>>> just told me, perhaps >>>>>>>>>> my new bug is the >>>>>>>>>> same issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be >>>>>>>>>> using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at >>>>>>>>>> 6:39 PM, Peter Newman >>>>>>>>>> <peter at uformia.com >>>>>>>>>> <mailto:peter at uformia.com>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I'm currently in >>>>>>>>>> the process of >>>>>>>>>> debugging a crash >>>>>>>>>> occurring in our >>>>>>>>>> program. In LLVM >>>>>>>>>> 3.2 and 3.3 it >>>>>>>>>> appears that JIT >>>>>>>>>> generated code is >>>>>>>>>> attempting to >>>>>>>>>> perform access >>>>>>>>>> unaligned memory >>>>>>>>>> with a SSE2 >>>>>>>>>> instruction. >>>>>>>>>> However this only >>>>>>>>>> happens under >>>>>>>>>> certain >>>>>>>>>> conditions that >>>>>>>>>> seem (but may not >>>>>>>>>> be) related to >>>>>>>>>> the stacks state >>>>>>>>>> on calling the >>>>>>>>>> function. >>>>>>>>>> >>>>>>>>>> Our program acts >>>>>>>>>> as a front-end, >>>>>>>>>> using the LLVM >>>>>>>>>> C++ API to >>>>>>>>>> generate a JIT >>>>>>>>>> generated >>>>>>>>>> function. This >>>>>>>>>> function is >>>>>>>>>> primarily >>>>>>>>>> mathematical, so >>>>>>>>>> we use the Vector >>>>>>>>>> types to take >>>>>>>>>> advantage of SIMD >>>>>>>>>> instructions (as >>>>>>>>>> well as a few >>>>>>>>>> SSE2 intrinsics). >>>>>>>>>> >>>>>>>>>> This worked in >>>>>>>>>> LLVM 2.8 but >>>>>>>>>> started failing >>>>>>>>>> in 3.2 and has >>>>>>>>>> continued to fail >>>>>>>>>> in 3.3. It fails >>>>>>>>>> with no >>>>>>>>>> optimizations >>>>>>>>>> applied to the >>>>>>>>>> LLVM >>>>>>>>>> Function/Module. >>>>>>>>>> It crashes with >>>>>>>>>> what is reported >>>>>>>>>> as a memory >>>>>>>>>> access error >>>>>>>>>> (accessing >>>>>>>>>> 0xffffffff), >>>>>>>>>> however it's >>>>>>>>>> suggested that >>>>>>>>>> this is how the >>>>>>>>>> SSE fault raising >>>>>>>>>> mechanism appears. >>>>>>>>>> >>>>>>>>>> The generated >>>>>>>>>> instruction >>>>>>>>>> varies, but it >>>>>>>>>> seems to often be >>>>>>>>>> similar to (I >>>>>>>>>> don't have it in >>>>>>>>>> front of me, sorry): >>>>>>>>>> movapd xmm0, >>>>>>>>>> xmm[ecx+0x???????] >>>>>>>>>> Where the xmm >>>>>>>>>> register changes, >>>>>>>>>> and the second >>>>>>>>>> parameter is a >>>>>>>>>> memory access. >>>>>>>>>> ECX is always set >>>>>>>>>> to 0x7ffffff - >>>>>>>>>> however I don't >>>>>>>>>> know if this is >>>>>>>>>> part of the SSE >>>>>>>>>> error reporting >>>>>>>>>> process or is >>>>>>>>>> part of the >>>>>>>>>> situation causing >>>>>>>>>> the error. >>>>>>>>>> >>>>>>>>>> I haven't worked >>>>>>>>>> out exactly what >>>>>>>>>> code path etc is >>>>>>>>>> causing this >>>>>>>>>> crash. I'm hoping >>>>>>>>>> that someone can >>>>>>>>>> tell me if there >>>>>>>>>> were any changed >>>>>>>>>> requirements for >>>>>>>>>> working with SIMD >>>>>>>>>> in LLVM 3.2 (or >>>>>>>>>> earlier, we >>>>>>>>>> haven't tried 3.0 >>>>>>>>>> or 3.1). I >>>>>>>>>> currently suspect >>>>>>>>>> the use of >>>>>>>>>> GlobalVariable >>>>>>>>>> (we first >>>>>>>>>> discovered the >>>>>>>>>> crash when using >>>>>>>>>> a feature that >>>>>>>>>> uses them), >>>>>>>>>> however I have >>>>>>>>>> attempted using >>>>>>>>>> setAlignment on >>>>>>>>>> the >>>>>>>>>> GlobalVariables >>>>>>>>>> without any change. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Peter N >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers >>>>>>>>>> mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~Craig >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/f93ac47a/attachment.html>
Here's my attempt at a fix. Adding Jakob to make sure I did this right. On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com> wrote:> That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's working > for them as well (not just on my machine). > > Thank you, again. > > -- > Peter N > > > On 19/07/2013 5:45 PM, Craig Topper wrote: > > I don't think that's going to work. > > > On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote: > >> Thank you, I'm trying this now. >> >> >> On 19/07/2013 5:23 PM, Craig Topper wrote: >> >> Try adding ECX to the Defs of this part of >> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a >> Windows machine to test myself. >> >> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP32:$src)]>, >> Requires<[In32BitMode]>; >> >> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP64:$src)]>, >> Requires<[In32BitMode]>; >> } >> >> >> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >>> are the sqrt, I see the sqrtpd's in the output. Also there are three >>> fptoui's and there are 3 call instances. >>> >>> (Changing subject line again.) >>> >>> Now it looks like it's bug #13862 >>> >>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>> >>> I think those calls correspond to this >>> >>> %110 = fptoui double %109 to i32 >>> >>> The calls are followed by an imul with 12 which matches up with what >>> occurs right after the fptoui in the IR. >>> >>> >>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Yes, that is the result of module-dump.ll >>>> >>>> >>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>> >>>> Does this correspond to one of the .ll files you sent earlier? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> (Changing subject line as diagnosis has changed) >>>>> >>>>> I'm attaching the compiled code that I've been getting, both with >>>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>>> >>>>> I notice that X86::SQRTPD[m|r] appear in >>>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>>> optimization level turned off. >>>>> >>>>> I am trying this with the Release 3.3 code - I'll try it with trunk >>>>> and see if I get a different result there. Maybe there was a recent commit >>>>> for this. >>>>> >>>>> -- >>>>> Peter N >>>>> >>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>> >>>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> Is there something specifically required to enable SSE? If it's not >>>>>> detected as available (based from the target triple?) then I don't think we >>>>>> enable it specifically. >>>>>> >>>>>> Also it seems that it should handle converting to/from the vector >>>>>> types, although I can see it getting confused about needing to do that if >>>>>> it thinks SSE isn't available at all. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>> >>>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>>> sqrt? >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> In the disassembly, I'm seeing three cases of >>>>>>> call 76719BA1 >>>>>>> >>>>>>> I am assuming this is the sqrt function as this is the only function >>>>>>> called in the LLVM IR. >>>>>>> >>>>>>> The code at 76719BA1 is: >>>>>>> >>>>>>> 76719BA1 push ebp >>>>>>> 76719BA2 mov ebp,esp >>>>>>> 76719BA4 sub esp,20h >>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>> 76719BAA fld st(0) >>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>> 76719BC0 test eax,eax >>>>>>> 76719BC2 je 76719DCF >>>>>>> 76719BC8 fsubp st(1),st >>>>>>> 76719BCA test edx,edx >>>>>>> 76719BCC js 7671F9DB >>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>> 76719BDE sbb eax,0 >>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>> 76719BE5 sbb edx,0 >>>>>>> 76719BE8 leave >>>>>>> 76719BE9 ret >>>>>>> >>>>>>> >>>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>>> >>>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>>> circumstances. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>> >>>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>>> prefixed with "llvm.x86". >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>>> culprit. >>>>>>>>> >>>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>>> previous value. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>>> >>>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>>> >>>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>>> >>>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>>> called with >>>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>> at the time of the exception. >>>>>>>>> >>>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>>> double asin( double ) ). >>>>>>>>> >>>>>>>>> Hopefully this is reproducible for you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> PeterN >>>>>>>>> >>>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>>> >>>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>>> >>>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>>> >>>>>>>>>> Also further testing found me getting the same behavior with >>>>>>>>>> other SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>>> >>>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>>> optimization passes. >>>>>>>>>> >>>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>>> >>>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>>> issue: >>>>>>>>>>> >>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>> >>>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>>> >>>>>>>>>>> Solomon >>>>>>>>>>> >>>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>>> >>>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>>> >>>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>>> >>>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>>> >>>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>>> memory access. >>>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this >>>>>>>>>>>> is part of the SSE error reporting process or is part of the situation >>>>>>>>>>>> causing the error. >>>>>>>>>>>> >>>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Peter N >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/28b21fa0/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: ftol.patch Type: application/octet-stream Size: 1262 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/28b21fa0/attachment.obj>
Reasonably Related Threads
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX