Try adding ECX to the Defs of this part of lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a Windows machine to test myself. let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), "# win32 fptoui", [(X86WinFTOL RFP32:$src)]>, Requires<[In32BitMode]>; def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), "# win32 fptoui", [(X86WinFTOL RFP64:$src)]>, Requires<[In32BitMode]>; } On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote:> Oh, excellent point, I agree. My bad. Now that I'm not assuming those > are the sqrt, I see the sqrtpd's in the output. Also there are three > fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: > > I think those calls correspond to this > > %110 = fptoui double %109 to i32 > > The calls are followed by an imul with 12 which matches up with what > occurs right after the fptoui in the IR. > > > On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com> wrote: > >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >> >> Does this correspond to one of the .ll files you sent earlier? >> >> >> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com> wrote: >> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, both with >>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. >>> I was thinking an optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>> see if I get a different result there. Maybe there was a recent commit for >>> this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>> >>> Hmm, I'm not able to get those .ll files to compile if I disable SSE and >>> I end up with SSE instructions(including sqrtpd) if I don't disable it. >>> >>> >>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Is there something specifically required to enable SSE? If it's not >>>> detected as available (based from the target triple?) then I don't think we >>>> enable it specifically. >>>> >>>> Also it seems that it should handle converting to/from the vector >>>> types, although I can see it getting confused about needing to do that if >>>> it thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>> >>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>> sqrt? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as this is the only function >>>>> called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>> from the IR I attached earlier, and is modifying ECX under some >>>>> circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>> >>>>> That should map directly to sqrtpd which can't modify ecx. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>> >>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>> prefixed with "llvm.x86". >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>> culprit. >>>>>>> >>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>>>>> ECX - while the produced code is expecting it to still contain its previous >>>>>>> value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>> >>>>>>> I've attached the module->dump() that our code is producing. >>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>> >>>>>>> This is before any optimization passes are applied. There are two >>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>> >>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>> called with >>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>> [0x0] -11.339976634695301 double >>>>>>> [0x1] -9.7504239056205506 double >>>>>>> [0x2] -5.2900856817382804 double >>>>>>> at the time of the exception. >>>>>>> >>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>> double asin( double ) ). >>>>>>> >>>>>>> Hopefully this is reproducible for you. >>>>>>> >>>>>>> -- >>>>>>> PeterN >>>>>>> >>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>> >>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>> >>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>> >>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>> optimization passes. >>>>>>>> >>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>> >>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>> issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>> >>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>> >>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>> >>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>> memory access. >>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>> the error. >>>>>>>>>> >>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Peter N >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/328fd42f/attachment.html>
Thank you, I'm trying this now. On 19/07/2013 5:23 PM, Craig Topper wrote:> Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have > a Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Oh, excellent point, I agree. My bad. Now that I'm not assuming > those are the sqrt, I see the sqrtpd's in the output. Also there > are three fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with >> what occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, >>> both with CodeGenOpt::Default and CodeGenOpt::None . The >>> crash isn't occurring with CodeGenOpt::None, but that >>> seems to be because ECX isn't being used - it still gets >>> set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in >>> X86InstrInfo::isHighLatencyDef. I was thinking an >>> optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization >>> level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it >>> with trunk and see if I get a different result there. >>> Maybe there was a recent commit for this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> Hmm, I'm not able to get those .ll files to compile if >>>> I disable SSE and I end up with SSE >>>> instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> Is there something specifically required to enable >>>> SSE? If it's not detected as available (based from >>>> the target triple?) then I don't think we enable it >>>> specifically. >>>> >>>> Also it seems that it should handle converting >>>> to/from the vector types, although I can see it >>>> getting confused about needing to do that if it >>>> thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> Hmm, maybe sse isn't being enabled so its falling >>>>> back to emulating sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman >>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as >>>>> this is the only function called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function >>>>> (for example, I'm not seeing any SSE >>>>> instructions here?) but whatever it is, it's >>>>> being called from the IR I attached earlier, >>>>> and is modifying ECX under some circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> That should map directly to sqrtpd which >>>>>> can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter >>>>>> Newman <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> Sorry, that should have been >>>>>> llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm >>>>>>> only familiar with things prefixed with >>>>>>> "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter >>>>>>> Newman <peter at uformia.com >>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>> >>>>>>> After stepping through the produced >>>>>>> assembly, I believe I have a culprit. >>>>>>> >>>>>>> One of the calls to >>>>>>> @frep.x86.sse2.sqrt.pd is modifying >>>>>>> the value of ECX - while the >>>>>>> produced code is expecting it to >>>>>>> still contain its previous value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman >>>>>>> wrote: >>>>>>>> I've attached the module->dump() >>>>>>>> that our code is producing. >>>>>>>> Unfortunately this is the smallest >>>>>>>> test case I have available. >>>>>>>> >>>>>>>> This is before any optimization >>>>>>>> passes are applied. There are two >>>>>>>> separate modules in existence at >>>>>>>> the time, and there are no >>>>>>>> guarantees about the order the >>>>>>>> surrounding code calls those >>>>>>>> functions, so there may be some >>>>>>>> interaction between them? There >>>>>>>> shouldn't be, they don't refer to >>>>>>>> any common memory etc. There is no >>>>>>>> multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll >>>>>>>> (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a >>>>>>>> "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to >>>>>>>> in these modules are the standard >>>>>>>> equivalents from the MSVC library >>>>>>>> (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper >>>>>>>> wrote: >>>>>>>>> Are you able to send any IR for >>>>>>>>> others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, >>>>>>>>> Peter Newman <peter at uformia.com >>>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>>> >>>>>>>>> Unfortunately, this doesn't >>>>>>>>> appear to be the bug I'm >>>>>>>>> hitting. I applied the fix to >>>>>>>>> my source and it didn't make a >>>>>>>>> difference. >>>>>>>>> >>>>>>>>> Also further testing found me >>>>>>>>> getting the same behavior with >>>>>>>>> other SIMD instructions. The >>>>>>>>> common factor is in each case, >>>>>>>>> ECX is set to 0x7fffffff, and >>>>>>>>> it's an operation using xmm >>>>>>>>> ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the >>>>>>>>> optimization level passed to >>>>>>>>> createJIT down appears to >>>>>>>>> avoid it, so I'm now leaning >>>>>>>>> towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the >>>>>>>>> passes controlled by that >>>>>>>>> parameter and see if I can >>>>>>>>> narrow down which optimization >>>>>>>>> is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon >>>>>>>>> Boulos wrote: >>>>>>>>> >>>>>>>>> As someone off list just >>>>>>>>> told me, perhaps my new >>>>>>>>> bug is the same issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using >>>>>>>>> FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 >>>>>>>>> PM, Peter Newman >>>>>>>>> <peter at uformia.com >>>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I'm currently in the >>>>>>>>> process of debugging a >>>>>>>>> crash occurring in our >>>>>>>>> program. In LLVM 3.2 >>>>>>>>> and 3.3 it appears >>>>>>>>> that JIT generated >>>>>>>>> code is attempting to >>>>>>>>> perform access >>>>>>>>> unaligned memory with >>>>>>>>> a SSE2 instruction. >>>>>>>>> However this only >>>>>>>>> happens under certain >>>>>>>>> conditions that seem >>>>>>>>> (but may not be) >>>>>>>>> related to the stacks >>>>>>>>> state on calling the >>>>>>>>> function. >>>>>>>>> >>>>>>>>> Our program acts as a >>>>>>>>> front-end, using the >>>>>>>>> LLVM C++ API to >>>>>>>>> generate a JIT >>>>>>>>> generated function. >>>>>>>>> This function is >>>>>>>>> primarily >>>>>>>>> mathematical, so we >>>>>>>>> use the Vector types >>>>>>>>> to take advantage of >>>>>>>>> SIMD instructions (as >>>>>>>>> well as a few SSE2 >>>>>>>>> intrinsics). >>>>>>>>> >>>>>>>>> This worked in LLVM >>>>>>>>> 2.8 but started >>>>>>>>> failing in 3.2 and has >>>>>>>>> continued to fail in >>>>>>>>> 3.3. It fails with no >>>>>>>>> optimizations applied >>>>>>>>> to the LLVM >>>>>>>>> Function/Module. It >>>>>>>>> crashes with what is >>>>>>>>> reported as a memory >>>>>>>>> access error >>>>>>>>> (accessing >>>>>>>>> 0xffffffff), however >>>>>>>>> it's suggested that >>>>>>>>> this is how the SSE >>>>>>>>> fault raising >>>>>>>>> mechanism appears. >>>>>>>>> >>>>>>>>> The generated >>>>>>>>> instruction varies, >>>>>>>>> but it seems to often >>>>>>>>> be similar to (I don't >>>>>>>>> have it in front of >>>>>>>>> me, sorry): >>>>>>>>> movapd xmm0, >>>>>>>>> xmm[ecx+0x???????] >>>>>>>>> Where the xmm register >>>>>>>>> changes, and the >>>>>>>>> second parameter is a >>>>>>>>> memory access. >>>>>>>>> ECX is always set to >>>>>>>>> 0x7ffffff - however I >>>>>>>>> don't know if this is >>>>>>>>> part of the SSE error >>>>>>>>> reporting process or >>>>>>>>> is part of the >>>>>>>>> situation causing the >>>>>>>>> error. >>>>>>>>> >>>>>>>>> I haven't worked out >>>>>>>>> exactly what code path >>>>>>>>> etc is causing this >>>>>>>>> crash. I'm hoping that >>>>>>>>> someone can tell me if >>>>>>>>> there were any changed >>>>>>>>> requirements for >>>>>>>>> working with SIMD in >>>>>>>>> LLVM 3.2 (or earlier, >>>>>>>>> we haven't tried 3.0 >>>>>>>>> or 3.1). I currently >>>>>>>>> suspect the use of >>>>>>>>> GlobalVariable (we >>>>>>>>> first discovered the >>>>>>>>> crash when using a >>>>>>>>> feature that uses >>>>>>>>> them), however I have >>>>>>>>> attempted using >>>>>>>>> setAlignment on the >>>>>>>>> GlobalVariables >>>>>>>>> without any change. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Peter N >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers >>>>>>>>> mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/65cb4500/attachment.html>
I don't think that's going to work. On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote:> Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: > > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a > Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote: > >> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >> are the sqrt, I see the sqrtpd's in the output. Also there are three >> fptoui's and there are 3 call instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >> >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with what >> occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been getting, both with >>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>>> see if I get a different result there. Maybe there was a recent commit for >>>> this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> >>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> Is there something specifically required to enable SSE? If it's not >>>>> detected as available (based from the target triple?) then I don't think we >>>>> enable it specifically. >>>>> >>>>> Also it seems that it should handle converting to/from the vector >>>>> types, although I can see it getting confused about needing to do that if >>>>> it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> >>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>> sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function as this is the only function >>>>>> called in the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>> circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> >>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>> prefixed with "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>> culprit. >>>>>>>> >>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>> previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>> >>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>> >>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>> >>>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>> >>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>> issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>> >>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>> >>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>> memory access. >>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>>> the error. >>>>>>>>>>> >>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/a2282c12/attachment.html>
Reasonably Related Threads
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX