Here's my attempt at a fix. Adding Jakob to make sure I did this right. On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com> wrote:> That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's working > for them as well (not just on my machine). > > Thank you, again. > > -- > Peter N > > > On 19/07/2013 5:45 PM, Craig Topper wrote: > > I don't think that's going to work. > > > On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote: > >> Thank you, I'm trying this now. >> >> >> On 19/07/2013 5:23 PM, Craig Topper wrote: >> >> Try adding ECX to the Defs of this part of >> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a >> Windows machine to test myself. >> >> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP32:$src)]>, >> Requires<[In32BitMode]>; >> >> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP64:$src)]>, >> Requires<[In32BitMode]>; >> } >> >> >> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >>> are the sqrt, I see the sqrtpd's in the output. Also there are three >>> fptoui's and there are 3 call instances. >>> >>> (Changing subject line again.) >>> >>> Now it looks like it's bug #13862 >>> >>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>> >>> I think those calls correspond to this >>> >>> %110 = fptoui double %109 to i32 >>> >>> The calls are followed by an imul with 12 which matches up with what >>> occurs right after the fptoui in the IR. >>> >>> >>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Yes, that is the result of module-dump.ll >>>> >>>> >>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>> >>>> Does this correspond to one of the .ll files you sent earlier? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> (Changing subject line as diagnosis has changed) >>>>> >>>>> I'm attaching the compiled code that I've been getting, both with >>>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>>> >>>>> I notice that X86::SQRTPD[m|r] appear in >>>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>>> optimization level turned off. >>>>> >>>>> I am trying this with the Release 3.3 code - I'll try it with trunk >>>>> and see if I get a different result there. Maybe there was a recent commit >>>>> for this. >>>>> >>>>> -- >>>>> Peter N >>>>> >>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>> >>>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> Is there something specifically required to enable SSE? If it's not >>>>>> detected as available (based from the target triple?) then I don't think we >>>>>> enable it specifically. >>>>>> >>>>>> Also it seems that it should handle converting to/from the vector >>>>>> types, although I can see it getting confused about needing to do that if >>>>>> it thinks SSE isn't available at all. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>> >>>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>>> sqrt? >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> In the disassembly, I'm seeing three cases of >>>>>>> call 76719BA1 >>>>>>> >>>>>>> I am assuming this is the sqrt function as this is the only function >>>>>>> called in the LLVM IR. >>>>>>> >>>>>>> The code at 76719BA1 is: >>>>>>> >>>>>>> 76719BA1 push ebp >>>>>>> 76719BA2 mov ebp,esp >>>>>>> 76719BA4 sub esp,20h >>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>> 76719BAA fld st(0) >>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>> 76719BC0 test eax,eax >>>>>>> 76719BC2 je 76719DCF >>>>>>> 76719BC8 fsubp st(1),st >>>>>>> 76719BCA test edx,edx >>>>>>> 76719BCC js 7671F9DB >>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>> 76719BDE sbb eax,0 >>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>> 76719BE5 sbb edx,0 >>>>>>> 76719BE8 leave >>>>>>> 76719BE9 ret >>>>>>> >>>>>>> >>>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>>> >>>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>>> circumstances. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>> >>>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>>> prefixed with "llvm.x86". >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>>> culprit. >>>>>>>>> >>>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>>> previous value. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>>> >>>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>>> >>>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>>> >>>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>>> called with >>>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>> at the time of the exception. >>>>>>>>> >>>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>>> double asin( double ) ). >>>>>>>>> >>>>>>>>> Hopefully this is reproducible for you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> PeterN >>>>>>>>> >>>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>>> >>>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>>> >>>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>>> >>>>>>>>>> Also further testing found me getting the same behavior with >>>>>>>>>> other SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>>> >>>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>>> optimization passes. >>>>>>>>>> >>>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>>> >>>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>>> issue: >>>>>>>>>>> >>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>> >>>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>>> >>>>>>>>>>> Solomon >>>>>>>>>>> >>>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>>> >>>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>>> >>>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>>> >>>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>>> >>>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>>> memory access. >>>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this >>>>>>>>>>>> is part of the SSE error reporting process or is part of the situation >>>>>>>>>>>> causing the error. >>>>>>>>>>>> >>>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Peter N >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/28b21fa0/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: ftol.patch Type: application/octet-stream Size: 1262 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/28b21fa0/attachment.obj>
I've applied this and the test cases I have here continue to work, so it looks good to me. I've ran into another (seemingly unrelated) issue which I'll describe in a separate email to the dev list. -- Peter N On 20/07/2013 5:30 AM, Craig Topper wrote:> Here's my attempt at a fix. Adding Jakob to make sure I did this right. > > > On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's > working for them as well (not just on my machine). > > Thank you, again. > > -- > Peter N > > > On 19/07/2013 5:45 PM, Craig Topper wrote: >> I don't think that's going to work. >> >> >> On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Thank you, I'm trying this now. >> >> >> On 19/07/2013 5:23 PM, Craig Topper wrote: >>> Try adding ECX to the Defs of this part of >>> lib/Target/X86/X86InstrCompiler.td like I've done below. I >>> don't have a Windows machine to test myself. >>> >>> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >>> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP32:$src)]>, >>> Requires<[In32BitMode]>; >>> >>> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP64:$src)]>, >>> Requires<[In32BitMode]>; >>> } >>> >>> >>> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> Oh, excellent point, I agree. My bad. Now that I'm not >>> assuming those are the sqrt, I see the sqrtpd's in the >>> output. Also there are three fptoui's and there are 3 >>> call instances. >>> >>> (Changing subject line again.) >>> >>> Now it looks like it's bug #13862 >>> >>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>>> I think those calls correspond to this >>>> >>>> %110 = fptoui double %109 to i32 >>>> >>>> The calls are followed by an imul with 12 which matches >>>> up with what occurs right after the fptoui in the IR. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> Yes, that is the result of module-dump.ll >>>> >>>> >>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>>> Does this correspond to one of the .ll files you >>>>> sent earlier? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>> >>>>> (Changing subject line as diagnosis has changed) >>>>> >>>>> I'm attaching the compiled code that I've been >>>>> getting, both with CodeGenOpt::Default and >>>>> CodeGenOpt::None . The crash isn't occurring >>>>> with CodeGenOpt::None, but that seems to be >>>>> because ECX isn't being used - it still gets >>>>> set to 0x7fffffff by one of the calls to 76719BA1 >>>>> >>>>> I notice that X86::SQRTPD[m|r] appear in >>>>> X86InstrInfo::isHighLatencyDef. I was thinking >>>>> an optimization might be removing it, but I >>>>> don't get the sqrtpd instruction even if the >>>>> createJIT optimization level turned off. >>>>> >>>>> I am trying this with the Release 3.3 code - >>>>> I'll try it with trunk and see if I get a >>>>> different result there. Maybe there was a >>>>> recent commit for this. >>>>> >>>>> -- >>>>> Peter N >>>>> >>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>>> Hmm, I'm not able to get those .ll files to >>>>>> compile if I disable SSE and I end up with >>>>>> SSE instructions(including sqrtpd) if I don't >>>>>> disable it. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter >>>>>> Newman <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> Is there something specifically required >>>>>> to enable SSE? If it's not detected as >>>>>> available (based from the target triple?) >>>>>> then I don't think we enable it specifically. >>>>>> >>>>>> Also it seems that it should handle >>>>>> converting to/from the vector types, >>>>>> although I can see it getting confused >>>>>> about needing to do that if it thinks SSE >>>>>> isn't available at all. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>>> Hmm, maybe sse isn't being enabled so >>>>>>> its falling back to emulating sqrt? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter >>>>>>> Newman <peter at uformia.com >>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>> >>>>>>> In the disassembly, I'm seeing three >>>>>>> cases of >>>>>>> call 76719BA1 >>>>>>> >>>>>>> I am assuming this is the sqrt >>>>>>> function as this is the only >>>>>>> function called in the LLVM IR. >>>>>>> >>>>>>> The code at 76719BA1 is: >>>>>>> >>>>>>> 76719BA1 push ebp >>>>>>> 76719BA2 mov ebp,esp >>>>>>> 76719BA4 sub esp,20h >>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>> 76719BAA fld st(0) >>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>> 76719BC0 test eax,eax >>>>>>> 76719BC2 je 76719DCF >>>>>>> 76719BC8 fsubp st(1),st >>>>>>> 76719BCA test edx,edx >>>>>>> 76719BCC js 7671F9DB >>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>> 76719BDE sbb eax,0 >>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>> 76719BE5 sbb edx,0 >>>>>>> 76719BE8 leave >>>>>>> 76719BE9 ret >>>>>>> >>>>>>> >>>>>>> As you can see at 76719BD5, it >>>>>>> modifies ECX . >>>>>>> >>>>>>> I don't know that this is the sqrtpd >>>>>>> function (for example, I'm not >>>>>>> seeing any SSE instructions here?) >>>>>>> but whatever it is, it's being >>>>>>> called from the IR I attached >>>>>>> earlier, and is modifying ECX under >>>>>>> some circumstances. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:29 PM, Craig Topper >>>>>>> wrote: >>>>>>>> That should map directly to sqrtpd >>>>>>>> which can't modify ecx. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, >>>>>>>> Peter Newman <peter at uformia.com >>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>> >>>>>>>> Sorry, that should have been >>>>>>>> llvm.x86.sse2.sqrt.pd >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:25 PM, Craig >>>>>>>> Topper wrote: >>>>>>>>> What is >>>>>>>>> "frep.x86.sse2.sqrt.pd". I'm >>>>>>>>> only familiar with things >>>>>>>>> prefixed with "llvm.x86". >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 18, 2013 at 10:12 >>>>>>>>> PM, Peter Newman >>>>>>>>> <peter at uformia.com >>>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>>> >>>>>>>>> After stepping through the >>>>>>>>> produced assembly, I >>>>>>>>> believe I have a culprit. >>>>>>>>> >>>>>>>>> One of the calls to >>>>>>>>> @frep.x86.sse2.sqrt.pd is >>>>>>>>> modifying the value of ECX >>>>>>>>> - while the produced code >>>>>>>>> is expecting it to still >>>>>>>>> contain its previous value. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 2:09 PM, >>>>>>>>> Peter Newman wrote: >>>>>>>>>> I've attached the >>>>>>>>>> module->dump() that our >>>>>>>>>> code is producing. >>>>>>>>>> Unfortunately this is the >>>>>>>>>> smallest test case I have >>>>>>>>>> available. >>>>>>>>>> >>>>>>>>>> This is before any >>>>>>>>>> optimization passes are >>>>>>>>>> applied. There are two >>>>>>>>>> separate modules in >>>>>>>>>> existence at the time, >>>>>>>>>> and there are no >>>>>>>>>> guarantees about the >>>>>>>>>> order the surrounding >>>>>>>>>> code calls those >>>>>>>>>> functions, so there may >>>>>>>>>> be some interaction >>>>>>>>>> between them? There >>>>>>>>>> shouldn't be, they don't >>>>>>>>>> refer to any common >>>>>>>>>> memory etc. There is no >>>>>>>>>> multi-threading occurring. >>>>>>>>>> >>>>>>>>>> The function in >>>>>>>>>> module-dump.ll (called >>>>>>>>>> crashfunc in this file) >>>>>>>>>> is called with >>>>>>>>>> - func_params 0x0018f3b0 >>>>>>>>>> double [3] >>>>>>>>>> [0x0] -11.339976634695301 >>>>>>>>>> double >>>>>>>>>> [0x1] -9.7504239056205506 >>>>>>>>>> double >>>>>>>>>> [0x2] -5.2900856817382804 >>>>>>>>>> double >>>>>>>>>> at the time of the exception. >>>>>>>>>> >>>>>>>>>> This is compiled on a >>>>>>>>>> "i686-pc-win32" triple. >>>>>>>>>> All of the non-intrinsic >>>>>>>>>> functions referred to in >>>>>>>>>> these modules are the >>>>>>>>>> standard equivalents from >>>>>>>>>> the MSVC library (e.g. >>>>>>>>>> @asin is the standard C >>>>>>>>>> lib double asin( >>>>>>>>>> double ) ). >>>>>>>>>> >>>>>>>>>> Hopefully this is >>>>>>>>>> reproducible for you. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> PeterN >>>>>>>>>> >>>>>>>>>> On 18/07/2013 4:37 PM, >>>>>>>>>> Craig Topper wrote: >>>>>>>>>>> Are you able to send any >>>>>>>>>>> IR for others to >>>>>>>>>>> reproduce this issue? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 17, 2013 at >>>>>>>>>>> 11:23 PM, Peter Newman >>>>>>>>>>> <peter at uformia.com >>>>>>>>>>> <mailto:peter at uformia.com>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Unfortunately, this >>>>>>>>>>> doesn't appear to be >>>>>>>>>>> the bug I'm hitting. >>>>>>>>>>> I applied the fix to >>>>>>>>>>> my source and it >>>>>>>>>>> didn't make a >>>>>>>>>>> difference. >>>>>>>>>>> >>>>>>>>>>> Also further testing >>>>>>>>>>> found me getting the >>>>>>>>>>> same behavior with >>>>>>>>>>> other SIMD >>>>>>>>>>> instructions. The >>>>>>>>>>> common factor is in >>>>>>>>>>> each case, ECX is >>>>>>>>>>> set to 0x7fffffff, >>>>>>>>>>> and it's an >>>>>>>>>>> operation using xmm >>>>>>>>>>> ptr ecx+offset . >>>>>>>>>>> >>>>>>>>>>> Additionally, >>>>>>>>>>> turning the >>>>>>>>>>> optimization level >>>>>>>>>>> passed to createJIT >>>>>>>>>>> down appears to >>>>>>>>>>> avoid it, so I'm now >>>>>>>>>>> leaning towards a >>>>>>>>>>> bug in one of the >>>>>>>>>>> optimization passes. >>>>>>>>>>> >>>>>>>>>>> I'm going to dig >>>>>>>>>>> through the passes >>>>>>>>>>> controlled by that >>>>>>>>>>> parameter and see if >>>>>>>>>>> I can narrow down >>>>>>>>>>> which optimization >>>>>>>>>>> is causing it. >>>>>>>>>>> >>>>>>>>>>> Peter N >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 17/07/2013 1:58 >>>>>>>>>>> PM, Solomon Boulos >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> As someone off >>>>>>>>>>> list just told >>>>>>>>>>> me, perhaps my >>>>>>>>>>> new bug is the >>>>>>>>>>> same issue: >>>>>>>>>>> >>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>> >>>>>>>>>>> Do you happen to >>>>>>>>>>> be using FastISel? >>>>>>>>>>> >>>>>>>>>>> Solomon >>>>>>>>>>> >>>>>>>>>>> On Jul 16, 2013, >>>>>>>>>>> at 6:39 PM, >>>>>>>>>>> Peter Newman >>>>>>>>>>> <peter at uformia.com >>>>>>>>>>> <mailto:peter at uformia.com>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm >>>>>>>>>>> currently in >>>>>>>>>>> the process >>>>>>>>>>> of debugging >>>>>>>>>>> a crash >>>>>>>>>>> occurring in >>>>>>>>>>> our program. >>>>>>>>>>> In LLVM 3.2 >>>>>>>>>>> and 3.3 it >>>>>>>>>>> appears that >>>>>>>>>>> JIT >>>>>>>>>>> generated >>>>>>>>>>> code is >>>>>>>>>>> attempting >>>>>>>>>>> to perform >>>>>>>>>>> access >>>>>>>>>>> unaligned >>>>>>>>>>> memory with >>>>>>>>>>> a SSE2 >>>>>>>>>>> instruction. >>>>>>>>>>> However this >>>>>>>>>>> only happens >>>>>>>>>>> under >>>>>>>>>>> certain >>>>>>>>>>> conditions >>>>>>>>>>> that seem >>>>>>>>>>> (but may not >>>>>>>>>>> be) related >>>>>>>>>>> to the >>>>>>>>>>> stacks state >>>>>>>>>>> on calling >>>>>>>>>>> the function. >>>>>>>>>>> >>>>>>>>>>> Our program >>>>>>>>>>> acts as a >>>>>>>>>>> front-end, >>>>>>>>>>> using the >>>>>>>>>>> LLVM C++ API >>>>>>>>>>> to generate >>>>>>>>>>> a JIT >>>>>>>>>>> generated >>>>>>>>>>> function. >>>>>>>>>>> This >>>>>>>>>>> function is >>>>>>>>>>> primarily >>>>>>>>>>> mathematical, so >>>>>>>>>>> we use the >>>>>>>>>>> Vector types >>>>>>>>>>> to take >>>>>>>>>>> advantage of >>>>>>>>>>> SIMD >>>>>>>>>>> instructions >>>>>>>>>>> (as well as >>>>>>>>>>> a few SSE2 >>>>>>>>>>> intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked >>>>>>>>>>> in LLVM 2.8 >>>>>>>>>>> but started >>>>>>>>>>> failing in >>>>>>>>>>> 3.2 and has >>>>>>>>>>> continued to >>>>>>>>>>> fail in 3.3. >>>>>>>>>>> It fails >>>>>>>>>>> with no >>>>>>>>>>> optimizations applied >>>>>>>>>>> to the LLVM >>>>>>>>>>> Function/Module. >>>>>>>>>>> It crashes >>>>>>>>>>> with what is >>>>>>>>>>> reported as >>>>>>>>>>> a memory >>>>>>>>>>> access error >>>>>>>>>>> (accessing >>>>>>>>>>> 0xffffffff), >>>>>>>>>>> however it's >>>>>>>>>>> suggested >>>>>>>>>>> that this is >>>>>>>>>>> how the SSE >>>>>>>>>>> fault >>>>>>>>>>> raising >>>>>>>>>>> mechanism >>>>>>>>>>> appears. >>>>>>>>>>> >>>>>>>>>>> The >>>>>>>>>>> generated >>>>>>>>>>> instruction >>>>>>>>>>> varies, but >>>>>>>>>>> it seems to >>>>>>>>>>> often be >>>>>>>>>>> similar to >>>>>>>>>>> (I don't >>>>>>>>>>> have it in >>>>>>>>>>> front of me, >>>>>>>>>>> sorry): >>>>>>>>>>> movapd xmm0, >>>>>>>>>>> xmm[ecx+0x???????] >>>>>>>>>>> Where the >>>>>>>>>>> xmm register >>>>>>>>>>> changes, and >>>>>>>>>>> the second >>>>>>>>>>> parameter is >>>>>>>>>>> a memory access. >>>>>>>>>>> ECX is >>>>>>>>>>> always set >>>>>>>>>>> to 0x7ffffff >>>>>>>>>>> - however I >>>>>>>>>>> don't know >>>>>>>>>>> if this is >>>>>>>>>>> part of the >>>>>>>>>>> SSE error >>>>>>>>>>> reporting >>>>>>>>>>> process or >>>>>>>>>>> is part of >>>>>>>>>>> the >>>>>>>>>>> situation >>>>>>>>>>> causing the >>>>>>>>>>> error. >>>>>>>>>>> >>>>>>>>>>> I haven't >>>>>>>>>>> worked out >>>>>>>>>>> exactly what >>>>>>>>>>> code path >>>>>>>>>>> etc is >>>>>>>>>>> causing this >>>>>>>>>>> crash. I'm >>>>>>>>>>> hoping that >>>>>>>>>>> someone can >>>>>>>>>>> tell me if >>>>>>>>>>> there were >>>>>>>>>>> any changed >>>>>>>>>>> requirements >>>>>>>>>>> for working >>>>>>>>>>> with SIMD in >>>>>>>>>>> LLVM 3.2 (or >>>>>>>>>>> earlier, we >>>>>>>>>>> haven't >>>>>>>>>>> tried 3.0 or >>>>>>>>>>> 3.1). I >>>>>>>>>>> currently >>>>>>>>>>> suspect the >>>>>>>>>>> use of >>>>>>>>>>> GlobalVariable >>>>>>>>>>> (we first >>>>>>>>>>> discovered >>>>>>>>>>> the crash >>>>>>>>>>> when using a >>>>>>>>>>> feature that >>>>>>>>>>> uses them), >>>>>>>>>>> however I >>>>>>>>>>> have >>>>>>>>>>> attempted >>>>>>>>>>> using >>>>>>>>>>> setAlignment >>>>>>>>>>> on the >>>>>>>>>>> GlobalVariables >>>>>>>>>>> without any >>>>>>>>>>> change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM >>>>>>>>>>> Developers >>>>>>>>>>> mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers >>>>>>>>>>> mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130720/75b94f25/attachment.html>
Committed in r186787 On Sat, Jul 20, 2013 at 12:44 AM, Peter Newman <peter at uformia.com> wrote:> I've applied this and the test cases I have here continue to work, so it > looks good to me. > > I've ran into another (seemingly unrelated) issue which I'll describe in a > separate email to the dev list. > > -- > Peter N > > > On 20/07/2013 5:30 AM, Craig Topper wrote: > > Here's my attempt at a fix. Adding Jakob to make sure I did this right. > > > On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com> wrote: > >> That does appear to have worked. All my tests are passing now. >> >> I'll hand this out to our other devs & testers and make sure it's working >> for them as well (not just on my machine). >> >> Thank you, again. >> >> -- >> Peter N >> >> >> On 19/07/2013 5:45 PM, Craig Topper wrote: >> >> I don't think that's going to work. >> >> >> On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote: >> >>> Thank you, I'm trying this now. >>> >>> >>> On 19/07/2013 5:23 PM, Craig Topper wrote: >>> >>> Try adding ECX to the Defs of this part of >>> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a >>> Windows machine to test myself. >>> >>> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >>> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP32:$src)]>, >>> Requires<[In32BitMode]>; >>> >>> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP64:$src)]>, >>> Requires<[In32BitMode]>; >>> } >>> >>> >>> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >>>> are the sqrt, I see the sqrtpd's in the output. Also there are three >>>> fptoui's and there are 3 call instances. >>>> >>>> (Changing subject line again.) >>>> >>>> Now it looks like it's bug #13862 >>>> >>>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>>> >>>> I think those calls correspond to this >>>> >>>> %110 = fptoui double %109 to i32 >>>> >>>> The calls are followed by an imul with 12 which matches up with what >>>> occurs right after the fptoui in the IR. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> Yes, that is the result of module-dump.ll >>>>> >>>>> >>>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>>> >>>>> Does this correspond to one of the .ll files you sent earlier? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> (Changing subject line as diagnosis has changed) >>>>>> >>>>>> I'm attaching the compiled code that I've been getting, both with >>>>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>>>> >>>>>> I notice that X86::SQRTPD[m|r] appear in >>>>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>>>> optimization level turned off. >>>>>> >>>>>> I am trying this with the Release 3.3 code - I'll try it with trunk >>>>>> and see if I get a different result there. Maybe there was a recent commit >>>>>> for this. >>>>>> >>>>>> -- >>>>>> Peter N >>>>>> >>>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>>> >>>>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> Is there something specifically required to enable SSE? If it's >>>>>>> not detected as available (based from the target triple?) then I don't >>>>>>> think we enable it specifically. >>>>>>> >>>>>>> Also it seems that it should handle converting to/from the vector >>>>>>> types, although I can see it getting confused about needing to do that if >>>>>>> it thinks SSE isn't available at all. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>>> >>>>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>>>> sqrt? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> In the disassembly, I'm seeing three cases of >>>>>>>> call 76719BA1 >>>>>>>> >>>>>>>> I am assuming this is the sqrt function as this is the only >>>>>>>> function called in the LLVM IR. >>>>>>>> >>>>>>>> The code at 76719BA1 is: >>>>>>>> >>>>>>>> 76719BA1 push ebp >>>>>>>> 76719BA2 mov ebp,esp >>>>>>>> 76719BA4 sub esp,20h >>>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>>> 76719BAA fld st(0) >>>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>>> 76719BC0 test eax,eax >>>>>>>> 76719BC2 je 76719DCF >>>>>>>> 76719BC8 fsubp st(1),st >>>>>>>> 76719BCA test edx,edx >>>>>>>> 76719BCC js 7671F9DB >>>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>>> 76719BDE sbb eax,0 >>>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>>> 76719BE5 sbb edx,0 >>>>>>>> 76719BE8 leave >>>>>>>> 76719BE9 ret >>>>>>>> >>>>>>>> >>>>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>>>> >>>>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>>>> circumstances. >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>>>> >>>>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>>>> prefixed with "llvm.x86". >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>>> >>>>>>>>>> After stepping through the produced assembly, I believe I have >>>>>>>>>> a culprit. >>>>>>>>>> >>>>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>>>> previous value. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>>>> >>>>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>>>> >>>>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>>>> >>>>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>>>> called with >>>>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>>> at the time of the exception. >>>>>>>>>> >>>>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>>>> double asin( double ) ). >>>>>>>>>> >>>>>>>>>> Hopefully this is reproducible for you. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> PeterN >>>>>>>>>> >>>>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>>>> >>>>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>>>> >>>>>>>>>>> Also further testing found me getting the same behavior with >>>>>>>>>>> other SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>>>> >>>>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>>>> optimization passes. >>>>>>>>>>> >>>>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>>>> >>>>>>>>>>> Peter N >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>>>> >>>>>>>>>>>> As someone off list just told me, perhaps my new bug is the >>>>>>>>>>>> same issue: >>>>>>>>>>>> >>>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>>> >>>>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>>>> >>>>>>>>>>>> Solomon >>>>>>>>>>>> >>>>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>>>> >>>>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>>>> >>>>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>>>> >>>>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>>>> memory access. >>>>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this >>>>>>>>>>>>> is part of the SSE error reporting process or is part of the situation >>>>>>>>>>>>> causing the error. >>>>>>>>>>>>> >>>>>>>>>>>>> I haven't worked out exactly what code path etc is causing >>>>>>>>>>>>> this crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Peter N >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130721/f6d25ae9/attachment.html>
Maybe Matching Threads
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX