Craig Topper
2013-Jul-19 06:00 UTC
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:> Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think we > enable it specifically. > > Also it seems that it should handle converting to/from the vector types, > although I can see it getting confused about needing to do that if it > thinks SSE isn't available at all. > > > On 19/07/2013 3:47 PM, Craig Topper wrote: > > Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? > > > On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com> wrote: > >> In the disassembly, I'm seeing three cases of >> call 76719BA1 >> >> I am assuming this is the sqrt function as this is the only function >> called in the LLVM IR. >> >> The code at 76719BA1 is: >> >> 76719BA1 push ebp >> 76719BA2 mov ebp,esp >> 76719BA4 sub esp,20h >> 76719BA7 and esp,0FFFFFFF0h >> 76719BAA fld st(0) >> 76719BAC fst dword ptr [esp+18h] >> 76719BB0 fistp qword ptr [esp+10h] >> 76719BB4 fild qword ptr [esp+10h] >> 76719BB8 mov edx,dword ptr [esp+18h] >> 76719BBC mov eax,dword ptr [esp+10h] >> 76719BC0 test eax,eax >> 76719BC2 je 76719DCF >> 76719BC8 fsubp st(1),st >> 76719BCA test edx,edx >> 76719BCC js 7671F9DB >> 76719BD2 fstp dword ptr [esp] >> 76719BD5 mov ecx,dword ptr [esp] >> 76719BD8 add ecx,7FFFFFFFh >> 76719BDE sbb eax,0 >> 76719BE1 mov edx,dword ptr [esp+14h] >> 76719BE5 sbb edx,0 >> 76719BE8 leave >> 76719BE9 ret >> >> >> As you can see at 76719BD5, it modifies ECX . >> >> I don't know that this is the sqrtpd function (for example, I'm not >> seeing any SSE instructions here?) but whatever it is, it's being called >> from the IR I attached earlier, and is modifying ECX under some >> circumstances. >> >> >> On 19/07/2013 3:29 PM, Craig Topper wrote: >> >> That should map directly to sqrtpd which can't modify ecx. >> >> >> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>> >>> >>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>> >>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed >>> with "llvm.x86". >>> >>> >>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> After stepping through the produced assembly, I believe I have a >>>> culprit. >>>> >>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>> ECX - while the produced code is expecting it to still contain its previous >>>> value. >>>> >>>> Peter N >>>> >>>> >>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>> >>>> I've attached the module->dump() that our code is producing. >>>> Unfortunately this is the smallest test case I have available. >>>> >>>> This is before any optimization passes are applied. There are two >>>> separate modules in existence at the time, and there are no guarantees >>>> about the order the surrounding code calls those functions, so there may be >>>> some interaction between them? There shouldn't be, they don't refer to any >>>> common memory etc. There is no multi-threading occurring. >>>> >>>> The function in module-dump.ll (called crashfunc in this file) is >>>> called with >>>> - func_params 0x0018f3b0 double [3] >>>> [0x0] -11.339976634695301 double >>>> [0x1] -9.7504239056205506 double >>>> [0x2] -5.2900856817382804 double >>>> at the time of the exception. >>>> >>>> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >>>> functions referred to in these modules are the standard equivalents from >>>> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >>>> ). >>>> >>>> Hopefully this is reproducible for you. >>>> >>>> -- >>>> PeterN >>>> >>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>> >>>> Are you able to send any IR for others to reproduce this issue? >>>> >>>> >>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>> applied the fix to my source and it didn't make a difference. >>>>> >>>>> Also further testing found me getting the same behavior with other >>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>> >>>>> Additionally, turning the optimization level passed to createJIT down >>>>> appears to avoid it, so I'm now leaning towards a bug in one of the >>>>> optimization passes. >>>>> >>>>> I'm going to dig through the passes controlled by that parameter and >>>>> see if I can narrow down which optimization is causing it. >>>>> >>>>> Peter N >>>>> >>>>> >>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>> >>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>> issue: >>>>>> >>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>> >>>>>> Do you happen to be using FastISel? >>>>>> >>>>>> Solomon >>>>>> >>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> wrote: >>>>>> >>>>>> Hello all, >>>>>>> >>>>>>> I'm currently in the process of debugging a crash occurring in our >>>>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>> be) related to the stacks state on calling the function. >>>>>>> >>>>>>> Our program acts as a front-end, using the LLVM C++ API to generate >>>>>>> a JIT generated function. This function is primarily mathematical, so we >>>>>>> use the Vector types to take advantage of SIMD instructions (as well as a >>>>>>> few SSE2 intrinsics). >>>>>>> >>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued >>>>>>> to fail in 3.3. It fails with no optimizations applied to the LLVM >>>>>>> Function/Module. It crashes with what is reported as a memory access error >>>>>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>>>>> fault raising mechanism appears. >>>>>>> >>>>>>> The generated instruction varies, but it seems to often be similar >>>>>>> to (I don't have it in front of me, sorry): >>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>> Where the xmm register changes, and the second parameter is a memory >>>>>>> access. >>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>> the error. >>>>>>> >>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>> >>>>>>> -- >>>>>>> Peter N >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130718/f10017cb/attachment.html>
Peter Newman
2013-Jul-19 06:34 UTC
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be removing it, but I don't get the sqrtpd instruction even if the createJIT optimization level turned off. I am trying this with the Release 3.3 code - I'll try it with trunk and see if I get a different result there. Maybe there was a recent commit for this. -- Peter N On 19/07/2013 4:00 PM, Craig Topper wrote:> Hmm, I'm not able to get those .ll files to compile if I disable SSE > and I end up with SSE instructions(including sqrtpd) if I don't > disable it. > > > On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Is there something specifically required to enable SSE? If it's > not detected as available (based from the target triple?) then I > don't think we enable it specifically. > > Also it seems that it should handle converting to/from the vector > types, although I can see it getting confused about needing to do > that if it thinks SSE isn't available at all. > > > On 19/07/2013 3:47 PM, Craig Topper wrote: >> Hmm, maybe sse isn't being enabled so its falling back to >> emulating sqrt? >> >> >> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> In the disassembly, I'm seeing three cases of >> call 76719BA1 >> >> I am assuming this is the sqrt function as this is the only >> function called in the LLVM IR. >> >> The code at 76719BA1 is: >> >> 76719BA1 push ebp >> 76719BA2 mov ebp,esp >> 76719BA4 sub esp,20h >> 76719BA7 and esp,0FFFFFFF0h >> 76719BAA fld st(0) >> 76719BAC fst dword ptr [esp+18h] >> 76719BB0 fistp qword ptr [esp+10h] >> 76719BB4 fild qword ptr [esp+10h] >> 76719BB8 mov edx,dword ptr [esp+18h] >> 76719BBC mov eax,dword ptr [esp+10h] >> 76719BC0 test eax,eax >> 76719BC2 je 76719DCF >> 76719BC8 fsubp st(1),st >> 76719BCA test edx,edx >> 76719BCC js 7671F9DB >> 76719BD2 fstp dword ptr [esp] >> 76719BD5 mov ecx,dword ptr [esp] >> 76719BD8 add ecx,7FFFFFFFh >> 76719BDE sbb eax,0 >> 76719BE1 mov edx,dword ptr [esp+14h] >> 76719BE5 sbb edx,0 >> 76719BE8 leave >> 76719BE9 ret >> >> >> As you can see at 76719BD5, it modifies ECX . >> >> I don't know that this is the sqrtpd function (for example, >> I'm not seeing any SSE instructions here?) but whatever it >> is, it's being called from the IR I attached earlier, and is >> modifying ECX under some circumstances. >> >> >> On 19/07/2013 3:29 PM, Craig Topper wrote: >>> That should map directly to sqrtpd which can't modify ecx. >>> >>> >>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>> >>> >>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with >>>> things prefixed with "llvm.x86". >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> After stepping through the produced assembly, I >>>> believe I have a culprit. >>>> >>>> One of the calls to @frep.x86.sse2.sqrt.pd is >>>> modifying the value of ECX - while the produced >>>> code is expecting it to still contain its previous >>>> value. >>>> >>>> Peter N >>>> >>>> >>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>> I've attached the module->dump() that our code is >>>>> producing. Unfortunately this is the smallest test >>>>> case I have available. >>>>> >>>>> This is before any optimization passes are >>>>> applied. There are two separate modules in >>>>> existence at the time, and there are no guarantees >>>>> about the order the surrounding code calls those >>>>> functions, so there may be some interaction >>>>> between them? There shouldn't be, they don't refer >>>>> to any common memory etc. There is no >>>>> multi-threading occurring. >>>>> >>>>> The function in module-dump.ll (called crashfunc >>>>> in this file) is called with >>>>> - func_params 0x0018f3b0 double [3] >>>>> [0x0] -11.339976634695301 double >>>>> [0x1] -9.7504239056205506 double >>>>> [0x2] -5.2900856817382804 double >>>>> at the time of the exception. >>>>> >>>>> This is compiled on a "i686-pc-win32" triple. All >>>>> of the non-intrinsic functions referred to in >>>>> these modules are the standard equivalents from >>>>> the MSVC library (e.g. @asin is the standard C lib >>>>> double asin( double ) ). >>>>> >>>>> Hopefully this is reproducible for you. >>>>> >>>>> -- >>>>> PeterN >>>>> >>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>> Are you able to send any IR for others to >>>>>> reproduce this issue? >>>>>> >>>>>> >>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> Unfortunately, this doesn't appear to be the >>>>>> bug I'm hitting. I applied the fix to my >>>>>> source and it didn't make a difference. >>>>>> >>>>>> Also further testing found me getting the >>>>>> same behavior with other SIMD instructions. >>>>>> The common factor is in each case, ECX is set >>>>>> to 0x7fffffff, and it's an operation using >>>>>> xmm ptr ecx+offset . >>>>>> >>>>>> Additionally, turning the optimization level >>>>>> passed to createJIT down appears to avoid it, >>>>>> so I'm now leaning towards a bug in one of >>>>>> the optimization passes. >>>>>> >>>>>> I'm going to dig through the passes >>>>>> controlled by that parameter and see if I can >>>>>> narrow down which optimization is causing it. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>> >>>>>> As someone off list just told me, perhaps >>>>>> my new bug is the same issue: >>>>>> >>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>> >>>>>> Do you happen to be using FastISel? >>>>>> >>>>>> Solomon >>>>>> >>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>> <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> Hello all, >>>>>> >>>>>> I'm currently in the process of >>>>>> debugging a crash occurring in our >>>>>> program. In LLVM 3.2 and 3.3 it >>>>>> appears that JIT generated code is >>>>>> attempting to perform access >>>>>> unaligned memory with a SSE2 >>>>>> instruction. However this only >>>>>> happens under certain conditions that >>>>>> seem (but may not be) related to the >>>>>> stacks state on calling the function. >>>>>> >>>>>> Our program acts as a front-end, >>>>>> using the LLVM C++ API to generate a >>>>>> JIT generated function. This function >>>>>> is primarily mathematical, so we use >>>>>> the Vector types to take advantage of >>>>>> SIMD instructions (as well as a few >>>>>> SSE2 intrinsics). >>>>>> >>>>>> This worked in LLVM 2.8 but started >>>>>> failing in 3.2 and has continued to >>>>>> fail in 3.3. It fails with no >>>>>> optimizations applied to the LLVM >>>>>> Function/Module. It crashes with what >>>>>> is reported as a memory access error >>>>>> (accessing 0xffffffff), however it's >>>>>> suggested that this is how the SSE >>>>>> fault raising mechanism appears. >>>>>> >>>>>> The generated instruction varies, but >>>>>> it seems to often be similar to (I >>>>>> don't have it in front of me, sorry): >>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>> Where the xmm register changes, and >>>>>> the second parameter is a memory access. >>>>>> ECX is always set to 0x7ffffff - >>>>>> however I don't know if this is part >>>>>> of the SSE error reporting process or >>>>>> is part of the situation causing the >>>>>> error. >>>>>> >>>>>> I haven't worked out exactly what >>>>>> code path etc is causing this crash. >>>>>> I'm hoping that someone can tell me >>>>>> if there were any changed >>>>>> requirements for working with SIMD in >>>>>> LLVM 3.2 (or earlier, we haven't >>>>>> tried 3.0 or 3.1). I currently >>>>>> suspect the use of GlobalVariable (we >>>>>> first discovered the crash when using >>>>>> a feature that uses them), however I >>>>>> have attempted using setAlignment on >>>>>> the GlobalVariables without any change. >>>>>> >>>>>> -- >>>>>> Peter N >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu >>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>> http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu >>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>> http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/d61f6a7f/attachment.html> -------------- next part -------------- 002E00D0 push ebp 002E00D1 mov ebp,esp 002E00D3 push ebx 002E00D4 push edi 002E00D5 push esi 002E00D6 and esp,0FFFFFFF0h 002E00DC sub esp,110h 002E00E2 mov eax,dword ptr [ebp+8] 002E00E5 movddup xmm0,mmword ptr [eax+10h] 002E00EA movapd xmmword ptr [esp+80h],xmm0 002E00F3 movddup xmm0,mmword ptr [eax+8] 002E00F8 movapd xmmword ptr [esp+70h],xmm0 002E00FE movddup xmm0,mmword ptr [eax] 002E0102 movapd xmmword ptr [esp+60h],xmm0 002E0108 xorpd xmm0,xmm0 002E010C movapd xmmword ptr [esp+0C0h],xmm0 002E0115 xorpd xmm1,xmm1 002E0119 xorpd xmm7,xmm7 002E011D movapd xmmword ptr [esp+0A0h],xmm1 002E0126 movapd xmmword ptr [esp+0B0h],xmm7 002E012F movapd xmm3,xmm1 002E0133 movlpd qword ptr [esp+0F0h],xmm3 002E013C movhpd qword ptr [esp+0E0h],xmm3 002E0145 movlpd qword ptr [esp+100h],xmm7 002E014E pshufd xmm0,xmm7,44h 002E0153 movdqa xmm5,xmm0 002E0157 xorpd xmm4,xmm4 002E015B mulpd xmm5,xmm4 002E015F pshufd xmm2,xmm3,44h 002E0164 movdqa xmm1,xmm2 002E0168 mulpd xmm1,xmm4 002E016C xorpd xmm7,xmm7 002E0170 movapd xmm4,xmmword ptr [esp+70h] 002E0176 subpd xmm4,xmm1 002E017A pshufd xmm3,xmm3,0EEh 002E017F subpd xmm4,xmm3 002E0183 subpd xmm4,xmm5 002E0187 fld qword ptr [esp+0F0h] 002E018E call 76719BA1 CALL 002E0193 imul ebx,eax,0Ch 002E0196 lea esi,[ebx+3] 002E0199 shl esi,4 002E019C movapd xmm6,xmmword ptr [esi+2C0030h] 002E01A4 mulpd xmm6,xmm4 002E01A8 mulpd xmm3,xmm7 002E01AC movapd xmm7,xmmword ptr [esp+60h] 002E01B2 subpd xmm7,xmm2 002E01B6 subpd xmm7,xmm3 002E01BA subpd xmm7,xmm5 002E01BE movapd xmm2,xmmword ptr [esi+2C0020h] 002E01C6 mulpd xmm2,xmm7 002E01CA addpd xmm2,xmm6 002E01CE movapd xmm5,xmmword ptr [esp+80h] 002E01D7 subpd xmm5,xmm1 002E01DB subpd xmm5,xmm3 002E01DF mulpd xmm0,xmmword ptr ds:[2E0010h] 002E01E7 subpd xmm5,xmm0 002E01EB movapd xmm6,xmmword ptr [esi+2C0040h] 002E01F3 mulpd xmm6,xmm5 002E01F7 addpd xmm6,xmm2 002E01FB addpd xmm6,xmmword ptr [esi+2C0050h] 002E0203 fld qword ptr [esp+0E0h] 002E020A call 76719BA1 CALL 002E020F imul edi,eax,0Ch 002E0212 lea ecx,[edi+3] First time ECX is touched 002E0215 shl ecx,4 002E0218 movapd xmm0,xmmword ptr [ecx+2C0030h] *** 002E0220 mulpd xmm0,xmm6 002E0224 mov eax,ebx 002E0226 shl eax,4 002E0229 movapd xmm1,xmmword ptr [eax+2C0010h] 002E0231 mulpd xmm1,xmm7 002E0235 or ebx,1 002E0238 shl ebx,4 002E023B movapd xmm2,xmmword ptr [ebx+2C0010h] 002E0243 mulpd xmm2,xmm4 002E0247 addpd xmm2,xmm1 002E024B movapd xmm3,xmmword ptr [ebx+2C0020h] 002E0253 mulpd xmm3,xmm5 002E0257 addpd xmm3,xmm2 002E025B addpd xmm3,xmmword ptr [esi+2C0010h] 002E0263 movapd xmm1,xmmword ptr [ecx+2C0020h] *** 002E026B mulpd xmm1,xmm3 002E026F addpd xmm1,xmm0 002E0273 mulpd xmm4,xmmword ptr [esi+2C0070h] 002E027B mulpd xmm7,xmmword ptr [esi+2C0060h] 002E0283 addpd xmm7,xmm4 002E0287 mulpd xmm5,xmmword ptr [esi+2C0080h] 002E028F addpd xmm5,xmm7 002E0293 addpd xmm5,xmmword ptr [esi+2C0090h] 002E029B movapd xmm7,xmmword ptr [ecx+2C0040h] *** 002E02A3 mulpd xmm7,xmm5 002E02A7 addpd xmm7,xmm1 002E02AB addpd xmm7,xmmword ptr [ecx+2C0050h] *** 002E02B3 fld qword ptr [esp+100h] 002E02BA call 76719BA1 CALL 002E02BF imul edx,eax,0Ch 002E02C2 lea eax,[edx+3] 002E02C5 shl eax,4 002E02C8 movapd xmm1,xmmword ptr [eax+2C0130h] 002E02D0 mulpd xmm1,xmm7 002E02D4 lea esi,[edi+1] 002E02D7 shl esi,4 002E02DA movapd xmm0,xmmword ptr [esi+2C0010h] 002E02E2 mulpd xmm0,xmm6 002E02E6 shl edi,4 002E02E9 movapd xmm2,xmmword ptr [edi+2C0010h] 002E02F1 mulpd xmm2,xmm3 002E02F5 addpd xmm2,xmm0 002E02F9 movapd xmm0,xmmword ptr [esi+2C0020h] 002E0301 mulpd xmm0,xmm5 002E0305 addpd xmm0,xmm2 002E0309 addpd xmm0,xmmword ptr [ecx+2C0010h] *** 002E0311 movapd xmm2,xmmword ptr [eax+2C0120h] *** 002E0319 mulpd xmm2,xmm0 002E031D addpd xmm2,xmm1 002E0321 mulpd xmm6,xmmword ptr [ecx+2C0070h] *** 002E0329 mulpd xmm3,xmmword ptr [ecx+2C0060h] *** 002E0331 addpd xmm3,xmm6 002E0335 mulpd xmm5,xmmword ptr [ecx+2C0080h] *** 002E033D addpd xmm5,xmm3 002E0341 addpd xmm5,xmmword ptr [ecx+2C0090h] *** 002E0349 movapd xmm6,xmmword ptr [eax+2C0140h] *** 002E0351 mulpd xmm6,xmm5 002E0355 addpd xmm6,xmm2 002E0359 addpd xmm6,xmmword ptr [eax+2C0150h] *** 002E0361 movapd xmm2,xmm6 002E0365 mulpd xmm2,xmmword ptr ds:[2E00C0h] 002E036D lea ecx,[edx+1] ECX set 002E0370 shl ecx,4 002E0373 movapd xmm1,xmmword ptr [ecx+2C00D0h] 002E037B mulpd xmm1,xmm7 002E037F shl edx,4 002E0382 movapd xmm3,xmmword ptr [edx+2C00D0h] 002E038A mulpd xmm3,xmm0 002E038E addpd xmm3,xmm1 002E0392 movapd xmm4,xmmword ptr [ecx+2C00E0h] 002E039A mulpd xmm4,xmm5 002E039E addpd xmm4,xmm3 002E03A2 addpd xmm4,xmmword ptr [eax+2C00D0h] 002E03AA movapd xmm3,xmm4 002E03AE addpd xmm3,xmmword ptr ds:[2E0020h] 002E03B6 movapd xmm1,xmm3 002E03BA subpd xmm1,xmm2 002E03BE movapd xmmword ptr [esp+90h],xmm1 002E03C7 movapd xmm2,xmmword ptr ds:[2E0030h] 002E03CF mulpd xmm1,xmm2 002E03D3 mulpd xmm7,xmmword ptr [eax+2C00F0h] 002E03DB mulpd xmm0,xmmword ptr [eax+2C00E0h] 002E03E3 addpd xmm0,xmm7 002E03E7 mulpd xmm5,xmmword ptr [eax+2C0100h] 002E03EF addpd xmm5,xmm0 002E03F3 addpd xmm5,xmmword ptr [eax+2C0110h] 002E03FB movapd xmm0,xmm5 002E03FF addpd xmm0,xmmword ptr ds:[2E0040h] 002E0407 movapd xmm7,xmm0 002E040B movapd xmm2,xmmword ptr ds:[2E0050h] 002E0413 mulpd xmm7,xmm2 002E0417 subpd xmm7,xmm1 002E041B xorpd xmm1,xmm1 002E041F mulpd xmm3,xmm1 002E0423 addpd xmm3,xmm6 002E0427 movapd xmm2,xmm3 002E042B mulpd xmm2,xmm1 002E042F addpd xmm2,xmm7 002E0433 addpd xmm2,xmm1 002E0437 movapd xmm1,xmmword ptr ds:[2E0060h] 002E043F mulpd xmm2,xmm1 002E0443 mulpd xmm2,xmm2 002E0447 movapd xmm1,xmmword ptr [esp+90h] 002E0450 mulpd xmm1,xmmword ptr ds:[2E0050h] 002E0458 mulpd xmm0,xmmword ptr ds:[2E0030h] 002E0460 addpd xmm0,xmm1 002E0464 addpd xmm0,xmmword ptr ds:[2E00C0h] 002E046C movapd xmm1,xmmword ptr ds:[2E0060h] 002E0474 mulpd xmm0,xmm1 002E0478 mulpd xmm0,xmm0 002E047C addpd xmm0,xmm2 002E0480 mulpd xmm7,xmmword ptr ds:[2E00C0h] 002E0488 xorpd xmm2,xmm2 002E048C subpd xmm3,xmm7 002E0490 addpd xmm3,xmm2 002E0494 mulpd xmm3,xmm1 002E0498 mulpd xmm3,xmm3 002E049C addpd xmm3,xmm0 002E04A0 movapd xmm7,xmmword ptr ds:[2E0070h] 002E04A8 movapd xmm0,xmm7 002E04AC subpd xmm0,xmm3 002E04B0 movapd xmm1,xmmword ptr ds:[2E0080h] 002E04B8 mulpd xmm5,xmm1 002E04BC mulpd xmm5,xmm5 002E04C0 mulpd xmm4,xmm1 002E04C4 mulpd xmm4,xmm4 002E04C8 addpd xmm4,xmm5 002E04CC mulpd xmm6,xmm1 002E04D0 mulpd xmm6,xmm6 002E04D4 addpd xmm6,xmm4 002E04D8 movapd xmm2,xmm7 002E04DC subpd xmm2,xmm6 002E04E0 movapd xmm1,xmm2 002E04E4 addpd xmm1,xmm0 002E04E8 mulpd xmm0,xmm0 002E04EC mulpd xmm2,xmm2 002E04F0 addpd xmm2,xmm0 002E04F4 sqrtpd xmm0,xmm2 002E04F8 addpd xmm0,xmm1 002E04FC addpd xmm2,xmm7 002E0500 movsd xmm4,mmword ptr ds:[2E0090h] 002E0508 movapd xmm1,xmm4 002E050C divsd xmm1,xmm2 002E0510 unpckhpd xmm2,xmm2 002E0514 movapd xmm3,xmm4 002E0518 divsd xmm3,xmm2 002E051C unpcklpd xmm1,xmm3 002E0520 mulpd xmm1,xmmword ptr ds:[2E00A0h] 002E0528 addpd xmm1,xmm0 002E052C movapd xmm3,xmmword ptr [esp+0A0h] 002E0535 movapd xmm0,xmm3 002E0539 unpckhpd xmm0,xmm0 002E053D movapd xmm2,xmm3 002E0541 movapd xmm6,xmm3 002E0545 addsd xmm2,xmm0 002E0549 movapd xmm3,xmmword ptr [esp+0B0h] 002E0552 addsd xmm2,xmm3 002E0556 movapd xmm7,xmm3 002E055A xorpd xmm3,xmm3 002E055E ucomisd xmm2,xmm3 002E0562 setnp al 002E0565 sete cl 002E0568 test al,cl 002E056A jne 002E059A 002E0570 movapd xmm5,xmmword ptr [esp+0C0h] 002E0579 movapd xmm2,xmm5 002E057D addpd xmm2,xmm1 002E0581 mulpd xmm1,xmm1 002E0585 mulpd xmm5,xmm5 002E0589 addpd xmm5,xmm1 002E058D sqrtpd xmm5,xmm5 002E0591 addpd xmm5,xmm2 002E0595 jmp 002E059E 002E059A movapd xmm5,xmm1 002E059E movapd xmmword ptr [esp+0C0h],xmm5 002E05A7 movapd xmm2,xmm6 002E05AB addsd xmm2,xmm4 002E05AF ucomisd xmm2,xmm4 002E05B3 xorpd xmm1,xmm1 002E05B7 jae 002E05C1 002E05BD movapd xmm1,xmm2 002E05C1 jb 002E05CB 002E05C7 addsd xmm0,xmm4 002E05CB ucomisd xmm0,xmm4 002E05CF xorpd xmm2,xmm2 002E05D3 jae 002E05DD 002E05D9 movapd xmm2,xmm0 002E05DD movsd xmm6,xmm1 002E05E1 unpcklpd xmm6,xmm2 002E05E5 movapd xmm1,xmm6 002E05E9 movapd xmm0,xmm7 002E05ED jb 002E05F7 002E05F3 addsd xmm0,xmm4 002E05F7 ucomisd xmm0,mmword ptr ds:[2E00B0h] 002E05FF jae 002E0609 002E0605 movapd xmm3,xmm0 002E0609 movsd xmm7,xmm3 002E060D jb 002E011D 002E0613 movapd xmm0,xmmword ptr [esp+0C0h] 002E061C movlpd qword ptr [esp+0D0h],xmm0 002E0625 fld qword ptr [esp+0D0h] 002E062C lea esp,[ebp-0Ch] 002E062F pop esi 002E0630 pop edi 002E0631 pop ebx 002E0632 pop ebp 002E0633 ret -------------- next part -------------- 002B00B8 push ebp 002B00B9 mov ebp,esp 002B00BB and esp,0FFFFFFF0h 002B00C1 sub esp,540h 002B00C7 mov eax,dword ptr [ebp+8] 002B00CA movsd xmm0,mmword ptr [eax+10h] 002B00CF unpcklpd xmm0,xmm0 002B00D3 movsd xmm1,mmword ptr [eax] 002B00D7 movsd xmm2,mmword ptr [eax+8] 002B00DC unpcklpd xmm2,xmm2 002B00E0 unpcklpd xmm1,xmm1 002B00E4 xorps xmm3,xmm3 002B00E7 movaps xmm4,xmm3 002B00EA movaps xmm5,xmm3 002B00ED movaps xmmword ptr [esp+4F0h],xmm2 002B00F5 movaps xmmword ptr [esp+4E0h],xmm0 002B00FD movaps xmmword ptr [esp+4D0h],xmm1 002B0105 movaps xmmword ptr [esp+4C0h],xmm5 002B010D movaps xmmword ptr [esp+4B0h],xmm3 002B0115 movaps xmmword ptr [esp+4A0h],xmm4 002B011D movaps xmm0,xmmword ptr [esp+4C0h] 002B0125 movaps xmm1,xmmword ptr [esp+4B0h] 002B012D movaps xmm2,xmmword ptr [esp+4A0h] 002B0135 movaps xmm3,xmm1 002B0138 movaps xmm4,xmm1 002B013B shufpd xmm4,xmm4,0 002B0140 movaps xmm5,xmmword ptr [esp+4D0h] 002B0148 subpd xmm5,xmm4 002B014C xorps xmm6,xmm6 002B014F mulpd xmm4,xmm6 002B0153 xorps xmm7,xmm7 002B0156 movaps xmmword ptr [esp+490h],xmm0 002B015E movaps xmm0,xmmword ptr [esp+4F0h] 002B0166 subpd xmm0,xmm4 002B016A movaps xmmword ptr [esp+480h],xmm0 002B0172 movaps xmm0,xmmword ptr [esp+4E0h] 002B017A subpd xmm0,xmm4 002B017E movaps xmm4,xmm1 002B0181 shufpd xmm4,xmm4,3 002B0186 movaps xmmword ptr [esp+470h],xmm0 002B018E movaps xmm0,xmm4 002B0191 mulpd xmm0,xmm6 002B0195 subpd xmm5,xmm0 002B0199 movaps xmmword ptr [esp+460h],xmm0 002B01A1 movaps xmm0,xmmword ptr [esp+480h] 002B01A9 subpd xmm0,xmm4 002B01AD movaps xmm4,xmmword ptr [esp+470h] 002B01B5 movaps xmmword ptr [esp+450h],xmm0 002B01BD movaps xmm0,xmmword ptr [esp+460h] 002B01C5 subpd xmm4,xmm0 002B01C9 movaps xmm0,xmm2 002B01CC movsd mmword ptr [esp+448h],xmm0 002B01D5 movaps xmm0,xmm2 002B01D8 shufpd xmm0,xmm0,0 002B01DD movaps xmmword ptr [esp+430h],xmm0 002B01E5 mulpd xmm0,xmm6 002B01E9 subpd xmm5,xmm0 002B01ED movaps xmmword ptr [esp+420h],xmm0 002B01F5 movaps xmm0,xmmword ptr [esp+450h] 002B01FD movaps xmmword ptr [esp+410h],xmm1 002B0205 movaps xmm1,xmmword ptr [esp+420h] 002B020D subpd xmm0,xmm1 002B0211 movapd xmm1,xmmword ptr ds:[2B0010h] 002B0219 movaps xmmword ptr [esp+400h],xmm0 002B0221 movaps xmm0,xmmword ptr [esp+430h] 002B0229 mulpd xmm0,xmm1 002B022D subpd xmm4,xmm0 002B0231 movaps xmm0,xmmword ptr [esp+410h] 002B0239 movlpd qword ptr [esp+520h],xmm0 002B0242 fld qword ptr [esp+520h] 002B0249 call 76719BA1 002B024E imul eax,eax,0Ch 002B0251 mov edx,eax 002B0253 shl edx,4 002B0256 movapd xmm1,xmmword ptr [edx+330010h] 002B025E mov edx,eax 002B0260 or edx,1 002B0263 shl edx,4 002B0266 movapd xmm0,xmmword ptr [edx+330010h] 002B026E movaps xmmword ptr [esp+3F0h],xmm0 002B0276 movapd xmm0,xmmword ptr [edx+330020h] 002B027E or eax,3 002B0281 shl eax,4 002B0284 movaps xmmword ptr [esp+3E0h],xmm0 002B028C movapd xmm0,xmmword ptr [eax+330010h] 002B0294 movaps xmmword ptr [esp+3D0h],xmm0 002B029C movapd xmm0,xmmword ptr [eax+330020h] 002B02A4 movaps xmmword ptr [esp+3C0h],xmm0 002B02AC movapd xmm0,xmmword ptr [eax+330030h] 002B02B4 movaps xmmword ptr [esp+3B0h],xmm0 002B02BC movapd xmm0,xmmword ptr [eax+330040h] 002B02C4 movaps xmmword ptr [esp+3A0h],xmm0 002B02CC movapd xmm0,xmmword ptr [eax+330050h] 002B02D4 movaps xmmword ptr [esp+390h],xmm0 002B02DC movapd xmm0,xmmword ptr [eax+330060h] 002B02E4 movaps xmmword ptr [esp+380h],xmm0 002B02EC movapd xmm0,xmmword ptr [eax+330070h] 002B02F4 movaps xmmword ptr [esp+370h],xmm0 002B02FC movapd xmm0,xmmword ptr [eax+330080h] 002B0304 movaps xmmword ptr [esp+360h],xmm0 002B030C movapd xmm0,xmmword ptr [eax+330090h] 002B0314 movaps xmmword ptr [esp+350h],xmm0 002B031C movaps xmm0,xmmword ptr [esp+3E0h] 002B0324 mulpd xmm0,xmm4 002B0328 movaps xmmword ptr [esp+340h],xmm0 002B0330 movaps xmm0,xmmword ptr [esp+3F0h] 002B0338 movaps xmmword ptr [esp+330h],xmm1 002B0340 movaps xmm1,xmmword ptr [esp+400h] 002B0348 mulpd xmm0,xmm1 002B034C movaps xmm1,xmmword ptr [esp+330h] 002B0354 mulpd xmm1,xmm5 002B0358 addpd xmm1,xmm0 002B035C movaps xmm0,xmmword ptr [esp+340h] 002B0364 addpd xmm0,xmm1 002B0368 movaps xmm1,xmmword ptr [esp+3D0h] 002B0370 addpd xmm1,xmm0 002B0374 movaps xmm0,xmm4 002B0377 movaps xmmword ptr [esp+320h],xmm1 002B037F movaps xmm1,xmmword ptr [esp+3A0h] 002B0387 mulpd xmm0,xmm1 002B038B movaps xmm1,xmmword ptr [esp+3B0h] 002B0393 movaps xmmword ptr [esp+310h],xmm0 002B039B movaps xmm0,xmmword ptr [esp+400h] 002B03A3 mulpd xmm1,xmm0 002B03A7 movaps xmm0,xmmword ptr [esp+3C0h] 002B03AF mulpd xmm0,xmm5 002B03B3 addpd xmm0,xmm1 002B03B7 movaps xmm1,xmmword ptr [esp+310h] 002B03BF addpd xmm1,xmm0 002B03C3 movaps xmm0,xmmword ptr [esp+390h] 002B03CB addpd xmm0,xmm1 002B03CF movaps xmm1,xmmword ptr [esp+360h] 002B03D7 mulpd xmm4,xmm1 002B03DB movaps xmm1,xmmword ptr [esp+400h] 002B03E3 movaps xmmword ptr [esp+300h],xmm0 002B03EB movaps xmm0,xmmword ptr [esp+370h] 002B03F3 mulpd xmm1,xmm0 002B03F7 movaps xmm0,xmmword ptr [esp+380h] 002B03FF mulpd xmm5,xmm0 002B0403 addpd xmm5,xmm1 002B0407 addpd xmm5,xmm4 002B040B movaps xmm0,xmmword ptr [esp+350h] 002B0413 addpd xmm0,xmm5 002B0417 movaps xmm1,xmmword ptr [esp+410h] 002B041F movhpd qword ptr [esp+510h],xmm1 002B0428 fld qword ptr [esp+510h] 002B042F call 76719BA1 002B0434 imul eax,eax,0Ch 002B0437 mov edx,eax 002B0439 shl edx,4 002B043C movapd xmm4,xmmword ptr [edx+330010h] 002B0444 mov edx,eax 002B0446 or edx,1 002B0449 shl edx,4 002B044C movapd xmm5,xmmword ptr [edx+330010h] 002B0454 movapd xmm1,xmmword ptr [edx+330020h] 002B045C or eax,3 002B045F shl eax,4 002B0462 movaps xmmword ptr [esp+2F0h],xmm0 002B046A movapd xmm0,xmmword ptr [eax+330010h] 002B0472 movaps xmmword ptr [esp+2E0h],xmm0 002B047A movapd xmm0,xmmword ptr [eax+330020h] 002B0482 movaps xmmword ptr [esp+2D0h],xmm0 002B048A movapd xmm0,xmmword ptr [eax+330030h] 002B0492 movaps xmmword ptr [esp+2C0h],xmm0 002B049A movapd xmm0,xmmword ptr [eax+330040h] 002B04A2 movaps xmmword ptr [esp+2B0h],xmm0 002B04AA movapd xmm0,xmmword ptr [eax+330050h] 002B04B2 movaps xmmword ptr [esp+2A0h],xmm0 002B04BA movapd xmm0,xmmword ptr [eax+330060h] 002B04C2 movaps xmmword ptr [esp+290h],xmm0 002B04CA movapd xmm0,xmmword ptr [eax+330070h] 002B04D2 movaps xmmword ptr [esp+280h],xmm0 002B04DA movapd xmm0,xmmword ptr [eax+330080h] 002B04E2 movaps xmmword ptr [esp+270h],xmm0 002B04EA movapd xmm0,xmmword ptr [eax+330090h] 002B04F2 movaps xmmword ptr [esp+260h],xmm0 002B04FA movaps xmm0,xmmword ptr [esp+2F0h] 002B0502 mulpd xmm0,xmm1 002B0506 movaps xmm1,xmmword ptr [esp+300h] 002B050E mulpd xmm1,xmm5 002B0512 movaps xmm5,xmmword ptr [esp+320h] 002B051A mulpd xmm5,xmm4 002B051E addpd xmm5,xmm1 002B0522 addpd xmm5,xmm0 002B0526 movaps xmm0,xmmword ptr [esp+2E0h] 002B052E addpd xmm0,xmm5 002B0532 movaps xmm1,xmmword ptr [esp+2F0h] 002B053A movaps xmm4,xmmword ptr [esp+2B0h] 002B0542 mulpd xmm1,xmm4 002B0546 movaps xmm4,xmmword ptr [esp+300h] 002B054E movaps xmm5,xmmword ptr [esp+2C0h] 002B0556 mulpd xmm4,xmm5 002B055A movaps xmm5,xmmword ptr [esp+320h] 002B0562 movaps xmmword ptr [esp+250h],xmm0 002B056A movaps xmm0,xmmword ptr [esp+2D0h] 002B0572 mulpd xmm5,xmm0 002B0576 addpd xmm5,xmm4 002B057A addpd xmm5,xmm1 002B057E movaps xmm0,xmmword ptr [esp+2A0h] 002B0586 addpd xmm0,xmm5 002B058A movaps xmm1,xmmword ptr [esp+2F0h] 002B0592 movaps xmm4,xmmword ptr [esp+270h] 002B059A mulpd xmm1,xmm4 002B059E movaps xmm4,xmmword ptr [esp+300h] 002B05A6 movaps xmm5,xmmword ptr [esp+280h] 002B05AE mulpd xmm4,xmm5 002B05B2 movaps xmm5,xmmword ptr [esp+320h] 002B05BA movaps xmmword ptr [esp+240h],xmm0 002B05C2 movaps xmm0,xmmword ptr [esp+290h] 002B05CA mulpd xmm5,xmm0 002B05CE addpd xmm5,xmm4 002B05D2 addpd xmm5,xmm1 002B05D6 movaps xmm0,xmmword ptr [esp+260h] 002B05DE addpd xmm0,xmm5 002B05E2 movlpd qword ptr [esp+530h],xmm2 002B05EB fld qword ptr [esp+530h] 002B05F2 call 76719BA1 002B05F7 imul eax,eax,0Ch 002B05FA mov edx,eax 002B05FC shl edx,4 002B05FF movapd xmm1,xmmword ptr [edx+3300D0h] 002B0607 mov edx,eax 002B0609 or edx,1 002B060C shl edx,4 002B060F movapd xmm4,xmmword ptr [edx+3300D0h] 002B0617 movapd xmm5,xmmword ptr [edx+3300E0h] 002B061F or eax,3 002B0622 shl eax,4 002B0625 movaps xmmword ptr [esp+230h],xmm0 002B062D movapd xmm0,xmmword ptr [eax+3300D0h] 002B0635 movaps xmmword ptr [esp+220h],xmm0 002B063D movapd xmm0,xmmword ptr [eax+3300E0h] 002B0645 movaps xmmword ptr [esp+210h],xmm0 002B064D movapd xmm0,xmmword ptr [eax+3300F0h] 002B0655 movaps xmmword ptr [esp+200h],xmm0 002B065D movapd xmm0,xmmword ptr [eax+330100h] 002B0665 movaps xmmword ptr [esp+1F0h],xmm0 002B066D movapd xmm0,xmmword ptr [eax+330110h] 002B0675 movaps xmmword ptr [esp+1E0h],xmm0 002B067D movapd xmm0,xmmword ptr [eax+330120h] 002B0685 movaps xmmword ptr [esp+1D0h],xmm0 002B068D movapd xmm0,xmmword ptr [eax+330130h] 002B0695 movaps xmmword ptr [esp+1C0h],xmm0 002B069D movapd xmm0,xmmword ptr [eax+330140h] 002B06A5 movaps xmmword ptr [esp+1B0h],xmm0 002B06AD movapd xmm0,xmmword ptr [eax+330150h] 002B06B5 movaps xmmword ptr [esp+1A0h],xmm0 002B06BD movaps xmm0,xmmword ptr [esp+230h] 002B06C5 mulpd xmm0,xmm5 002B06C9 movaps xmm5,xmmword ptr [esp+240h] 002B06D1 mulpd xmm5,xmm4 002B06D5 movaps xmm4,xmmword ptr [esp+250h] 002B06DD mulpd xmm4,xmm1 002B06E1 addpd xmm4,xmm5 002B06E5 addpd xmm4,xmm0 002B06E9 movaps xmm0,xmmword ptr [esp+220h] 002B06F1 addpd xmm0,xmm4 002B06F5 movaps xmm1,xmmword ptr [esp+230h] 002B06FD movaps xmm4,xmmword ptr [esp+1F0h] 002B0705 mulpd xmm1,xmm4 002B0709 movaps xmm4,xmmword ptr [esp+240h] 002B0711 movaps xmm5,xmmword ptr [esp+200h] 002B0719 mulpd xmm4,xmm5 002B071D movaps xmm5,xmmword ptr [esp+250h] 002B0725 movaps xmmword ptr [esp+190h],xmm0 002B072D movaps xmm0,xmmword ptr [esp+210h] 002B0735 mulpd xmm5,xmm0 002B0739 addpd xmm5,xmm4 002B073D addpd xmm5,xmm1 002B0741 movaps xmm0,xmmword ptr [esp+1E0h] 002B0749 addpd xmm0,xmm5 002B074D movaps xmm1,xmmword ptr [esp+230h] 002B0755 movaps xmm4,xmmword ptr [esp+1B0h] 002B075D mulpd xmm1,xmm4 002B0761 movaps xmm4,xmmword ptr [esp+240h] 002B0769 movaps xmm5,xmmword ptr [esp+1C0h] 002B0771 mulpd xmm4,xmm5 002B0775 movaps xmm5,xmmword ptr [esp+250h] 002B077D movaps xmmword ptr [esp+180h],xmm0 002B0785 movaps xmm0,xmmword ptr [esp+1D0h] 002B078D mulpd xmm5,xmm0 002B0791 addpd xmm5,xmm4 002B0795 addpd xmm5,xmm1 002B0799 movaps xmm0,xmmword ptr [esp+1A0h] 002B07A1 addpd xmm0,xmm5 002B07A5 movapd xmm1,xmmword ptr ds:[2B0020h] 002B07AD movaps xmm4,xmmword ptr [esp+190h] 002B07B5 mulpd xmm4,xmm1 002B07B9 mulpd xmm4,xmm4 002B07BD movaps xmm5,xmmword ptr [esp+180h] 002B07C5 mulpd xmm5,xmm1 002B07C9 mulpd xmm5,xmm5 002B07CD movaps xmmword ptr [esp+170h],xmm0 002B07D5 mulpd xmm0,xmm1 002B07D9 mulpd xmm0,xmm0 002B07DD addpd xmm4,xmm5 002B07E1 addpd xmm4,xmm0 002B07E5 movapd xmm0,xmmword ptr ds:[2B0030h] 002B07ED movaps xmm1,xmm0 002B07F0 subpd xmm1,xmm4 002B07F4 movapd xmm4,xmmword ptr ds:[2B0040h] 002B07FC movaps xmm5,xmmword ptr [esp+190h] 002B0804 addpd xmm5,xmm4 002B0808 movapd xmm4,xmmword ptr ds:[2B0050h] 002B0810 movaps xmmword ptr [esp+160h],xmm0 002B0818 movaps xmm0,xmmword ptr [esp+180h] 002B0820 addpd xmm0,xmm4 002B0824 movaps xmm4,xmmword ptr [esp+170h] 002B082C mulpd xmm4,xmm6 002B0830 movaps xmmword ptr [esp+150h],xmm0 002B0838 movaps xmm0,xmm5 002B083B subpd xmm0,xmm4 002B083F mulpd xmm5,xmm6 002B0843 movaps xmm4,xmmword ptr [esp+170h] 002B084B addpd xmm5,xmm4 002B084F movapd xmm4,xmmword ptr ds:[2B0060h] 002B0857 movaps xmmword ptr [esp+140h],xmm0 002B085F mulpd xmm0,xmm4 002B0863 movaps xmmword ptr [esp+130h],xmm0 002B086B movapd xmm0,xmmword ptr ds:[2B0070h] 002B0873 movaps xmmword ptr [esp+120h],xmm0 002B087B movaps xmm0,xmmword ptr [esp+150h] 002B0883 movaps xmmword ptr [esp+110h],xmm1 002B088B movaps xmm1,xmmword ptr [esp+120h] 002B0893 mulpd xmm0,xmm1 002B0897 movaps xmm1,xmmword ptr [esp+130h] 002B089F addpd xmm0,xmm1 002B08A3 movaps xmm1,xmmword ptr [esp+140h] 002B08AB movaps xmmword ptr [esp+100h],xmm0 002B08B3 movaps xmm0,xmmword ptr [esp+120h] 002B08BB mulpd xmm1,xmm0 002B08BF movaps xmm0,xmmword ptr [esp+150h] 002B08C7 mulpd xmm0,xmm4 002B08CB subpd xmm0,xmm1 002B08CF movaps xmm1,xmm5 002B08D2 mulpd xmm1,xmm6 002B08D6 addpd xmm1,xmm0 002B08DA mulpd xmm0,xmm6 002B08DE subpd xmm5,xmm0 002B08E2 movaps xmm0,xmmword ptr [esp+100h] 002B08EA addpd xmm0,xmm6 002B08EE addpd xmm1,xmm6 002B08F2 addpd xmm5,xmm6 002B08F6 movapd xmm4,xmmword ptr ds:[2B0080h] 002B08FE mulpd xmm0,xmm4 002B0902 mulpd xmm0,xmm0 002B0906 mulpd xmm1,xmm4 002B090A mulpd xmm1,xmm1 002B090E mulpd xmm5,xmm4 002B0912 mulpd xmm5,xmm5 002B0916 addpd xmm0,xmm1 002B091A addpd xmm0,xmm5 002B091E movaps xmm1,xmmword ptr [esp+160h] 002B0926 subpd xmm1,xmm0 002B092A movaps xmm0,xmmword ptr [esp+110h] 002B0932 mulpd xmm0,xmm0 002B0936 movaps xmm4,xmm1 002B0939 mulpd xmm4,xmm4 002B093D addpd xmm0,xmm4 002B0941 movaps xmm4,xmm0 002B0944 movaps xmm5,xmmword ptr [esp+160h] 002B094C addpd xmm4,xmm5 002B0950 movaps xmm6,xmm4 002B0953 movsd xmm5,mmword ptr ds:[2B0090h] 002B095B movaps xmmword ptr [esp+0F0h],xmm0 002B0963 movaps xmm0,xmm5 002B0966 divsd xmm0,xmm6 002B096A unpckhpd xmm4,xmm4 002B096E movaps xmm6,xmm5 002B0971 divsd xmm6,xmm4 002B0975 unpcklpd xmm0,xmm6 002B0979 movapd xmm4,xmmword ptr ds:[2B00A0h] 002B0981 mulpd xmm0,xmm4 002B0985 movaps xmm4,xmmword ptr [esp+110h] 002B098D addpd xmm4,xmm1 002B0991 movaps xmm1,xmmword ptr [esp+0F0h] 002B0999 sqrtpd xmm6,xmm1 002B099D addpd xmm6,xmm4 002B09A1 addpd xmm0,xmm6 002B09A5 movaps xmm4,xmmword ptr [esp+410h] 002B09AD unpckhpd xmm4,xmm4 002B09B1 movaps xmm6,xmm3 002B09B4 addsd xmm6,xmm4 002B09B8 movsd xmm1,mmword ptr [esp+448h] 002B09C1 addsd xmm1,xmm6 002B09C5 movaps xmm6,xmm0 002B09C8 mulpd xmm6,xmm6 002B09CC movaps xmmword ptr [esp+0E0h],xmm0 002B09D4 movaps xmm0,xmmword ptr [esp+490h] 002B09DC mulpd xmm0,xmm0 002B09E0 addpd xmm0,xmm6 002B09E4 movaps xmm6,xmmword ptr [esp+490h] 002B09EC movaps xmmword ptr [esp+0D0h],xmm0 002B09F4 movaps xmm0,xmmword ptr [esp+0E0h] 002B09FC addpd xmm6,xmm0 002B0A00 movaps xmm0,xmmword ptr [esp+0D0h] 002B0A08 sqrtpd xmm0,xmm0 002B0A0C addpd xmm0,xmm6 002B0A10 addsd xmm3,xmm5 002B0A14 movaps xmm6,xmm4 002B0A17 addsd xmm6,xmm5 002B0A1B movaps xmmword ptr [esp+0C0h],xmm0 002B0A23 movsd xmm0,mmword ptr [esp+448h] 002B0A2C addsd xmm0,xmm5 002B0A30 ucomisd xmm1,xmm7 002B0A34 setnp cl 002B0A37 sete ch 002B0A3A test cl,ch 002B0A3C movaps xmm1,xmmword ptr [esp+0E0h] 002B0A44 movsd mmword ptr [esp+0B8h],xmm7 002B0A4D movsd mmword ptr [esp+0B0h],xmm0 002B0A56 movaps xmmword ptr [esp+0A0h],xmm2 002B0A5E movsd mmword ptr [esp+98h],xmm4 002B0A67 movsd mmword ptr [esp+90h],xmm6 002B0A70 movsd mmword ptr [esp+88h],xmm5 002B0A79 movsd mmword ptr [esp+80h],xmm3 002B0A82 movaps xmmword ptr [esp+70h],xmm1 002B0A87 jne 002B0A9A 002B0A8D movaps xmm0,xmmword ptr [esp+0C0h] 002B0A95 movaps xmmword ptr [esp+70h],xmm0 002B0A9A movaps xmm0,xmmword ptr [esp+70h] 002B0A9F movsd xmm1,mmword ptr [esp+80h] 002B0AA8 movsd xmm2,mmword ptr [esp+88h] 002B0AB1 ucomisd xmm1,xmm2 002B0AB5 movsd xmm3,mmword ptr [esp+0B8h] 002B0ABE movaps xmmword ptr [esp+60h],xmm0 002B0AC3 movsd mmword ptr [esp+58h],xmm3 002B0AC9 jae 002B0ADE 002B0ACF movsd xmm0,mmword ptr [esp+80h] 002B0AD8 movsd mmword ptr [esp+58h],xmm0 002B0ADE movsd xmm0,mmword ptr [esp+58h] 002B0AE4 movsd xmm1,mmword ptr [esp+90h] 002B0AED movsd mmword ptr [esp+50h],xmm0 002B0AF3 movsd mmword ptr [esp+48h],xmm1 002B0AF9 jae 002B0B0E 002B0AFF movsd xmm0,mmword ptr [esp+98h] 002B0B08 movsd mmword ptr [esp+48h],xmm0 002B0B0E movsd xmm0,mmword ptr [esp+48h] 002B0B14 movsd xmm1,mmword ptr [esp+88h] 002B0B1D ucomisd xmm0,xmm1 002B0B21 movsd xmm2,mmword ptr [esp+0B8h] 002B0B2A movsd mmword ptr [esp+40h],xmm0 002B0B30 movsd mmword ptr [esp+38h],xmm2 002B0B36 jae 002B0B48 002B0B3C movsd xmm0,mmword ptr [esp+40h] 002B0B42 movsd mmword ptr [esp+38h],xmm0 002B0B48 movsd xmm0,mmword ptr [esp+38h] 002B0B4E movaps xmm1,xmmword ptr [esp+410h] 002B0B56 movsd xmm2,mmword ptr [esp+50h] 002B0B5C movsd xmm1,xmm2 002B0B60 unpcklpd xmm1,xmm0 002B0B64 movsd xmm0,mmword ptr [esp+0B0h] 002B0B6D movaps xmmword ptr [esp+20h],xmm1 002B0B72 movsd mmword ptr [esp+18h],xmm0 002B0B78 jae 002B0B8D 002B0B7E movsd xmm0,mmword ptr [esp+448h] 002B0B87 movsd mmword ptr [esp+18h],xmm0 002B0B8D movsd xmm0,mmword ptr [esp+18h] 002B0B93 movsd xmm1,mmword ptr ds:[2B00B0h] 002B0B9B ucomisd xmm0,xmm1 002B0B9F movsd xmm1,mmword ptr [esp+0B8h] 002B0BA8 movsd mmword ptr [esp+10h],xmm0 002B0BAE movsd mmword ptr [esp+8],xmm1 002B0BB4 jae 002B0BC6 002B0BBA movsd xmm0,mmword ptr [esp+10h] 002B0BC0 movsd mmword ptr [esp+8],xmm0 002B0BC6 movsd xmm0,mmword ptr [esp+8] 002B0BCC movaps xmm1,xmmword ptr [esp+0A0h] 002B0BD4 movsd xmm1,xmm0 002B0BD8 movaps xmm0,xmmword ptr [esp+60h] 002B0BDD movaps xmm2,xmmword ptr [esp+20h] 002B0BE2 movaps xmmword ptr [esp+4B0h],xmm2 002B0BEA movaps xmmword ptr [esp+4C0h],xmm0 002B0BF2 movaps xmmword ptr [esp+4A0h],xmm1 002B0BFA jb 002B011D 002B0C00 jmp 002B0C05 002B0C05 movaps xmm0,xmmword ptr [esp+60h] 002B0C0A movaps xmm1,xmmword ptr [esp+60h] 002B0C0F movlpd qword ptr [esp+500h],xmm1 002B0C18 fld qword ptr [esp+500h] 002B0C1F movsd mmword ptr [esp],xmm0 002B0C24 mov esp,ebp 002B0C26 pop ebp 002B0C27 ret
Craig Topper
2013-Jul-19 06:59 UTC
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
The calls represent the MSVC _ftol2 function I think. On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com> wrote:> (Changing subject line as diagnosis has changed) > > I'm attaching the compiled code that I've been getting, both with > CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with > CodeGenOpt::None, but that seems to be because ECX isn't being used - it > still gets set to 0x7fffffff by one of the calls to 76719BA1 > > I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. I > was thinking an optimization might be removing it, but I don't get the > sqrtpd instruction even if the createJIT optimization level turned off. > > I am trying this with the Release 3.3 code - I'll try it with trunk and > see if I get a different result there. Maybe there was a recent commit for > this. > > -- > Peter N > > On 19/07/2013 4:00 PM, Craig Topper wrote: > > Hmm, I'm not able to get those .ll files to compile if I disable SSE and I > end up with SSE instructions(including sqrtpd) if I don't disable it. > > > On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > >> Is there something specifically required to enable SSE? If it's not >> detected as available (based from the target triple?) then I don't think we >> enable it specifically. >> >> Also it seems that it should handle converting to/from the vector types, >> although I can see it getting confused about needing to do that if it >> thinks SSE isn't available at all. >> >> >> On 19/07/2013 3:47 PM, Craig Topper wrote: >> >> Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? >> >> >> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com> wrote: >> >>> In the disassembly, I'm seeing three cases of >>> call 76719BA1 >>> >>> I am assuming this is the sqrt function as this is the only function >>> called in the LLVM IR. >>> >>> The code at 76719BA1 is: >>> >>> 76719BA1 push ebp >>> 76719BA2 mov ebp,esp >>> 76719BA4 sub esp,20h >>> 76719BA7 and esp,0FFFFFFF0h >>> 76719BAA fld st(0) >>> 76719BAC fst dword ptr [esp+18h] >>> 76719BB0 fistp qword ptr [esp+10h] >>> 76719BB4 fild qword ptr [esp+10h] >>> 76719BB8 mov edx,dword ptr [esp+18h] >>> 76719BBC mov eax,dword ptr [esp+10h] >>> 76719BC0 test eax,eax >>> 76719BC2 je 76719DCF >>> 76719BC8 fsubp st(1),st >>> 76719BCA test edx,edx >>> 76719BCC js 7671F9DB >>> 76719BD2 fstp dword ptr [esp] >>> 76719BD5 mov ecx,dword ptr [esp] >>> 76719BD8 add ecx,7FFFFFFFh >>> 76719BDE sbb eax,0 >>> 76719BE1 mov edx,dword ptr [esp+14h] >>> 76719BE5 sbb edx,0 >>> 76719BE8 leave >>> 76719BE9 ret >>> >>> >>> As you can see at 76719BD5, it modifies ECX . >>> >>> I don't know that this is the sqrtpd function (for example, I'm not >>> seeing any SSE instructions here?) but whatever it is, it's being called >>> from the IR I attached earlier, and is modifying ECX under some >>> circumstances. >>> >>> >>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>> >>> That should map directly to sqrtpd which can't modify ecx. >>> >>> >>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>> >>>> >>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>> >>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed >>>> with "llvm.x86". >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> After stepping through the produced assembly, I believe I have a >>>>> culprit. >>>>> >>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>>> ECX - while the produced code is expecting it to still contain its previous >>>>> value. >>>>> >>>>> Peter N >>>>> >>>>> >>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>> >>>>> I've attached the module->dump() that our code is producing. >>>>> Unfortunately this is the smallest test case I have available. >>>>> >>>>> This is before any optimization passes are applied. There are two >>>>> separate modules in existence at the time, and there are no guarantees >>>>> about the order the surrounding code calls those functions, so there may be >>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>> common memory etc. There is no multi-threading occurring. >>>>> >>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>> called with >>>>> - func_params 0x0018f3b0 double [3] >>>>> [0x0] -11.339976634695301 double >>>>> [0x1] -9.7504239056205506 double >>>>> [0x2] -5.2900856817382804 double >>>>> at the time of the exception. >>>>> >>>>> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >>>>> functions referred to in these modules are the standard equivalents from >>>>> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >>>>> ). >>>>> >>>>> Hopefully this is reproducible for you. >>>>> >>>>> -- >>>>> PeterN >>>>> >>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>> >>>>> Are you able to send any IR for others to reproduce this issue? >>>>> >>>>> >>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>> applied the fix to my source and it didn't make a difference. >>>>>> >>>>>> Also further testing found me getting the same behavior with other >>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>> >>>>>> Additionally, turning the optimization level passed to createJIT down >>>>>> appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>> optimization passes. >>>>>> >>>>>> I'm going to dig through the passes controlled by that parameter and >>>>>> see if I can narrow down which optimization is causing it. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>> >>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>> issue: >>>>>>> >>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>> >>>>>>> Do you happen to be using FastISel? >>>>>>> >>>>>>> Solomon >>>>>>> >>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> wrote: >>>>>>> >>>>>>> Hello all, >>>>>>>> >>>>>>>> I'm currently in the process of debugging a crash occurring in our >>>>>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>> be) related to the stacks state on calling the function. >>>>>>>> >>>>>>>> Our program acts as a front-end, using the LLVM C++ API to generate >>>>>>>> a JIT generated function. This function is primarily mathematical, so we >>>>>>>> use the Vector types to take advantage of SIMD instructions (as well as a >>>>>>>> few SSE2 intrinsics). >>>>>>>> >>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>> SSE fault raising mechanism appears. >>>>>>>> >>>>>>>> The generated instruction varies, but it seems to often be similar >>>>>>>> to (I don't have it in front of me, sorry): >>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>> memory access. >>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>> the error. >>>>>>>> >>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>> >>>>>>>> -- >>>>>>>> Peter N >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130718/d13fb831/attachment.html>
Oh, excellent point, I agree. My bad. Now that I'm not assuming those are the sqrt, I see the sqrtpd's in the output. Also there are three fptoui's and there are 3 call instances. (Changing subject line again.) Now it looks like it's bug #13862 On 19/07/2013 4:51 PM, Craig Topper wrote:> I think those calls correspond to this > > %110 = fptoui double %109 to i32 > > The calls are followed by an imul with 12 which matches up with what > occurs right after the fptoui in the IR. > > > On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Yes, that is the result of module-dump.ll > > > On 19/07/2013 4:46 PM, Craig Topper wrote: >> Does this correspond to one of the .ll files you sent earlier? >> >> >> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> (Changing subject line as diagnosis has changed) >> >> I'm attaching the compiled code that I've been getting, both >> with CodeGenOpt::Default and CodeGenOpt::None . The crash >> isn't occurring with CodeGenOpt::None, but that seems to be >> because ECX isn't being used - it still gets set to >> 0x7fffffff by one of the calls to 76719BA1 >> >> I notice that X86::SQRTPD[m|r] appear in >> X86InstrInfo::isHighLatencyDef. I was thinking an >> optimization might be removing it, but I don't get the sqrtpd >> instruction even if the createJIT optimization level turned off. >> >> I am trying this with the Release 3.3 code - I'll try it with >> trunk and see if I get a different result there. Maybe there >> was a recent commit for this. >> >> -- >> Peter N >> >> On 19/07/2013 4:00 PM, Craig Topper wrote: >>> Hmm, I'm not able to get those .ll files to compile if I >>> disable SSE and I end up with SSE instructions(including >>> sqrtpd) if I don't disable it. >>> >>> >>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> Is there something specifically required to enable SSE? >>> If it's not detected as available (based from the target >>> triple?) then I don't think we enable it specifically. >>> >>> Also it seems that it should handle converting to/from >>> the vector types, although I can see it getting confused >>> about needing to do that if it thinks SSE isn't >>> available at all. >>> >>> >>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>> Hmm, maybe sse isn't being enabled so its falling back >>>> to emulating sqrt? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> In the disassembly, I'm seeing three cases of >>>> call 76719BA1 >>>> >>>> I am assuming this is the sqrt function as this is >>>> the only function called in the LLVM IR. >>>> >>>> The code at 76719BA1 is: >>>> >>>> 76719BA1 push ebp >>>> 76719BA2 mov ebp,esp >>>> 76719BA4 sub esp,20h >>>> 76719BA7 and esp,0FFFFFFF0h >>>> 76719BAA fld st(0) >>>> 76719BAC fst dword ptr [esp+18h] >>>> 76719BB0 fistp qword ptr [esp+10h] >>>> 76719BB4 fild qword ptr [esp+10h] >>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>> 76719BBC mov eax,dword ptr [esp+10h] >>>> 76719BC0 test eax,eax >>>> 76719BC2 je 76719DCF >>>> 76719BC8 fsubp st(1),st >>>> 76719BCA test edx,edx >>>> 76719BCC js 7671F9DB >>>> 76719BD2 fstp dword ptr [esp] >>>> 76719BD5 mov ecx,dword ptr [esp] >>>> 76719BD8 add ecx,7FFFFFFFh >>>> 76719BDE sbb eax,0 >>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>> 76719BE5 sbb edx,0 >>>> 76719BE8 leave >>>> 76719BE9 ret >>>> >>>> >>>> As you can see at 76719BD5, it modifies ECX . >>>> >>>> I don't know that this is the sqrtpd function (for >>>> example, I'm not seeing any SSE instructions here?) >>>> but whatever it is, it's being called from the IR I >>>> attached earlier, and is modifying ECX under some >>>> circumstances. >>>> >>>> >>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>> That should map directly to sqrtpd which can't >>>>> modify ecx. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman >>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>> >>>>> Sorry, that should have been >>>>> llvm.x86.sse2.sqrt.pd >>>>> >>>>> >>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only >>>>>> familiar with things prefixed with "llvm.x86". >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter >>>>>> Newman <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> After stepping through the produced >>>>>> assembly, I believe I have a culprit. >>>>>> >>>>>> One of the calls to >>>>>> @frep.x86.sse2.sqrt.pd is modifying the >>>>>> value of ECX - while the produced code is >>>>>> expecting it to still contain its >>>>>> previous value. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>> I've attached the module->dump() that >>>>>>> our code is producing. Unfortunately >>>>>>> this is the smallest test case I have >>>>>>> available. >>>>>>> >>>>>>> This is before any optimization passes >>>>>>> are applied. There are two separate >>>>>>> modules in existence at the time, and >>>>>>> there are no guarantees about the order >>>>>>> the surrounding code calls those >>>>>>> functions, so there may be some >>>>>>> interaction between them? There >>>>>>> shouldn't be, they don't refer to any >>>>>>> common memory etc. There is no >>>>>>> multi-threading occurring. >>>>>>> >>>>>>> The function in module-dump.ll (called >>>>>>> crashfunc in this file) is called with >>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>> [0x0] -11.339976634695301 double >>>>>>> [0x1] -9.7504239056205506 double >>>>>>> [0x2] -5.2900856817382804 double >>>>>>> at the time of the exception. >>>>>>> >>>>>>> This is compiled on a "i686-pc-win32" >>>>>>> triple. All of the non-intrinsic >>>>>>> functions referred to in these modules >>>>>>> are the standard equivalents from the >>>>>>> MSVC library (e.g. @asin is the standard >>>>>>> C lib double asin( double ) ). >>>>>>> >>>>>>> Hopefully this is reproducible for you. >>>>>>> >>>>>>> -- >>>>>>> PeterN >>>>>>> >>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> Are you able to send any IR for others >>>>>>>> to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter >>>>>>>> Newman <peter at uformia.com >>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>> >>>>>>>> Unfortunately, this doesn't appear >>>>>>>> to be the bug I'm hitting. I >>>>>>>> applied the fix to my source and it >>>>>>>> didn't make a difference. >>>>>>>> >>>>>>>> Also further testing found me >>>>>>>> getting the same behavior with >>>>>>>> other SIMD instructions. The common >>>>>>>> factor is in each case, ECX is set >>>>>>>> to 0x7fffffff, and it's an >>>>>>>> operation using xmm ptr ecx+offset . >>>>>>>> >>>>>>>> Additionally, turning the >>>>>>>> optimization level passed to >>>>>>>> createJIT down appears to avoid it, >>>>>>>> so I'm now leaning towards a bug in >>>>>>>> one of the optimization passes. >>>>>>>> >>>>>>>> I'm going to dig through the passes >>>>>>>> controlled by that parameter and >>>>>>>> see if I can narrow down which >>>>>>>> optimization is causing it. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 17/07/2013 1:58 PM, Solomon >>>>>>>> Boulos wrote: >>>>>>>> >>>>>>>> As someone off list just told >>>>>>>> me, perhaps my new bug is the >>>>>>>> same issue: >>>>>>>> >>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>> >>>>>>>> Do you happen to be using FastISel? >>>>>>>> >>>>>>>> Solomon >>>>>>>> >>>>>>>> On Jul 16, 2013, at 6:39 PM, >>>>>>>> Peter Newman <peter at uformia.com >>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I'm currently in the >>>>>>>> process of debugging a >>>>>>>> crash occurring in our >>>>>>>> program. In LLVM 3.2 and >>>>>>>> 3.3 it appears that JIT >>>>>>>> generated code is >>>>>>>> attempting to perform >>>>>>>> access unaligned memory >>>>>>>> with a SSE2 instruction. >>>>>>>> However this only happens >>>>>>>> under certain conditions >>>>>>>> that seem (but may not be) >>>>>>>> related to the stacks state >>>>>>>> on calling the function. >>>>>>>> >>>>>>>> Our program acts as a >>>>>>>> front-end, using the LLVM >>>>>>>> C++ API to generate a JIT >>>>>>>> generated function. This >>>>>>>> function is primarily >>>>>>>> mathematical, so we use the >>>>>>>> Vector types to take >>>>>>>> advantage of SIMD >>>>>>>> instructions (as well as a >>>>>>>> few SSE2 intrinsics). >>>>>>>> >>>>>>>> This worked in LLVM 2.8 but >>>>>>>> started failing in 3.2 and >>>>>>>> has continued to fail in >>>>>>>> 3.3. It fails with no >>>>>>>> optimizations applied to >>>>>>>> the LLVM Function/Module. >>>>>>>> It crashes with what is >>>>>>>> reported as a memory access >>>>>>>> error (accessing >>>>>>>> 0xffffffff), however it's >>>>>>>> suggested that this is how >>>>>>>> the SSE fault raising >>>>>>>> mechanism appears. >>>>>>>> >>>>>>>> The generated instruction >>>>>>>> varies, but it seems to >>>>>>>> often be similar to (I >>>>>>>> don't have it in front of >>>>>>>> me, sorry): >>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>> Where the xmm register >>>>>>>> changes, and the second >>>>>>>> parameter is a memory access. >>>>>>>> ECX is always set to >>>>>>>> 0x7ffffff - however I don't >>>>>>>> know if this is part of the >>>>>>>> SSE error reporting process >>>>>>>> or is part of the situation >>>>>>>> causing the error. >>>>>>>> >>>>>>>> I haven't worked out >>>>>>>> exactly what code path etc >>>>>>>> is causing this crash. I'm >>>>>>>> hoping that someone can >>>>>>>> tell me if there were any >>>>>>>> changed requirements for >>>>>>>> working with SIMD in LLVM >>>>>>>> 3.2 (or earlier, we haven't >>>>>>>> tried 3.0 or 3.1). I >>>>>>>> currently suspect the use >>>>>>>> of GlobalVariable (we first >>>>>>>> discovered the crash when >>>>>>>> using a feature that uses >>>>>>>> them), however I have >>>>>>>> attempted using >>>>>>>> setAlignment on the >>>>>>>> GlobalVariables without any >>>>>>>> change. >>>>>>>> >>>>>>>> -- >>>>>>>> Peter N >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/83a8df56/attachment.html>
Try adding ECX to the Defs of this part of lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a Windows machine to test myself. let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), "# win32 fptoui", [(X86WinFTOL RFP32:$src)]>, Requires<[In32BitMode]>; def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), "# win32 fptoui", [(X86WinFTOL RFP64:$src)]>, Requires<[In32BitMode]>; } On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote:> Oh, excellent point, I agree. My bad. Now that I'm not assuming those > are the sqrt, I see the sqrtpd's in the output. Also there are three > fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: > > I think those calls correspond to this > > %110 = fptoui double %109 to i32 > > The calls are followed by an imul with 12 which matches up with what > occurs right after the fptoui in the IR. > > > On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com> wrote: > >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >> >> Does this correspond to one of the .ll files you sent earlier? >> >> >> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com> wrote: >> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, both with >>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. >>> I was thinking an optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>> see if I get a different result there. Maybe there was a recent commit for >>> this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>> >>> Hmm, I'm not able to get those .ll files to compile if I disable SSE and >>> I end up with SSE instructions(including sqrtpd) if I don't disable it. >>> >>> >>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> Is there something specifically required to enable SSE? If it's not >>>> detected as available (based from the target triple?) then I don't think we >>>> enable it specifically. >>>> >>>> Also it seems that it should handle converting to/from the vector >>>> types, although I can see it getting confused about needing to do that if >>>> it thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>> >>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>> sqrt? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as this is the only function >>>>> called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>> from the IR I attached earlier, and is modifying ECX under some >>>>> circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>> >>>>> That should map directly to sqrtpd which can't modify ecx. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>> >>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>> prefixed with "llvm.x86". >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>> culprit. >>>>>>> >>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>>>>> ECX - while the produced code is expecting it to still contain its previous >>>>>>> value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>> >>>>>>> I've attached the module->dump() that our code is producing. >>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>> >>>>>>> This is before any optimization passes are applied. There are two >>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>> >>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>> called with >>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>> [0x0] -11.339976634695301 double >>>>>>> [0x1] -9.7504239056205506 double >>>>>>> [0x2] -5.2900856817382804 double >>>>>>> at the time of the exception. >>>>>>> >>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>> double asin( double ) ). >>>>>>> >>>>>>> Hopefully this is reproducible for you. >>>>>>> >>>>>>> -- >>>>>>> PeterN >>>>>>> >>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>> >>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>> >>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>> >>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>> optimization passes. >>>>>>>> >>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>> >>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>> issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>> >>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>> >>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>> >>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>> memory access. >>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>> the error. >>>>>>>>>> >>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Peter N >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/328fd42f/attachment.html>
Reasonably Related Threads
- [LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX