Try adding ECX to the Defs of this part of
lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a
Windows machine to test myself.
let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in {
def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src),
"# win32 fptoui",
[(X86WinFTOL RFP32:$src)]>,
Requires<[In32BitMode]>;
def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src),
"# win32 fptoui",
[(X86WinFTOL RFP64:$src)]>,
Requires<[In32BitMode]>;
}
On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com>
wrote:
> Oh, excellent point, I agree. My bad. Now that I'm not assuming those
> are the sqrt, I see the sqrtpd's in the output. Also there are three
> fptoui's and there are 3 call instances.
>
> (Changing subject line again.)
>
> Now it looks like it's bug #13862
>
> On 19/07/2013 4:51 PM, Craig Topper wrote:
>
> I think those calls correspond to this
>
> %110 = fptoui double %109 to i32
>
> The calls are followed by an imul with 12 which matches up with what
> occurs right after the fptoui in the IR.
>
>
> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com>
wrote:
>
>> Yes, that is the result of module-dump.ll
>>
>>
>> On 19/07/2013 4:46 PM, Craig Topper wrote:
>>
>> Does this correspond to one of the .ll files you sent earlier?
>>
>>
>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at
uformia.com> wrote:
>>
>>> (Changing subject line as diagnosis has changed)
>>>
>>> I'm attaching the compiled code that I've been getting,
both with
>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't
occurring with
>>> CodeGenOpt::None, but that seems to be because ECX isn't being
used - it
>>> still gets set to 0x7fffffff by one of the calls to 76719BA1
>>>
>>> I notice that X86::SQRTPD[m|r] appear in
X86InstrInfo::isHighLatencyDef.
>>> I was thinking an optimization might be removing it, but I
don't get the
>>> sqrtpd instruction even if the createJIT optimization level turned
off.
>>>
>>> I am trying this with the Release 3.3 code - I'll try it with
trunk and
>>> see if I get a different result there. Maybe there was a recent
commit for
>>> this.
>>>
>>> --
>>> Peter N
>>>
>>> On 19/07/2013 4:00 PM, Craig Topper wrote:
>>>
>>> Hmm, I'm not able to get those .ll files to compile if I
disable SSE and
>>> I end up with SSE instructions(including sqrtpd) if I don't
disable it.
>>>
>>>
>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at
uformia.com>wrote:
>>>
>>>> Is there something specifically required to enable SSE? If
it's not
>>>> detected as available (based from the target triple?) then I
don't think we
>>>> enable it specifically.
>>>>
>>>> Also it seems that it should handle converting to/from the
vector
>>>> types, although I can see it getting confused about needing to
do that if
>>>> it thinks SSE isn't available at all.
>>>>
>>>>
>>>> On 19/07/2013 3:47 PM, Craig Topper wrote:
>>>>
>>>> Hmm, maybe sse isn't being enabled so its falling back to
emulating
>>>> sqrt?
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at
uformia.com>wrote:
>>>>
>>>>> In the disassembly, I'm seeing three cases of
>>>>> call 76719BA1
>>>>>
>>>>> I am assuming this is the sqrt function as this is the only
function
>>>>> called in the LLVM IR.
>>>>>
>>>>> The code at 76719BA1 is:
>>>>>
>>>>> 76719BA1 push ebp
>>>>> 76719BA2 mov ebp,esp
>>>>> 76719BA4 sub esp,20h
>>>>> 76719BA7 and esp,0FFFFFFF0h
>>>>> 76719BAA fld st(0)
>>>>> 76719BAC fst dword ptr [esp+18h]
>>>>> 76719BB0 fistp qword ptr [esp+10h]
>>>>> 76719BB4 fild qword ptr [esp+10h]
>>>>> 76719BB8 mov edx,dword ptr [esp+18h]
>>>>> 76719BBC mov eax,dword ptr [esp+10h]
>>>>> 76719BC0 test eax,eax
>>>>> 76719BC2 je 76719DCF
>>>>> 76719BC8 fsubp st(1),st
>>>>> 76719BCA test edx,edx
>>>>> 76719BCC js 7671F9DB
>>>>> 76719BD2 fstp dword ptr [esp]
>>>>> 76719BD5 mov ecx,dword ptr [esp]
>>>>> 76719BD8 add ecx,7FFFFFFFh
>>>>> 76719BDE sbb eax,0
>>>>> 76719BE1 mov edx,dword ptr [esp+14h]
>>>>> 76719BE5 sbb edx,0
>>>>> 76719BE8 leave
>>>>> 76719BE9 ret
>>>>>
>>>>>
>>>>> As you can see at 76719BD5, it modifies ECX .
>>>>>
>>>>> I don't know that this is the sqrtpd function (for
example, I'm not
>>>>> seeing any SSE instructions here?) but whatever it is,
it's being called
>>>>> from the IR I attached earlier, and is modifying ECX under
some
>>>>> circumstances.
>>>>>
>>>>>
>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote:
>>>>>
>>>>> That should map directly to sqrtpd which can't modify
ecx.
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at
uformia.com>wrote:
>>>>>
>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd
>>>>>>
>>>>>>
>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote:
>>>>>>
>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only
familiar with things
>>>>>> prefixed with "llvm.x86".
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman
<peter at uformia.com>wrote:
>>>>>>
>>>>>>> After stepping through the produced assembly, I
believe I have a
>>>>>>> culprit.
>>>>>>>
>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is
modifying the value of
>>>>>>> ECX - while the produced code is expecting it to
still contain its previous
>>>>>>> value.
>>>>>>>
>>>>>>> Peter N
>>>>>>>
>>>>>>>
>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote:
>>>>>>>
>>>>>>> I've attached the module->dump() that our
code is producing.
>>>>>>> Unfortunately this is the smallest test case I have
available.
>>>>>>>
>>>>>>> This is before any optimization passes are applied.
There are two
>>>>>>> separate modules in existence at the time, and
there are no guarantees
>>>>>>> about the order the surrounding code calls those
functions, so there may be
>>>>>>> some interaction between them? There shouldn't
be, they don't refer to any
>>>>>>> common memory etc. There is no multi-threading
occurring.
>>>>>>>
>>>>>>> The function in module-dump.ll (called crashfunc in
this file) is
>>>>>>> called with
>>>>>>> - func_params 0x0018f3b0 double [3]
>>>>>>> [0x0] -11.339976634695301 double
>>>>>>> [0x1] -9.7504239056205506 double
>>>>>>> [0x2] -5.2900856817382804 double
>>>>>>> at the time of the exception.
>>>>>>>
>>>>>>> This is compiled on a "i686-pc-win32"
triple. All of the
>>>>>>> non-intrinsic functions referred to in these
modules are the standard
>>>>>>> equivalents from the MSVC library (e.g. @asin is
the standard C lib
>>>>>>> double asin( double ) ).
>>>>>>>
>>>>>>> Hopefully this is reproducible for you.
>>>>>>>
>>>>>>> --
>>>>>>> PeterN
>>>>>>>
>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote:
>>>>>>>
>>>>>>> Are you able to send any IR for others to reproduce
this issue?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman
<peter at uformia.com>wrote:
>>>>>>>
>>>>>>>> Unfortunately, this doesn't appear to be
the bug I'm hitting. I
>>>>>>>> applied the fix to my source and it didn't
make a difference.
>>>>>>>>
>>>>>>>> Also further testing found me getting the same
behavior with other
>>>>>>>> SIMD instructions. The common factor is in each
case, ECX is set to
>>>>>>>> 0x7fffffff, and it's an operation using xmm
ptr ecx+offset .
>>>>>>>>
>>>>>>>> Additionally, turning the optimization level
passed to createJIT
>>>>>>>> down appears to avoid it, so I'm now
leaning towards a bug in one of the
>>>>>>>> optimization passes.
>>>>>>>>
>>>>>>>> I'm going to dig through the passes
controlled by that parameter
>>>>>>>> and see if I can narrow down which optimization
is causing it.
>>>>>>>>
>>>>>>>> Peter N
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote:
>>>>>>>>
>>>>>>>>> As someone off list just told me, perhaps
my new bug is the same
>>>>>>>>> issue:
>>>>>>>>>
>>>>>>>>>
http://llvm.org/bugs/show_bug.cgi?id=16640
>>>>>>>>>
>>>>>>>>> Do you happen to be using FastISel?
>>>>>>>>>
>>>>>>>>> Solomon
>>>>>>>>>
>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman
<peter at uformia.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>>
>>>>>>>>>> I'm currently in the process of
debugging a crash occurring in
>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it
appears that JIT generated code is
>>>>>>>>>> attempting to perform access unaligned
memory with a SSE2 instruction.
>>>>>>>>>> However this only happens under certain
conditions that seem (but may not
>>>>>>>>>> be) related to the stacks state on
calling the function.
>>>>>>>>>>
>>>>>>>>>> Our program acts as a front-end, using
the LLVM C++ API to
>>>>>>>>>> generate a JIT generated function. This
function is primarily mathematical,
>>>>>>>>>> so we use the Vector types to take
advantage of SIMD instructions (as well
>>>>>>>>>> as a few SSE2 intrinsics).
>>>>>>>>>>
>>>>>>>>>> This worked in LLVM 2.8 but started
failing in 3.2 and has
>>>>>>>>>> continued to fail in 3.3. It fails with
no optimizations applied to the
>>>>>>>>>> LLVM Function/Module. It crashes with
what is reported as a memory access
>>>>>>>>>> error (accessing 0xffffffff), however
it's suggested that this is how the
>>>>>>>>>> SSE fault raising mechanism appears.
>>>>>>>>>>
>>>>>>>>>> The generated instruction varies, but
it seems to often be
>>>>>>>>>> similar to (I don't have it in
front of me, sorry):
>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????]
>>>>>>>>>> Where the xmm register changes, and the
second parameter is a
>>>>>>>>>> memory access.
>>>>>>>>>> ECX is always set to 0x7ffffff -
however I don't know if this is
>>>>>>>>>> part of the SSE error reporting process
or is part of the situation causing
>>>>>>>>>> the error.
>>>>>>>>>>
>>>>>>>>>> I haven't worked out exactly what
code path etc is causing this
>>>>>>>>>> crash. I'm hoping that someone can
tell me if there were any changed
>>>>>>>>>> requirements for working with SIMD in
LLVM 3.2 (or earlier, we haven't
>>>>>>>>>> tried 3.0 or 3.1). I currently suspect
the use of GlobalVariable (we first
>>>>>>>>>> discovered the crash when using a
feature that uses them), however I have
>>>>>>>>>> attempted using setAlignment on the
GlobalVariables without any change.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Peter N
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> LLVMdev at cs.uiuc.edu
http://llvm.cs.uiuc.edu
>>>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> LLVMdev at cs.uiuc.edu
http://llvm.cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~Craig
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ~Craig
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ~Craig
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~Craig
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ~Craig
>>>
>>>
>>>
>>
>>
>> --
>> ~Craig
>>
>>
>>
>
>
> --
> ~Craig
>
>
>
--
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/328fd42f/attachment.html>
Thank you, I'm trying this now. On 19/07/2013 5:23 PM, Craig Topper wrote:> Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have > a Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Oh, excellent point, I agree. My bad. Now that I'm not assuming > those are the sqrt, I see the sqrtpd's in the output. Also there > are three fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with >> what occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, >>> both with CodeGenOpt::Default and CodeGenOpt::None . The >>> crash isn't occurring with CodeGenOpt::None, but that >>> seems to be because ECX isn't being used - it still gets >>> set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in >>> X86InstrInfo::isHighLatencyDef. I was thinking an >>> optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization >>> level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it >>> with trunk and see if I get a different result there. >>> Maybe there was a recent commit for this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> Hmm, I'm not able to get those .ll files to compile if >>>> I disable SSE and I end up with SSE >>>> instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>> >>>> Is there something specifically required to enable >>>> SSE? If it's not detected as available (based from >>>> the target triple?) then I don't think we enable it >>>> specifically. >>>> >>>> Also it seems that it should handle converting >>>> to/from the vector types, although I can see it >>>> getting confused about needing to do that if it >>>> thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> Hmm, maybe sse isn't being enabled so its falling >>>>> back to emulating sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman >>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote: >>>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as >>>>> this is the only function called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function >>>>> (for example, I'm not seeing any SSE >>>>> instructions here?) but whatever it is, it's >>>>> being called from the IR I attached earlier, >>>>> and is modifying ECX under some circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> That should map directly to sqrtpd which >>>>>> can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter >>>>>> Newman <peter at uformia.com >>>>>> <mailto:peter at uformia.com>> wrote: >>>>>> >>>>>> Sorry, that should have been >>>>>> llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm >>>>>>> only familiar with things prefixed with >>>>>>> "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter >>>>>>> Newman <peter at uformia.com >>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>> >>>>>>> After stepping through the produced >>>>>>> assembly, I believe I have a culprit. >>>>>>> >>>>>>> One of the calls to >>>>>>> @frep.x86.sse2.sqrt.pd is modifying >>>>>>> the value of ECX - while the >>>>>>> produced code is expecting it to >>>>>>> still contain its previous value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman >>>>>>> wrote: >>>>>>>> I've attached the module->dump() >>>>>>>> that our code is producing. >>>>>>>> Unfortunately this is the smallest >>>>>>>> test case I have available. >>>>>>>> >>>>>>>> This is before any optimization >>>>>>>> passes are applied. There are two >>>>>>>> separate modules in existence at >>>>>>>> the time, and there are no >>>>>>>> guarantees about the order the >>>>>>>> surrounding code calls those >>>>>>>> functions, so there may be some >>>>>>>> interaction between them? There >>>>>>>> shouldn't be, they don't refer to >>>>>>>> any common memory etc. There is no >>>>>>>> multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll >>>>>>>> (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a >>>>>>>> "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to >>>>>>>> in these modules are the standard >>>>>>>> equivalents from the MSVC library >>>>>>>> (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper >>>>>>>> wrote: >>>>>>>>> Are you able to send any IR for >>>>>>>>> others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, >>>>>>>>> Peter Newman <peter at uformia.com >>>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>>> >>>>>>>>> Unfortunately, this doesn't >>>>>>>>> appear to be the bug I'm >>>>>>>>> hitting. I applied the fix to >>>>>>>>> my source and it didn't make a >>>>>>>>> difference. >>>>>>>>> >>>>>>>>> Also further testing found me >>>>>>>>> getting the same behavior with >>>>>>>>> other SIMD instructions. The >>>>>>>>> common factor is in each case, >>>>>>>>> ECX is set to 0x7fffffff, and >>>>>>>>> it's an operation using xmm >>>>>>>>> ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the >>>>>>>>> optimization level passed to >>>>>>>>> createJIT down appears to >>>>>>>>> avoid it, so I'm now leaning >>>>>>>>> towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the >>>>>>>>> passes controlled by that >>>>>>>>> parameter and see if I can >>>>>>>>> narrow down which optimization >>>>>>>>> is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon >>>>>>>>> Boulos wrote: >>>>>>>>> >>>>>>>>> As someone off list just >>>>>>>>> told me, perhaps my new >>>>>>>>> bug is the same issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using >>>>>>>>> FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 >>>>>>>>> PM, Peter Newman >>>>>>>>> <peter at uformia.com >>>>>>>>> <mailto:peter at uformia.com>> wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I'm currently in the >>>>>>>>> process of debugging a >>>>>>>>> crash occurring in our >>>>>>>>> program. In LLVM 3.2 >>>>>>>>> and 3.3 it appears >>>>>>>>> that JIT generated >>>>>>>>> code is attempting to >>>>>>>>> perform access >>>>>>>>> unaligned memory with >>>>>>>>> a SSE2 instruction. >>>>>>>>> However this only >>>>>>>>> happens under certain >>>>>>>>> conditions that seem >>>>>>>>> (but may not be) >>>>>>>>> related to the stacks >>>>>>>>> state on calling the >>>>>>>>> function. >>>>>>>>> >>>>>>>>> Our program acts as a >>>>>>>>> front-end, using the >>>>>>>>> LLVM C++ API to >>>>>>>>> generate a JIT >>>>>>>>> generated function. >>>>>>>>> This function is >>>>>>>>> primarily >>>>>>>>> mathematical, so we >>>>>>>>> use the Vector types >>>>>>>>> to take advantage of >>>>>>>>> SIMD instructions (as >>>>>>>>> well as a few SSE2 >>>>>>>>> intrinsics). >>>>>>>>> >>>>>>>>> This worked in LLVM >>>>>>>>> 2.8 but started >>>>>>>>> failing in 3.2 and has >>>>>>>>> continued to fail in >>>>>>>>> 3.3. It fails with no >>>>>>>>> optimizations applied >>>>>>>>> to the LLVM >>>>>>>>> Function/Module. It >>>>>>>>> crashes with what is >>>>>>>>> reported as a memory >>>>>>>>> access error >>>>>>>>> (accessing >>>>>>>>> 0xffffffff), however >>>>>>>>> it's suggested that >>>>>>>>> this is how the SSE >>>>>>>>> fault raising >>>>>>>>> mechanism appears. >>>>>>>>> >>>>>>>>> The generated >>>>>>>>> instruction varies, >>>>>>>>> but it seems to often >>>>>>>>> be similar to (I don't >>>>>>>>> have it in front of >>>>>>>>> me, sorry): >>>>>>>>> movapd xmm0, >>>>>>>>> xmm[ecx+0x???????] >>>>>>>>> Where the xmm register >>>>>>>>> changes, and the >>>>>>>>> second parameter is a >>>>>>>>> memory access. >>>>>>>>> ECX is always set to >>>>>>>>> 0x7ffffff - however I >>>>>>>>> don't know if this is >>>>>>>>> part of the SSE error >>>>>>>>> reporting process or >>>>>>>>> is part of the >>>>>>>>> situation causing the >>>>>>>>> error. >>>>>>>>> >>>>>>>>> I haven't worked out >>>>>>>>> exactly what code path >>>>>>>>> etc is causing this >>>>>>>>> crash. I'm hoping that >>>>>>>>> someone can tell me if >>>>>>>>> there were any changed >>>>>>>>> requirements for >>>>>>>>> working with SIMD in >>>>>>>>> LLVM 3.2 (or earlier, >>>>>>>>> we haven't tried 3.0 >>>>>>>>> or 3.1). I currently >>>>>>>>> suspect the use of >>>>>>>>> GlobalVariable (we >>>>>>>>> first discovered the >>>>>>>>> crash when using a >>>>>>>>> feature that uses >>>>>>>>> them), however I have >>>>>>>>> attempted using >>>>>>>>> setAlignment on the >>>>>>>>> GlobalVariables >>>>>>>>> without any change. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Peter N >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers >>>>>>>>> mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/65cb4500/attachment.html>
I don't think that's going to work. On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote:> Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: > > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a > Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman <peter at uformia.com> wrote: > >> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >> are the sqrt, I see the sqrtpd's in the output. Also there are three >> fptoui's and there are 3 call instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >> >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with what >> occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman <peter at uformia.com> wrote: >> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman <peter at uformia.com>wrote: >>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been getting, both with >>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>>> see if I get a different result there. Maybe there was a recent commit for >>>> this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> >>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com>wrote: >>>> >>>>> Is there something specifically required to enable SSE? If it's not >>>>> detected as available (based from the target triple?) then I don't think we >>>>> enable it specifically. >>>>> >>>>> Also it seems that it should handle converting to/from the vector >>>>> types, although I can see it getting confused about needing to do that if >>>>> it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> >>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>> sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman <peter at uformia.com>wrote: >>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function as this is the only function >>>>>> called in the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>> circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> >>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman <peter at uformia.com>wrote: >>>>>> >>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>> prefixed with "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>> >>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>> culprit. >>>>>>>> >>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>> previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>> >>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>> >>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>wrote: >>>>>>>> >>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>> >>>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>> >>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>> issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>> >>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>> >>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>> memory access. >>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>>> the error. >>>>>>>>>>> >>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/a2282c12/attachment.html>
Apparently Analagous Threads
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX
- [LLVMdev] fptoui calling a function that modifies ECX