Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86"
2013 Aug 28
0
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
On Wed, Aug 28, 2013 at 2:16 AM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
> I found that there is no diff in code generator for Ordered / Unordered
> FP compare instructions.
> FUCOMISS, FUCOMISD are generated in the both cases.
>
>
> Yes. That's how fcmp is defined in LangRef.
-Eli
-------------- next part --------------
An HTML attachment was
2013 Aug 29
2
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
Should I open a ticket for this?
- Elena
From: Eli Friedman [mailto:eli.friedman at gmail.com]
Sent: Wednesday, August 28, 2013 19:51
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
On Wed, Aug 28, 2013 at 2:16 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at
2013 Aug 29
1
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
On 29 August 2013 10:12, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> But this is another case. LLVM IR distinguishes between ordered and unordered compare and X86 backend has appropriate instructions.
I think LLVM uses ordered/unordered compare to mean something
different to what the x86 instructions do. For example, "not equal":
fcmp une == unordered not
2013 Aug 29
2
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
On 29 Aug 2013, at 08:19, Tim Northover <t.p.northover at gmail.com> wrote:
> If so, a compare that used that instruction would have to become more
> like an "invoke" with a landingpad for the exception and so on,
> wouldn't it? The current fcmp can already distinguish between ordered
> and unordered, because ucomiss provides that information.
There are currently
2013 Aug 29
0
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
But this is another case. LLVM IR distinguishes between ordered and unordered compare and X86 backend has appropriate instructions.
But during DAG selection we just lose this information and always generate unordered fcmp.
I.e. in case of ordered fcmp the vcomiss should be generated, and in case of unordered - vucomiss.
- Elena
-----Original Message-----
From: Dr D. Chisnall [mailto:dc552 at
2013 Aug 29
0
[LLVMdev] Ordered / Unordered FP compare are not handled properly on X86
On 29 August 2013 06:31, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> Should I open a ticket for this?
I think he was saying this is intended behaviour. Isn't the difference
between ucomiss and comiss just whether an exception is raised for
NaN?
If so, a compare that used that instruction would have to become more
like an "invoke" with a landingpad for the
2016 May 19
3
Working on FP SCEV Analysis
> One option would be to extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support as Sanjoy suggested. I *think* that that could be sufficient to handle case B.
I implemented this with FP SCEV and the code looks very structured, including SCEVExpander. Extending the existing structures without implementing FP SCEV will be problematic.
And
2016 May 20
0
Working on FP SCEV Analysis
Hi Hideki,
I like this summary overall, thanks. More below.
> On May 20, 2016, at 10:04 AM, Saito, Hideki <hideki.saito at intel.com> wrote:
>
>
> To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken,
> people aren’t opposed to that.
>
> For me, handling case A (primary induction) is “why not?”, but I certainly
2016 May 16
6
Working on FP SCEV Analysis
[+CC Andy]
Hi Elena,
I don't have any fundamental issues with teaching SCEV about floating
point types, but given this will be a major change, I think a high
level roadmap should be discussed on llvm-dev before we start
reviewing and committing changes.
Here are some issues that I think are worth discussing:
- Core motivation: why do we even care about optimizing floating
point
2016 May 20
5
Working on FP SCEV Analysis
To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken,
people aren’t opposed to that.
For me, handling case A (primary induction) is “why not?”, but I certainly admit that that can be very naïve
thinking coming from lack of good understanding on SCEV and their proper usages. Now, let’s assume we
can postpone discussion about case A. What is the
2016 May 24
1
Working on FP SCEV Analysis
Adding support for FP inductions through isInductionPHI() is certainly
possible, I have a relatively small local patch that does exactly that
for simple fp add-recurrence cases, along with changes to the
vectorizer to make it aware of FP inductions. It won't get give you
the powerful reasoning capabilities of SCEV, but for the B-like cases
it should work.
Amara
On 20 May 2016 at 19:31, Adam
2016 May 18
4
Working on FP SCEV Analysis
Demikhovsky, Elena wrote:
> > Even then, I'd personally want to see further evidence of why the
> correct solution is to model the floating point IV in SCEV rather than
> find a more powerful way of converting the IV to an integer that models
> > the non-integer values taken on by the IV. As an example, if the use
> case is the following code with appropriate flags to
2016 May 18
4
Working on FP SCEV Analysis
On Tue, May 17, 2016 at 8:49 PM Owen Anderson <resistor at mac.com> wrote:
>
> On May 16, 2016, at 2:42 PM, Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> - Core motivation: why do we even care about optimizing floating
> point induction variables? What situations are they common in? Do
> programmers _expect_ compilers to optimize them
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all
of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
not generate it for any of those machines.
Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() &&
!hasAVX512()". It could break for some hypothetical future processor that
manages to
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented
gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to
enable gather for AVX/2? Thanks.
Best,
Zhi
On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> I don't think gather has been enabled for AVX2 as of r261875.
> Masked load/store were enabled for AVX with:
>
2016 Feb 26
0
how to force llvm generate gather intrinsic
That makes great sense. It would be great if we have profitability mode to
see the necessity to use gathers. Or it also would be good if there is a
compiler option for the users to enable LLVM to generate the gather
instructions no matter it is faster or slow.
Best,
Zhi
On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> If I'm understanding
2016 Feb 25
2
how to force llvm generate gather intrinsic
Yes, masked load/store/gather/scatter are completed.
- Elena
From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Thursday, February 25, 2016 01:20
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to
2014 Oct 27
4
[LLVMdev] Adding masked vector load and store intrinsics
we just follow a common recommendation to start with intrinsics:
http://llvm.org/docs/ExtendingLLVM.html
- Elena
From: Owen Anderson [mailto:resistor at mac.com]
Sent: Sunday, October 26, 2014 23:57
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu; dag at cray.com
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
What is the motivation for using intrinsics
2014 Oct 28
2
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt.
Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs?
A new instruction like
%a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask
is possible, but may be not very useful for most of targets.
So we start from intrinsics.
-
2016 Feb 26
0
how to force llvm generate gather intrinsic
No. Gather operation is slow on AVX2 processors.
- Elena
From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Thursday, February 25, 2016 20:48
To: Sanjay Patel <spatel at rotateright.com>
Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to force