thr3ads.net - similar to: "[X86] FMA transformation restrictions"

Displaying 20 results from an estimated 5000 matches similar to: "[X86] FMA transformation restrictions"

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Kay, My patch will partially address your bug. For now I'm just looking to switch the default FMA from vfmadd213xx to vfmadd231xx. That will cause the code in PR17229 to compile as desired, but would regress code like: double foo(double a, double b, double c) { return a * b + c; } Which will now require a vmovaps + vfmadd231. If this impacts real benchmarks we could add an

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 23

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Elena, Thank you very much for looking in to that. I'll go ahead and remove the isCommutable flag from those instructions, since it sounds like that's the right thing to do. I would still like to change the default from the 231 variant to 213 too, as this will reduce code-size for accumulator-style loops. I have at least one benchmark that shows significant speedups when this change

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Lang, Unfortunately, I don't have an answer on the commutability question, but I wanted to let you know that I filed a bug on this: http://llvm.org/bugs/show_bug.cgi?id=17229 This also shows a memory operand variant of the fma that you may want to consider in your patch and testcases. Thanks! On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com> wrote: > Hi all,

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi all, The 213 variant of the FMA3 instructions is currently marked commutable (see X86InstrFMA.td). Is that safe? According to the ISA the FMA3 instructions aren't commutable for non-numeric results, so I'd have thought commuting this would only be valid in fast-math mode? For the curious, the reason that I'm asking is that we currently always select the 213 variant, but this

[LLVMdev] Question about FMA formation

2012 Dec 12

[LLVMdev] Question about FMA formation

Hi, Dear All: I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than "a * b + c". I'm wondering if FMA could be less precise. In the former case, can we enable FMA formation despite restrictive FP mode? Thanks Shuxin

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

On Dec 12, 2012, at 3:40 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > Hi, Dear All: > > I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than > "a * b + c". I'm wondering if FMA could be less precise. In the former case, can we enable FMA > formation despite restrictive FP mode? > I believe

[LLVMdev] Clarifying FMA-related TargetOptions

2012 Feb 08

[LLVMdev] Clarifying FMA-related TargetOptions

Hi Owen, Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

Hi Michael, Shuxin, > Shuxin was showing some more complicated patterns that required > re-association to match (fast-math flags permitting). For those, we're > considering if having a re-associate-for-FMA functionality in > codegen-prepare would solve that problem. Thus, we can re-associate in > codegen-prepare and emit FMA in fast-isel. > Yep. I misread the association

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

Hi, Eli, Mike and Lang: Thank you all for the input. This is one e.g which might be difficult for isel: a*b + c*d + e => a*b + (c*d + e). Thanks Shuxin On 12/12/12 4:43 PM, Lang Hames wrote: > A little background: > > The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma > in C. llvm.fmuladd.* is generated by clang when it sees an expression > of the

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

On Dec 12, 2012, at 5:20 PM, Eric Christopher <echristo at gmail.com> wrote: > Why not just form them via a fast IR level pass and just have patterns match in fast isel instead of trying to form code? Or are we saying the same thing? (Your words of "fast isel spot"ting and "form better code" caused me to think you mean to do optimizations within the fast isel pass).

[LLVMdev] Clarifying FMA-related TargetOptions

2012 Feb 08

[LLVMdev] Clarifying FMA-related TargetOptions

On Feb 8, 2012, at 10:42 AM, Hal Finkel wrote: > In my experience, users of numerical codes expect that the compiler will > use FMA instructions where it can, unless specifically asked to avoid > doing so by the user. Even though this can sometimes produce a different > result (*almost* always a better one), the performance gain is too large > to be ignored by default. I highly

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

2014 Dec 10

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

Thanks! That’s probably close enough for practical purposes. I looked at the overrides on various targets, and they all return true if the FMA hardware exists. - Arch From: Jingyue Wu [mailto:jingyue at google.com] Sent: Wednesday, December 10, 2014 2:56 PM To: Robison, Arch Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Best way for JIT to query whether llvm.fma.* is fast? Does

FMA canonicalization in IR

2016 Nov 19

FMA canonicalization in IR

Sent from my Verizon Wireless 4G LTE DROID On Nov 19, 2016 10:26 AM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: > > If I have my FMA intrinsics story straight now (thanks for the explanation, Hal!), I think it raises another question about IR canonicalization (and may affect the proposed revision to IR FMF): No, I think that we specifically

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

A little background: The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form 'a * b + c' within a single source statement. If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from a*b+c expressions at isel time. I

[LLVMdev] Clarifying FMA-related TargetOptions

2012 Feb 08

[LLVMdev] Clarifying FMA-related TargetOptions

On Feb 8, 2012, at 10:44 AM, James Molloy wrote: > Hi Owen, > > Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... > > I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If

[LLVMdev] Clarifying FMA-related TargetOptions

2012 Feb 08

[LLVMdev] Clarifying FMA-related TargetOptions

On Wed, 2012-02-08 at 10:11 -0800, Owen Anderson wrote: > Hello everyone, > > > I'd like to propose the attached patch to form FMA intrinsics > aggressively, but in order to do so I need some clarification on the > intended semantics for the various FP precision-related > TargetOptions. I've summarized the three relevant ones below: > > > UnsafeFPMath -

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

2014 Dec 10

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

For the Julia language JIT, we'd like be able to tell whether the llvm.fma.* intrinsic has hardware support. What's the best way to query LLVM (JIT) for this information? The information would be used in situations where the user wants to use different algorithms depending on whether FMA hardware is present or not. - Arch D. Robison -------------- next part -------------- An HTML

[LLVMdev] Clarifying FMA-related TargetOptions

2012 Feb 08

[LLVMdev] Clarifying FMA-related TargetOptions

Hello everyone, I'd like to propose the attached patch to form FMA intrinsics aggressively, but in order to do so I need some clarification on the intended semantics for the various FP precision-related TargetOptions. I've summarized the three relevant ones below: UnsafeFPMath - Defaults to off, enables "less precise" results than permitted by IEEE754. Comments specifically

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote: > > It appears you need 'reassoc' on fmul/fadd: > https://godbolt.org/z/nuTzx2 Thanks very much, that was it. Either that or providing -enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since using FMAs here instead of mul/add appears to be safer (the reverse is unsafe). ~ Uday

[LLVMdev] Question about FMA formation

2012 Dec 13

[LLVMdev] Question about FMA formation

On Dec 12, 2012, at 4:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > Hi, Eli, Mike and Lang: > > Thank you all for the input. This is one e.g which might be difficult for isel: > a*b + c*d + e => a*b + (c*d + e). > You hit send right when I did! For your example, do you mean that it's grouped like: (fadd (fadd (fmul a b) (fmul c d)) e) How would your

similar to: [X86] FMA transformation restrictions