search for: fma

Displaying 20 results from an estimated 395 matches for "fma".

Did you mean: dma
2013 Jan 11
3
[LLVMdev] Documentation of fmuladd intrinsic
On Fri, Jan 11, 2013 at 1:08 PM, Andrew Booker <andrew.booker at arm.com>wrote: > The fmuladd intrinsic is described as saying that a multiply and > addition sequence can be fused into an fma instruction "if the code > generator determines that the fused expression would be legal and > efficient". (http://llvm.org/docs/LangRef.html#llvm-fma-intrinsic) > > I've spent a bit of time puzzling over how a code generator is supposed > to know if it's legal to g...
2012 Dec 12
3
[LLVMdev] Question about FMA formation
Hi, Dear All: I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than "a * b + c". I'm wondering if FMA could be less precise. In the former case, can we enable FMA formation despite restrictive FP mode? Thanks Shuxin
2013 Jan 11
0
[LLVMdev] Documentation of fmuladd intrinsic
...ubject: Re: [LLVMdev] Documentation of fmuladd intrinsic > > > On Fri, Jan 11, 2013 at 1:08 PM, Andrew Booker < > andrew.booker at arm.com > wrote: > > > > The fmuladd intrinsic is described as saying that a multiply and > addition sequence can be fused into an fma instruction "if the code > generator determines that the fused expression would be legal and > efficient". ( http://llvm.org/docs/LangRef.html#llvm-fma-intrinsic ) > > I've spent a bit of time puzzling over how a code generator is > supposed > to know if it's le...
2016 Sep 12
4
[X86] FMA transformation restrictions
I noticed that the operand commuting code in X86InstrInfo.cpp treats scalar FMA intrinsics specially. It prevents operand commuting on these scalar instructions because the scalar FMA instructions preserve the upper bits of the vector. Presumably, the restrictions are there because commuting operands potentially changes the result upper bits. However, AFAIK the Intel and GN...
2012 Dec 13
3
[LLVMdev] Question about FMA formation
A little background: The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form 'a * b + c' within a single source statement. If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from a*b+c expressions at isel time. I don't see any fundamental problem with forming llvm.fmuladd.* to model FMA formation opportunities in an IR pass though. - Lang. On Wed, Dec 12, 2012 at 4:11 PM, Micha...
2013 Jan 11
3
[LLVMdev] Documentation of fmuladd intrinsic
Out of curiosity, what is the use-case for isFMAFasterThanMulAndAdd? If a target declares that FMA is actually slower for a given type, why not just declare it as illegal for that type? Wouldn't that accomplish the same thing without another target hook? I feel like I'm missing something here. On Fri, Jan 11, 2013 at 2:40 PM, Hal Fin...
2012 Feb 08
6
[LLVMdev] Clarifying FMA-related TargetOptions
Hello everyone, I'd like to propose the attached patch to form FMA intrinsics aggressively, but in order to do so I need some clarification on the intended semantics for the various FP precision-related TargetOptions. I've summarized the three relevant ones below: UnsafeFPMath - Defaults to off, enables "less precise" results than permitted by IEEE...
2014 Dec 10
2
[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?
Thanks! That’s probably close enough for practical purposes. I looked at the overrides on various targets, and they all return true if the FMA hardware exists. - Arch From: Jingyue Wu [mailto:jingyue at google.com] Sent: Wednesday, December 10, 2014 2:56 PM To: Robison, Arch Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Best way for JIT to query whether llvm.fma.* is fast? Does TargetLowering::isFMAFasterThanFMulAndFAdd (http://llv...
2016 Nov 18
2
what does -ffp-contract=fast allow?
...4481 > If we set "on" in the invocation *and* we set "ON" in the source, > clang will generate @llvm.fmuladd intrinsics for expressions like > x*y+z. If you split that into 2 lines in C with a temp variable > assignment, it's no longer a single expression, so no FMA for you. > The @llvm.fmuladd intrinsic is our way of preserving the C source > information through the optimizer. If we don't end up producing an > FMA instruction for the target in this case, it's a bug. This is not correct. First, the behavior of -ffp-contract=on/off should ju...
2012 Feb 08
0
[LLVMdev] Clarifying FMA-related TargetOptions
Hi Owen, Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c " will have been fused with the multiply...
2012 Dec 13
0
[LLVMdev] Question about FMA formation
On Dec 12, 2012, at 3:40 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > Hi, Dear All: > > I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than > "a * b + c". I'm wondering if FMA could be less precise. In the former case, can we enable FMA > formation despite restrictive FP mode? > I believe that a pass to form fmuladd[1] intrinsic cal...
2012 Dec 13
2
[LLVMdev] Question about FMA formation
...sel. Fast-isel allows us to skip extra representations of the program, and replacing IR with intrinsic calls is similar to having an extra representation, albeit only for part of the program. However, the basic task of spotting an fadd of an fmul is simple enough that fast-isel could just emit the FMA equivalent if it likes. This has the benefit that we avoid the extra representation, but the downside that it makes fast-isel a little more complicated and it only does simple patterns. Shuxin was showing some more complicated patterns that required re-association to match (fast-math flags permit...
2016 Nov 19
2
FMA canonicalization in IR
Sent from my Verizon Wireless 4G LTE DROID On Nov 19, 2016 10:26 AM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: > > If I have my FMA intrinsics story straight now (thanks for the explanation, Hal!), I think it raises another question about IR canonicalization (and may affect the proposed revision to IR FMF): No, I think that we specifically don't want to canonicalize to fmuladd at the IR level at all. If the backend has the...
2013 Jul 08
1
[LLVMdev] API break for out-of-tree targets implementing TargetLoweringBase::isFMAFasterThanMulAndAdd
Hello, To any out-of-tree targets, please be aware that I intend to commit a patch that will break the build of any target implementing TargetLoweringBase::isFMAFasterThanMulAndAdd, for the reasons described below. (Basically, the current interface definition is broken and not followed, and no in-tree target was doing the right thing with it, so it is unlikely any out-of-tree target is either...) To un-break your build after this patch goes through, you wi...
2012 Dec 13
0
[LLVMdev] Question about FMA formation
...A little background: > > The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma > in C. llvm.fmuladd.* is generated by clang when it sees an expression > of the form 'a * b + c' within a single source statement. > > If you want to opportunistically form FMA target instructions my > inclination would be to skip llvm.fmuladd.* and just form them from > a*b+c expressions at isel time. I don't see any fundamental problem > with forming llvm.fmuladd.* to model FMA formation opportunities in an > IR pass though. > > - Lang. > &g...
2012 Feb 08
1
[LLVMdev] Clarifying FMA-related TargetOptions
On Feb 8, 2012, at 10:42 AM, Hal Finkel wrote: > In my experience, users of numerical codes expect that the compiler will > use FMA instructions where it can, unless specifically asked to avoid > doing so by the user. Even though this can sometimes produce a different > result (*almost* always a better one), the performance gain is too large > to be ignored by default. I highly recommend that we continue to enable >...
2012 Dec 13
0
[LLVMdev] Question about FMA formation
Hi Michael, Shuxin, > Shuxin was showing some more complicated patterns that required > re-association to match (fast-math flags permitting). For those, we're > considering if having a re-associate-for-FMA functionality in > codegen-prepare would solve that problem. Thus, we can re-associate in > codegen-prepare and emit FMA in fast-isel. > Yep. I misread the association on Shuxin's example, but even ((a*b) + (c*d)) + e would match to a 3-instructions: (fadd (fma a b (fmul c d)) e). If...
2014 Dec 10
2
[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?
For the Julia language JIT, we'd like be able to tell whether the llvm.fma.* intrinsic has hardware support. What's the best way to query LLVM (JIT) for this information? The information would be used in situations where the user wants to use different algorithms depending on whether FMA hardware is present or not. - Arch D. Robison -------------- next part -----...
2012 Feb 08
0
[LLVMdev] Clarifying FMA-related TargetOptions
On Wed, 2012-02-08 at 10:11 -0800, Owen Anderson wrote: > Hello everyone, > > > I'd like to propose the attached patch to form FMA intrinsics > aggressively, but in order to do so I need some clarification on the > intended semantics for the various FP precision-related > TargetOptions. I've summarized the three relevant ones below: > > > UnsafeFPMath - Defaults to off, enables "less precise"...
2012 Feb 08
1
[LLVMdev] Clarifying FMA-related TargetOptions
On Feb 8, 2012, at 10:44 AM, James Molloy wrote: > Hi Owen, > > Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... > > I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c " will have been fused with the multiply...