similar to: [LLVMdev] controlling(enable/disable) FMA instruction generation

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] controlling(enable/disable) FMA instruction generation"

2012 Nov 15
2
[LLVMdev] X86 rsqrt instruction generated
Hi, We have implemented the rsqrt instruction generation for X86 target architecture. We have introduced a flag -fp-rsqrt flag which controls the generatation of X86 rsqrt instruction generation. We have observed minor effects on precision due to rsqrt and hence has put these transformations under the mentioned flag. Note that -fp-rsqrt is only enabled with -enable-unsafe-fp-math flag presently.
2012 Sep 06
1
[LLVMdev] Error running spec benchmark with FMA4 on X86
Hi All, I am facing miscompare error when running povray (and few other C/C++ benchmarks) from spec cpu2006 suite enabling FMA4 (and disabling FMA3). I have used -ffp-contract=fast to turn on this option. (Compilation options and targets pasted below). >>>>>>>> clang version 3.2 (trunk 163295:163308) (llvm/trunk 163295) Target: x86_64-unknown-linux-gnu Thread model: posix
2012 Nov 15
0
[LLVMdev] X86 rcp instruction generated
Hi, We have implemented the rcp instruction generation for X86 target architecture. We have introduced a flag -fp-rcp flag which controls the generatation of X86 rcp instruction generation. We have observed minor effects on precision and hence hve put these transformations under the mentioned flag. Note that -fp-rcp is only enabled with -enable-unsafe-fp-math flag presently. Moreover we have
2012 Nov 15
0
[LLVMdev] X86 rsqrt instruction generated
On Wed, Nov 14, 2012 at 10:43 PM, Chakraborty, Soham <Soham.Chakraborty at amd.com> wrote: > Hi, > > > > We have implemented the rsqrt instruction generation for X86 target > architecture. We have introduced a flag -fp-rsqrt flag which controls the > generatation of X86 rsqrt instruction generation. > > We have observed minor effects on precision due to rsqrt and
2012 Dec 03
1
[LLVMdev] X86 rsqrt instruction generated
Hi, Please find attached the modified patch and description. We have modified and retested the patch taking into consideration the comments and inputs provided earlier. Thanks & Regards, soham -----Original Message----- From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Thursday, November 15, 2012 12:59 PM To: Chakraborty, Soham Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev]
2010 Feb 25
0
[LLVMdev] AVX support
On Thursday 25 February 2010 15:33:58 Jan Sjodin wrote: > I have seen some re-factoring work done to prepare for AVX support. What > are the plans (time wise) to add the AVX patterns to the backend? Has > anyone thought about FMA4? Oh yes. :) FMA4 will have a different feature bit than AVX or FMA3. FMA4 is our top priority after AVX due to Bulldozer. What would you like to see for
2013 Sep 17
0
[LLVMdev] conditional assignment in selectionDAG
Hi, I am trying following transformation in X86 selection dag. lhs = rhs; // lhs and rhs are f32 => if(rhs == 0.0) lhs = rhs; else lhs = rhs'; i.e. conditionally replace rhs by rhs'. I guess it can be done using ISD::SELECT node(for float values)? In that case I have to create a condition node. Can you please suggest/refer to how to do so? You may please suggest
2013 Dec 20
0
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Lang, Unfortunately, I don't have an answer on the commutability question, but I wanted to let you know that I filed a bug on this: http://llvm.org/bugs/show_bug.cgi?id=17229 This also shows a memory operand variant of the fma that you may want to consider in your patch and testcases. Thanks! On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com> wrote: > Hi all,
2013 Dec 20
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi all, The 213 variant of the FMA3 instructions is currently marked commutable (see X86InstrFMA.td). Is that safe? According to the ISA the FMA3 instructions aren't commutable for non-numeric results, so I'd have thought commuting this would only be valid in fast-math mode? For the curious, the reason that I'm asking is that we currently always select the 213 variant, but this
2013 Dec 23
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Elena, Thank you very much for looking in to that. I'll go ahead and remove the isCommutable flag from those instructions, since it sounds like that's the right thing to do. I would still like to change the default from the 231 variant to 213 too, as this will reduce code-size for accumulator-style loops. I have at least one benchmark that shows significant speedups when this change
2013 Dec 20
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Kay, My patch will partially address your bug. For now I'm just looking to switch the default FMA from vfmadd213xx to vfmadd231xx. That will cause the code in PR17229 to compile as desired, but would regress code like: double foo(double a, double b, double c) { return a * b + c; } Which will now require a vmovaps + vfmadd231. If this impacts real benchmarks we could add an
2016 Sep 14
2
undef * 0
Hi, > Both A and B are undef: > LHS = (undef & undef) | (undef & undef) = undef // Since ~undef = undef > RHS = undef > Thus transform is correct. LLVM documentation (http://llvm.org/docs/LangRef.html#undefined-values) suggests that it is unsafe to consider (a & undef = undef) and (a | undef = undef). "As such, it is unsafe to optimize or assume
2012 Nov 16
2
[LLVMdev] Operand order in dag pattern matching in td files
Hi, I have a simple question w.r.t the order of operands used in dag pattern matching in target files. Some of them seem intuitive. But I want to get it clarified anyway. I am using a pattern from X86InstrFMA.td in the below example. Consider FMA3 pattern (simplified). let Constraints = "$src1 = $dst" in { multiclass fma3s_rm<bits<8> opc, string OpcodeStr, X86MemOperand
2012 Nov 16
0
[LLVMdev] Operand order in dag pattern matching in td files
On 16 November 2012 13:41, Anitha B Gollamudi <anitha.boyapati at gmail.com> wrote: > Hi, > > I have a simple question w.r.t the order of operands used in dag > pattern matching in target files. Some of them seem intuitive. But I > want to get it clarified anyway. I am using a pattern from > X86InstrFMA.td in the below example. Consider FMA3 pattern > (simplified). >
2012 Nov 27
2
order.max specification problem in the ar.ols function
Hello I am facing a curious problem.I have a time series data with which i want to fit auto-regressive model of order p, where p runs from 1:9.I am using a for loop which will fit an AR(p) model for each value of p using the *ar.ols* function. I am using the following code for ( p in 1:9){ a=ar.ols (x=data.ts, order.max=p, demean=T, intercept=T) } Specifying the *order.max* to be p, it gives me a
2012 Nov 08
2
[LLVMdev] X86 Tablegen Description and VEX.W
Hi, A question from r162999 changes: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrFMA.td?r1=162999&r2=162998&pathrev=162999 For the multiclass "fma4s", why is "mr" not inherited from "VEX_W" and "MemOp4" like those of "rm" or "rr" ? multiclass fma4s< > ... def mr : FMA4<opc, MRMSrcMem, (outs
2012 Nov 08
2
[LLVMdev] X86 Tablegen Description and VEX.W
On 8 November 2012 11:12, Cameron McInally <cameron.mcinally at nyu.edu> wrote: > On Wed, Nov 7, 2012 at 10:52 PM, Anitha Boyapati <anitha.boyapati at gmail.com> > wrote: > ... >> >> For the multiclass "fma4s", why is "mr" not inherited from "VEX_W" and >> "MemOp4" like those of "rm" or "rr" ? >
2012 Nov 16
1
[LLVMdev] Operand order in dag pattern matching in td files
You've unfortunately chosen a complex example. Your second question is needs be answered first. null_frag causes the pattern to be dropped. Now having covered that the reason the operands are in the order they are is because the only instruction that doesn't use null_frag is this one defm r213 : fma3s_rm<opc213, !strconcat(OpStr, !strconcat("213", PackTy)),
2012 Nov 05
1
[LLVMdev] adding architecture specific flag
Hi, Can anybody please suggest where to add architecture specific code generation flags(e.g. X86) in llvm? Thanks in advance. Best Regards, soham
2016 Sep 02
4
undef * 0
What is the value of undef * 0 in LLVM? According to its definition in the LLVM IR reference; "The string ‘undef‘ can be used anywhere a constant is expected..." Am I correct to say that undef * 0 = 0 following this definition? Best Regards, soham