thr3ads.net - similar to: "[LLVMdev] Commutability of X86 FMA3 instructions."

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Commutability of X86 FMA3 instructions."

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Kay, My patch will partially address your bug. For now I'm just looking to switch the default FMA from vfmadd213xx to vfmadd231xx. That will cause the code in PR17229 to compile as desired, but would regress code like: double foo(double a, double b, double c) { return a * b + c; } Which will now require a vmovaps + vfmadd231. If this impacts real benchmarks we could add an

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Lang, Unfortunately, I don't have an answer on the commutability question, but I wanted to let you know that I filed a bug on this: http://llvm.org/bugs/show_bug.cgi?id=17229 This also shows a memory operand variant of the fma that you may want to consider in your patch and testcases. Thanks! On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com> wrote: > Hi all,

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 23

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Elena, Thank you very much for looking in to that. I'll go ahead and remove the isCommutable flag from those instructions, since it sounds like that's the right thing to do. I would still like to change the default from the 231 variant to 213 too, as this will reduce code-size for accumulator-style loops. I have at least one benchmark that shows significant speedups when this change

[LLVMdev] Operand order in dag pattern matching in td files

2012 Nov 16

[LLVMdev] Operand order in dag pattern matching in td files

Hi, I have a simple question w.r.t the order of operands used in dag pattern matching in target files. Some of them seem intuitive. But I want to get it clarified anyway. I am using a pattern from X86InstrFMA.td in the below example. Consider FMA3 pattern (simplified). let Constraints = "$src1 = $dst" in { multiclass fma3s_rm<bits<8> opc, string OpcodeStr, X86MemOperand

[LLVMdev] Operand order in dag pattern matching in td files

2012 Nov 16

[LLVMdev] Operand order in dag pattern matching in td files

On 16 November 2012 13:41, Anitha B Gollamudi <anitha.boyapati at gmail.com> wrote: > Hi, > > I have a simple question w.r.t the order of operands used in dag > pattern matching in target files. Some of them seem intuitive. But I > want to get it clarified anyway. I am using a pattern from > X86InstrFMA.td in the below example. Consider FMA3 pattern > (simplified). >

[LLVMdev] Error running spec benchmark with FMA4 on X86

2012 Sep 06

[LLVMdev] Error running spec benchmark with FMA4 on X86

Hi All, I am facing miscompare error when running povray (and few other C/C++ benchmarks) from spec cpu2006 suite enabling FMA4 (and disabling FMA3). I have used -ffp-contract=fast to turn on this option. (Compilation options and targets pasted below). >>>>>>>> clang version 3.2 (trunk 163295:163308) (llvm/trunk 163295) Target: x86_64-unknown-linux-gnu Thread model: posix

How to tell LLVM to treat Commutable library calls as such, for example multiplication?

2019 Jun 11

How to tell LLVM to treat Commutable library calls as such, for example multiplication?

A few library calls are commutable by definition, for example multiplications. I defined them as LibCalls for my architecture. However, I found that arguments are always passed in the order they are generated by Clang thus missing possible optimisations. For example, the following IR code ; Function Attrs: minsize norecurse nounwind optsize readnone define dso_local i16 @multTest(i16 %a, i16

[LLVMdev] [PATCH] Lost instcombine opportunity: "or"s of "icmp"s (commutability)

2008 Oct 08

[LLVMdev] [PATCH] Lost instcombine opportunity: "or"s of "icmp"s (commutability)

Here's an initial stab, but I'm not too happy about the temporarily adding new instructions then removing it because returning it will have it added back in to replace other uses. I also added a couple test cases pass with the new InstructionCombining changes (the old code only passes one of the added tests). Also, this change exposes some simplification for

[LLVMdev] Operand order in dag pattern matching in td files

2012 Nov 16

[LLVMdev] Operand order in dag pattern matching in td files

You've unfortunately chosen a complex example. Your second question is needs be answered first. null_frag causes the pattern to be dropped. Now having covered that the reason the operands are in the order they are is because the only instruction that doesn't use null_frag is this one defm r213 : fma3s_rm<opc213, !strconcat(OpStr, !strconcat("213", PackTy)),

[LLVMdev] Lost instcombine opportunity: "or"s of "icmp"s (commutability)

2008 Oct 08

[LLVMdev] Lost instcombine opportunity: "or"s of "icmp"s (commutability)

instcombine can handle certain orders of "icmp"s that are "or"ed together: x != 5 OR x > 10 OR x == 8 becomes.. x != 5 OR x == 8 becomes.. x != 5 However, a different ordering prevents the simplification: x == 8 OR x > 10 OR x != 5 becomes.. %or.eq8.gt10 OR x != 5 and that can't be simplified because we now have an "or" OR "icmp". What would I

[LLVMdev] Multiclass patterns

2009 Feb 10

[LLVMdev] Multiclass patterns

On Tue, Feb 10, 2009 at 8:27 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote: > Bill, > Sorry if I wasn't clear enough. I wasn't referring to multiclass's that > define other classes, but with using patterns inside of a multiclass to > reduce redundant code. > For example: > multiclass IntSubtract<SDNode node> > { > def _i8 : Pat<(sub

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner

[LLVMdev] Help me improve two-address code

2009 Apr 16

[LLVMdev] Help me improve two-address code

Evan Cheng wrote: > On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: > >> Is there some optimizer knob I'm not turning properly? In more complex >> cases, GCC does poorly with two-address operand choices and so bloats >> the code with unnecessary register moves. I have high hopes LLVM >> can do better, so this result for a simple case is bothersome. >>

[LLVMdev] removing unnecessary moves with 2-address machine

2008 Mar 05

[LLVMdev] removing unnecessary moves with 2-address machine

I am new to LLVM and am trying to write a backend for a simple 2-address machine. For a very simple program: define i16 @add2aa(i16 %a, i16 %b) nounwind { entry: %tmp3 = add i16 %b, %a ; <i16> [#uses=1] ret i16 %tmp3 } The arguments a and b are passed in r15 and r14. The result is returned in r15. The generated code is: add2aa: add.w r15,r14

[LLVMdev] "icmp eq", "icmp ne" not commuting operands on ARM

2009 Jun 26

[LLVMdev] "icmp eq", "icmp ne" not commuting operands on ARM

NE and EQ comparisons should be able to commute their operands. But, for ARM at least, this does not seem to be happening. The first sequence below generates CMN (compare negated) but the second does not (complete test attached). These seem to map to ARMcmpNZ. Where would I look to see if that is marked commutative? %nb = sub i32 0, %b %tmp = icmp ne i32 %a, %nb %nb = sub

[LLVMdev] Help me improve two-address code

2009 Apr 16

[LLVMdev] Help me improve two-address code

On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: > I have my new port limping enough to compile a very basic function: > > int > foo (int a, int b, int c, int d) > { > return a + b - c + d; > } > > clang-cc -O2 yields: > > define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone { > entry: > %add = add i32 %b, %a ; <i32> [#uses=1]

[LLVMdev] Stack alignment in kernel

2012 Mar 01

[LLVMdev] Stack alignment in kernel

I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit. Could you, please, tell me where it should be specified? Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or

[LLVMdev] bitwise AND selector node not commutative?

2009 Jun 26

[LLVMdev] bitwise AND selector node not commutative?

On Jun 25, 2009, at 6:06 PM, Evan Cheng wrote: > > On Jun 25, 2009, at 4:38 PM, David Goodwin wrote: > >> Using the Thumb-2 target we see that ORN ( a | ^b) and BIC (a & ^b) >> have similar patterns, as we would expect: >> >> defm t2BIC : T2I_bin_irs<"bic", BinOpFrag<(and node:$LHS, (not >> node:$RHS))>>; >> defm t2ORN :

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,

[LLVMdev] bitwise AND selector node not commutative?

2009 Jun 26

[LLVMdev] bitwise AND selector node not commutative?

On Jun 25, 2009, at 4:38 PM, David Goodwin wrote: > Using the Thumb-2 target we see that ORN ( a | ^b) and BIC (a & ^b) > have similar patterns, as we would expect: > > defm t2BIC : T2I_bin_irs<"bic", BinOpFrag<(and node:$LHS, (not node: > $RHS))>>; > defm t2ORN : T2I_bin_irs<"orn", BinOpFrag<(or node:$LHS, (not node: >

similar to: [LLVMdev] Commutability of X86 FMA3 instructions.