similar to: [LLVMdev] Help me improve two-address code

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Help me improve two-address code"

2009 Apr 16
0
[LLVMdev] Help me improve two-address code
On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: > I have my new port limping enough to compile a very basic function: > > int > foo (int a, int b, int c, int d) > { > return a + b - c + d; > } > > clang-cc -O2 yields: > > define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone { > entry: > %add = add i32 %b, %a ; <i32> [#uses=1]
2009 Apr 16
3
[LLVMdev] Help me improve two-address code
Evan Cheng wrote: > On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: > >> Is there some optimizer knob I'm not turning properly? In more complex >> cases, GCC does poorly with two-address operand choices and so bloats >> the code with unnecessary register moves. I have high hopes LLVM >> can do better, so this result for a simple case is bothersome. >>
2009 Apr 17
0
[LLVMdev] Help me improve two-address code
On Apr 16, 2009, at 4:25 PM, Greg McGary wrote: > Evan Cheng wrote: >> On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: >> >>> Is there some optimizer knob I'm not turning properly? In more >>> complex >>> cases, GCC does poorly with two-address operand choices and so >>> bloats >>> the code with unnecessary register moves. I
2009 Apr 20
4
[LLVMdev] Unnecessary moves after sign-extension in 2-address target
My two-address target machine has sign-extension instructions to extend i8->i32 and i16->i32. When I compile this simple program: int sext (unsigned a, unsigned b, int c) { return (signed char) a + (signed short) b + c; } I get this IR: define i32 @sext(i32 %a, i32 %b, i32 %c) nounwind readnone { entry: %conv = trunc i32 %a to i8 ; <i8>
2009 Apr 21
3
[LLVMdev] Unnecessary moves after sign-extension in 2-address target
Dan Gohman wrote: > On Apr 19, 2009, at 6:15 PM, Greg McGary wrote: > >> Because sextb_r and sextw_r have destination tied to source operands, >> TwoAddressInstructionPass thinks it needs a copy. However, since the >> sext kills its source, the copy is unnecessary. Why does this happen? >> Is TwoAddressInstructionPass relying on a later pass to notice this
2014 Oct 03
2
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
Hi Tom, Matt, I'm running into strange issues with the cos test (piglit generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c) I have been seeing random failures (incorrect results) for some time and tried to investigate. the weird part is that the failures are not 100% reproducible, sometimes the tests pass, or partly pass (it's usually float8 and float16 subtests that
2008 Jul 08
3
[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC
Hi Evan, Evan Cheng wrote: > The patch looks great. But I do have one comment: > > +let usesCustomDAGSchedInserter = 1 in { > + let Uses = [CR0] in { > + let Uses = [R0] in > + def ATOMIC_LOAD_ADD_I32 : Pseudo< > > The "let Uses = [R0]" is not needed. The pseudo instruction will be > expanded like this later: > > + BuildMI(BB,
2009 Apr 21
0
[LLVMdev] Unnecessary moves after sign-extension in 2-address target
On Apr 19, 2009, at 6:15 PM, Greg McGary wrote: > > Because sextb_r and sextw_r have destination tied to source operands, > TwoAddressInstructionPass thinks it needs a copy. However, since the > sext kills its source, the copy is unnecessary. Why does this happen? > Is TwoAddressInstructionPass relying on a later pass to notice this > and > transform it again? Yes, the
2017 Mar 15
2
Data structure improvement for the SLP vectorizer
Maybe it would illustrative to give an IR example of the case I'm interested in. Consider define void @"julia_transform_bvn_derivs_hessian!"(double* %data, double* %data2, double *%data3, double *%out) { %element11 = getelementptr inbounds double, double* %data, i32 1 %load10 = load double, double* %data %load11 = load double, double* %element11 %element21 =
2008 Jul 08
0
[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC
Look for createVirtualRegister. These are examples in PPCISelLowering.cpp. Evan On Jul 8, 2008, at 8:24 AM, Gary Benson wrote: > Hi Evan, > > Evan Cheng wrote: >> The patch looks great. But I do have one comment: >> >> +let usesCustomDAGSchedInserter = 1 in { >> + let Uses = [CR0] in { >> + let Uses = [R0] in >> + def ATOMIC_LOAD_ADD_I32 :
2008 Jul 04
0
[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC
Hi Gary, The patch looks great. But I do have one comment: +let usesCustomDAGSchedInserter = 1 in { + let Uses = [CR0] in { + let Uses = [R0] in + def ATOMIC_LOAD_ADD_I32 : Pseudo< The "let Uses = [R0]" is not needed. The pseudo instruction will be expanded like this later: + BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest) +
2013 Dec 20
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi all, The 213 variant of the FMA3 instructions is currently marked commutable (see X86InstrFMA.td). Is that safe? According to the ISA the FMA3 instructions aren't commutable for non-numeric results, so I'd have thought commuting this would only be valid in fast-math mode? For the curious, the reason that I'm asking is that we currently always select the 213 variant, but this
2013 Dec 20
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Kay, My patch will partially address your bug. For now I'm just looking to switch the default FMA from vfmadd213xx to vfmadd231xx. That will cause the code in PR17229 to compile as desired, but would regress code like: double foo(double a, double b, double c) { return a * b + c; } Which will now require a vmovaps + vfmadd231. If this impacts real benchmarks we could add an
2018 Nov 14
2
Fw: How to define an instruction
--------- Forwarded Message --------- From: Tianhao Shen <17862703959 at 163.com> Date: 11/14/2018 09:31 To: craig.topper at gmail.com <craig.topper at gmail.com> Subject: Re: [llvm-dev] How to define an instruction Hi, Craig Thank you for replying to me. I guess that you misunderstand my meaning about "can'r run". I just want to run my instruction by LLVM using the
2013 Dec 20
0
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Lang, Unfortunately, I don't have an answer on the commutability question, but I wanted to let you know that I filed a bug on this: http://llvm.org/bugs/show_bug.cgi?id=17229 This also shows a memory operand variant of the fma that you may want to consider in your patch and testcases. Thanks! On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com> wrote: > Hi all,
2013 Dec 23
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Elena, Thank you very much for looking in to that. I'll go ahead and remove the isCommutable flag from those instructions, since it sounds like that's the right thing to do. I would still like to change the default from the 231 variant to 213 too, as this will reduce code-size for accumulator-style loops. I have at least one benchmark that shows significant speedups when this change
2019 Jun 11
3
How to tell LLVM to treat Commutable library calls as such, for example multiplication?
A few library calls are commutable by definition, for example multiplications. I defined them as LibCalls for my architecture. However, I found that arguments are always passed in the order they are generated by Clang thus missing possible optimisations. For example, the following IR code ; Function Attrs: minsize norecurse nounwind optsize readnone define dso_local i16 @multTest(i16 %a, i16
2018 Nov 14
2
Fw: How to define an instruction
Thank you for answering my confusion. I have another questions. If I add really instructions instead intrinsics ,can I reach my purpose? I guess ,the answer is "can't". I don't find the anything about how machine to do about instructions,especially "ALU" instructions. Thank you again, Tianhao Shen On 11/14/2018 13:42,Craig Topper<craig.topper at gmail.com>
2018 Sep 17
2
error about adding an trinsics
Hi,every one. This problem has been bothering me for several days.I really hope that you can help me. I want to add an trinsics in X86. This trinsics can compare two numbers and return the larger. There are the changes I do as fllowing. In /tools/clang/include/clang/Basic/BuiltinsX86.def : BUILTIN(__builtin_x86_max_qb, "iii", "") In include/llvm/IR/IntrinsicsX86.td : let
2009 Apr 16
2
[LLVMdev] How do I model MUL with multiply-accumulate instruction?
The only multiplication instruction on my target CPU is multiply-and-accumulate. The result goes into a special register that can destructively read at the end of a sequence of multiply-adds. The following sequence is required to so a simple multiply: acc r0 # clear accumulator, discarding its value (r0 reads as 0, and sinks writes) mac rSRC1, rSRC2 # multiply sources, store