thr3ads.net - similar to: "[LLVMdev] X86 - Help on fixing a poor code generation bug"

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] X86 - Help on fixing a poor code generation bug"

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi Andrea, Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily

[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.

2012 Oct 24

[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.

Hi, I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below. The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is : // BEFORE LOOP ... Some COPYs.... 400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2

Greedy register allocator allocates live sub-register

2016 Mar 10

Greedy register allocator allocates live sub-register

Hi all, I've come across a problem with register allocation which I have been unable to track down the root cause of. 6728B %vreg304<def> = COPY %vreg278; VRF128:%vreg304,%vreg278 6736B %vreg302<def> = COPY %vreg278; VRF128:%vreg302,%vreg278 6752B %vreg278<def,tied1> = foo %vreg278<tied0>, %vreg277, 14, pred:1, pred:%noreg, 5; VRF128:%vreg278 VRF64_l:%vreg277 * bar

[LLVMdev] Adding a stack probe function attribute

2015 Aug 16

[LLVMdev] Adding a stack probe function attribute

I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://github.com/Microsoft/llvm/tree/MS. Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code. I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32. I

TwoAddressInstructionPass bug?

2017 Nov 30

TwoAddressInstructionPass bug?

Hi, we are in the midst of an interesting work that begun with setting 'guessInstructionProperties = 0' in the SystemZ backend. We have found this to be useful, and discovered many instructions where the hasSideEffects flag was incorrectly set while it actually shouldn't. The attached patch and test case triggers an assert in TwoAddress. (bin/llc ./tc_TwoAddr_crash.ll

[LLVMdev] [PATCH] [MachineSinking] Conservatively clear kill flags after coalescing.

2014 Sep 05

[LLVMdev] [PATCH] [MachineSinking] Conservatively clear kill flags after coalescing.

Hi Quentin, Jonas looked further into the problem below, and asked me to submit his patch. Note the we have our own out-of-tree target, and we have not been able to reproduce this problem on an in-tree target. /Patrik Hägglund [MachineSinking] Conservatively clear kill flags after coalescing. This solves the problem of having a kill flag inside a loop with a definition of the register prior to

[LLVMdev] Register scavenger and SP/FP adjustments

2013 Sep 26

[LLVMdev] Register scavenger and SP/FP adjustments

Consider this example: --- ex.ll --- declare void @bar() ; Function Attrs: nounwind optsize define void @main() { entry: %hin = alloca [256 x i32], align 4 %xin = alloca [256 x i32], align 4 call void @bar() ret void } ------------- Freshly built llc: llc -O2 -march=x86 < ex.ll -print-before-all # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization ***: #

[LLVMdev] Register scavenger and SP/FP adjustments

2013 Sep 26

[LLVMdev] Register scavenger and SP/FP adjustments

The code has changed a lot over the years. Looks like at some point of time the assumption was broken. calculateCallsInformation() may have eliminated the pseudo set up instructions already. // If call frames are not being included as part of the stack frame, and

[LLVMdev] Register scavenger and SP/FP adjustments

2013 Sep 26

[LLVMdev] Register scavenger and SP/FP adjustments

Thanks, I'll look into that. Still, the case where the function does not call anything remains---in such a situation there are no ADJCALLSTACK pseudos, so regardless of what that function you pointed at does, there won't be any target-independent information about the SP adjustment by the time the frame index elimination runs. Would it make sense to have ADJCALLSTACK pseudos every

[newbie] trouble with global variables and CreateLoad/Store in JIT

2017 Jun 05

[newbie] trouble with global variables and CreateLoad/Store in JIT

Since the getelementptrs were implicitly generated by the CreateStore/Load I'm not sure how to get access to them. So I hacked the assignment to be done thrice: once using a manual decomposition into two GEPs and stores, once using the "big" CreateStore, once via the setGlobal function, printing addresses and memory contents at each point to the degree that I have access to them.

[LLVMdev] [PATCH] [MachineSinking] Conservatively clear kill flags after coalescing.

2014 Sep 05

[LLVMdev] [PATCH] [MachineSinking] Conservatively clear kill flags after coalescing.

On Sep 5, 2014, at 10:21 AM, Juergen Ributzka <juergen at apple.com> wrote: > clearKillFlags seems a little "overkill" to me. In this case you could just simply transfer the value of the kill flag from the SrcReg to the DstReg. We are extending the live-range of SrcReg. I do not see how you could relate that to the kill flag of DstReg. Therefore, I still think, this is the

ARM: Predicated returns considered analyzable?

2015 Aug 10

ARM: Predicated returns considered analyzable?

Hello, The function ARMBaseInstrInfo::AnalyzeBranch contains the following piece of code: } else if (I->isReturn()) { // Returns can't be analyzed, but we should run cleanup. CantAnalyze = !isPredicated(I); } else { This could lead to cases where for a block that ends with a conditional return, AnalyzeBranch returns false (i.e. analyzed), both TBB and FBB are

MachineSink optimization in code containing a setjmp

2015 Oct 13

MachineSink optimization in code containing a setjmp

Hello LLVM-dev, I think I've found an issue with the MachineSink optimization on a program that uses setjmp. It looks like MachineSink will happily move a machine instruction into a following machine basic block (not necessarily a successor), even when that later block can be reached through a setjmp. Here is some example debug output from llc that I'm seeing: Sinking along critical

[newbie] trouble with global variables and CreateLoad/Store in JIT

2017 Jun 06

[newbie] trouble with global variables and CreateLoad/Store in JIT

On Mon, Jun 5, 2017 at 1:34 PM, Nikodemus Siivola < nikodemus at random-state.net> wrote: > Uh. Turns out that if I hide the pointer to @foo from LLVM by passing it > through an opaque identity function ... then everything works fine. > > Is this a bug in LLVM or is there some magic involving globals I'm > misunderstanding? > This looks like a bug in the handling of

[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass

2017 Aug 02

[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass

Hi, We recently found a testcase showing that simplifications in instcombine sometimes change the instruction without reducing the instruction cost, but causing problems in TwoAddressInstruction pass. And it looks like the problem is generic and other simplification may have the same issue. I want to get some ideas about what is the best way to fix such kind of problem. The testcase:

[newbie] trouble with global variables and CreateLoad/Store in JIT

2017 Jun 06

[newbie] trouble with global variables and CreateLoad/Store in JIT

That's useful to know that the static compilation code path works. Furthermore, as expected from that: 52: c7 05 04 00 00 00 d5 00 00 00 movl $213, 4 00000054: IMAGE_REL_I386_DIR32 _foo It looks like the offset `4` of the second field of your struct is correct in the object file, so this does seem to be a problem in the JIT-specific linking/loading.

ARM: Predicated returns considered analyzable?

2015 Aug 12

ARM: Predicated returns considered analyzable?

Doh. I missed the list in my first reply... Here's the replay of the conversation: ----- Renato: On 10 August 2015 at 14:05, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: > --> %SP<def,tied1> = t2LDMIA_RET %SP<tied0>, pred:8, pred:%CPSR, > %R7<def>, %PC<def>, %SP<imp-use,undef>, %R7<imp-use,undef>, >

[newbie] trouble with global variables and CreateLoad/Store in JIT

2017 Jun 07

[newbie] trouble with global variables and CreateLoad/Store in JIT

My code was hinky, but only in the sense that I was accidentally duplicating the definition variable in the module where the function was. With only the declaration in the second module loading the bitcode reproduces the issue. Managed an lli reproduction: $ cat jit-0.ll target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32" target triple =

[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass

2017 Aug 02

[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass

On Wed, Aug 2, 2017 at 3:36 PM, Matthias Braun <mbraun at apple.com> wrote: > So to write this in a more condensed form, you have: > > %v0 = ... > %v1 = and %v0, 255 > %v2 = and %v1, 31 > use %v1 > use %v2 > > and transform this to > %v0 = ... > %v1 = and %v0, 255 > %v2 = and %v0, 31 > ... > > This is a classical problem with instruction

[LLVMdev] Predicated Vector Operations

2013 May 09

[LLVMdev] Predicated Vector Operations

On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote: > On Thu, May 9, 2013 at 8:10 AM, <dag at cray.com> wrote: >> Jeff Bush <jeffbush001 at gmail.com> writes: >> >>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...> >>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> >>> %sum = fadd %tx, %ty >>> %newvalue

similar to: [LLVMdev] X86 - Help on fixing a poor code generation bug