thr3ads.net - similar to: "[cfe-dev] CFG simplification question, and preservation of branching in the original code"

Displaying 20 results from an estimated 10000 matches similar to: "[cfe-dev] CFG simplification question, and preservation of branching in the original code"

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Sep 29

[cfe-dev] CFG simplification question, and preservation of branching in the original code

On Sun, Sep 29, 2019 at 3:35 PM Joan Lluch via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Sanjay, > > Actually, the CodeGenPrepare::optimizeSelectInst is not doing the best it could do in some circumstances: The case of “OptSize" for targets not supporting Select was already mentioned to be detrimental. > > For targets that actually have selects, but branches

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Sep 30

[cfe-dev] CFG simplification question, and preservation of branching in the original code

On Mon, Sep 30, 2019 at 11:52 AM Joan Lluch <joan.lluch at icloud.com> wrote: > > Hi Roman, > > Is "test" actually an implementation of a 64-bit-wide multiplication > compiler-rt builtin? > Then i'd think the main problem is that it is being optimized in the > first place, you could end up with endless recursion… > > > No, this is not a compiler-rt

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Sep 30

[cfe-dev] CFG simplification question, and preservation of branching in the original code

For the MSP430 example, I'm guess its InstCombiner::transformSExtICmp or InstCombiner::transformZExtICmp ~Craig On Mon, Sep 30, 2019 at 2:21 PM Support IMAP <support at sweetwilliamsl.com> wrote: > Hi all, > > Ok, I just found a much simpler example of the same issue. > > Consider the following code > > int cmpge32_0(long a) { > return a>=0; > } >

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Oct 01

[cfe-dev] CFG simplification question, and preservation of branching in the original code

Hi Sanjay, Thanks for your reply. > So yes, the IR optimizer (instcombine is the specific pass) sometimes turns icmp (and select) sequences into ALU ops. Instcombine is almost entirely *target-independent* and should remain that way. The (sometimes unfortunate) decision to create shifts were made based on popular targets of the time (PowerPC and/or x86), and other targets may have suffered

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Oct 03

[cfe-dev] CFG simplification question, and preservation of branching in the original code

Hi all, > On 2 Oct 2019, at 14:34, Sanjay Patel <spatel at rotateright.com> wrote > Providing target options/overrides to code that is supposed to be target-independent sounds self-defeating to me. I doubt that proposal would gain much support. > Of course, if you're customizing LLVM for your own out-of-trunk backend, you can do anything you'd like if you're willing to

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

2019 Nov 14

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

For any of the examples shown below, if the logical equivalent using cmp + other IR instructions is no more than the number of IR instructions as the variant that uses shift, we should consider reversing the canonicalization. To make that happen, you would need to show that at least the minimal cases have codegen that is equal or better using the cmp form for at least a few in-tree targets. My

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

2019 Nov 13

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

As before, I'm not convinced that we want to allow target-based enable/disable in instcombine for performance. That undermines having a target-independent canonical form in the 1st place. It's not clear to me what the remaining motivating cases look like. If you could post those here or as bugs, I think you'd have a better chance of finding an answer. Let's take a minimal example

Dataflow analysis regression in 3.7

2017 Jul 06

Dataflow analysis regression in 3.7

On Thu, Jul 6, 2017 at 7:00 AM, Davide Italiano <davide at freebsd.org> wrote: > On Wed, Jul 5, 2017 at 3:59 PM, Johan Engelen via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I just found an optimization regression regarding simple > > dataflow/constprop analysis: > > https://godbolt.org/g/Uz8P7t > > > > This code >

How to tell LLVM to treat Commutable library calls as such, for example multiplication?

2019 Jun 11

How to tell LLVM to treat Commutable library calls as such, for example multiplication?

A few library calls are commutable by definition, for example multiplications. I defined them as LibCalls for my architecture. However, I found that arguments are always passed in the order they are generated by Clang thus missing possible optimisations. For example, the following IR code ; Function Attrs: minsize norecurse nounwind optsize readnone define dso_local i16 @multTest(i16 %a, i16

Disabling select instructions

2020 Jan 31

Disabling select instructions

I agree with John; also, if you decide to go this route, you can reuse the code from CodeGenPrepare::optimizeSelectInst: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/CodeGenPrepare.cpp#L6065 Alexey On Thu, Jan 30, 2020 at 9:00 PM John Regehr via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Several different passes introduce select instructions, such as >

[LLVMdev] Correct usage of `llvm.assume` for loop vectorization alignment?

2014 Dec 26

[LLVMdev] Correct usage of `llvm.assume` for loop vectorization alignment?

Using LLVM ToT and Hal's helpful slide deck [1], I've been trying to use `llvm.assume` to communicate pointer alignment guarantees to vector load and store instructions. For example, in [2] %5 and %9 are guaranteed to be 32-byte aligned. However, if I run this IR through `opt -O3 -datalayout -S`, the vectorized loads and stores are still 1-byte aligned [3]. What's going wrong? Do I

Optimizing Compare instruction selection

2019 Jun 05

Optimizing Compare instruction selection

Hi Eli, Thanks again for your reply. I am unsure about implementing the getCrossCopyRegClass for my target. My target does not support or allow moves to and from the SR. The SR exists because it has implicit involvement in some instructions, but it is opaque to the assembler and to the user as a register. I mean, there are no instructions to directly move or read it, or even access it directly.

Optimizing Compare instruction selection

2019 Jun 02

Optimizing Compare instruction selection

Hi Eli, Thank you very much for your response. In fact, I had already tried the X86 approach before, i.e explicitly using the status register. This is the approach that appeals more to me. I left it parked because it also produced some problems (but I left it commented out). So I have now re-lived the code, and it works fine in most cases, but there’s a particular case that causes LLVM to stop

how to generate select-free LLVM IR?

2019 Aug 01

how to generate select-free LLVM IR?

I run "clang ... --emit-llvm ....". I get LLVM IR select instructions out. I don't want select instructions, but want explicit control flow. Is there a way to do this? I'm scared of disabling the SimplifyCFG pass since it appears to do much more than just rewrite to use select statements. -------------- next part -------------- An HTML attachment was scrubbed... URL:

NSW and ExtLdPromotion()

2015 Aug 11

NSW and ExtLdPromotion()

Hi, All: I have a testcase which produced incorrect result, it's caused by the combination of nsw flag and ExtLdPromotion, I am leaning to say Clang set nsw flag incorrectly, but please let me know if I was wrong. Here is the reduced testcase: long long foo(int *a) { long long c; c = *a * 1405; return c; } Clang emitted the following IR (It is done by EmitMUL() in

Disabling select instructions

2020 Jan 30

Disabling select instructions

Hi, I would like to know if there's a way to avoid select instructions during the IR generation. What are the optimization passes that can result in a select instruction? i.e. I want to preserve branches in my code without disabling any other optimizations applicable. For example, void foo(int* x, int* y){ if(*x > 0){ *y = *x + 10; } else{ *y = *x + 20; } }

Optimizing Compare instruction selection

2019 Jun 01

Optimizing Compare instruction selection

I attempt to optimize the use of the ‘CMP’ instruction on my architecture by removing the instruction instances where the Status Register already had the correct status flags. The cmp instruction in my architecture is the typical one that compares two registers, or a register with an immediate, and sets the Status Flags accordingly. I implemented my ‘cmp’ instruction in LLVM by custom lowering

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hello, FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324 When compiling the following code for sm_20, func params are by some reason given with .align 0, which is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple =

should we do this time-consuming transform in InstCombine?

2018 Dec 18

should we do this time-consuming transform in InstCombine?

Hi Roman, Thanks for your good idea. I think it can solve the abs issue very well. I can continue with my work now^-^. But if it is not abs and there is no select, %res = OP i32 %b, %a %sub = sub i32 0, %b %res2 = OP i32 %sub, %a theoretically, we can still do the following transform for the above pattern: %res2 = OP i32 %sub, %a ==> %res2 = sub i32 0, %res Not sure whether we can do it

How to change CLang struct alignment behaviour?

2019 May 13

How to change CLang struct alignment behaviour?

Hi Joan, On Mon, 13 May 2019 at 18:01, Joan Lluch <joan.lluch at icloud.com> wrote: > After looking at it a bit further, I think this is a Clang thing. Clang issues “align 2” if the struct has at least one int (2 bytes), but also if the entire struct size is multiple of 2. For example a struct with 4 char members. In these cases the LLVM backend correctly creates word sized load/stores

similar to: [cfe-dev] CFG simplification question, and preservation of branching in the original code