thr3ads.net - similar to: "Unlikely branches can have expensive contents hoisted"

Displaying 20 results from an estimated 6000 matches similar to: "Unlikely branches can have expensive contents hoisted"

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

Hey list, I've recently done the first test run of bumping our Burst compiler from LLVM 10 -> 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the

default behavior or

2020 May 27

default behavior or

Hi Devs, going by this link https://llvm.org/docs/LangRef.html#floatenv it says that floating point operation does not have side effects by defaults. but when compile a test case i.e. cat a.c float foo(float a, float b) { return a+b; } $clang a.c -O2 -S -emit-llvm emit ir like: $cat a.ll --------------------------------------- ; ModuleID = 'a.c' source_filename = "a.c" target

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

Tried a bunch of them there (x86-64, haswell, znver2) and they all defaulted to 4-wide - haswell additionally caused some extra loop unrolling but still with 8-wide pows. Cheers, -Neil. On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <lebedev.ri at gmail.com> wrote: > Did you specify the target CPU the code should be optimized for? > For clang that is -march=native/znver2/... /

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi- Here is the simple C++ function: ----------- void foo() { } ----------- Let's say, above function is compiled to generate LLVM IR with -g flag using the command line `clang++ -g -O0 -S -emit-llvm foo.cpp`, we get below IR ----------- ; ModuleID = 'foo.cpp' source_filename = "foo.cpp" target datalayout =

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics) that LLVM by default would generate - and since SLEEF has highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to use that. So we would not see 4/8 libcalls and instead see 1 call to something that lights up the ymm registers. I guess the problem then is that the default expectation is that pow would be

By default clang does not emit trap insn

2020 May 27

By default clang does not emit trap insn

looks like experimental/work in progress support: https://reviews.llvm.org/D62731 On Tue, May 26, 2020 at 10:39 PM kamlesh kumar via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Wed, May 27, 2020 at 11:06 AM kamlesh kumar <kamleshbhalui at gmail.com> > wrote: > >> Hi Devs, >> going by this link https://llvm.org/docs/LangRef.html#floatenv >>

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi David If you look at line https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Verifier.cpp#L1160 there is IR verification which asserts that only in case of `spFlags = DISPFlagDefinition`, the compilation unit (`unit` field) should be present. Otherwise, it should *not* be present. In the crash case, `spFlags = DISPFlagOptimized`. So, I guess, `unit` field should *not* be present,

JIT compiling CUDA source code

2020 Nov 17

JIT compiling CUDA source code

We have an application that allows the user to compile and execute C++ code on the fly, using Orc JIT v2, via the LLJIT class. And we would like to extend it to allow the user to provide CUDA source code as well, for GPU programming. But I am having a hard time figuring out how to do it. To JIT compile C++ code, we do basically as follows: 1. call Driver::BuildCompilation(), which returns a

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

I am bit confused - `unit` must be present for definitions, and `optimized ` is also a `definition`, so, `unit` must be present for `optimized ` too. Am I right? Mahesha On Sun, May 31, 2020 at 10:14 PM David Blaikie <dblaikie at gmail.com> wrote: > definition and optimized are orthogonal (a function could be both, or > neither) - one says this DISubprogram describes a function

target-features attribute prevents inlining?

2020 Jun 13

target-features attribute prevents inlining?

Hello, I'm new to LLVM and I recently hit a weird problem about inlining behavior. I managed to get a minimal repro and the symptom of the issue, but I couldn't understand the root cause or how I should properly handle this issue. Below is an IR code consisting of two functions '_Z2fnP10TestStructi' and 'testfn', with the latter calling the former. One would expect the

LLC crash while handling DEBUG info

2020 Jun 01

LLC crash while handling DEBUG info

Let's forget about my malformed IR if it is adding additional confusion here. I mentioned it here to ease the conversation, but if it is causing confusion rather than making the discussion flow easier, then we better ignore it. The whole triggering point for this email initiative is - one of the applications is crashing with the stack trace that I mentioned earlier. The crash is during the

JIT compiling CUDA source code

2020 Nov 19

JIT compiling CUDA source code

Sound right now like you are emitting an LLVM module? The best strategy is probably to use to emit a PTX module and then pass that to the CUDA driver. This is what we do on the Julia side in CUDA.jl. Nvidia has a somewhat helpful tutorial on this at https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp and

target-features attribute prevents inlining?

2020 Jun 13

target-features attribute prevents inlining?

Hi David, Thanks for your quick response! I now understand the reason that inlining cannot be done on functions with different target-attributes. Thanks for your explanation! However, I think I didn't fully understand your solution; it would be nice if you would like to elaborate a bit more. Here's a bit more info on my current workflow: (1) The clang++ compiler builds C++ source file

target-features attribute prevents inlining?

2020 Jun 13

target-features attribute prevents inlining?

Thank you so much David! After thinking a bit more I agree with you that attempting to add 'target-features' to my functions seem to be the safest approach of all. I noticed that if I mark the clang++ function as 'AlwaysInline', the inlining is performed normally. Is this a potential bug, given what you said that LLVM may accidentally move code using advanced cpu features outside

Loop Unroll

2020 May 23

Loop Unroll

This is my example (for.c): #include <stdio.h> int add(int a, int b) { return a + b; } int main() { int a, b, c, d; a = 5; b = 15; c = add(a, b); d = 0; for(int i=0;i<16;i++) d = add(c, d); } I run: $ clang -O0 -Xclang -disable-O0-optnone -emit-llvm for.c -S -o forO0.ll $ opt -O0 -S --loop-unroll --unroll-count=4 -view-cfg forO0.ll -o for-opt00-unroll4.ll

Loop Unroll

2020 May 22

Loop Unroll

Hi, I'm interesting in find a pass for loop unrolling in LLVM compiler. I tried opt --loop-unroll --unroll-count=4, but it don't work well. What pass I can used and how? I would also like to know if there is any way to mark the loops that I want them to be unroll Thanks you. -------------- next part -------------- An HTML attachment was scrubbed... URL:

Eliminate some two entry PHI nodes - SimplifyCFG

2020 Feb 03

Eliminate some two entry PHI nodes - SimplifyCFG

SimplifyCFG FoldTwoEntryPhiNode looks to simplify all 2 entry phi nodes in a block, if it can't do them all then it won't do any and returns. There is a lot of code that is directly in this function geared toward this requirement. Is it possible currently to get this function (or pass) to simply fold "some" of the phis (without having to fold them all?). I understand that

Eliminate some two entry PHI nodes - SimplifyCFG

2020 Feb 05

Eliminate some two entry PHI nodes - SimplifyCFG

Conditional on the target supporting cmov? Though that's probably not optimal. On Wed, Feb 5, 2020, 7:47 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote: > Hi Ryan, > > On Mon, Feb 3, 2020 at 7:08 PM Ryan Taylor via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > SimplifyCFG FoldTwoEntryPhiNode looks to simplify all 2 entry phi nodes > in a block, if it

Loop Unroll

2020 May 26

Loop Unroll

Awesome, thanks! Now I have another question. I have a matrix multiplication code. This is my code: #include <stdio.h> #include <stdlib.h> #define n 4 int main(int argc, char *argv[]) { int i, j, k; int A[n][n], B[n][n], C[n][n]; for(i=0;i<n;i++){ for(j=0;j<n;j++){ A[i][j] = 1; B[i][j] = 2; C[i][j] = 0; } }

[LLVMdev] llvm-gcc cannot emit @llvm.pow.* ?

2007 Nov 22

[LLVMdev] llvm-gcc cannot emit @llvm.pow.* ?

2007/11/22, Duncan Sands <baldrick at free.fr>: > > Hi, > > > Current llvm-gcc cannot emit llvm intrinsic function like llvm.pow.* and > > llvm.sin.* > > For example: > > > > double foo(double x, double y) { > > return pow(x,y); > > } > > > > will compiled into ll: > > > > define double @foo(double %x, double %y) {

similar to: Unlikely branches can have expensive contents hoisted