similar to: target-features attribute prevents inlining?

Displaying 20 results from an estimated 7000 matches similar to: "target-features attribute prevents inlining?"

2020 Jun 13
2
target-features attribute prevents inlining?
Hi David, Thanks for your quick response! I now understand the reason that inlining cannot be done on functions with different target-attributes. Thanks for your explanation! However, I think I didn't fully understand your solution; it would be nice if you would like to elaborate a bit more. Here's a bit more info on my current workflow: (1) The clang++ compiler builds C++ source file
2020 Jun 13
2
target-features attribute prevents inlining?
Thank you so much David! After thinking a bit more I agree with you that attempting to add 'target-features' to my functions seem to be the safest approach of all. I noticed that if I mark the clang++ function as 'AlwaysInline', the inlining is performed normally. Is this a potential bug, given what you said that LLVM may accidentally move code using advanced cpu features outside
2020 May 31
2
LLC crash while handling DEBUG info
Hi- Here is the simple C++ function: ----------- void foo() { } ----------- Let's say, above function is compiled to generate LLVM IR with -g flag using the command line `clang++ -g -O0 -S -emit-llvm foo.cpp`, we get below IR ----------- ; ModuleID = 'foo.cpp' source_filename = "foo.cpp" target datalayout =
2020 May 31
2
LLC crash while handling DEBUG info
Hi David If you look at line https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Verifier.cpp#L1160 there is IR verification which asserts that only in case of `spFlags = DISPFlagDefinition`, the compilation unit (`unit` field) should be present. Otherwise, it should *not* be present. In the crash case, `spFlags = DISPFlagOptimized`. So, I guess, `unit` field should *not* be present,
2020 May 27
4
default behavior or
Hi Devs, going by this link https://llvm.org/docs/LangRef.html#floatenv it says that floating point operation does not have side effects by defaults. but when compile a test case i.e. cat a.c float foo(float a, float b) { return a+b; } $clang a.c -O2 -S -emit-llvm emit ir like: $cat a.ll --------------------------------------- ; ModuleID = 'a.c' source_filename = "a.c" target
2020 May 31
2
LLC crash while handling DEBUG info
I am bit confused - `unit` must be present for definitions, and `optimized ` is also a `definition`, so, `unit` must be present for `optimized ` too. Am I right? Mahesha On Sun, May 31, 2020 at 10:14 PM David Blaikie <dblaikie at gmail.com> wrote: > definition and optimized are orthogonal (a function could be both, or > neither) - one says this DISubprogram describes a function
2020 Jun 01
2
LLC crash while handling DEBUG info
Let's forget about my malformed IR if it is adding additional confusion here. I mentioned it here to ease the conversation, but if it is causing confusion rather than making the discussion flow easier, then we better ignore it. The whole triggering point for this email initiative is - one of the applications is crashing with the stack trace that I mentioned earlier. The crash is during the
2020 May 27
2
By default clang does not emit trap insn
looks like experimental/work in progress support: https://reviews.llvm.org/D62731 On Tue, May 26, 2020 at 10:39 PM kamlesh kumar via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Wed, May 27, 2020 at 11:06 AM kamlesh kumar <kamleshbhalui at gmail.com> > wrote: > >> Hi Devs, >> going by this link https://llvm.org/docs/LangRef.html#floatenv >>
2003 Dec 15
1
[LLVMdev] Assertion failed in Pass.cpp
Hi all, I am trying to write a pass for the llc tool. I register this pass with the RegisterLLC template. However, when I try to run llc and load up the pass, I get a failed assertion: $ /storage/anshuman/llvmCVS/llvm/tools/Debug/llc -load=./libTest.so --help ... ... llc: Pass.cpp:327: void llvm::RegisterPassBase::unregisterPass(llvm::PassInfo*): Assertion `I != PassInfoMap->end()
2005 Jan 24
4
converting R objects to C types in .Call
Dear People, I'm trying to write an R wrapper for a C++ library, using .Call. I've never used .Call before. I'm currently having some difficulties converting a R character string to a C one. Here is a little test program. #include <R.h> #include <Rinternals.h> #include <stdio.h> SEXP testfn(SEXP chstr) { char * charptr = CHAR(chstr); printf("%s",
2020 Sep 04
2
Performance of JIT execution
Hello, I recently noticed a performance issue of JIT execution vs native code of the following simple logic which computes the Fibonacci sequence: uint64_t fib(int n) { if (n <= 2) { return 1; } else { return fib(n-1) + fib(n-2); } } When compiled natively using clang++ with -O3, it took 0.17s to compute fib(40). However, when executing using LLJIT, fed with the IR output of "clang++
2016 May 17
2
Function arguments pass by value or by reference.
Now, I am using LLVM-3.3 do some process with functions, however there are some difficult things I can't handle by myself. So, I want get your help to get it down properly. Q1. There is a function declaration: call i32 @create(i64* %tid, %union.t* %pab, i8* (i8*)* @worker, i8* null) // callInst Store instruction goes like this: store i8* (i32, double, i32*)* %fp, i8* (i32, double, i32*)**
2020 Oct 03
2
Another tail call optimization question
Hello, Could anyone kindly explain to me why the 'g()' in the following function cannot have tail call optimization? > void f(int* x); > void g(); > void h(int v) { > f(&v); > g(); > } > A while ago I was taught that tail call optimization cannot apply if local variables needs to be kept alive, but 'g()' doesn't seem to require anything to be
2020 May 23
2
Loop Unroll
This is my example (for.c): #include <stdio.h> int add(int a, int b) { return a + b; } int main() { int a, b, c, d; a = 5; b = 15; c = add(a, b); d = 0; for(int i=0;i<16;i++) d = add(c, d); } I run: $ clang -O0 -Xclang -disable-O0-optnone -emit-llvm for.c -S -o forO0.ll $ opt -O0 -S --loop-unroll --unroll-count=4 -view-cfg forO0.ll -o for-opt00-unroll4.ll
2020 Jul 22
2
Unlikely branches can have expensive contents hoisted
Hey all - me again, So I'm looking at llvm.expect specifically for branch hints. In the following example LLVM will hoist the pow/cos calls into the entry block even though I've used the llvm.expect intrinsic to make it clear that one of the calls is unlikely to occur. target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple =
2020 May 26
3
Loop Unroll
Awesome, thanks! Now I have another question. I have a matrix multiplication code. This is my code: #include <stdio.h> #include <stdlib.h> #define n 4 int main(int argc, char *argv[]) { int i, j, k; int A[n][n], B[n][n], C[n][n]; for(i=0;i<n;i++){ for(j=0;j<n;j++){ A[i][j] = 1; B[i][j] = 2; C[i][j] = 0; } }
2020 May 22
4
Loop Unroll
Hi, I'm interesting in find a pass for loop unrolling in LLVM compiler. I tried opt --loop-unroll --unroll-count=4, but it don't work well. What pass I can used and how? I would also like to know if there is any way to mark the loops that I want them to be unroll Thanks you. -------------- next part -------------- An HTML attachment was scrubbed... URL:
2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Hey list, I've recently done the first test run of bumping our Burst compiler from LLVM 10 -> 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the
2020 Nov 19
1
JIT compiling CUDA source code
Sound right now like you are emitting an LLVM module? The best strategy is probably to use to emit a PTX module and then pass that to the CUDA driver. This is what we do on the Julia side in CUDA.jl. Nvidia has a somewhat helpful tutorial on this at https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp and
2020 Nov 17
2
JIT compiling CUDA source code
We have an application that allows the user to compile and execute C++ code on the fly, using Orc JIT v2, via the LLJIT class. And we would like to extend it to allow the user to provide CUDA source code as well, for GPU programming. But I am having a hard time figuring out how to do it. To JIT compile C++ code, we do basically as follows: 1. call Driver::BuildCompilation(), which returns a