Displaying 20 results from an estimated 2000 matches similar to: "default behavior or"
2020 May 27
2
By default clang does not emit trap insn
looks like experimental/work in progress support:
https://reviews.llvm.org/D62731
On Tue, May 26, 2020 at 10:39 PM kamlesh kumar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Wed, May 27, 2020 at 11:06 AM kamlesh kumar <kamleshbhalui at gmail.com>
> wrote:
>
>> Hi Devs,
>> going by this link https://llvm.org/docs/LangRef.html#floatenv
>>
2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Hey list,
I've recently done the first test run of bumping our Burst compiler from
LLVM 10 -> 11 now that the branch has been cut, and have noticed an
apparent loop vectorization codegen regression for X86 with AVX or AVX2
enabled. The following IR example is vectorized to 4 wide with LLVM 11 and
trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8
wide matching the
2020 May 31
2
LLC crash while handling DEBUG info
Hi-
Here is the simple C++ function:
-----------
void foo() {
}
-----------
Let's say, above function is compiled to generate LLVM IR with -g flag
using the command line `clang++ -g -O0 -S -emit-llvm foo.cpp`, we get
below IR
-----------
; ModuleID = 'foo.cpp'
source_filename = "foo.cpp"
target datalayout =
2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Tried a bunch of them there (x86-64, haswell, znver2) and they all
defaulted to 4-wide - haswell additionally caused some extra loop unrolling
but still with 8-wide pows.
Cheers,
-Neil.
On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <lebedev.ri at gmail.com> wrote:
> Did you specify the target CPU the code should be optimized for?
> For clang that is -march=native/znver2/... /
2020 May 31
2
LLC crash while handling DEBUG info
Hi David
If you look at line
https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Verifier.cpp#L1160
there is IR verification which asserts that only in case of `spFlags
= DISPFlagDefinition`, the compilation unit (`unit` field) should be
present. Otherwise, it should *not* be present. In the crash case,
`spFlags = DISPFlagOptimized`. So, I guess, `unit` field should *not* be
present,
2020 May 31
2
LLC crash while handling DEBUG info
I am bit confused - `unit` must be present for definitions, and `optimized `
is also a `definition`, so, `unit` must be present for `optimized ` too. Am
I right?
Mahesha
On Sun, May 31, 2020 at 10:14 PM David Blaikie <dblaikie at gmail.com> wrote:
> definition and optimized are orthogonal (a function could be both, or
> neither) - one says this DISubprogram describes a function
2020 Jul 22
2
Unlikely branches can have expensive contents hoisted
Hey all - me again,
So I'm looking at llvm.expect specifically for branch hints. In the
following example LLVM will hoist the pow/cos calls into the entry block
even though I've used the llvm.expect intrinsic to make it clear that one
of the calls is unlikely to occur.
target datalayout =
"e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple =
2020 Jun 01
2
LLC crash while handling DEBUG info
Let's forget about my malformed IR if it is adding additional confusion
here. I mentioned it here to ease the conversation, but if it is causing
confusion rather than making the discussion flow easier, then we better
ignore it.
The whole triggering point for this email initiative is - one of the
applications is crashing with the stack trace that I mentioned earlier. The
crash is during the
2020 Jul 16
4
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics)
that LLVM by default would generate - and since SLEEF has highly optimal
8-wide pow, optimized for AVX and AVX2, we really want to use that.
So we would not see 4/8 libcalls and instead see 1 call to something that
lights up the ymm registers. I guess the problem then is that the default
expectation is that pow would be
2020 Nov 17
2
JIT compiling CUDA source code
We have an application that allows the user to compile and execute C++ code
on the fly, using Orc JIT v2, via the LLJIT class. And we would like to
extend it to allow the user to provide CUDA source code as well, for GPU
programming. But I am having a hard time figuring out how to do it.
To JIT compile C++ code, we do basically as follows:
1. call Driver::BuildCompilation(), which returns a
2020 Jun 13
2
target-features attribute prevents inlining?
Hello,
I'm new to LLVM and I recently hit a weird problem about inlining behavior.
I managed to get a minimal repro and the symptom of the issue, but I
couldn't understand the root cause or how I should properly handle this
issue.
Below is an IR code consisting of two functions '_Z2fnP10TestStructi' and
'testfn', with the latter calling the former. One would expect the
2020 Nov 19
1
JIT compiling CUDA source code
Sound right now like you are emitting an LLVM module?
The best strategy is probably to use to emit a PTX module and then pass
that to the CUDA driver. This is what we do on the Julia side in CUDA.jl.
Nvidia has a somewhat helpful tutorial on this at
https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
and
2020 May 23
2
Loop Unroll
This is my example (for.c):
#include <stdio.h>
int add(int a, int b) {
return a + b;
}
int main() {
int a, b, c, d;
a = 5;
b = 15;
c = add(a, b);
d = 0;
for(int i=0;i<16;i++)
d = add(c, d);
}
I run:
$ clang -O0 -Xclang -disable-O0-optnone -emit-llvm for.c -S -o forO0.ll
$ opt -O0 -S --loop-unroll --unroll-count=4 -view-cfg forO0.ll -o
for-opt00-unroll4.ll
2020 Jun 13
2
target-features attribute prevents inlining?
Hi David,
Thanks for your quick response!
I now understand the reason that inlining cannot be done on functions with
different target-attributes. Thanks for your explanation!
However, I think I didn't fully understand your solution; it would be nice
if you would like to elaborate a bit more. Here's a bit more info on my
current workflow:
(1) The clang++ compiler builds C++ source file
2020 May 22
4
Loop Unroll
Hi,
I'm interesting in find a pass for loop unrolling in LLVM compiler. I tried
opt --loop-unroll --unroll-count=4, but it don't work well.
What pass I can used and how?
I would also like to know if there is any way to mark the loops that I want
them to be unroll
Thanks you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2020 Jun 13
2
target-features attribute prevents inlining?
Thank you so much David! After thinking a bit more I agree with you that
attempting to add 'target-features' to my functions seem to be the safest
approach of all.
I noticed that if I mark the clang++ function as 'AlwaysInline', the
inlining is performed normally. Is this a potential bug, given what you
said that LLVM may accidentally move code using advanced cpu features
outside
2020 May 26
3
Loop Unroll
Awesome, thanks!
Now I have another question. I have a matrix multiplication code. This is
my code:
#include <stdio.h>
#include <stdlib.h>
#define n 4
int main(int argc, char *argv[]) {
int i, j, k;
int A[n][n], B[n][n], C[n][n];
for(i=0;i<n;i++){
for(j=0;j<n;j++){
A[i][j] = 1;
B[i][j] = 2;
C[i][j] = 0;
}
}
2019 Feb 25
3
Why is there still ineffective code after -o3 optimization?
Hi,
I have some IR module from random generation (mostly ineffective
instructions).
It has a function with void return, and two function arguments where one
is a reference.
Therefore, I expect every instruction not altering the value at the 2nd
arguments address should be ineffective.
Here is the function definition (see below for full ll):
define void @_Z27entityMainDataInputCallbackdRd(double
2020 Jan 21
4
aarch64 does not emit DW_AT_Location
Hi Devs,
debug info emitted by llvm does not contain DW_AT_Location for Formal
parameter
if it is an aggregate like below case
1) aggregate contain more than 4 homogeneous and size more than 128 bits
i.e.
typedef struct{
int a,b,c,d,e;
}mystruct;
void foo(mystruct ms){
}
2) aggregate contain hetrogeneous type and size more than 128 bits.
i.e.
typedef struct{
int a,b;
float c,d,e;
}mystruct;
void
2011 Dec 23
1
execute command just after Dial()
Hello,
I'm using AGI scripting with asterisk and need to execute certain commands just after Dial(). But once dial command is executed, further commands/instructions are ignored.
$agi->exec("Dial","SIP/100");
$dialstatus = $agi -> get_variable("DIALSTATUS");
if($dialstatus[data]=="ANSWER")
{
do something.......