thr3ads.net - similar to: "Comparing Clang and GCC: only clang stores updated value in each iteration."

Displaying 20 results from an estimated 600 matches similar to: "Comparing Clang and GCC: only clang stores updated value in each iteration."

Comparing Clang and GCC: only clang stores updated value in each iteration.

2018 Sep 21

Comparing Clang and GCC: only clang stores updated value in each iteration.

Hi Philip and Eli, > I think your example may be a bit over reduced. Unless I'm misreading > this, a starts at 1, is incremented one each iteration, and then is > tested against zero. The only way this loop can exit is if a has > wrapped around and C++ states that signed integers are assumed to not > overflow. We can/should be replacing the whole loop with an

Aliasing rules difference between GCC and Clang

2018 Sep 20

Aliasing rules difference between GCC and Clang

Hi, I found a difference between Clang and GCC in alias handling. This was with a benchmark where Clang was considerably slower, and in a hot function which does many more loads from the same address due to stores between the uses. In other words, a value is loaded and used, another value is stored, and then the first value is loaded once again before its second use. This happens many times,

Aliasing rules difference between GCC and Clang

2018 Sep 21

Aliasing rules difference between GCC and Clang

> > I would say that GCC is wrong and should also have a version where a > could be equal to b. There is no restrict keyword, so they could be > equal. > This was between a/b and c, not between a and b. Could you explain your opinion a bit more in detail, please? /Jonas > Cheers, > > Matthieu > > Le jeu. 20 sept. 2018 à 16:56, Jonas Paulsson via llvm-dev >

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

[LLVMdev] [Q] x86 peephole deficiency

Hi all, I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) and now I am running into a deficiency of the x86 peephole optimizer (or jump-threader?). Here is what I get: andl $3, %edi je .LBB0_4 # BB#2: # %nz # in Loop: Header=BB0_1 Depth=1 cmpl $2, %edi

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 25

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

For the following code fragment, ; <label>:27 ; preds = %27, %entry %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 %29 = icmp slt i32 %28, 0 br i1 %29, label %27, label %loop.exit loop.exit: ; preds = %27 llc will generate following MIPS code, $BB0_1: lui $3, 32800 ori $3, $3, 1032 lw

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

I didn't have Mips board. I compile as the commands and check the asm output as below. 1. Question: The distance of caller arg[4] and arg[5] is 4 bytes. But the the callee get every arg[] by 8 bytes offset (arg_ptr1+8 or arg_ptr2+8). I assume the #BB#4 and #BB#5 are the arg_ptr which is the pointer to access the stack arguments. 2. Question: Stack memory 28($sp) has no initial value. If

Aarch64: unaligned access despite -mstrict-align

2020 Jun 01

Aarch64: unaligned access despite -mstrict-align

Hi, I experienced a crash in code compiled with Clang 10.0.0 due to a misaligned 64-bit data access. The (ARMv8) CPU is configured with SCTL.A == 1 (alignment check enable). With SCTLR.A == 0 the code runs as expected. After some investigation I came up with the following reproducer: ---8<-------8<-------8<-------8<-------8<-------8<-------8<------- $ cat test.c extern char

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

Does it make a difference if you give the "-target" option to clang? $ clang -target mips-linux-gnu ch8_3.cpp -o ch8_3.bc -emit-llvm -c The .s file generated this way looks quite different from the one in your email. On Tue, Feb 19, 2013 at 5:06 PM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I didn't have Mips board. I compile as the commands and check the asm >

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

[LLVMdev] [Q] x86 peephole deficiency

On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz >

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

The Sparc, Microblaze, and Mips code generators implement branch delay slots. They all seem to exhibit the same bug, which is not surprising since the code is very similar. If I compile code with this snippit: while (n--) *s++ = (char) c; I get this (for the Microblaze): swi r19, r1, 0 add r3, r0, r0 cmp r3, r3, r7 beqid r3,

Naming a random effect in lmer

2009 May 21

Naming a random effect in lmer

Dear guRus: I am using lmer for a mixed model that includes a random intercept for a set of effects that have the same distribution, Normal(0, sig2b). This set of effects is of variable size, so I am using an as.formula statement to create the formula for lmer. For example, if the set of random effects has dimension 8, then the lmer call is: Zs<-

[LLVMdev] .globl

2013 Aug 29

[LLVMdev] .globl

I need to be able to emit .globl for the soft float routines used by mips16. The routines are called but there is no .globl definition for them. How can I do this? Background: I have a strange issue that I encountered with mips16 hard float. Part of mips16 hard float is to emit calls to runtime routines with the same signature as usual soft float routines, except that they are implemented

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jun 30

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

Hi All, The following code : void hexagon2( int *a, int *res ) { int i = 100; while ( i-- ) { *res++ = *a++; } } gets compiled as a sub-optimal Software loop on LLVM 9.0 instead of a Hardware loop, whereas it was compiled as a Hardware Loop in LLVM 7.0. This is the final assembly code generated by LLVM 9.0 : .text .file "main.c" .globl hexagon2 // --

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote: > For the following code fragment, > > ; <label>:27 ; preds = %27, %entry > %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 > %29 = icmp slt i32 %28, 0 > br i1 %29, label %27, label %loop.exit > > loop.exit: ; preds = %27

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

Which part of the generated code do you think is not correct? Could you be more specific? I compiled this program with clang and ran it on a mips board. It returns the expected result (21). On Tue, Feb 19, 2013 at 4:15 AM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I check the Mips backend for the following C code fragment compile result. > It seems not correct. Is it my

LoopVectorizer with ifconversion

2017 Mar 17

LoopVectorizer with ifconversion

On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote: > In general, this is true everywhere. In a large vectorized loop, this cost > may well be worthwhile. The idea is that the cost model should account for > all of these costs. If it doesn't properly, we should fix that. Isn't this only worth when the SIMD instructions can be conditionalised per lane? I

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

I check the Mips backend for the following C code fragment compile result. It seems not correct. Is it my misunderstand or it's a bug. //ch8_3.cpp #include <stdarg.h> int sum_i(int amount, ...) { int i = 0; int val = 0; int sum = 0; va_list vl; va_start(vl, amount); for (i = 0; i < amount; i++) { val = va_arg(vl, int); sum += val; } va_end(vl);

similar to: Comparing Clang and GCC: only clang stores updated value in each iteration.