thr3ads.net - similar to: "[LLVMdev] Weird volatile propagation ?"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Weird volatile propagation ?"

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 28

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 29

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

I can't think of a better way to do this, so I think it's ok. I also submitted a complementary patch on llvm-commits clarifying volatile semantics. -Andy On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <arnaud.allarddegrandmaison at parrot.com> wrote: > Hi All, > > In the language reference manual, the access behavior of the memcpy, > memmove and memset

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 21

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

On 01/20/2013 10:56 PM, Chandler Carruth wrote: > I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this > seems like an easy and simple deficiency in the IR intrinsic for > memcpy. See below. > > On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison > <arnaud.allarddegrandmaison at parrot.com > <mailto:arnaud.allarddegrandmaison at

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 31

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Thanks Andy and Chandler, After specifying the volatile access behaviour, the second step was to autoupgrade the memmove/memcpy intrinsics, and implement (is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no functional change. 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the one you already reviewed, unaltered.

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Feb 03

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Same patches as before, but 0002-memcpy has been updated to put the (is|set)SrcVolatile methods to where they logically belong : MemTransferInst. This makes (is|set)Volatile methods look a bit ugly to keep compatibility with existing behaviour, but they will hopefully disappear when all users have moved to the new interface --- in the next series of patches. I plan to give a try to phabricator

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 16

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

At 2013-12-15 22:43:34,"Caldarale, Charles R" <Chuck.Caldarale at unisys.com> wrote: >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Haishan >> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > >> My clang version is 3.3 and debug build. > >> //test.c >> int a[6] = {1, 2, 3, 4, 5,

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 21

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

The flag -enable-aa-sched-mi should do what you want you want in the MachineScheduler pass. If you want to do it in the selection DAG, there is a subtarget hook that might do it: TargetSubtargetInfo::useAA() LLVM won’t generate the schedule you want anyway for Intel core processors, but the alias analysis can be useful in general. -Andy On Dec 16, 2013, at 6:03 AM, Haishan <hndxvon at

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 15

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

Hi, I compile a case (test.c) to get object machine file (test.o) using clang as follows: "clang -target arm -integrated-as -c test.c -o test.o" My clang version is 3.3 and debug build. //test.c int a[6] = {1, 2, 3, 4, 5, 6} int main() { a[0] = a[5]; a[1] = a[4]; a[2] = a[5]; } //end test.c Then test.dump is generated by using the objdump tool. //test.dump ldr r1, [r0, #20]

[RFC] bitfield access shrinking

2017 Mar 09

[RFC] bitfield access shrinking

On 03/09/2017 12:28 PM, Krzysztof Parzyszek via llvm-dev wrote: > We could add intrinsics to extract/insert a bitfield, which would > simplify a lot of that bitwise logic. But then you need to teach a bunch of places about how to simply them, fold using bitwise logic and other things that reduce demanded bits into them, etc. This seems like a difficult tradeoff. -Hal > >

[LLVMdev] wrong code generation for memcpy function in SROA optimization pass

2013 Nov 24

[LLVMdev] wrong code generation for memcpy function in SROA optimization pass

SROA optimization pass did some optimizations and transforms for memcpy function,such as ld/st operations.When someone has written down code like size>sizeof(dest) in memcpy(*dest,*src,size), there was much likely a wrong code generation.for example,considered as such testcase: int main() { char ch; short sh = 0x1234; memcpy(&ch,&sh,2); printf("ch=0x%02x\n",ch); } At

Avoiding during my pass the optimization (copy propagation) of my LLVM IR code (at generation)

2016 Dec 30

Avoiding during my pass the optimization (copy propagation) of my LLVM IR code (at generation)

Hello. I'm writing an LLVM pass that is working on LLVM IR. To my surprise the following LLVM pass code generates optimized code - it does copy propagation on it. Value *vecShuffleOnePtr = Builder.CreateGEP(ptr_B, vecShuffleOne, "VectorGep"); ... packed_gather_params.push_back(vecShuffleOnePtr); CallInst *callGather =

Optimization issues (Alias Analysis?)

2016 Jul 04

Optimization issues (Alias Analysis?)

Hey, I am currently working on a VM which is based on LLVM and I would like to use its optimizer, but it somehow it can't detect something very simple (I guess.) This is the LLVM IR: target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple = "i386-unknown-linux-gnu" %struct.regs = type { i32, i32, i32 } define void @Test(%struct.regs* noalias

Intel AMX programming model discussion.

2020 Aug 14

Intel AMX programming model discussion.

Hi, Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Capability of Intel AMX implementation is enumerated by palettes. Two palettes are supported: palette 0 represents the initialized state and

Intel AMX programming model discussion.

2020 Aug 14

Intel AMX programming model discussion.

[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 15

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Haishan > Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > My clang version is 3.3 and debug build. > //test.c > int a[6] = {1, 2, 3, 4, 5, 6} > int main() { > a[0] = a[5]; > a[1] = a[4]; > a[2] = a[5]; > } > //end test.c > Then test.dump is

Intel AMX programming model discussion.

2020 Aug 18

Intel AMX programming model discussion.

The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floating point) controls the shape of all the tile registers, and if you change the tile configuration every single tile register is cleared. In practice, if we have to change the the configuration while any of the tile registers are live, performance is going to be terrible.

Redundant ptrtoint/inttoptr instructions

2020 Jul 02

Redundant ptrtoint/inttoptr instructions

Hi all, We noticed a lot of unnecessary ptrtoint instructions that stand in way of some of our optimizations; the code pattern looks like this: bb1: %int1 = ptrtoint %struct.s* %ptr1 to i64 bb2: %int2 = ptrtoint %struct.s* %ptr2 to i64 %bb3: %phi.node = phi i64 [ %int1, %bb1 ], [%int2, %bb2 ] %ptr = inttoptr i64 %phi.node to %struct.s* In short, the pattern above arises due to: 1.

similar to: [LLVMdev] Weird volatile propagation ?