thr3ads.net - similar to: "Optimization issues (Alias Analysis?)"

Displaying 20 results from an estimated 4000 matches similar to: "Optimization issues (Alias Analysis?)"

2013 Jan 18

[LLVMdev] Weird volatile propagation ?

Hi All, Using clang+llvm at head, I noticed a weird behaviour with the following reduced testcase : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 28

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 29

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

I can't think of a better way to do this, so I think it's ok. I also submitted a complementary patch on llvm-commits clarifying volatile semantics. -Andy On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <arnaud.allarddegrandmaison at parrot.com> wrote: > Hi All, > > In the language reference manual, the access behavior of the memcpy, > memmove and memset

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {

Redundant ptrtoint/inttoptr instructions

2020 Jul 02

Redundant ptrtoint/inttoptr instructions

Hi all, We noticed a lot of unnecessary ptrtoint instructions that stand in way of some of our optimizations; the code pattern looks like this: bb1: %int1 = ptrtoint %struct.s* %ptr1 to i64 bb2: %int2 = ptrtoint %struct.s* %ptr2 to i64 %bb3: %phi.node = phi i64 [ %int1, %bb1 ], [%int2, %bb2 ] %ptr = inttoptr i64 %phi.node to %struct.s* In short, the pattern above arises due to: 1.

LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

2017 Jun 20

LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

On 06/20/2017 03:26 AM, Hal Finkel wrote: > Hi, Adrien, Hello Hal! Thanks for your answer! > Thanks for reporting this. I recommend that you file a bug report at > https://bugs.llvm.org/ Will do! > Whenever I see reports of missed optimization opportunities in the face > of ptrtoint/inttoptr, my first question is: why are these instructions > present in the first place? At

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,

Help with SROA throwing away no-alias information

2020 Jan 17

Help with SROA throwing away no-alias information

I'm having an issue where SROA will throw away no-alias information on some loads after inlining, because the loads are derived from a store to an alloca which can be removed after inlining. The pointers that were originally stored into the alloca do *not *have any aliasing information - the only context that allowed me to assert aliasing was that the inlined-function guaranteed it to be so.

[LLVMdev] Alias analysis issue with structs on PPC

2015 Mar 17

[LLVMdev] Alias analysis issue with structs on PPC

Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most

Redundant ptrtoint/inttoptr instructions

2020 Jul 02

Redundant ptrtoint/inttoptr instructions

My general feeling is this: No optimizations should be creating int2ptr/ptr2int. We really need to fix them all. They should use pointer casts and i8* GEPs. This has, unfortunately, been a problem for a long time. As Johannes says, optimizing int2ptr/ptr2int is very tricky. In part, becaue all dependencies, including implicit control dependencies, end up being part of the resulting aliasing

Memory Store/Load Optimization Issue (Emulating stack)

2016 Feb 10

Memory Store/Load Optimization Issue (Emulating stack)

Thank you for the hint. I adjusted the code and it works: The code after replacing inttoptr with getelementptr: define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) { entry: ; push foo (On "stack") %sp_1 = getelementptr i8, i8* %sp, i32 -4 %sp_1_ptr = bitcast i8* %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 ; push bar %sp_2 = getelementptr i8, i8* %sp_1,

Memory Store/Load Optimization Issue (Emulating stack)

2016 Feb 10

Memory Store/Load Optimization Issue (Emulating stack)

Thanks for the answers. Although I am not sure if I've understood the docs about how inttoptr/ptrtointr are different when compared to gep. It says: "It’s invalid to take a GEP from one object, address into a different separately allocated object, and dereference it.". To go back to my intention why I am doing this, I would like to "emulate" some x86 instructions with

Memory Store/Load Optimization Issue (Emulating stack)

2016 Feb 12

Memory Store/Load Optimization Issue (Emulating stack)

Hi again, So I finally gave up on trying to get through the converting (x86' push pop mov add) because it deals a lot with crazy pointer arithmetics and sonce inttoptr and ptrtoint doesn't provide any alias analysis information. Daniel, you said it doesn't make much sense to provide it but in my cases it is actually very much needed, you didn't say it wasn't possible to

Memory Store/Load Optimization Issue (Emulating stack)

2016 Feb 08

Memory Store/Load Optimization Issue (Emulating stack)

Hello, I am trying to emulate the "stack" as like on x86 when using push/pop so afterwards I can use LLVM's optimizer passes to simplify (reduce junk) the code. The LLVM IR code: define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) { ; push foo (On "stack") %sp_1 = sub i32 %sp, 4 %sp_1_ptr = inttoptr i32 %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align

[LLVMdev] Alias analysis issue with structs on PPC

2015 Mar 15

[LLVMdev] Alias analysis issue with structs on PPC

On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> wrote: > Hi Daniel, > > Thanks for your feedback. I would prefer not to write a new AA. Can't we > directly implement that traversal in BasicAA? > Can I ask why? Outside of the "well, it's another pass", i mean? BasicAA is stateless, so you can't cache, and you really don't

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa

LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

2017 Jun 17

LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

Hello all, There is a missing vectorization opportunity issue with clang 4.0 with the file attached. Indeed, when compiled with -O2, the "op_distance" function get vectorized, but not the "op" one. For information, this test case has been reduced from a file generated by the Pythran compiler (https://github.com/serge-sans-paille/pythran). If we take a look at the generated

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

What is the status of the "Killing Undef and Spreading Poison" RFC?

2018 Mar 20

What is the status of the "Killing Undef and Spreading Poison" RFC?

Hi Nuno, Except for one bit, which was that it requires correct typing of load/store > operations. That is, if you load an i32, it means you are loading a single > 32-bit integer, not two 16-bit integers or something else. > This is a valid concern because currently nor LLVM nor clang respect this > property. Clang may pass several parameters as a single variable, LLVM has >

RFC: A change in InstCombine canonical form

2016 Mar 16

RFC: A change in InstCombine canonical form

=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445) IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float.

similar to: Optimization issues (Alias Analysis?)