similar to: Optimization issues (Alias Analysis?)

Displaying 20 results from an estimated 4000 matches similar to: "Optimization issues (Alias Analysis?)"

2013 Jan 18
2
[LLVMdev] Weird volatile propagation ?
Hi All, Using clang+llvm at head, I noticed a weird behaviour with the following reduced testcase : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>
2013 Jan 29
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
I can't think of a better way to do this, so I think it's ok. I also submitted a complementary patch on llvm-commits clarifying volatile semantics. -Andy On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <arnaud.allarddegrandmaison at parrot.com> wrote: > Hi All, > > In the language reference manual, the access behavior of the memcpy, > memmove and memset
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {
2020 Jul 02
3
Redundant ptrtoint/inttoptr instructions
Hi all, We noticed a lot of unnecessary ptrtoint instructions that stand in way of some of our optimizations; the code pattern looks like this: bb1: %int1 = ptrtoint %struct.s* %ptr1 to i64 bb2: %int2 = ptrtoint %struct.s* %ptr2 to i64 %bb3: %phi.node = phi i64 [ %int1, %bb1 ], [%int2, %bb2 ] %ptr = inttoptr i64 %phi.node to %struct.s* In short, the pattern above arises due to: 1.
2017 Jun 20
3
LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions
On 06/20/2017 03:26 AM, Hal Finkel wrote: > Hi, Adrien, Hello Hal! Thanks for your answer! > Thanks for reporting this. I recommend that you file a bug report at > https://bugs.llvm.org/ Will do! > Whenever I see reports of missed optimization opportunities in the face > of ptrtoint/inttoptr, my first question is: why are these instructions > present in the first place? At
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,
2020 Jan 17
3
Help with SROA throwing away no-alias information
I'm having an issue where SROA will throw away no-alias information on some loads after inlining, because the loads are derived from a store to an alloca which can be removed after inlining. The pointers that were originally stored into the alloca do *not *have any aliasing information - the only context that allowed me to assert aliasing was that the inlined-function guaranteed it to be so.
2015 Mar 17
2
[LLVMdev] Alias analysis issue with structs on PPC
Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most
2020 Jul 02
3
Redundant ptrtoint/inttoptr instructions
My general feeling is this: No optimizations should be creating int2ptr/ptr2int. We really need to fix them all. They should use pointer casts and i8* GEPs. This has, unfortunately, been a problem for a long time. As Johannes says, optimizing int2ptr/ptr2int is very tricky. In part, becaue all dependencies, including implicit control dependencies, end up being part of the resulting aliasing
2016 Feb 10
4
Memory Store/Load Optimization Issue (Emulating stack)
Thank you for the hint. I adjusted the code and it works: The code after replacing inttoptr with getelementptr: define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) { entry: ; push foo (On "stack") %sp_1 = getelementptr i8, i8* %sp, i32 -4 %sp_1_ptr = bitcast i8* %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 ; push bar %sp_2 = getelementptr i8, i8* %sp_1,
2016 Feb 10
2
Memory Store/Load Optimization Issue (Emulating stack)
Thanks for the answers. Although I am not sure if I've understood the docs about how inttoptr/ptrtointr are different when compared to gep. It says: "It’s invalid to take a GEP from one object, address into a different separately allocated object, and dereference it.". To go back to my intention why I am doing this, I would like to "emulate" some x86 instructions with
2016 Feb 12
2
Memory Store/Load Optimization Issue (Emulating stack)
Hi again, So I finally gave up on trying to get through the converting (x86' push pop mov add) because it deals a lot with crazy pointer arithmetics and sonce inttoptr and ptrtoint doesn't provide any alias analysis information. Daniel, you said it doesn't make much sense to provide it but in my cases it is actually very much needed, you didn't say it wasn't possible to
2016 Feb 08
2
Memory Store/Load Optimization Issue (Emulating stack)
Hello, I am trying to emulate the "stack" as like on x86 when using push/pop so afterwards I can use LLVM's optimizer passes to simplify (reduce junk) the code. The LLVM IR code: define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) { ; push foo (On "stack") %sp_1 = sub i32 %sp, 4 %sp_1_ptr = inttoptr i32 %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align
2015 Mar 15
5
[LLVMdev] Alias analysis issue with structs on PPC
On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> wrote: > Hi Daniel, > > Thanks for your feedback. I would prefer not to write a new AA. Can't we > directly implement that traversal in BasicAA? > Can I ask why? Outside of the "well, it's another pass", i mean? BasicAA is stateless, so you can't cache, and you really don't
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2017 Jun 17
5
LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions
Hello all, There is a missing vectorization opportunity issue with clang 4.0 with the file attached. Indeed, when compiled with -O2, the "op_distance" function get vectorized, but not the "op" one. For information, this test case has been reduced from a file generated by the Pythran compiler (https://github.com/serge-sans-paille/pythran). If we take a look at the generated
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is
2018 Mar 20
3
What is the status of the "Killing Undef and Spreading Poison" RFC?
Hi Nuno, Except for one bit, which was that it requires correct typing of load/store > operations. That is, if you load an i32, it means you are loading a single > 32-bit integer, not two 16-bit integers or something else. > This is a valid concern because currently nor LLVM nor clang respect this > property. Clang may pass several parameters as a single variable, LLVM has >
2016 Mar 16
3
RFC: A change in InstCombine canonical form
=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445) IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float.