thr3ads.net - similar to: "Help with SROA throwing away no-alias information"

Displaying 20 results from an estimated 3000 matches similar to: "Help with SROA throwing away no-alias information"

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

Hey list, I've recently done the first test run of bumping our Burst compiler from LLVM 10 -> 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

Tried a bunch of them there (x86-64, haswell, znver2) and they all defaulted to 4-wide - haswell additionally caused some extra loop unrolling but still with 8-wide pows. Cheers, -Neil. On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <lebedev.ri at gmail.com> wrote: > Did you specify the target CPU the code should be optimized for? > For clang that is -march=native/znver2/... /

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics) that LLVM by default would generate - and since SLEEF has highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to use that. So we would not see 4/8 libcalls and instead see 1 call to something that lights up the ymm registers. I guess the problem then is that the default expectation is that pow would be

[LLVMdev] Alias analysis issue with structs on PPC

2015 Mar 17

[LLVMdev] Alias analysis issue with structs on PPC

Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most

RFC: SROA for method argument

2017 May 09

RFC: SROA for method argument

Hi, I am working to improve SROA to generate better code when a method has a struct in its arguments. I would appreciate it if I could have any suggestions or comments on how I can best proceed with this optimization. * Problem * I observed that LLVM often generates redundant instructions around glibc’s istreambuf_iterator. The problem comes from the scalar replacement (SROA) for methods with an

[LLVMdev] Debug info for lazy variables triggers SROA assertion

2015 Jun 01

[LLVMdev] Debug info for lazy variables triggers SROA assertion

Hi! I created a bug report (https://llvm.org/bugs/show_bug.cgi?id=23712) for this failure but then I realized that my approach may be wrong. The following D source contains a lazy variable: void bar(lazy bool val) { val(); } The lazy variable val is translated to a delegate. The signature and the first IR lines are: define void @_D7opover23barFLbZv({ i8*, i1 (i8*)* } %val_arg) #0 {

Optimization generate super long function definition

2020 Apr 12

Optimization generate super long function definition

Hi all, sorry to have sent the same question around. I am quite desperately looking for a solution to this problem and I figured the mailing list is the best bet. In my code, I generate the following function: define i32 @gl.qi([500 x i32] %x, i32 %i) { entry: %x. = alloca [500 x i32] %i. = alloca i32 %0 = alloca [500 x i32] store [500 x i32] %x, [500 x i32]* %x. store i32 %i, i32*

[LLVMdev] Alias analysis issue with structs on PPC

2015 Mar 15

[LLVMdev] Alias analysis issue with structs on PPC

On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> wrote: > Hi Daniel, > > Thanks for your feedback. I would prefer not to write a new AA. Can't we > directly implement that traversal in BasicAA? > Can I ask why? Outside of the "well, it's another pass", i mean? BasicAA is stateless, so you can't cache, and you really don't

RFC: A change in InstCombine canonical form

2016 Mar 16

RFC: A change in InstCombine canonical form

=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445) IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float.

[LLVMdev] Weird volatile propagation ?

2013 Jan 18

[LLVMdev] Weird volatile propagation ?

Hi All, Using clang+llvm at head, I noticed a weird behaviour with the following reduced testcase : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target

RFC: A change in InstCombine canonical form

2016 Mar 16

RFC: A change in InstCombine canonical form

On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > How do it interact with the "typeless pointers" work? > Right - the goal of the typeless pointer work is to fix all these bugs related to "didn't look through bitcasts" in optimizations. Sometimes that's going to mean more work (because the code

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 28

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>

RFC: A change in InstCombine canonical form

2016 Mar 16

RFC: A change in InstCombine canonical form

On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > David, > > Could you give us an update on the status of typeless pointer work? How > much work is left and when you think it might be ready? > It's a bit of an onion peel, really - since it will eventually involve generalizing/fixing every optimization that's currently leaning on typed

SROA and volatile memcpy/memset

2015 Nov 10

SROA and volatile memcpy/memset

On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote: > On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote: >> I have a customer testcase where SROA splits a volatile memcpy and we end up >> generating bad code[1]. While this looks like a bug, simply preventing SROA >> from splitting volatile memory intrinsics causes basictest.ll for SROA

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {

RFC: A change in InstCombine canonical form

2016 Mar 22

RFC: A change in InstCombine canonical form

Back to the discussion on the RFC, I still see some advantage in following the proposed solution. I see two paths forward: 1- Change canonical form, possibly lower memcpy to non-integer load and store in InstCombine. Then teach the backends to convert that to integer load and store if that is more profitable. Notice that we are talking about loads that have no use other than store. So it is a

RFC: A change in InstCombine canonical form

2016 Mar 22

RFC: A change in InstCombine canonical form

I don't really mind, but the intermediate stage will not be very nice: that a lot of code / tests that needs to be written with bitcast, and all of that while they are deemed to disappear. The added value isn't clear to me considering the added work. I'm not sure it wouldn't add more work for all the cleanup required by the "typeless pointer", but I'm not sure

[LLVMdev] [RFC] Poor code generation for paired load

2013 Aug 09

[LLVMdev] [RFC] Poor code generation for paired load

Hi, I am investigating a poor code generation on x86-64 involving a 64-bits structure with two 32-bits fields (in the attached examples float, but similar behavior is exposed with i32, and we can probably generalize that to smaller types too). The root cause of the problem is in SROA, although I am not sure we should fix something there. That is why I need your advices. ** Problem ** 64-bits

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,

RFC: A change in InstCombine canonical form

2016 Mar 22

RFC: A change in InstCombine canonical form

I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for something that is not a correctness issue neither a regression. I'm not sure it justifies "bandaid patches" while there is a clear path forward, i.e. typeless pointers, unless there is an acknowledgement that typeless pointers won't be there before a couple of years. -- Mehdi > On Mar 22, 2016,

similar to: Help with SROA throwing away no-alias information