similar to: Help with SROA throwing away no-alias information

Displaying 20 results from an estimated 3000 matches similar to: "Help with SROA throwing away no-alias information"

2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Hey list, I've recently done the first test run of bumping our Burst compiler from LLVM 10 -> 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the
2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
Tried a bunch of them there (x86-64, haswell, znver2) and they all defaulted to 4-wide - haswell additionally caused some extra loop unrolling but still with 8-wide pows. Cheers, -Neil. On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <lebedev.ri at gmail.com> wrote: > Did you specify the target CPU the code should be optimized for? > For clang that is -march=native/znver2/... /
2020 Jul 16
4
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics) that LLVM by default would generate - and since SLEEF has highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to use that. So we would not see 4/8 libcalls and instead see 1 call to something that lights up the ymm registers. I guess the problem then is that the default expectation is that pow would be
2015 Mar 17
2
[LLVMdev] Alias analysis issue with structs on PPC
Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most
2017 May 09
3
RFC: SROA for method argument
Hi, I am working to improve SROA to generate better code when a method has a struct in its arguments. I would appreciate it if I could have any suggestions or comments on how I can best proceed with this optimization. * Problem * I observed that LLVM often generates redundant instructions around glibc’s istreambuf_iterator. The problem comes from the scalar replacement (SROA) for methods with an
2015 Jun 01
2
[LLVMdev] Debug info for lazy variables triggers SROA assertion
Hi! I created a bug report (https://llvm.org/bugs/show_bug.cgi?id=23712) for this failure but then I realized that my approach may be wrong. The following D source contains a lazy variable: void bar(lazy bool val) { val(); } The lazy variable val is translated to a delegate. The signature and the first IR lines are: define void @_D7opover23barFLbZv({ i8*, i1 (i8*)* } %val_arg) #0 {
2020 Apr 12
2
Optimization generate super long function definition
Hi all, sorry to have sent the same question around. I am quite desperately looking for a solution to this problem and I figured the mailing list is the best bet. In my code, I generate the following function: define i32 @gl.qi([500 x i32] %x, i32 %i) { entry: %x. = alloca [500 x i32] %i. = alloca i32 %0 = alloca [500 x i32] store [500 x i32] %x, [500 x i32]* %x. store i32 %i, i32*
2015 Mar 15
5
[LLVMdev] Alias analysis issue with structs on PPC
On Sun, Mar 15, 2015 at 4:34 PM Olivier Sallenave <ol.sall at gmail.com> wrote: > Hi Daniel, > > Thanks for your feedback. I would prefer not to write a new AA. Can't we > directly implement that traversal in BasicAA? > Can I ask why? Outside of the "well, it's another pass", i mean? BasicAA is stateless, so you can't cache, and you really don't
2016 Mar 16
3
RFC: A change in InstCombine canonical form
=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445) IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float.
2013 Jan 18
2
[LLVMdev] Weird volatile propagation ?
Hi All, Using clang+llvm at head, I noticed a weird behaviour with the following reduced testcase : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target
2016 Mar 16
2
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > How do it interact with the "typeless pointers" work? > Right - the goal of the typeless pointer work is to fix all these bugs related to "didn't look through bitcasts" in optimizations. Sometimes that's going to mean more work (because the code
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>
2016 Mar 16
3
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > David, > > Could you give us an update on the status of typeless pointer work? How > much work is left and when you think it might be ready? > It's a bit of an onion peel, really - since it will eventually involve generalizing/fixing every optimization that's currently leaning on typed
2015 Nov 10
4
SROA and volatile memcpy/memset
On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote: > On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote: >> I have a customer testcase where SROA splits a volatile memcpy and we end up >> generating bad code[1]. While this looks like a bug, simply preventing SROA >> from splitting volatile memory intrinsics causes basictest.ll for SROA
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {
2016 Mar 22
0
RFC: A change in InstCombine canonical form
Back to the discussion on the RFC, I still see some advantage in following the proposed solution. I see two paths forward: 1- Change canonical form, possibly lower memcpy to non-integer load and store in InstCombine. Then teach the backends to convert that to integer load and store if that is more profitable. Notice that we are talking about loads that have no use other than store. So it is a
2016 Mar 22
4
RFC: A change in InstCombine canonical form
I don't really mind, but the intermediate stage will not be very nice: that a lot of code / tests that needs to be written with bitcast, and all of that while they are deemed to disappear. The added value isn't clear to me considering the added work. I'm not sure it wouldn't add more work for all the cleanup required by the "typeless pointer", but I'm not sure
2013 Aug 09
2
[LLVMdev] [RFC] Poor code generation for paired load
Hi, I am investigating a poor code generation on x86-64 involving a 64-bits structure with two 32-bits fields (in the attached examples float, but similar behavior is exposed with i32, and we can probably generalize that to smaller types too). The root cause of the problem is in SROA, although I am not sure we should fix something there. That is why I need your advices. ** Problem ** 64-bits
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,
2016 Mar 22
2
RFC: A change in InstCombine canonical form
I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for something that is not a correctness issue neither a regression. I'm not sure it justifies "bandaid patches" while there is a clear path forward, i.e. typeless pointers, unless there is an acknowledgement that typeless pointers won't be there before a couple of years. -- Mehdi > On Mar 22, 2016,