similar to: A 4x slower initialization loop in LLVM vs GCC and MSVC

Displaying 20 results from an estimated 2000 matches similar to: "A 4x slower initialization loop in LLVM vs GCC and MSVC"

2020 Oct 01
3
A 4x slower initialization loop in LLVM vs GCC and MSVC
> On Oct 1, 2020, at 20:45, Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > >> On Sep 27, 2020, at 12:52, Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi everyone, >> >> I was watching this video [1]. There's an example of an initialization loop for which >> Clang unfortunately
2016 Dec 26
1
Multiple simplifycfg pass make some loop significantly slower
Hi all, I am noticing a significant degradation in execution performance in loops with just one backedge than loops with two backedges. Unifying the backedges into one will also cause the slowdown. To replicate this problem, I used the C code in https://gist.github.com/sklam/11f11a410258ca191e6f263262a4ea65 and checked against clang-3.8 and clang-4.0 nightly. Depending on where I put the
2012 Apr 04
1
[LLVMdev] scalar replacement of aggregates slower?
I just upgraded our optimizer to LLVM 3.0 from 2.8 and noticed that the scalar replacement of aggregates pass takes a lot longer for some code. Has there been a performance regression in this pass, or does it do more work? LLVM 3.0: Total Execution Time: 1.0600 seconds (1.0526 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.5100
2016 Mar 16
3
RFC: A change in InstCombine canonical form
=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445) IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float.
2017 May 09
3
RFC: SROA for method argument
Hi, I am working to improve SROA to generate better code when a method has a struct in its arguments. I would appreciate it if I could have any suggestions or comments on how I can best proceed with this optimization. * Problem * I observed that LLVM often generates redundant instructions around glibc’s istreambuf_iterator. The problem comes from the scalar replacement (SROA) for methods with an
2013 Jan 18
2
[LLVMdev] Weird volatile propagation ?
Hi All, Using clang+llvm at head, I noticed a weird behaviour with the following reduced testcase : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target
2016 Mar 16
2
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > How do it interact with the "typeless pointers" work? > Right - the goal of the typeless pointer work is to fix all these bugs related to "didn't look through bitcasts" in optimizations. Sometimes that's going to mean more work (because the code
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h>
2016 Mar 16
3
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > David, > > Could you give us an update on the status of typeless pointer work? How > much work is left and when you think it might be ready? > It's a bit of an onion peel, really - since it will eventually involve generalizing/fixing every optimization that's currently leaning on typed
2015 Nov 10
4
SROA and volatile memcpy/memset
On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote: > On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote: >> I have a customer testcase where SROA splits a volatile memcpy and we end up >> generating bad code[1]. While this looks like a bug, simply preventing SROA >> from splitting volatile memory intrinsics causes basictest.ll for SROA
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
As a results of my investigations, the thread is also added to cfe-dev. The context : while porting my company code from the LLVM/Clang releases 3.1 to 3.2, I stumbled on a code size and performance regression. The testcase is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) {
2016 Mar 22
0
RFC: A change in InstCombine canonical form
Back to the discussion on the RFC, I still see some advantage in following the proposed solution. I see two paths forward: 1- Change canonical form, possibly lower memcpy to non-integer load and store in InstCombine. Then teach the backends to convert that to integer load and store if that is more profitable. Notice that we are talking about loads that have no use other than store. So it is a
2016 Mar 22
4
RFC: A change in InstCombine canonical form
I don't really mind, but the intermediate stage will not be very nice: that a lot of code / tests that needs to be written with bitcast, and all of that while they are deemed to disappear. The added value isn't clear to me considering the added work. I'm not sure it wouldn't add more work for all the cleanup required by the "typeless pointer", but I'm not sure
2013 Aug 09
2
[LLVMdev] [RFC] Poor code generation for paired load
Hi, I am investigating a poor code generation on x86-64 involving a 64-bits structure with two 32-bits fields (in the attached examples float, but similar behavior is exposed with i32, and we can probably generalize that to smaller types too). The root cause of the problem is in SROA, although I am not sure we should fix something there. That is why I need your advices. ** Problem ** 64-bits
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16,
2016 Mar 22
2
RFC: A change in InstCombine canonical form
I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for something that is not a correctness issue neither a regression. I'm not sure it justifies "bandaid patches" while there is a clear path forward, i.e. typeless pointers, unless there is an acknowledgement that typeless pointers won't be there before a couple of years. -- Mehdi > On Mar 22, 2016,
2016 Jul 04
2
Optimization issues (Alias Analysis?)
Hey, I am currently working on a VM which is based on LLVM and I would like to use its optimizer, but it somehow it can't detect something very simple (I guess.) This is the LLVM IR: target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple = "i386-unknown-linux-gnu" %struct.regs = type { i32, i32, i32 } define void @Test(%struct.regs* noalias
2013 Aug 12
2
[LLVMdev] [RFC] Poor code generation for paired load
Hi Eli, Thanks for the feedbacks. On Aug 9, 2013, at 8:00 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Fri, Aug 9, 2013 at 4:58 PM, Quentin Colombet <qcolombet at apple.com> wrote: >> Hi, >> >> I am investigating a poor code generation on x86-64 involving a 64-bits >> structure with two 32-bits fields (in the attached examples float, but
2015 Nov 10
2
SROA and volatile memcpy/memset
Hi, I have a customer testcase where SROA splits a volatile memcpy and we end up generating bad code[1]. While this looks like a bug, simply preventing SROA from splitting volatile memory intrinsics causes basictest.ll for SROA to fail. Not only that, but it also seems like handling of volatile memory transfers was done with some intent. What are the design decisions in SROA regarding
2016 Mar 22
2
RFC: A change in InstCombine canonical form
I feel very strongly that blocking work on making optimization bitcast-ignorant on the typeless pointer work would be a major mistake. Unless we expected the typeless pointer work to be concluded within the near term (say 3-6 months maximum), we should not block any development which would be accepted in the typeless pointer work wasn't planned. In my view, this is one of the largest