Displaying 20 results from an estimated 2000 matches similar to: "A 4x slower initialization loop in LLVM vs GCC and MSVC"
2020 Oct 01
3
A 4x slower initialization loop in LLVM vs GCC and MSVC
> On Oct 1, 2020, at 20:45, Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
>> On Sep 27, 2020, at 12:52, Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi everyone,
>>
>> I was watching this video [1]. There's an example of an initialization loop for which
>> Clang unfortunately
2016 Dec 26
1
Multiple simplifycfg pass make some loop significantly slower
Hi all,
I am noticing a significant degradation in execution performance in loops
with just one backedge than loops with two backedges. Unifying the
backedges into one will also cause the slowdown.
To replicate this problem, I used the C code in
https://gist.github.com/sklam/11f11a410258ca191e6f263262a4ea65 and checked
against clang-3.8 and clang-4.0 nightly. Depending on where I put the
2012 Apr 04
1
[LLVMdev] scalar replacement of aggregates slower?
I just upgraded our optimizer to LLVM 3.0 from 2.8 and noticed that the
scalar replacement of aggregates pass takes a lot longer for some code.
Has there been a performance regression in this pass, or does it do more
work?
LLVM 3.0:
Total Execution Time: 1.0600 seconds (1.0526 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
0.5100
2016 Mar 16
3
RFC: A change in InstCombine canonical form
=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445)
IR contains code for loading a float from float * and storing it to a float
* address. After canonicalization of load in InstCombine [1], new bitcasts
are added to the IR (see bottom of the email for code samples). This
prevents select speculation in SROA to work. Also after SROA we have
bitcasts from int32 to float.
2017 May 09
3
RFC: SROA for method argument
Hi,
I am working to improve SROA to generate better code when a method has a
struct in its arguments. I would appreciate it if I could have any
suggestions or comments on how I can best proceed with this optimization.
* Problem *
I observed that LLVM often generates redundant instructions around glibc’s
istreambuf_iterator. The problem comes from the scalar replacement (SROA)
for methods with an
2013 Jan 18
2
[LLVMdev] Weird volatile propagation ?
Hi All,
Using clang+llvm at head, I noticed a weird behaviour with the following
reduced testcase :
$ cat test.c
#include <stdint.h>
struct R {
uint16_t a;
uint16_t b;
};
volatile struct R * const addr = (volatile struct R *) 416;
void test(uint16_t a)
{
struct R r = { a, 1 };
*addr = r;
}
$ clang -O2 -o - -emit-llvm -S -c test.c
; ModuleID = 'test.c'
target
2016 Mar 16
2
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> How do it interact with the "typeless pointers" work?
>
Right - the goal of the typeless pointer work is to fix all these bugs
related to "didn't look through bitcasts" in optimizations. Sometimes
that's going to mean more work (because the code
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All,
In the language reference manual, the access behavior of the memcpy,
memmove and memset intrinsics is not well defined with respect to the
volatile flag. The LRM even states that "it is unwise to depend on it".
This forces optimization passes to be conservatively correct and prevent
optimizations.
A very simple example of this is :
$ cat test.c
#include <stdint.h>
2016 Mar 16
3
RFC: A change in InstCombine canonical form
On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote:
> David,
>
> Could you give us an update on the status of typeless pointer work? How
> much work is left and when you think it might be ready?
>
It's a bit of an onion peel, really - since it will eventually involve
generalizing/fixing every optimization that's currently leaning on typed
2015 Nov 10
4
SROA and volatile memcpy/memset
On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote:
> On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote:
>> I have a customer testcase where SROA splits a volatile memcpy and we end up
>> generating bad code[1]. While this looks like a bug, simply preventing SROA
>> from splitting volatile memory intrinsics causes basictest.ll for SROA
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
As a results of my investigations, the thread is also added to cfe-dev.
The context : while porting my company code from the LLVM/Clang releases
3.1 to 3.2, I stumbled on a code size and performance regression. The
testcase is :
$ cat test.c
#include <stdint.h>
struct R {
uint16_t a;
uint16_t b;
};
volatile struct R * const addr = (volatile struct R *) 416;
void test(uint16_t a)
{
2016 Mar 22
0
RFC: A change in InstCombine canonical form
Back to the discussion on the RFC, I still see some advantage in following
the proposed solution. I see two paths forward:
1- Change canonical form, possibly lower memcpy to non-integer load and
store in InstCombine. Then teach the backends to convert that to integer
load and store if that is more profitable. Notice that we are talking about
loads that have no use other than store. So it is a
2016 Mar 22
4
RFC: A change in InstCombine canonical form
I don't really mind, but the intermediate stage will not be very nice: that a lot of code / tests that needs to be written with bitcast, and all of that while they are deemed to disappear. The added value isn't clear to me considering the added work. I'm not sure it wouldn't add more work for all the cleanup required by the "typeless pointer", but I'm not sure
2013 Aug 09
2
[LLVMdev] [RFC] Poor code generation for paired load
Hi,
I am investigating a poor code generation on x86-64 involving a 64-bits structure with two 32-bits fields (in the attached examples float, but similar behavior is exposed with i32, and we can probably generalize that to smaller types too).
The root cause of the problem is in SROA, although I am not sure we should fix something there. That is why I need your advices.
** Problem **
64-bits
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this
seems like an easy and simple deficiency in the IR intrinsic for memcpy.
See below.
On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison <
arnaud.allarddegrandmaison at parrot.com> wrote:
> define void @test(i16 zeroext %a) nounwind uwtable {
> %r.sroa.0 = alloca i16, align 2
> %r.sroa.1 = alloca i16,
2016 Mar 22
2
RFC: A change in InstCombine canonical form
I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for something that is not a correctness issue neither a regression. I'm not sure it justifies "bandaid patches" while there is a clear path forward, i.e. typeless pointers, unless there is an acknowledgement that typeless pointers won't be there before a couple of years.
--
Mehdi
> On Mar 22, 2016,
2016 Jul 04
2
Optimization issues (Alias Analysis?)
Hey,
I am currently working on a VM which is based on LLVM and I would like to
use its optimizer, but it somehow it can't detect something very simple (I
guess.)
This is the LLVM IR:
target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"
%struct.regs = type { i32, i32, i32 }
define void @Test(%struct.regs* noalias
2013 Aug 12
2
[LLVMdev] [RFC] Poor code generation for paired load
Hi Eli,
Thanks for the feedbacks.
On Aug 9, 2013, at 8:00 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Fri, Aug 9, 2013 at 4:58 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> Hi,
>>
>> I am investigating a poor code generation on x86-64 involving a 64-bits
>> structure with two 32-bits fields (in the attached examples float, but
2015 Nov 10
2
SROA and volatile memcpy/memset
Hi,
I have a customer testcase where SROA splits a volatile memcpy and we
end up generating bad code[1]. While this looks like a bug, simply
preventing SROA from splitting volatile memory intrinsics causes
basictest.ll for SROA to fail. Not only that, but it also seems like
handling of volatile memory transfers was done with some intent.
What are the design decisions in SROA regarding
2016 Mar 22
2
RFC: A change in InstCombine canonical form
I feel very strongly that blocking work on making optimization
bitcast-ignorant on the typeless pointer work would be a major mistake.
Unless we expected the typeless pointer work to be concluded within the
near term (say 3-6 months maximum), we should not block any development
which would be accepted in the typeless pointer work wasn't planned.
In my view, this is one of the largest