thr3ads.net - similar to: "[LLVMdev] Aliasing of volatile and non-volatile"

[LLVMdev] Aliasing of volatile and non-volatile

2013 Sep 07

0

[LLVMdev] Aliasing of volatile and non-volatile

Are you sure this is an alias problem? What is happening is LLVM is leaving the code looking like this: int foo(int *p, volatile int *q, int n) { int i, s = 0; for (i = 0; i < n; ++i) s += *p + *q; return s; } but GCC is changing to code to look like this: int foo(int *p, volatile int *q, int n) { int i, s = 0; int t; t = *p; for (i = 0; i < n; ++i) s += t + *q;

[LLVMdev] GCC vs. LLVM difference on simple code example

2011 Mar 24

2

[LLVMdev] GCC vs. LLVM difference on simple code example

Hi, I have a question on why gcc and llvm-gcc compile the following simple code snippet differently: extern int a; extern int *b; void foo() { int i; for (i = 1; i < 100; ++i) a += b[i]; } gcc compiles this function hoisting the load of the global variable "b" outside of the loop, while llvm-gcc keeps it inside the loop. This results in slower code on the part of

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

2012 Jan 04

1

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

I write a small function and test it under clang and gcc, filet test.c: double X[100]; double Y[100]; double DA = 0.3; int f() { int i; for (i = 0; i < 100; i++) Y[i] = Y[i] - DA * X[i]; return 0; } clang -S -O3 -o test.s test.c -march=native -ccc-echo result: "D:/work/trunk/bin/Release/clang.exe" -cc1 -triple i686-pc-win32 -S -disable-fr e -disable-llvm-verifier

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

Help required regarding IPRA and Local Function optimization

2016 Jun 30

4

Help required regarding IPRA and Local Function optimization

Hello Mentors, I am currently finding bug in Local Function related optimization due to which runtime failures are observed in some test cases, as those test cases are containing very large function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider

[LLVMdev] Failure to optimize ? operator

2011 Dec 14

2

[LLVMdev] Failure to optimize ? operator

I don't understand your point. Which version is better does NOT depend on what inputs are passed to the function. The compiled code for (as per llvm) f1 will always take less time to execute than f2. for x > 0 => T(f1) < T(f2) for x <= 0 => T(f1) = T(f2) where T() is the time to execute the given function. So always T(f1) <= T(f2). I would call this a missed

Help required regarding IPRA and Local Function optimization

2016 Jun 30

0

Help required regarding IPRA and Local Function optimization

One more interesting thing I have noticed is as following : In sqlite3 code consider 3 functions namely sqlite3Update, sqlite3Select and sqlite3Where begin sqlite3WhereBegin is called by both functions sqlite3Update and sqlite3Select but according to CallGraphSCC sqlite3Update is codegen before in that case during RegMask propagation phase default regmask is used for call site of

[BUG Report] -dead_strip, strips prefix data unconditionally on macOS

2017 Mar 07

4

[BUG Report] -dead_strip, strips prefix data unconditionally on macOS

Firstly, do you need "main.dsp" defined as an external symbol, or can all external references go via "main"? If the answer is the latter, that will make the solution simpler. If only the latter, you will need to make a change to LLVM here: http://llvm-cs.pcc.me.uk/lib/CodeGen/AsmPrinter/AsmPrinter.cpp#650 Basically you would need to add a hook to the TargetLoweringObjectFile

DragonEgg for GCC v8.x and LLVM v6.x is just able to work

2017 Aug 21

3

DragonEgg for GCC v8.x and LLVM v6.x is just able to work

Hi LLVM and GCC developers, My sincere thanks will goto: * Duncan, the core developer of llvm-gcc and dragonegg http://llvm.org/devmtg/2009-10/Sands_LLVMGCCPlugin.pdf * David, the innovator and developer of GCC https://dmalcolm.fedorapeople.org/gcc/global-state/requirements.html and others who give me kind response for teaching me patiently and carefully about how to migrate GCC v4.8.x to

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

[BUG Report] -dead_strip, strips prefix data unconditionally on macOS

2017 Mar 07

2

[BUG Report] -dead_strip, strips prefix data unconditionally on macOS

I suspect that the format isn't important if you do that, but I wouldn't recommend it, at least because inlining (and other inter-procedural optimizations) are not expected to work correctly if you produce IR like that. Peter On Mon, Mar 6, 2017 at 6:44 PM, Moritz Angermann <moritz.angermann at gmail.com > wrote: > Peter, > > thanks again! Yes, we only need to refer to

[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data

2020 Aug 05

10

[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data

Greetings, We present “Machine Function Splitter”, a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions

[LLVMdev] interesting possible compiler bug

2012 Oct 02

4

[LLVMdev] interesting possible compiler bug

This code is essentially from an LTP test ( http://ltp.sourceforge.net/ <http://ltp.sourceforge.net/> ). #include <stdlib.h> int main() { void *curr; do { curr = malloc(1); } while (curr); return 0; } If you compile it with no optimization, it will keep the malloc calls. If you compile it with -O2, it will create an infinite loop, i.e. assuming that malloc

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {

similar to: [LLVMdev] Aliasing of volatile and non-volatile