Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Aliasing of volatile and non-volatile"
2013 Sep 07
0
[LLVMdev] Aliasing of volatile and non-volatile
Are you sure this is an alias problem?
What is happening is LLVM is leaving the code looking like this:
int foo(int *p, volatile int *q, int n) {
int i, s = 0;
for (i = 0; i < n; ++i)
s += *p + *q;
return s;
}
but GCC is changing to code to look like this:
int foo(int *p, volatile int *q, int n) {
int i, s = 0;
int t;
t = *p;
for (i = 0; i < n; ++i)
s += t + *q;
2011 Mar 24
2
[LLVMdev] GCC vs. LLVM difference on simple code example
Hi,
I have a question on why gcc and llvm-gcc compile the following simple code
snippet differently:
extern int a;
extern int *b;
void foo() {
int i;
for (i = 1; i < 100; ++i)
a += b[i];
}
gcc compiles this function hoisting the load of the global variable "b"
outside of the loop, while llvm-gcc keeps it inside the loop. This results
in slower code on the part of
2012 Jan 04
1
[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?
I write a small function and test it under clang and gcc,
filet test.c:
double X[100]; double Y[100]; double DA = 0.3;
int f()
{
int i;
for (i = 0; i < 100; i++)
Y[i] = Y[i] - DA * X[i];
return 0;
}
clang -S -O3 -o test.s test.c -march=native -ccc-echo
result:
"D:/work/trunk/bin/Release/clang.exe" -cc1 -triple i686-pc-win32 -S
-disable-fr
e -disable-llvm-verifier
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem
<jvanadrighem at gmail.com> wrote:
> Do you have some specific performance measurements?
Averaging 4 runs of 10000 iterations each of Coremark on my X86_64
desktop showed:
-O2 performance: +2.9% faster with the L.E.V. pass
-Os size: 1.5% smaller with the L.E.V. pass
In the case of Coremark, the benefit comes mainly from the matrix
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM,
This is a proposal for a new pass that improves performance and code
size in some nested loop situations. The pass is target independent.
>From the description in the file header:
This optimization finds loop exit values reevaluated after the loop
execution and replaces them by the corresponding exit values if they
are available. Such sequences can arise after the
2016 Jun 30
4
Help required regarding IPRA and Local Function optimization
Hello Mentors,
I am currently finding bug in Local Function related optimization due to
which runtime failures are observed in some test cases, as those test cases
are containing very large function with recursion and object oriented code
so I am not able to find a pattern which is causing failure. So I tried
following simple case to understand expected behavior from this
optimization.
Consider
2011 Dec 14
2
[LLVMdev] Failure to optimize ? operator
I don't understand your point. Which version is better does NOT
depend on what inputs are passed to the function. The compiled code
for (as per llvm) f1 will always take less time to execute than f2.
for x > 0 => T(f1) < T(f2)
for x <= 0 => T(f1) = T(f2)
where T() is the time to execute the given function.
So always T(f1) <= T(f2).
I would call this a missed
2016 Jun 30
0
Help required regarding IPRA and Local Function optimization
One more interesting thing I have noticed is as following :
In sqlite3 code consider 3 functions namely sqlite3Update, sqlite3Select
and sqlite3Where begin sqlite3WhereBegin is called by both functions
sqlite3Update and sqlite3Select but according to CallGraphSCC sqlite3Update
is codegen before in that case during RegMask propagation phase default
regmask is used for call site of
2017 Mar 07
4
[BUG Report] -dead_strip, strips prefix data unconditionally on macOS
Firstly, do you need "main.dsp" defined as an external symbol, or can all
external references go via "main"? If the answer is the latter, that will
make the solution simpler.
If only the latter, you will need to make a change to LLVM here:
http://llvm-cs.pcc.me.uk/lib/CodeGen/AsmPrinter/AsmPrinter.cpp#650
Basically you would need to add a hook to the TargetLoweringObjectFile
2017 Aug 21
3
DragonEgg for GCC v8.x and LLVM v6.x is just able to work
Hi LLVM and GCC developers,
My sincere thanks will goto:
* Duncan, the core developer of llvm-gcc and dragonegg
http://llvm.org/devmtg/2009-10/Sands_LLVMGCCPlugin.pdf
* David, the innovator and developer of GCC
https://dmalcolm.fedorapeople.org/gcc/global-state/requirements.html
and others who give me kind response for teaching me patiently and
carefully about how to migrate GCC v4.8.x to
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior
2017 Mar 07
2
[BUG Report] -dead_strip, strips prefix data unconditionally on macOS
I suspect that the format isn't important if you do that, but I wouldn't
recommend it, at least because inlining (and other inter-procedural
optimizations) are not expected to work correctly if you produce IR like
that.
Peter
On Mon, Mar 6, 2017 at 6:44 PM, Moritz Angermann <moritz.angermann at gmail.com
> wrote:
> Peter,
>
> thanks again! Yes, we only need to refer to
2020 Aug 05
10
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data
Greetings,
We present “Machine Function Splitter”, a codegen optimization pass which
splits functions into hot and cold parts. This pass leverages the basic
block sections feature recently introduced in LLVM from the Propeller
project. The pass targets functions with profile coverage, identifies cold
blocks and moves them to a separate section. The linker groups all cold
blocks across functions
2012 Oct 02
4
[LLVMdev] interesting possible compiler bug
This code is essentially from an LTP test ( http://ltp.sourceforge.net/
<http://ltp.sourceforge.net/> ).
#include <stdlib.h>
int main() {
void *curr;
do {
curr = malloc(1);
} while (curr);
return 0;
}
If you compile it with no optimization, it will keep the malloc calls.
If you compile it with -O2, it will create an infinite loop, i.e.
assuming that malloc
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the very simple example
----
const int SIZE = 3;
int the_func(int* p_array)
{
int dummy = 0;
#if defined(ITER)
for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p;
#else
for(int i = 0; i < SIZE; ++i) dummy += p_array[i];
#endif
return dummy;
}
int main(int argc, char** argv)
{