search for: speedup

Displaying 20 results from an estimated 983 matches for "speedup".

2015 May 03
2
[LLVMdev] libiomp, not libgomp as default library linked with -fopenmp
A couple more data points. Current llvm 3.7svn with the two outstanding OPENMP patches can build the openmp support in gdl 0.9.5 (which completely passes its test suite) and apbs 1.4.1's limited openmp support. On Sat, May 2, 2015 at 11:11 PM, Jack Howarth < howarth.mailing.lists at gmail.com> wrote: > On a positive note, current llvm 3.7svn with the two outstanding > OPENMP
2015 Jul 30
4
[LLVMdev] RFC: Callee speedup estimation in inline cost analysis
...oposal ------------- LLVM inlines a function if the size growth (in the given context) is less than a threshold. The threshold is increased based on certain characteristics of the called function (inline keyword and the fraction of vector instructions, for example). I propose the use of estimated speedup (estimated reduction in dynamic instruction count to be precise) as another factor that controls threshold. This would allow larger functions whose inlining potentially reduces execution time to be inlined. The dynamic instruction count of (an uninlined) function F is DI(F) = Sum_BB(Freq(BB) * In...
2015 Jul 31
0
[LLVMdev] RFC: Callee speedup estimation in inline cost analysis
Just nitpicking: 1) DI(F) should include a component that estimate the epi/prologue cost (frameSetupCost) which InlinedDF does not have 2) The speedup should include callsite cost associated with 'C' (call instr, argument passing): Speedup(F,C) = (DI(F) + CallCost(C) - InlinedDF(F,C))/DI(F). Otherwise the proposal looks reasonable to me. David On Thu, Jul 30, 2015 at 2:25 PM, Easwaran Raman <eraman at google.com> wrote:...
2013 Jun 02
4
[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
...Turning on LLVM's vectorizer gives a 2% slowdown. > aermod 16.03 14.45 16.13 Turning on LLVM's vectorizer gives a 2.5% slowdown. > air 6.80 5.28 5.73 > capacita 39.89 35.21 34.96 Turning on LLVM's vectorizer gives a 5% speedup. GCC gets a 5.5% speedup from its vectorizer. > channel 2.06 2.29 2.69 GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get. On the other hand, without vectorization LLVM's version runs 23% faster than GCC's, so while GCC's vectorizer lea...
2007 Jun 24
1
rsync summary details...
Hi, I'm trying to figure our some of these details: sent 34108 bytes received 6913101 bytes 19487.26 bytes/sec total size is 231889639875 speedup is 33378.82 1. Is the 6913101 really in bytes? 2. What is the 231889639875 measurement? Bytes? Bits? 3. What does "speedup" mean exactly? Thanks in advance, Shai -------------- next part -------------- HTML attachment scrubbed and removed
2015 Sep 16
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
...st of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly). —escha > On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > Can someone provide the file used to demonstrate the speedup here? > I'd be glad to take a quick crack at seeing if i can achieve the same speedup. > > > On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev >>...
2013 Jun 02
0
[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
...2% slowdown. > >> aermod 16.03 14.45 16.13 > > Turning on LLVM's vectorizer gives a 2.5% slowdown. > >> air 6.80 5.28 5.73 >> capacita 39.89 35.21 34.96 > > Turning on LLVM's vectorizer gives a 5% speedup. GCC gets a 5.5% speedup from > its vectorizer. > >> channel 2.06 2.29 2.69 > > GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get. On the > other hand, without vectorization LLVM's version runs 23% faster than GCC's, so &...
2011 Nov 08
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...lang -O3' against 'clang -O3 -mllvm -vectorize'? Yes. [I've tested the current patch directly using opt -vectorize -unroll-allow-partial; for running the test suite I recompiled llvm/clang to hardcode the options as I wanted them]. > > > The largest three performance speedups are: > > SingleSource/Benchmarks/BenchmarkGame/puzzle - 59.2% speedup > > SingleSource/UnitTests/Vector/multiplies - 57.7% speedup > > SingleSource/Benchmarks/Misc/flops-7 - 50.75% speedup > > > > The largest three performance slowd...
2011 Nov 08
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...ang -O3 -mllvm -unroll-allow-partial' with 'clang -O3 -mllvm -unroll-allow-partial -mllvm -vectorize'. It will show how much of the runtime overhead is due to the unrolling (produces more code that needs to be optimized) and which part is due to vectorization. The same counts for the speedup. How much is caused by unrolling and how much is actually caused by your pass. >>> The largest three performance speedups are: >>> SingleSource/Benchmarks/BenchmarkGame/puzzle - 59.2% speedup >>> SingleSource/UnitTests/Vector/multiplies - 57.7% sp...
2001 Sep 08
5
Patch
Hallo short question how is the Syntax for interactivity.patch ext3-dir-speedup.patch ? patch -p0 ext3-dir-speedup.patch doesnt work -- Frank
2013 Jun 03
0
[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
...fmul's. > > I'm not sure what the best way to implement this optimization in LLVM > is. Maybe > Shuxin has some ideas. > > So it looks like a missed fast-math optimization rather than anything > to do with > vectorization, which is strange as GCC only gets the big speedup when > vectorization is turned on. > > Ciao, Duncan. > >> >> Thanks, >> Nadav >> >> >> On Jun 2, 2013, at 1:27, Duncan Sands <duncan.sands at gmail.com >> <mailto:duncan.sands at gmail.com>> wrote: >> >>> Hi Jack, than...
2010 May 17
0
[LLVMdev] selection dag speedups / llc speedups
On May 14, 2010, at 11:24 AM, Jan Voung wrote: > I'm sure this has been asked many times, but is there current work on decreasing the time taken by the DAG-based instruction selector, or the other phases of llc? I am just beginning to dive into LLVM, and I am interested in compile-time reductions that do not reduce code quality dramatically. For example, simply switching on
2010 May 19
0
[LLVMdev] selection dag speedups / llc speedups
On May 18, 2010, at 12:07 PM, Jan Voung wrote: > Here are some recent stats of the fast vs local vs linear scan at O0 on "opt -std-compile-opts" processed bitcode files. The fast regalloc is still certainly faster at codegen than local with such bitcode files. Let me know if the link doesn't work: > >
2015 Sep 14
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
I did something similar for dominators, for GVN, etc. All see significant speedups. However, the answer i got back when i mentioned this was "things like ptrset and densemap should only have a small performance difference from side data when used and sized right", and i've found this to mostly be true after looking harder. In the case you are looking at, i see:...
2011 Nov 08
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...ch will probably > work for you. Hey Hal, those are great news. Especially as the numbers seem to show that vectorization has a significant performance impact. What did you compare exactly. 'clang -O3' against 'clang -O3 -mllvm -vectorize'? > The largest three performance speedups are: > SingleSource/Benchmarks/BenchmarkGame/puzzle - 59.2% speedup > SingleSource/UnitTests/Vector/multiplies - 57.7% speedup > SingleSource/Benchmarks/Misc/flops-7 - 50.75% speedup > > The largest three performance slowdowns are: > MultiSourc...
2010 May 18
0
[LLVMdev] selection dag speedups / llc speedups
On May 17, 2010, at 9:09 PM, Rafael Espindola wrote: >> The fast and local register allocators are meant to be used on unoptimized code, a 'Debug build'. While they do work on optimized code, they do not give good results. Their primary goal is compile time, not code quality. > > Yes, we have a somewhat uncommon use case. It is fine to spend time > optimizing bitcode (LTO
2011 Nov 08
1
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...t all of the bugs that it revealed have now been fixed. There are still two programs that don't compile with vectorization turned on, and I'm working on those now, but in case anyone feels like playing with vectorization, this patch will probably work for you. The largest three performance speedups are: SingleSource/Benchmarks/BenchmarkGame/puzzle - 59.2% speedup SingleSource/UnitTests/Vector/multiplies - 57.7% speedup SingleSource/Benchmarks/Misc/flops-7 - 50.75% speedup The largest three performance slowdowns are: MultiSource/Benchmarks/MiBench&...
2015 Sep 15
7
RFC: speedups with instruction side-data (ADCE, perhaps others?)
...use. > I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff. Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here? —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/b6c7d7ff/attachment.html>
2004 Apr 27
0
[LLVMdev] LLVM benchmarks against GCC
...--------------- > 1. Programs/External: > > a) CBE code is already comparable with GCC code > (some tests are slower, but some quicker.) > b) LLC code is still rather slower then GCC code This is about right. With the CBE, we are *consistently* faster on 179.art (a 2-2.5x speedup), 252.eon (~20% speedup), 255.vortex (~15% speedup), and 130.li (~20% speedup). Some of the other benchmarks we lag behind, others are extremely noisy. LLC generates code that is generally pretty slow compared to the CBE on X86. This is largely due to lack of global register allocator for floati...
2010 May 18
2
[LLVMdev] selection dag speedups / llc speedups
> The fast and local register allocators are meant to be used on unoptimized code, a 'Debug build'. While they do work on optimized code, they do not give good results. Their primary goal is compile time, not code quality. Yes, we have a somewhat uncommon use case. It is fine to spend time optimizing bitcode (LTO is a OK), but we want to make the final IL -> Executable translation