search for: dhryston

Displaying 20 results from an estimated 62 matches for "dhryston".

Did you mean: dhrystone
2009 Nov 10
0
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
On Nov 9, 2009, at 7:34 PM, Neel Nagar wrote: > I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the > memcpy intrinsic. I used the Neon load multiple instruction to move up > to 48 bytes at a time . Over 15 scalar instructions collapsed down > into these 2 Neon instructions. > > fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 35...
2009 Nov 10
4
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d...
2009 Nov 10
3
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
On Nov 9, 2009, at 5:59 PM, David Conrad wrote: > On Nov 9, 2009, at 7:34 PM, Neel Nagar wrote: > >> I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the >> memcpy intrinsic. I used the Neon load multiple instruction to move >> up >> to 48 bytes at a time . Over 15 scalar instructions collapsed down >> into these 2 Neon instructions. Nice. Thanks for working on this. It has long been on...
2010 Apr 27
3
[LLVMdev] Phoronix: Benchmarking LLVM & Clang Against GCC 4.5
FYI http://www.phoronix.com/scan.php?page=article&item=gcc_llvm_clang&num=1
2010 Apr 27
0
[LLVMdev] Phoronix: Benchmarking LLVM & Clang Against GCC 4.5
On 27 April 2010 08:18, Stefano Delli Ponti <stefano.delliponti at gmail.com> wrote: > FYI > http://www.phoronix.com/scan.php?page=article&item=gcc_llvm_clang&num=1 For Apache and Dhrystone, the performance boost is good (but only the former is really important), but for the rest, especially those with image/sound processing, and HMMR, it's still far behind. Is this only because there is no auto vectorization in LLVM? Would be good to know why some programs were not compiled wit...
2010 Mar 24
1
[LLVMdev] [cfe-dev] 2.7 Pre-release1 available for testing
...eural 1.0 -> 0.9 MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm 1.06 -> 0.9 MultiSource/Benchmarks/Olden/treeadd/treeadd 11.44 -> 9.89 MultiSource/Benchmarks/Olden/tsp/tsp 1.14 -> 1.02 MultiSource/Benchmarks/Ptrdist/anagram/anagram 1.33 -> 1.23 SingleSource/Benchmarks/Dhrystone/dry 7.32 -> 5.16 SingleSource/Benchmarks/Dhrystone/fldry 8.02 -> 6.65 .... I'll have to write a script to compare the results, its boring and inaccurate to do by hand. Will go through the bugzilla tomorrow and see if I need to open new bugs for this stuff. > > /To test cl...
2004 Apr 30
5
[LLVMdev] Benchmarks
Dear List, There's been some recent discussion on the list about benchmarks. I just read a Dr. Dobbs article on the relative runtime performance of various compilers (8 of them compared) on Intel platforms. The test focused on mainly template type things but offers Dhrystone and zlib for comparisons. There's no clear winner as all compilers perform well in some areas and poorly in others. The overall rating (Table 2 in the article) ranks the compilers thusly (higher is better): Intel 8.0 9.22 VC++ 7.1 7.56 CodeWarrior 7.44 GCC 3.2 6.67 VC++ 6.0 6.00 Comeau 4.3....
2010 Mar 30
0
[LLVMdev] [cfe-dev] 2.7 Pre-release1 available for testing
...; MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm 1.06 -> 0.9 > MultiSource/Benchmarks/Olden/treeadd/treeadd 11.44 -> 9.89 > MultiSource/Benchmarks/Olden/tsp/tsp 1.14 -> 1.02 > MultiSource/Benchmarks/Ptrdist/anagram/anagram 1.33 -> 1.23 > SingleSource/Benchmarks/Dhrystone/dry 7.32 -> 5.16 > SingleSource/Benchmarks/Dhrystone/fldry 8.02 -> 6.65 > .... > Unfortunately, we just don't have enough man power to have performance be a release criteria at this time. We also need a better infrastructure in place to track this stuff (Daniel is working o...
2008 Dec 15
2
[LLVMdev] A faster instruction selector?
...ceton.edu/software/lcc), in order to see how much faster instruction selection could be. lcc uses a BURG-type (called lburg) instruction-selector. The following is for x86/linux (ubuntu) I am interested in JIT performance so I have only counted user+sys time. Compiling a small test program (the dhrystone benchmark): lcc (Fraser and Hanson) (after pre-processing): 4ms. llc : 16ms. Incidentally, optimisation is respectably fast opt -O3 : 16ms. (My machine is quite slow) Using -time-passes shows that almost all of the t...
2010 Apr 21
1
[LLVMdev] "Benchmarking LLVM & Clang Against GCC 4.5"
For interest. It looks like LLVM is the ultimate Dhrystone compiler! :) http://www.phoronix.com/scan.php?page=article&item=gcc_llvm_clang&num=1 It's nice that they compared against llvm 2.7 prerelease. -Chris
2002 Oct 22
0
gcc 3.2 performance
...topic, but may be of interest: I've done some comparing of the overall performance obtained with gcc 3.2.0, compared to gcc 2.95.3 (linux/x86 platform). Has anybody done similar things, and/or does anybody have any comments on this? (Maybe be handled off-list AFAIAC.) 1) On simple benchmarks (dhrystone, "floating dhrystone", some of my own), 3.2 is faster than 2.95.3, but with -O2 only. With -O3, execution times on these can become 2x longer with 3.2 . This appears to be due to -finline-functions that is activated by -O3. 2) Comparison with my own graphing programme with its own expre...
2004 May 02
0
[LLVMdev] Benchmarks
...> Dear List, > > There's been some recent discussion on the list about benchmarks. I just > read a Dr. Dobbs article on the relative runtime performance of various > compilers (8 of them compared) on Intel platforms. The test focused on > mainly template type things but offers Dhrystone and zlib for > comparisons. I bought the issue and took a look. I suspect that LLVM will do extremely well on these tests, but it doesn't look like there is a publically available download for his benchmarks. I'm going to email the author and see if we can get a copy. -Chris -- ht...
2010 Apr 27
1
[LLVMdev] Phoronix: Benchmarking LLVM & Clang Against GCC 4.5
On Tue, Apr 27, 2010 at 09:37:53AM +0100, Renato Golin wrote: > On 27 April 2010 08:18, Stefano Delli Ponti > <stefano.delliponti at gmail.com> wrote: > > FYI > > http://www.phoronix.com/scan.php?page=article&item=gcc_llvm_clang&num=1 > > For Apache and Dhrystone, the performance boost is good (but only the > former is really important), but for the rest, especially those with > image/sound processing, and HMMR, it's still far behind. Is this only > because there is no auto vectorization in LLVM? Doesn't llvm-gcc still lack autovectorizat...
2010 Mar 17
9
[LLVMdev] 2.7 Pre-release1 available for testing
The 2.7 binaries are available for testing: http://llvm.org/pre-releases/2.7/pre-release1/ You will also find the source tarballs there as well. We rely on the community to help make our releases great, so please help test 2.7 if you can. Please follow these instructions to test 2.7: To test llvm-gcc: 1) Compile llvm from source and untar the llvm-test in the projects directory (name it
2017 Jun 06
3
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
...if you can point me to the biggest offender, I can have a look. > > So the biggest offenders on the mem_bytes metric in LNT are: > O0 -g O0 -g gisel-with-localizer O0 -g gisel-without-localizer > SingleSource/Benchmarks/Misc/perlin 14272 14640 18344 25.95% > SingleSource/Benchmarks/Dhrystone/dry 16560 17144 20160 18.21% > SingleSource/Benchmarks/Stanford/QueensProfile 13912 14192 15136 6.79% > MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url 71400 72272 75504 4.53% > > I haven't had time to investigate what exact changes make the code size go up that much wit...
2008 Mar 03
1
Speex requirements on a TI Davinci / ARM926EJ-Sid(wb)
...eering project. I have it up and running on the ARM, and I just wanted to see if anyone could sanity-check my results before I continue. Brief version: WB decode takes ~24MIPS, encode takes ~243MIPS. NB decode takes ~10MIPS, encode takes ~102MIPS. (And by MIPS, I mean ARM CPU cycles, not Dhrystone MIPS) If I manually (because I couldn't figure out how to make configure do it) add "-mpcu=arm926ej_s" and "-DSHORTCUTS", the wideband numbers drop to 22MIPS and 219MIPS, respectively. More Details: The commands/options I used to config Speex: # export ARM_I...
2017 Jun 12
1
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
...an point me to the biggest offender, I can have a > look. > > > So the biggest offenders on the mem_bytes metric in LNT are: > O0 -g O0 -g gisel-with-localizer O0 -g gisel-without-localizer > SingleSource/Benchmarks/Misc/perlin 14272 14640 18344 25.95% > SingleSource/Benchmarks/Dhrystone/dry 16560 17144 20160 18.21% > SingleSource/Benchmarks/Stanford/QueensProfile 13912 14192 15136 6.79% > MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url 71400 72272 > 75504 4.53% > > I haven't had time to investigate what exact changes make the code size go > up that...
2017 May 12
2
FENV_ACCESS and floating point LibFunc calls
On 11 May 2017 at 18:30, Michael Clark via llvm-dev <llvm-dev at lists.llvm.org> wrote: > I note that on your bug that you have stated that the branch is faster than > the conditional move. Faster code is a side effect of the fix in this > particular case. On the contrary: the faster code is pretty much the only reason this can happen before the rest of the FENV support lands.
2010 Mar 30
2
[LLVMdev] [cfe-dev] 2.7 Pre-release1 available for testing
...nchmarks/MiBench/telecomm-gsm/telecomm-gsm 1.06 -> 0.9 >> MultiSource/Benchmarks/Olden/treeadd/treeadd 11.44 -> 9.89 >> MultiSource/Benchmarks/Olden/tsp/tsp 1.14 -> 1.02 >> MultiSource/Benchmarks/Ptrdist/anagram/anagram 1.33 -> 1.23 >> SingleSource/Benchmarks/Dhrystone/dry 7.32 -> 5.16 >> SingleSource/Benchmarks/Dhrystone/fldry 8.02 -> 6.65 >> .... >> > > Unfortunately, we just don't have enough man power to have performance be a release criteria at this time. We also need a better infrastructure in place to track this stuff...
2017 Sep 05
4
Lowering llvm.memset for ARM target
As reported in an earlier thread (http://clang-developers.42468.n3.nabble.com/Disable-memset-synthesis-tp4057810.html), we noticed in some cases that the llvm.memset intrinsic, if lowered to stores, could help with performance. Here's a test case: If LIMIT is > 8, I see that a call to memset is emitted for arm & aarch64, but not for x86 target. typedef struct { int v0[100]; }