Martijn van Beurden
2022-Nov-26 17:12 UTC
[flac-dev] Performance tests on POWER8 or POWER9
Hi all, Last year the POWER8 and POWER9 specific improvements (for PowerPC) were completely rewritten, but as of yet no accurate performance tests of these have been performed. I have validated functionality and rough performance checks for these through Travis CI, but the numbers I get through that vary wildly. Recently the C code that these improvements mirror has been changed to allow compilers to autovectorize it. It would be nice if someone with access to POWER hardware could compare builds with and without POWER specific improvements. In other words, a plain build and one with either --disable-asm-optimizations (for autotools build) or -DWITH_ASM=0 (for CMake). If anyone could do such a comparison, that would be great! Kind regards, Martijn van Beurden
Martijn van Beurden
2022-Dec-03 09:49 UTC
[flac-dev] Performance tests on POWER8 or POWER9
Hi all, I've tried once again to get performance figures through Travis CI for powerpc. It took quite a few tries, but I got a clean run here: https://app.travis-ci.com/github/ktmf01/flac/builds/258454024 3 different builds are compared: one with default configuration, one with arch-specific optimizations turned off and one with arch-specific optimizations and associative math disabled (which is required for autovectorization). If you open the build logs, there are at the bottom 8 execution times of the same test. In many Travis runs these figures varied widely, sometimes even varying by as much as 100%. However, the linked results are in a very narrow margin so they should be dependable. They are also the lowest numbers I've found overall. The results are actually the inverse of what you'd expect. The build with all (auto)vectorization disabled is the fastest with a time of 23.1 sec. The one with autovectorization (so with arch-specific optimization disabled) is quite a bit slower at 23.9 sec. The default build with arch-specific optimization is the slowest at 24.0 sec. This difference is obviously too small to be conclusive. So, while there is some uncertainty as to which build is the fastest, it seems the PPC specific code brings no improvement whatsoever. My plan is to remove all PPC specific code unless someone can convince me otherwise in the next few weeks. This removal has the added benefit of reducing the amount of code that is not being regularly tested and fuzzed. Kind regards, Martijn van Beurden Op za 26 nov. 2022 om 18:12 schreef Martijn van Beurden <mvanb1 at gmail.com>:> > Hi all, > > Last year the POWER8 and POWER9 specific improvements (for PowerPC) > were completely rewritten, but as of yet no accurate performance tests > of these have been performed. I have validated functionality and rough > performance checks for these through Travis CI, but the numbers I get > through that vary wildly. > > Recently the C code that these improvements mirror has been changed to > allow compilers to autovectorize it. It would be nice if someone with > access to POWER hardware could compare builds with and without POWER > specific improvements. In other words, a plain build and one with > either --disable-asm-optimizations (for autotools build) or > -DWITH_ASM=0 (for CMake). > > If anyone could do such a comparison, that would be great! > > Kind regards, > > Martijn van Beurden
Possibly Parallel Threads
- Release for CentOS Linux 7 (1804) on POWER9 (ppc64le)
- [PATCH 0/7] PowerPC64 performance improvements
- Most recent version R for IBM Power8 Ubuntu environment
- Most recent version R for IBM Power8 Ubuntu environment
- Continuous Release (CR) Repository has been released for CentOS Linux 7.6.1810 for X86_64, i386, armhfp, aarch64, power9, ppc64le