Tomas Kalibera
2022-Apr-13 06:00 UTC
[Rd] Matrix issues when building R with znver3 architecture under GCC 11
Hi Kieran, On 4/12/22 02:36, Kieran Short wrote:> Hello, > > I'm new to this list, and have subscribed particularly because I've come > across an issue with building R from source with an AMD-based Zen > architecture under GCC11. Please don't attack me for my linux operating > system choice, but it is Ubuntu 20.04 with Linux Kernel 5.10.102.1 - > microsoft-standard-WSL2. I've built GCC11 using GCC8 (the standard GCC > under Ubuntu20.04 WSL release), under Windows11 with wslg. WSL2/g runs as a > hypervisor with ports to all system resources including display, GPU (cuda, > etc). > > The reason why I am posting this email is that I am trying to compile R > using the AMD Zen3 platform architecture rather than x86/64, because it has > processor-specific optimizations that improve performance over the standard > x86/64 in benchmarks. The Zen3 architecture optimizations are not available > in earlier versions of GCC (actually, they have possibly been backported to > GCC10 now). Since Ubuntu 20.04 doesn't have GCC11, I compiled the GCC11 > compiler using the native GCC8. > > The GCC11 I have built can build R 4.1.3 with a standard x86-64 > architecture and pass all tests with "make check-all". > I configured that with: >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > CXXFLAGS="-O3 -march=x86-64" CFLAGS="-O3 -march=x86-64" FFLAGS="-O3 > -march=x86-64" --enable-memory-profiling --enable-R-shlib > and built with >> make -j 32 -O >> make check-all > ## PASS. > > So I can build R in my environment with GCC11. > In configure, I am using references to "gcc-11.2" "gfortran-11.2" and > "g++-11.2" because I compiled GCC11 compilers with these suffixes. > > Now, I'm using a 32 thread (16 core) AMD Zen3 CPU (a 5950x), and want to > use it to its full potential. Zen3 optimizations are available as a > -march=znver3 option n GCC11. The znver3 optimizations improve performance > in Phoronix Test Suite benchmarks (I'm not aware of anyone that has > compiled R with them). See: > https://www.phoronix.com/scan.php?page=article&item=amd-5950x-gcc11 > > However, the R 4.1.3 build (made with "make -j 32 -O"), configured with > -march=znver3, produces an R that fails "make check-all". > >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > CXXFLAGS="-O2 -march=znver3" CFLAGS="-O2 -march=znver3" FFLAGS="-O2 > -march=znver3" --enable-memory-profiling --enable-R-shlib > or >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > CXXFLAGS="-O3 -march=znver3" CFLAGS="-O3 -march=znver3" FFLAGS="-O3 > -march=znver3" --enable-memory-profiling --enable-R-shlib > > The fail is always in the factorizing.R Matrix.R tests, and in particular, > there are a number of errors and a fatal error. > I have attached the output because I cannot really understand what is going > wrong. But results returned from matrix calculations are obviously odd with > -march=znver3 in GCC 11. There is another backwards-compatible architecture > option "znver2" and this has EXACTLY the same result. > > While there are other warrnings and errors (many in assert.EQ() ), the > factorizing.R script continues. The fatal error (at line 2662 in the > attached factorizing.Rout.fail text file) is: > >> ## problematic rank deficient rankMatrix() case -- only seen in large > cases ?? >> Z. <- readRDS(system.file("external", "Z_NA_rnk.rds", package="Matrix")) >> tools::assertWarning(rnkZ. <- rankMatrix(Z., method = "qr")) # gave errors > Error in assertCondition(expr, classes, .exprString = d.expr) : > Failed to get warning in evaluating rnkZ. <- rankMatrix(Z., method ... > Calls: <Anonymous> -> assertCondition > Execution halted > > Can anybody shed light on what might be going on here? 'make check-all' > passes all the other checks. It is just factorizing.R in Matrix that fails > (other matrix tests run ok). > Sorry this is a bit long-winded, but I thought details might be important.R gets used and tested most with the default optimizations, without use of model-specific instructions and with -O2 (GCC). It happens time to time that some people try other optimization options and run into problems. In principle, there are these cases (seen before): (1) the test in R package (or R) is wrong - it (unintentionally) expects behavior which has been observed in builds with default optimizations, but is not necessarily the only correct one; in case of numerical tolerances set empirically, they could simply be too tight (2) the algorithm in R package or R has a bug - the result is really wrong and it is because the algorithm is (unintentionally) not portable enough, it (unintentionally) only works with default optimizations or lower; in case of numerical results, this can be because it expects more precision from the floating point computations than mandated by IEEE, or assumes behavior not mandated (3) the optimization by design violates some properties the algorithm knowingly depends on; with numerical computations, this can be a sort of "fast" (and similarly referred to) mode which violates IEEE floating point standard by design, in the aim of better performance; due to the nature of the algorithm depending on IEEE, and poor luck, the results end up completely wrong (4) there is a bug in the C or Fortran compiler (GCC as we use GCC) that only exhibits with the unusual optimizations; the compiler produces wrong code So, when you run into a problem like this and want to get that fixed, the first thing is to identify which case of the above it is, in case of 1 and 2 also differentiate between base R and a package (and which concrete package). Different people maintain these things and you would ideally narrow down the problem to a very small, isolated, reproducible example to support your claim where the bug is. If you do this right, the problem can often get fixed very fast. Such an example for (1) could be: few lines of standalone R code using Matrix that produces correct results, but the test is not happy. With pointers to the real check in the tests that is wrong. And an explanation why the result is wrong. For (2)-(4) it would be a minimal standalone C/Fortran example including only the critical function/part of algorithm that is not correct/not portable/not compiled correctly, with results obtained with optimizations where it works and where it doesn't. Unless you find an obvious bug in R easy to explain (2), when the example would not have to be standalone. With such standalone C example, you could easily test the results with different optimizations and compilers, it is easier to analyze, and easier to produce a bug report for GCC. What would make it harder in this case is that it needs special hardware, but you could still try with the example, and worry about that later (one option is running in an emulator, and again a standalone example really helps here). In principle, as it needs special hardware, the chances someone else would do this work is smaller. Indeed, if it turns out to be (3), it is unlikely to get resolved, but at least would get isolated (you would know what not to run). As a user, if you run into a problem like this and do not want to get it fixed, but just work it around somehow. First, it may be dangerous, possibly one would get incorrect results from computations, but say in applications where they are verified externally. You could try disabling individual specific optimization until the tests pass. You could try with later versions of gcc-11 (even unreleased) or gcc-12. Still, a lot of this is easier with a small example, too. You could ignore the failing test. And it may not be worth it - it may be that you could get your speedups in a different, but more reliable way. Using wsl2 on its own should not necessarily be a problem and the way you built gcc from the description should be ok, but at some point it would be worth checking under Linux and running natively - because even if these are numerical differences, they could be in principle caused by running on Windows (or in wsl2), at least in the past such differences were seen (related to (2) above). I would recommend checking on Linux natively once you have at least a standalone R example. Best Tomas> > best regards, > Kieran > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Kieran Short
2022-Apr-13 09:20 UTC
[Rd] Matrix issues when building R with znver3 architecture under GCC 11
Hi Tomas, Many thanks for your thorough response, it is very much appreciated and what you say makes perfect sense to me. I was relying on the in-built R compilation checks, I have been working on the assumption that everything on the R side is correct (including the matrix package). Indeed, R 4.1.3 builds and "make check-all" passes with the more general -march=x86-64 architecture compiled with -O3 optimizations (in my hands, on the Zen3 system). So I had no underlying reason not to believe R or its packages were the problem when -march=znver3 was trialed. I found it interesting that it was only the one factorizing.R script in the Matrix suite that failed (out of the seemingly hundreds of remaining checks overall which passed). So I was more wondering if there might have been prior knowledge within the brain's trust on this list that "oh the factorizing.R matrix test does ABC error when R or the package is compiled with GCC using XYZ flags". As you'll read ahead, you can say that now. :) I don't think I have the capability to determine the root trigger in R itself, the package, or the compiler (whichever one, or combination, it actually is). However, assuming R isn't the issue, I have done is go through the GCC optimizations and I have now isolated the culprit optimization which crashes factorizing.R. It is "-fexpensive-optimizations". If I use "-fno-expensive-optimizations" paired with -O2 or -O3 optimizations, all "make check-all" checks pass. So I can build a fully checked and passed R 4.1.3 under my environment now with: ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 CXXFLAGS="-O3 -march=znver3 -fno-expensive-optimizations -flto" CFLAGS="-O3 -march=znver3 -fno-expensive-optimizations -flto" FFLAGS="-O3 -march=znver3 -fno-expensive-optimizations -flto" --enable-memory-profiling --enable-R-shlib I'm yet to benchmark whether the loss of that particular optimization flag negates the advantages of using znver3 as a core architecture target over a -x86-64 target in the first place. So I think I've solved my own problem (at least, it appears that way based on the checks). So the remaining question is, what method or package does the development team use (if any) for testing the speed of various base R calculations? best regards, Kieran On Wed, Apr 13, 2022 at 4:00 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> Hi Kieran, > > On 4/12/22 02:36, Kieran Short wrote: > > Hello, > > > > I'm new to this list, and have subscribed particularly because I've come > > across an issue with building R from source with an AMD-based Zen > > architecture under GCC11. Please don't attack me for my linux operating > > system choice, but it is Ubuntu 20.04 with Linux Kernel 5.10.102.1 - > > microsoft-standard-WSL2. I've built GCC11 using GCC8 (the standard GCC > > under Ubuntu20.04 WSL release), under Windows11 with wslg. WSL2/g runs > as a > > hypervisor with ports to all system resources including display, GPU > (cuda, > > etc). > > > > The reason why I am posting this email is that I am trying to compile R > > using the AMD Zen3 platform architecture rather than x86/64, because it > has > > processor-specific optimizations that improve performance over the > standard > > x86/64 in benchmarks. The Zen3 architecture optimizations are not > available > > in earlier versions of GCC (actually, they have possibly been backported > to > > GCC10 now). Since Ubuntu 20.04 doesn't have GCC11, I compiled the GCC11 > > compiler using the native GCC8. > > > > The GCC11 I have built can build R 4.1.3 with a standard x86-64 > > architecture and pass all tests with "make check-all". > > I configured that with: > >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > > CXXFLAGS="-O3 -march=x86-64" CFLAGS="-O3 -march=x86-64" FFLAGS="-O3 > > -march=x86-64" --enable-memory-profiling --enable-R-shlib > > and built with > >> make -j 32 -O > >> make check-all > > ## PASS. > > > > So I can build R in my environment with GCC11. > > In configure, I am using references to "gcc-11.2" "gfortran-11.2" and > > "g++-11.2" because I compiled GCC11 compilers with these suffixes. > > > > Now, I'm using a 32 thread (16 core) AMD Zen3 CPU (a 5950x), and want to > > use it to its full potential. Zen3 optimizations are available as a > > -march=znver3 option n GCC11. The znver3 optimizations improve > performance > > in Phoronix Test Suite benchmarks (I'm not aware of anyone that has > > compiled R with them). See: > > https://www.phoronix.com/scan.php?page=article&item=amd-5950x-gcc11 > > > > However, the R 4.1.3 build (made with "make -j 32 -O"), configured with > > -march=znver3, produces an R that fails "make check-all". > > > >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > > CXXFLAGS="-O2 -march=znver3" CFLAGS="-O2 -march=znver3" FFLAGS="-O2 > > -march=znver3" --enable-memory-profiling --enable-R-shlib > > or > >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2 > > CXXFLAGS="-O3 -march=znver3" CFLAGS="-O3 -march=znver3" FFLAGS="-O3 > > -march=znver3" --enable-memory-profiling --enable-R-shlib > > > > The fail is always in the factorizing.R Matrix.R tests, and in > particular, > > there are a number of errors and a fatal error. > > I have attached the output because I cannot really understand what is > going > > wrong. But results returned from matrix calculations are obviously odd > with > > -march=znver3 in GCC 11. There is another backwards-compatible > architecture > > option "znver2" and this has EXACTLY the same result. > > > > While there are other warrnings and errors (many in assert.EQ() ), the > > factorizing.R script continues. The fatal error (at line 2662 in the > > attached factorizing.Rout.fail text file) is: > > > >> ## problematic rank deficient rankMatrix() case -- only seen in large > > cases ?? > >> Z. <- readRDS(system.file("external", "Z_NA_rnk.rds", package="Matrix")) > >> tools::assertWarning(rnkZ. <- rankMatrix(Z., method = "qr")) # gave > errors > > Error in assertCondition(expr, classes, .exprString = d.expr) : > > Failed to get warning in evaluating rnkZ. <- rankMatrix(Z., method > ... > > Calls: <Anonymous> -> assertCondition > > Execution halted > > > > Can anybody shed light on what might be going on here? 'make check-all' > > passes all the other checks. It is just factorizing.R in Matrix that > fails > > (other matrix tests run ok). > > Sorry this is a bit long-winded, but I thought details might be > important. > > R gets used and tested most with the default optimizations, without use > of model-specific instructions and with -O2 (GCC). It happens time to > time that some people try other optimization options and run into > problems. In principle, there are these cases (seen before): > > (1) the test in R package (or R) is wrong - it (unintentionally) expects > behavior which has been observed in builds with default optimizations, > but is not necessarily the only correct one; in case of numerical > tolerances set empirically, they could simply be too tight > > (2) the algorithm in R package or R has a bug - the result is really > wrong and it is because the algorithm is (unintentionally) not portable > enough, it (unintentionally) only works with default optimizations or > lower; in case of numerical results, this can be because it expects more > precision from the floating point computations than mandated by IEEE, or > assumes behavior not mandated > > (3) the optimization by design violates some properties the algorithm > knowingly depends on; with numerical computations, this can be a sort of > "fast" (and similarly referred to) mode which violates IEEE floating > point standard by design, in the aim of better performance; due to the > nature of the algorithm depending on IEEE, and poor luck, the results > end up completely wrong > > (4) there is a bug in the C or Fortran compiler (GCC as we use GCC) that > only exhibits with the unusual optimizations; the compiler produces > wrong code > > So, when you run into a problem like this and want to get that fixed, > the first thing is to identify which case of the above it is, in case of > 1 and 2 also differentiate between base R and a package (and which > concrete package). Different people maintain these things and you would > ideally narrow down the problem to a very small, isolated, reproducible > example to support your claim where the bug is. If you do this right, > the problem can often get fixed very fast. > > Such an example for (1) could be: few lines of standalone R code using > Matrix that produces correct results, but the test is not happy. With > pointers to the real check in the tests that is wrong. And an > explanation why the result is wrong. > > For (2)-(4) it would be a minimal standalone C/Fortran example including > only the critical function/part of algorithm that is not correct/not > portable/not compiled correctly, with results obtained with > optimizations where it works and where it doesn't. Unless you find an > obvious bug in R easy to explain (2), when the example would not have to > be standalone. With such standalone C example, you could easily test the > results with different optimizations and compilers, it is easier to > analyze, and easier to produce a bug report for GCC. What would make it > harder in this case is that it needs special hardware, but you could > still try with the example, and worry about that later (one option is > running in an emulator, and again a standalone example really helps > here). In principle, as it needs special hardware, the chances someone > else would do this work is smaller. Indeed, if it turns out to be (3), > it is unlikely to get resolved, but at least would get isolated (you > would know what not to run). > > As a user, if you run into a problem like this and do not want to get it > fixed, but just work it around somehow. First, it may be dangerous, > possibly one would get incorrect results from computations, but say in > applications where they are verified externally. You could try disabling > individual specific optimization until the tests pass. You could try > with later versions of gcc-11 (even unreleased) or gcc-12. Still, a lot > of this is easier with a small example, too. You could ignore the > failing test. And it may not be worth it - it may be that you could get > your speedups in a different, but more reliable way. > > Using wsl2 on its own should not necessarily be a problem and the way > you built gcc from the description should be ok, but at some point it > would be worth checking under Linux and running natively - because even > if these are numerical differences, they could be in principle caused by > running on Windows (or in wsl2), at least in the past such differences > were seen (related to (2) above). I would recommend checking on Linux > natively once you have at least a standalone R example. > > Best > Tomas > > > > > > best regards, > > Kieran > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]