Paul Johnson
2016-Oct-17 23:44 UTC
[Rd] Cluster: Various GCC, how important is consistency?
On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1. I have, from time to time, run into problems with various R packages and some older versions of GCC. I wish we had newer Linux in the cluster, but with 1000s of nodes running 1000s of jobs, well, they don't want a restart. Administrator suggested I try to build with the GCC that is provided with the nodes, which is gcc-4.4.7. To my surprise, R-3.3.1 compiled with that. After that, I got quite far, many 100s of packages compiled, but then I hit a snag that RccArmadillo explicitly refuses to build with anything older than gcc-4.6. The OpenMx package and emplik packages also refuse to compile with old gcc The cluster uses a module system, it is easy enough to swap in various gcc versions to see what compiles. I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not picky, it compiled with gcc-4.4.7. I worry... 1) will reliance on various GCC make the packages incompatible with R, or each other? I logged out, logged back in, with R 3.3.1 I can run library(RcppArmadillo) library(Rcpp) with no errors so far. But I'm not stress testing it much. I should rebuild everything? I expect that if I were to use gcc-6 on one package, it would not be compatible with binaries built with 4.4.7. But is there a zone of tolerance allowing 4.4.7 and 4.9 packages to coexist? 2) If I build with non-default GCC, are all of the R users going to hit trouble if they don't have the same GCC I use? Unless I make some extraordinary effort, they are getting GCC 4.4.7. If they try to install a package, they are getting that GCC, not the one I use to build RcppArmadillo or the other trouble cases (or everything, if you say I need to go back and rebuild).>From an administrative point of view, should I tie R-3.3.1 to aparticular version of GCC? I think I could learn how to do that. On the cluster, they use the module framework. There are about 50 versions of GCC. It is easy enough ask for a newer one: $ module load gcc/4.9.2 It puts the gcc 4.9.2 binaries and shared libraries at the front of the PATHs. pj -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write me directly, address me at pauljohn at ku.edu.
Simon Urbanek
2016-Oct-18 01:05 UTC
[Rd] Cluster: Various GCC, how important is consistency?
There are many issues with different gcc versions, but they can at least be minimized by using static linking, i.e. you should at the very least use -static-libstdc++ -static-libgcc to make sure you don't mix runtime versions. We run into the same problem since C++11 compilers are rare on production machines, but as long as you can isolate the packages away from the dynamically loaded code it often works since R only works at symbol level as long as you have a self-contained binary. The only other thing to worry about are ABI changes, but unless you use Fortran they tend to be compatible enough. Cheers, Simon> On Oct 17, 2016, at 7:44 PM, Paul Johnson <pauljohn32 at gmail.com> wrote: > > On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1. > I have, from time to time, run into problems with various R packages > and some older versions of GCC. I wish we had newer Linux in the > cluster, but with 1000s of nodes running 1000s of jobs, well, they > don't want a restart. > > Administrator suggested I try to build with the GCC that is provided > with the nodes, which is gcc-4.4.7. To my surprise, R-3.3.1 compiled > with that. After that, I got quite far, many 100s of packages > compiled, but then I hit a snag that RccArmadillo explicitly refuses > to build with anything older than gcc-4.6. The OpenMx package and > emplik packages also refuse to compile with old gcc > > The cluster uses a module system, it is easy enough to swap in various > gcc versions to see what compiles. > > I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not > picky, it compiled with gcc-4.4.7. > > I worry... > > 1) will reliance on various GCC make the packages incompatible with > R, or each other? > > I logged out, logged back in, with R 3.3.1 I can run > > library(RcppArmadillo) > library(Rcpp) > > with no errors so far. But I'm not stress testing it much. > > I should rebuild everything? > > I expect that if I were to use gcc-6 on one package, it would not be > compatible with binaries built with 4.4.7. But is there a zone of > tolerance allowing 4.4.7 and 4.9 packages to coexist? > > 2) If I build with non-default GCC, are all of the R users going to > hit trouble if they don't have the same GCC I use? Unless I make some > extraordinary effort, they are getting GCC 4.4.7. If they try to > install a package, they are getting that GCC, not the one I use to > build RcppArmadillo or the other trouble cases (or everything, if you > say I need to go back and rebuild). > >> From an administrative point of view, should I tie R-3.3.1 to a > particular version of GCC? I think I could learn how to do that. > > On the cluster, they use the module framework. There are about 50 > versions of GCC. It is easy enough ask for a newer one: > > $ module load gcc/4.9.2 > > It puts the gcc 4.9.2 binaries and shared libraries at the front of the PATHs. > > pj > > > -- > Paul E. Johnson http://pj.freefaculty.org > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > To write me directly, address me at pauljohn at ku.edu. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Gabriel Becker
2016-Oct-18 04:27 UTC
[Rd] Cluster: Various GCC, how important is consistency?
This absolutely causes it's own problems (and they may be bad enough that you shouldnt do it) but you can also install an older version of rcpparmadillo. My switchr package makes this more convenient from within r but grabbing tarballs from the crank Web archive also works (in fact that's what switchr will do in this case). This, of course will never be more than a stop gap. Eventually, sadly, you'll likely need a newer operating system. We have the same problems on our cluster. Best of luck, ~G On Oct 17, 2016 6:16 PM, "Simon Urbanek" <simon.urbanek at r-project.org> wrote:> There are many issues with different gcc versions, but they can at least > be minimized by using static linking, i.e. you should at the very least use > -static-libstdc++ -static-libgcc to make sure you don't mix runtime > versions. We run into the same problem since C++11 compilers are rare on > production machines, but as long as you can isolate the packages away from > the dynamically loaded code it often works since R only works at symbol > level as long as you have a self-contained binary. The only other thing to > worry about are ABI changes, but unless you use Fortran they tend to be > compatible enough. > > Cheers, > Simon > > > > On Oct 17, 2016, at 7:44 PM, Paul Johnson <pauljohn32 at gmail.com> wrote: > > > > On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1. > > I have, from time to time, run into problems with various R packages > > and some older versions of GCC. I wish we had newer Linux in the > > cluster, but with 1000s of nodes running 1000s of jobs, well, they > > don't want a restart. > > > > Administrator suggested I try to build with the GCC that is provided > > with the nodes, which is gcc-4.4.7. To my surprise, R-3.3.1 compiled > > with that. After that, I got quite far, many 100s of packages > > compiled, but then I hit a snag that RccArmadillo explicitly refuses > > to build with anything older than gcc-4.6. The OpenMx package and > > emplik packages also refuse to compile with old gcc > > > > The cluster uses a module system, it is easy enough to swap in various > > gcc versions to see what compiles. > > > > I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not > > picky, it compiled with gcc-4.4.7. > > > > I worry... > > > > 1) will reliance on various GCC make the packages incompatible with > > R, or each other? > > > > I logged out, logged back in, with R 3.3.1 I can run > > > > library(RcppArmadillo) > > library(Rcpp) > > > > with no errors so far. But I'm not stress testing it much. > > > > I should rebuild everything? > > > > I expect that if I were to use gcc-6 on one package, it would not be > > compatible with binaries built with 4.4.7. But is there a zone of > > tolerance allowing 4.4.7 and 4.9 packages to coexist? > > > > 2) If I build with non-default GCC, are all of the R users going to > > hit trouble if they don't have the same GCC I use? Unless I make some > > extraordinary effort, they are getting GCC 4.4.7. If they try to > > install a package, they are getting that GCC, not the one I use to > > build RcppArmadillo or the other trouble cases (or everything, if you > > say I need to go back and rebuild). > > > >> From an administrative point of view, should I tie R-3.3.1 to a > > particular version of GCC? I think I could learn how to do that. > > > > On the cluster, they use the module framework. There are about 50 > > versions of GCC. It is easy enough ask for a newer one: > > > > $ module load gcc/4.9.2 > > > > It puts the gcc 4.9.2 binaries and shared libraries at the front of the > PATHs. > > > > pj > > > > > > -- > > Paul E. Johnson http://pj.freefaculty.org > > Director, Center for Research Methods and Data Analysis > http://crmda.ku.edu > > > > To write me directly, address me at pauljohn at ku.edu. > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Jeroen Ooms
2016-Oct-18 10:55 UTC
[Rd] Cluster: Various GCC, how important is consistency?
On Tue, Oct 18, 2016 at 1:44 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:> > Administrator suggested I try to build with the GCC that is provided > with the nodes, which is gcc-4.4.7.Redhat provides an alternative compiler (gcc 5.3 based) in one of it's opt-in repositories called "redhat developer toolkit" (RDT). In CentOS you install it as follows: yum install -y centos-release-scl yum install -y devtoolset-4-gcc-c++ This compiler is specifically designed to be used alongside the EL6 stock gcc 4.4.7. It includes a simple 'enable' script which will put RDT gcc and g++ in front of your PATH and LD_LIBRARY_PATH and so on. So what I do on CentOS is install R from EPEL (built with stock gcc 4.4.7) and whenever I need to install an R package that uses e.g. CXX11, simply start an R shell using the RDT compilers: source /opt/rh/devtoolset-4/enable R>From what I have been able to test, this works pretty well (though Iam not a regular EL user). But I was able to build R packages that use C++11 (such as feather) and once installed, these packages can be used even in a regular R session (without RDT enabled).
Paul Johnson
2016-Oct-18 15:04 UTC
[Rd] Cluster: Various GCC, how important is consistency?
Dear Jeroen Did you rebuild R-3.3.1 and all of the packages with GCC-5.3 in order to make this work? The part that worries me is that the shared libraries won't be consistent, with various versions of GCC in play. On Tue, Oct 18, 2016 at 5:55 AM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:> On Tue, Oct 18, 2016 at 1:44 AM, Paul Johnson <pauljohn32 at gmail.com> wrote: >> >> Administrator suggested I try to build with the GCC that is provided >> with the nodes, which is gcc-4.4.7. > > Redhat provides an alternative compiler (gcc 5.3 based) in one of it's > opt-in repositories called "redhat developer toolkit" (RDT). In CentOS > you install it as follows: > > yum install -y centos-release-scl > yum install -y devtoolset-4-gcc-c++ > > This compiler is specifically designed to be used alongside the EL6 > stock gcc 4.4.7. It includes a simple 'enable' script which will put > RDT gcc and g++ in front of your PATH and LD_LIBRARY_PATH and so on. > > So what I do on CentOS is install R from EPEL (built with stock gcc > 4.4.7) and whenever I need to install an R package that uses e.g. > CXX11, simply start an R shell using the RDT compilers: > > source /opt/rh/devtoolset-4/enable > R > > From what I have been able to test, this works pretty well (though I > am not a regular EL user). But I was able to build R packages that use > C++11 (such as feather) and once installed, these packages can be used > even in a regular R session (without RDT enabled).-- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu I only use this account for email list memberships. To write directly, address me at pauljohn at ku.edu.