Kevin Egan
2020-Aug-07 20:24 UTC
[R] Reproducibility Between Local and Remote Computer with R
I posted this question: I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote. After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it. Thanks. and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC. Thanks. [[alternative HTML version deleted]]
Jeff Newmiller
2020-Aug-08 13:17 UTC
[R] Reproducibility Between Local and Remote Computer with R
Compare the sessionInfo outputs for the different environments. On August 7, 2020 1:24:55 PM PDT, Kevin Egan <kevinegan31 at gmail.com> wrote:>I posted this question: > >I am currently using R , RStudio , and a remote computer (using an R >script) to run the same code. I start by using set.seed(123) in all >three versions of the code, then using glmnet to assess a matrix. >Ultimately, I am having trouble reproducing the results between my >local and the remote computer's results. I am using R version 4.0.2 >locally, and R version 3.6.0 remote. > >After running several tests, I'm wondering if there is a difference >between the two versions in R which may lead to slightly different >coefficients. If anyone has any insight I would appreciate it. > >Thanks. > >and found that there were slight differences between using rnorm with >R-4.0.2 and R-3.6.0 but did not find any differences for runif between >both systems. In my original code, I am using rnorm and was wondering >if this may be the reason I am finding slight differences in >coefficients for glmnet and lars testing between using my local >computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my >code locally on a MacOSX and remote on what I believe is an HPC. > >Thanks. > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Marc Schwartz
2020-Aug-08 13:34 UTC
[R] Reproducibility Between Local and Remote Computer with R
Hi, I was initially going to think that the change in the RNG might be the source, however, that change was made in 3.6.0 and would have applied to runif() and sample(): "sample.kind can be "Rounding" or "Rejection", or partial matches to these. The former was the default in versions prior to 3.6.0: it made sample noticeably non-uniform on large populations, and should only be used for reproduction of old results. See PR#17494 for a discussion." Three other possibilities: 1. Read news() for your local 4.0.2 installation, as there are some changes that were made, including some changes to round() that could be applicable here. 2. Check to see if the version of glmnet is the same on both machines. There have been changes to that package that might be relevant here and you might read the README and NEWS files for the package on CRAN to see if there is any relevant information there. 3. There is always a chance that different hardware and OS versions could lead to issues, especially out to a number of decimal places that could alter results. If you or via an Admin, have the ability to update the remote machine (both R and installed packages), that can help to reduce the number of variables at play here. Regards, Marc Schwartz> On Aug 7, 2020, at 4:24 PM, Kevin Egan <kevinegan31 at gmail.com> wrote: > > I posted this question: > > I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote. > > After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it. > > Thanks. > > and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC. > > Thanks.
Kevin Egan
2020-Aug-08 14:15 UTC
[R] Reproducibility Between Local and Remote Computer with R
Local: R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] crayon_1.3.4 dplyr_1.0.0 R6_2.4.1 lifecycle_0.2.0 magrittr_1.5 pillar_1.4.3 [7] rlang_0.4.7 rstudioapi_0.11 vctrs_0.3.1 generics_0.0.2 ellipsis_0.3.0 tools_4.0.2 [13] glue_1.4.1 purrr_0.3.4 yaml_2.2.1 compiler_4.0.2 pkgconfig_2.0.3 tidyselect_1.1.0 [19] tibble_3.0.1 Remote:> sessionInfo()R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /ddn/apps/Cluster-Apps/intel/2019.5/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.3> On 8 Aug 2020, at 08:17, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote: > > Compare the sessionInfo outputs for the different environments. > > On August 7, 2020 1:24:55 PM PDT, Kevin Egan <kevinegan31 at gmail.com> wrote: >> I posted this question: >> >> I am currently using R , RStudio , and a remote computer (using an R >> script) to run the same code. I start by using set.seed(123) in all >> three versions of the code, then using glmnet to assess a matrix. >> Ultimately, I am having trouble reproducing the results between my >> local and the remote computer's results. I am using R version 4.0.2 >> locally, and R version 3.6.0 remote. >> >> After running several tests, I'm wondering if there is a difference >> between the two versions in R which may lead to slightly different >> coefficients. If anyone has any insight I would appreciate it. >> >> Thanks. >> >> and found that there were slight differences between using rnorm with >> R-4.0.2 and R-3.6.0 but did not find any differences for runif between >> both systems. In my original code, I am using rnorm and was wondering >> if this may be the reason I am finding slight differences in >> coefficients for glmnet and lars testing between using my local >> computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my >> code locally on a MacOSX and remote on what I believe is an HPC. >> >> Thanks. >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity.[[alternative HTML version deleted]]
Abby Spurdle
2020-Aug-08 23:05 UTC
[R] Reproducibility Between Local and Remote Computer with R
Hi Kevin, Intuitively, the first step would be to ensure that all versions of R, and all the R packages, are the same. However, you mention HPC. And the glmnet package imports the foreach package, which appears (after a quick glance) to support multi-core and parallel computing. If your code uses parallel computing (?), you may need to look at how random numbers, and related results, are handled... On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <kevinegan31 at gmail.com> wrote:> > I posted this question: > > I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote. > > After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it. > > Thanks. > > and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC. > > Thanks. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Duncan Murdoch
2020-Aug-08 23:13 UTC
[R] Reproducibility Between Local and Remote Computer with R
On 08/08/2020 9:34 a.m., Marc Schwartz via R-help wrote:> Hi, > > I was initially going to think that the change in the RNG might be the source, however, that change was made in 3.6.0 and would have applied to runif() and sample(): > > "sample.kind can be "Rounding" or "Rejection", or partial matches to these. The former was the default in versions prior to 3.6.0: it made sample noticeably non-uniform on large populations, and should only be used for reproduction of old results. See PR#17494 for a discussion." >That still may be an issue. If a user saves a workspace in an old version and reloads it in a newer version, I believe they get the old version of the RNG. You need to check that the output of RNGkind() matches in all machines to know that they're using the same RNGs. Duncan Murdoch> Three other possibilities: > > 1. Read news() for your local 4.0.2 installation, as there are some changes that were made, including some changes to round() that could be applicable here. > > 2. Check to see if the version of glmnet is the same on both machines. There have been changes to that package that might be relevant here and you might read the README and NEWS files for the package on CRAN to see if there is any relevant information there. > > 3. There is always a chance that different hardware and OS versions could lead to issues, especially out to a number of decimal places that could alter results. If you or via an Admin, have the ability to update the remote machine (both R and installed packages), that can help to reduce the number of variables at play here. > > Regards, > > Marc Schwartz > > >> On Aug 7, 2020, at 4:24 PM, Kevin Egan <kevinegan31 at gmail.com> wrote: >> >> I posted this question: >> >> I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote. >> >> After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it. >> >> Thanks. >> >> and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC. >> >> Thanks. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
stephen sefick
2020-Aug-08 23:18 UTC
[R] Reproducibility Between Local and Remote Computer with R
Caveat, I have only skimmed this email thread, so please forgive me if I have missed something. Are you able to use Renv, packrat, docker, or anaconda? Your compute environments are very different. Kindest regards, Stephen Sefick On Sat, Aug 8, 2020, 19:05 Abby Spurdle <spurdle.a at gmail.com> wrote:> Hi Kevin, > > Intuitively, the first step would be to ensure that all versions of R, > and all the R packages, are the same. > > However, you mention HPC. > And the glmnet package imports the foreach package, which appears > (after a quick glance) to support multi-core and parallel computing. > > If your code uses parallel computing (?), you may need to look at how > random numbers, and related results, are handled... > > > On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <kevinegan31 at gmail.com> wrote: > > > > I posted this question: > > > > I am currently using R , RStudio , and a remote computer (using an R > script) to run the same code. I start by using set.seed(123) in all three > versions of the code, then using glmnet to assess a matrix. Ultimately, I > am having trouble reproducing the results between my local and the remote > computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 > remote. > > > > After running several tests, I'm wondering if there is a difference > between the two versions in R which may lead to slightly different > coefficients. If anyone has any insight I would appreciate it. > > > > Thanks. > > > > and found that there were slight differences between using rnorm with > R-4.0.2 and R-3.6.0 but did not find any differences for runif between both > systems. In my original code, I am using rnorm and was wondering if this > may be the reason I am finding slight differences in coefficients for > glmnet and lars testing between using my local computer (R-4.0.2) and my > remote computer (R-3.6.0). I am running my code locally on a MacOSX and > remote on what I believe is an HPC. > > > > Thanks. > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Kevin Egan
2020-Aug-09 12:33 UTC
[R] Reproducibility Between Local and Remote Computer with R
Hi Abby, After running a few tests on my local and remote versions of R, this seems to be the most plausible answer to the problem. I put set.seed(123) several times within my code and produced the same results but would rather not have to do that if possible. On Sat, Aug 8, 2020 at 6:05 PM Abby Spurdle <spurdle.a at gmail.com> wrote:> Hi Kevin, > > Intuitively, the first step would be to ensure that all versions of R, > and all the R packages, are the same. > > However, you mention HPC. > And the glmnet package imports the foreach package, which appears > (after a quick glance) to support multi-core and parallel computing. > > If your code uses parallel computing (?), you may need to look at how > random numbers, and related results, are handled... > > > On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <kevinegan31 at gmail.com> wrote: > > > > I posted this question: > > > > I am currently using R , RStudio , and a remote computer (using an R > script) to run the same code. I start by using set.seed(123) in all three > versions of the code, then using glmnet to assess a matrix. Ultimately, I > am having trouble reproducing the results between my local and the remote > computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 > remote. > > > > After running several tests, I'm wondering if there is a difference > between the two versions in R which may lead to slightly different > coefficients. If anyone has any insight I would appreciate it. > > > > Thanks. > > > > and found that there were slight differences between using rnorm with > R-4.0.2 and R-3.6.0 but did not find any differences for runif between both > systems. In my original code, I am using rnorm and was wondering if this > may be the reason I am finding slight differences in coefficients for > glmnet and lars testing between using my local computer (R-4.0.2) and my > remote computer (R-3.6.0). I am running my code locally on a MacOSX and > remote on what I believe is an HPC. > > > > Thanks. > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
stephen sefick
2020-Aug-09 13:42 UTC
[R] Reproducibility Between Local and Remote Computer with R
Hi Kevin, I think Abby has suggested something similar to what I think the problem is related to - environment setup. Some possible solutions: The renv and packrat packages are a way to version your packages to help with reproducability. Anaconda might be a solution for the R version and package version problem, if installed on your hpc. Docker could work as well (maybe the best option if installed). There are other workarounds, but I would have to know how your particular hpc/compute environment is set up to comment further. Brass tacks: I think you need to ensure all your package versions (R and add-on packages) are the same. Fwiw, Stephen On Sun, Aug 9, 2020, 08:26 Kevin Egan <kevinegan31 at gmail.com> wrote:> Hi Stephen, > > I believe I am using Renv, but on my remote computer I am running batch > files. > > Thanks, > > Kevin > > On 8 Aug 2020, at 18:18, stephen sefick <ssefick at gmail.com> wrote: > > Caveat, I have only skimmed this email thread, so please forgive me if I > have missed something. > > Are you able to use Renv, packrat, docker, or anaconda? Your compute > environments are very different. > Kindest regards, > > Stephen Sefick > > On Sat, Aug 8, 2020, 19:05 Abby Spurdle <spurdle.a at gmail.com> wrote: > >> Hi Kevin, >> >> Intuitively, the first step would be to ensure that all versions of R, >> and all the R packages, are the same. >> >> However, you mention HPC. >> And the glmnet package imports the foreach package, which appears >> (after a quick glance) to support multi-core and parallel computing. >> >> If your code uses parallel computing (?), you may need to look at how >> random numbers, and related results, are handled... >> >> >> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <kevinegan31 at gmail.com> wrote: >> > >> > I posted this question: >> > >> > I am currently using R , RStudio , and a remote computer (using an R >> script) to run the same code. I start by using set.seed(123) in all three >> versions of the code, then using glmnet to assess a matrix. Ultimately, I >> am having trouble reproducing the results between my local and the remote >> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 >> remote. >> > >> > After running several tests, I'm wondering if there is a difference >> between the two versions in R which may lead to slightly different >> coefficients. If anyone has any insight I would appreciate it. >> > >> > Thanks. >> > >> > and found that there were slight differences between using rnorm with >> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both >> systems. In my original code, I am using rnorm and was wondering if this >> may be the reason I am finding slight differences in coefficients for >> glmnet and lars testing between using my local computer (R-4.0.2) and my >> remote computer (R-3.6.0). I am running my code locally on a MacOSX and >> remote on what I believe is an HPC. >> > >> > Thanks. >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]
Kevin Egan
2020-Aug-09 14:40 UTC
[R] Reproducibility Between Local and Remote Computer with R
Hi Stephen, Thanks, I?m now trying to use R 3.6.3 on the HPC, I was able to run a few tests remote and get reproducible results. The batches have not yet run, but I?m hoping will give reproducible results when they do. Thanks, Kevin On Sun, Aug 9, 2020 at 08:42 stephen sefick <ssefick at gmail.com> wrote:> Hi Kevin, > > I think Abby has suggested something similar to what I think the problem > is related to - environment setup. > > Some possible solutions: > The renv and packrat packages are a way to version your packages to help > with reproducability. Anaconda might be a solution for the R version and > package version problem, if installed on your hpc. Docker could work as > well (maybe the best option if installed). There are other workarounds, but > I would have to know how your particular hpc/compute environment is set up > to comment further. > > Brass tacks: > I think you need to ensure all your package versions (R and add-on > packages) are the same. > > Fwiw, > > Stephen > > On Sun, Aug 9, 2020, 08:26 Kevin Egan <kevinegan31 at gmail.com> wrote: > >> Hi Stephen, >> >> I believe I am using Renv, but on my remote computer I am running batch >> files. >> >> Thanks, >> >> Kevin >> >> On 8 Aug 2020, at 18:18, stephen sefick <ssefick at gmail.com> wrote: >> >> Caveat, I have only skimmed this email thread, so please forgive me if I >> have missed something. >> >> Are you able to use Renv, packrat, docker, or anaconda? Your compute >> environments are very different. >> Kindest regards, >> >> Stephen Sefick >> >> On Sat, Aug 8, 2020, 19:05 Abby Spurdle <spurdle.a at gmail.com> wrote: >> >>> Hi Kevin, >>> >>> Intuitively, the first step would be to ensure that all versions of R, >>> and all the R packages, are the same. >>> >>> However, you mention HPC. >>> And the glmnet package imports the foreach package, which appears >>> (after a quick glance) to support multi-core and parallel computing. >>> >>> If your code uses parallel computing (?), you may need to look at how >>> random numbers, and related results, are handled... >>> >>> >>> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <kevinegan31 at gmail.com> wrote: >>> > >>> > I posted this question: >>> > >>> > I am currently using R , RStudio , and a remote computer (using an R >>> script) to run the same code. I start by using set.seed(123) in all three >>> versions of the code, then using glmnet to assess a matrix. Ultimately, I >>> am having trouble reproducing the results between my local and the remote >>> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 >>> remote. >>> > >>> > After running several tests, I'm wondering if there is a difference >>> between the two versions in R which may lead to slightly different >>> coefficients. If anyone has any insight I would appreciate it. >>> > >>> > Thanks. >>> > >>> > and found that there were slight differences between using rnorm with >>> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both >>> systems. In my original code, I am using rnorm and was wondering if this >>> may be the reason I am finding slight differences in coefficients for >>> glmnet and lars testing between using my local computer (R-4.0.2) and my >>> remote computer (R-3.6.0). I am running my code locally on a MacOSX and >>> remote on what I believe is an HPC. >>> > >>> > Thanks. >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> <http://www.r-project.org/posting-guide.html> >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> <http://www.r-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >>[[alternative HTML version deleted]]