It might also be a BLAS+processor problem - I got bit pretty hard by
that, with an example here:
https://stat.ethz.ch/pipermail/r-help/2019-July/463477.html
With a key excerpt here:
On Thu, Jul 18, 2019 at 1:59 PM Ivan Krylov <krylov.r00t using gmail.com>
wrote:> Yes, this might be bad. I have heard about OpenBLAS (specifically, the
> matrix product routine) misbehaving on certain AVX-512 capable
> processors, so much that they had to disable some optimizations in
> 0.3.6 [*], which you already have installed. Still, would `env
> OPENBLAS_CORETYPE=Haswell R --vanilla` give a better result?
>
On Fri, Dec 3, 2021 at 10:29 AM Labone, Thomas <labone at email.sc.edu>
wrote:>
> Thanks for the feedback everyone. If you go to
https://github.com/csantill/RPerformanceWBLAS/blob/master/RPerformanceBLAS.md
you will find the Linux commands to change the default math library. When I
switch the BLAS library from MKL to the system default (see sessionInfo below),
everything works as expected. I installed version 2020.0-166-1 of
"Intel-MKL" from the Linux Mint Software Manager. I may be coming to a
hasty conclusion, but there appears to be something wrong with that package or
how it interacts with other system software. Any suggestions on who I should
notify about the problem (e.g., Intel, Mint, Ubuntu)?
>
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Linux Mint 20.2
>
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8
> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 tools_4.1.2
>
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
>
> ________________________________
> From: Labone, Thomas <labone at email.sc.edu>
> Sent: Thursday, December 2, 2021 11:53 AM
> To: Bill Dunlap <williamwdunlap at gmail.com>
> Cc: r-help at r-project.org <r-help at r-project.org>
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> > summary(fit)
>
> Call:
> lm(formula = log(k) ~ Z)
>
> Residuals:
> Min 1Q Median 3Q Max
> -21.241 1.327 1.776 2.245 4.418
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.03465 0.01916 -1.809 0.0705 .
> Z -0.24207 0.01916 -12.634 <2e-16 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 1.914 on 9998 degrees of freedom
> Multiple R-squared: 0.01467, Adjusted R-squared: 0.01457
> F-statistic: 148.8 on 1 and 9998 DF, p-value: < 2.2e-16
>
> > summary(k)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.2735 3.7658 5.9052 7.5113 9.4399 82.9531
> > summary(Z)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -3.8906 -0.6744 0.0000 0.0000 0.6744 3.8906
> > summary(gm*gsd^Z)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.3767 0.8204 0.9659 0.9947 1.1372 2.4772
> >
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
> ________________________________
> From: Bill Dunlap <williamwdunlap at gmail.com>
> Sent: Thursday, December 2, 2021 10:31 AM
> To: Labone, Thomas <labone at email.sc.edu>
> Cc: r-help at r-project.org <r-help at r-project.org>
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> On the 'bad' machines, what did you get for
> summary(fit)
> summary(k)
> summary(Z)
> summary(gm*gsd^Z)
> ?
>
> -Bill
>
> On Thu, Dec 2, 2021 at 6:18 AM Labone, Thomas <labone at
email.sc.edu<mailto:labone at email.sc.edu>> wrote:
> In the code below the first and second plots should look pretty much the
same, the only difference being that the first has n=1000 points and the second
n=10000 points. On two of my Linux machines (info below) the second plot is a
horizontal line (incorrect answer from lm), but on my Windows 10 machine and a
third Linux machine it works as expected. The interesting thing is that the code
works as expected for n <= 4095 but fails for n>=4096 (which equals 2^12).
Can anyone else reproduce this problem? Any ideas on how to fix it?
>
> set.seed(132)
>
>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> # This works
> n <- 1000# OK <= 4095
> Z <- qnorm(ppoints(n))
>
> k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))
>
> quantile(k,probs=c(0.025,0.5,0.975))
> summary(k)
>
> fit <- lm(log(k) ~ Z)
> summary(fit)
>
> gm <- exp(coef(fit)[1])
> gsd <- exp(coef(fit)[2])
> gm
> gsd
>
> plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
> lines(Z,gm*gsd^Z,col="red")
>
>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> #this does not
> n <- 10000# fails >= 4096 = 2^12
> Z <- qnorm(ppoints(n))
>
> k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))
>
> quantile(k,probs=c(0.025,0.5,0.975))
> summary(k)
>
> fit <- lm(log(k) ~ Z)
> summary(fit)
>
> gm <- exp(coef(fit)[1])
> gsd <- exp(coef(fit)[2])
> gm
> gsd
>
> plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
> lines(Z,gm*gsd^Z,col="red")
>
>
>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > sessionInfo() #for two Linux machines having problem
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Linux Mint 20.2
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 Matrix_1.3-4 tools_4.1.2 expm_0.999-6
grid_4.1.2 lattice_0.20-45
>
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > sessionInfo() # for a third Linux machine not having the problem
> R version 4.1.1 (2021-08-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Linux Mint 19.3
>
> Matrix products: default
> BLAS/LAPACK:
/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_rt.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8
> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.1 tools_4.1.1
>
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Sarah Goslee (she/her)
http://www.numberwright.com