iuke-tier@ey m@iii@g oii uiow@@edu
2021-Aug-13 14:58 UTC
[Rd] [External] svd For Large Matrix
[copying the list] svd() does support matrices with long vector data. Your example works fine for me on a machine with enough memory with either the reference BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I believe, by a version of openBLAS). Take a look at sessionInfo() to see what you are using and consider switching to another BLAS/LAPACK if necessary. Running under gdb may help tracking down where the issue is and reporting it for the BLAS/LAPACK you are using. Best, luke On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:> Good day, > > I have a real scenario involving 45 million biological cells (samples) and 60 proteins (variables) which leads to a segmentation fault for svd. I thought this might be a good example of why it might benefit from a long vector upgrade. > > test <- matrix(rnorm(45000000*60), ncol = 60) > testSVD <- svd(test) > > *** caught segfault *** > address 0x7fe93514d618, cause 'memory not mapped' > > Traceback: > 1: La.svd(x, nu, nv) > 2: svd(test) > > -------------------------------------- > Dario Strbenac > University of Sydney > Camperdown NSW 2050 > Australia > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 13/08/2021 15:58, luke-tierney at uiowa.edu wrote:> [copying the list] > > svd() does support matrices with long vector data. Your example works > fine for me on a machine with enough memory with either the reference > BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I > believe, by a version of openBLAS). Take a look at sessionInfo() to > see what you are using and consider switching to another BLAS/LAPACK > if necessary. Running under gdb may help tracking down where the issue > is and reporting it for the BLAS/LAPACK you are using.See also https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Large-matrices which (to nuance Prof Tierney's comment) mentions that svd on long-vector *complex* data has been known to segfault (with the reference BLAS/Lapack). My guess was that this was an out-of-memory condition not handled elegantly by the OS. (There are many reasons why the posting guide asks for the output of sessionInfo().) We do not have the statistical context but it seems unlikely that anyone is interested in each of the 45m samples, and for information on the proteins a quite small sample of cells would suffice. And that not all 45m left singular values are required (most likely none are, in which case the underlying Lapack routine can use a more efficient calculation).> > Best, > > luke > > On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote: > >> Good day, >> >> I have a real scenario involving 45 million biological cells (samples) >> and 60 proteins (variables) which leads to a segmentation fault for >> svd. I thought this might be a good example of why it might benefit >> from a long vector upgrade. >> >> test <- matrix(rnorm(45000000*60), ncol = 60) >> testSVD <- svd(test) >> >> *** caught segfault *** >> address 0x7fe93514d618, cause 'memory not mapped' >> >> Traceback: >> 1: La.svd(x, nu, nv) >> 2: svd(test)-- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford
Good day, Ah, I was confident it wouldn't be environment-specific but it is. My environment is R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 It crashes at about 180 GB RAM usage. The server has 1024 GB physical RAM in it. Modestly downsampling to 30 million cells avoids the segmentation fault. The segmentation fault originates from BLAS Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7649c10 in ATL_dgecopy () from /usr/lib/x86_64-linux-gnu/libblas.so.3 -------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia