Le 07/06/2021 ? 09:00, Dario Strbenac a ?crit?:> Good day,
>
> I notice that summing rows of a large dgTMatrix fails.
>
> library(Matrix)
> aMatrix <- new("dgTMatrix",
> i = as.integer(sample(200000, 10000)-1), j =
as.integer(sample(50000, 10000)-1), x = rnorm(10000),
> Dim = c(200000L, 50000L)
> )
> totals <- rowSums(aMatrix == 0) # Segmentation fault.
On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error
message:
Error in h(simpleError(msg, call)) :
? error in evaluating the argument 'x' in selecting a method for
function 'rowSums': cannot allocate vector of size 372.5 Gb
And the reason for this is quite clear: an intermediate logical matrix
'aMatrix == 0' is almost dense thus having 200000L*50000L - 10000L non
zero entries. It is a little bit too much ;) for my modest laptop. So I
can propose a workaround:
??? totals <- 50000 - rowSums(aMatrix != 0)
Hoping it helps.
Best,
Serguei.
>
> The server has 768 GB of RAM and it was never close to being consumed by
this. Converting it to an ordinary matrix works fine.
>
> big <- as.matrix(aMatrix)
> totals <- rowSums(big == 0) # Uses more RAM but there is no
segmentation fault and result is returned.
>
> May it be made more robust for dgTMatrix?
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel