Nathaniel Graham
2013-Dec-13 17:23 UTC
[R] disabling sparse Matrix index checking during assignment
The project I'm working on requires producing a number of large (250,000x250,000) sparse logical matrices. I'm currently doing this by updating the elements (turning FALSE to TRUE) of a matrix in batches as they're identified like so: x[idx.matrix] <- TRUE where x is created via Matrix(nrow = n, ncol = n, data = FALSE) and n is approximately 250,000. The idx.matrix is a two column matrix of indices to be assigned to. This is done many times. After profiling, I've found that the lion's share of the work is taking place in the internal calls to check for duplicates, etc in the indices passed to [. For instance, anyDuplicated.default is one of the most time-consuming portions of my code according to Rprof. This makes the whole process quite slow as there are frequently thousands of index pairs in each call. I'd like to disable as many of these checks as possible; I can guarantee that aren't any duplicates, and even if there are I don't especially care, since that would only mean that a value is assigned TRUE twice instead of once. I've tried a number of other approaches, such as creating a data.table of all the indices to be changed and doing the assignment once, but the temporary memory usage becomes enormous (I run out of memory on a 32GB machine). I've also tried creating a temporary sparseMatrix and using '|' like so: # a, b are numeric vectors of indices x <- x | sparseMatrix(a, b, x = TRUE, dims = x at Dim, check = FALSE) but this turns out to be slower than assignment; most of its time is spent in the logical OR command. Is there a way to speed this process up substantially? Thanks in advance for your help. ------- Nathaniel Graham npgraham1 at gmail.com npgraham1 at uky.edu