Hi all --- I have a large sparse matrix, call it P: ``` > str(P) Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:7868093] 4221 6098 8780 10313 11102 14243 20570 22145 24468 24977 ... ..@ p : int [1:7357] 0 0 269 388 692 2434 3662 4179 4205 4256 ... ..@ Dim : int [1:2] 1303967 7356 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:7868093] 1 1 1 1 1 1 1 1 1 1 ... ..@ factors : list() ``` I'd like to row-normalize (say, with the L-2 norm)... the straight-forward approach would be something like: ```> row_normalized_P <- P / rowSums(P^2)``` But this causes a memory allocation error, since it appears the `rowSums` result is being recycled (appropriately) into a _dense_ matrix with dimensions equal to `dim(P)`. Given that P is known to be sparse (or at the very least is stored in sparse format), does anyone know of a non-iterative approach to achieve the desired `row_normalized_P` shown above? (I.e. the resultant matrix will be equally sparse as P itself... and I'd like to avoid ever having a dense matrix (apart from the rowSums vector) allocated during the normalization steps.) The only semi-efficient method I've found around this is to `apply` across rows (more accurately through blocks of rows coerced into dense sub-matrices of P), but I'd like to try to remove the looping logic from my codebase if I can, and I'm wondering if perhaps there's a built-in in the Matrix package (that I'm just not aware of) that helps with this particular type of computation. Cheers and thanks for any help! -murat [[alternative HTML version deleted]]
> On 4 May 2017, at 20:13, Murat Tasan <mmuurr at gmail.com> wrote: > > The only semi-efficient method I've found around this is to `apply` across > rows (more accurately through blocks of rows coerced into dense > sub-matrices of P), but I'd like to try to remove the looping logic from my > codebase if I can, and I'm wondering if perhaps there's a built-in in the > Matrix package (that I'm just not aware of) that helps with this particular > type of computation.The "wordspace" package has an efficient C-level implementation for this purpose: P.norm <- normalize.rows(P) which is a short-hand for P.norm <- scaleMargins(P, rows=1 / rowNorms(P, method="euclidean")) Best, Stefan
Thanks, Stefan, I'll take a look! Also, I figured out another solution (~15 minutes after posting :-/): ``` row_normalized_P <- Matrix::Diagonal(x = 1 / sqrt(Matrix::rowSums(P^2))) %*% P ``` Cheers, -m On Thu, May 4, 2017 at 12:23 PM, Stefan Evert <stefanML at collocations.de> wrote:> > > On 4 May 2017, at 20:13, Murat Tasan <mmuurr at gmail.com> wrote: > > > > The only semi-efficient method I've found around this is to `apply` > across > > rows (more accurately through blocks of rows coerced into dense > > sub-matrices of P), but I'd like to try to remove the looping logic from > my > > codebase if I can, and I'm wondering if perhaps there's a built-in in the > > Matrix package (that I'm just not aware of) that helps with this > particular > > type of computation. > > The "wordspace" package has an efficient C-level implementation for this > purpose: > > P.norm <- normalize.rows(P) > > which is a short-hand for > > P.norm <- scaleMargins(P, rows=1 / rowNorms(P, method="euclidean")) > > Best, > Stefan[[alternative HTML version deleted]]