thr3ads.net - R help - [R] Sparse (dgCMatrix) Matrix row-wise normalization [May 2017]

If this information is useful, please help other people find it:
Share via:

Murat Tasan

2017-May-04 18:13 UTC

[R] Sparse (dgCMatrix) Matrix row-wise normalization

Hi all ---

I have a large sparse matrix, call it P:
```
 > str(P)
 Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
   ..@ i       : int [1:7868093] 4221 6098 8780 10313 11102 14243 20570
22145 24468 24977 ...
   ..@ p       : int [1:7357] 0 0 269 388 692 2434 3662 4179 4205 4256 ...
   ..@ Dim     : int [1:2] 1303967 7356
   ..@ Dimnames:List of 2
   .. ..$ : NULL
   .. ..$ : NULL
   ..@ x       : num [1:7868093] 1 1 1 1 1 1 1 1 1 1 ...
   ..@ factors : list()
```

I'd like to row-normalize (say, with the L-2 norm)... the straight-forward
approach would be something like:
```> row_normalized_P <- P / rowSums(P^2)```

But this causes a memory allocation error, since it appears the `rowSums`
result is being recycled (appropriately) into a _dense_ matrix with
dimensions equal to `dim(P)`.
Given that P is known to be sparse (or at the very least is stored in
sparse format), does anyone know of a non-iterative approach to achieve the
desired `row_normalized_P` shown above?
(I.e. the resultant matrix will be equally sparse as P itself... and I'd
like to avoid ever having a dense matrix (apart from the rowSums vector)
allocated during the normalization steps.)

The only semi-efficient method I've found around this is to `apply` across
rows (more accurately through blocks of rows coerced into dense
sub-matrices of P), but I'd like to try to remove the looping logic from my
codebase if I can, and I'm wondering if perhaps there's a built-in in
the
Matrix package (that I'm just not aware of) that helps with this particular
type of computation.

Cheers and thanks for any help!

-murat

	[[alternative HTML version deleted]]

Stefan Evert

2017-May-04 18:23 UTC

head link

[R] Sparse (dgCMatrix) Matrix row-wise normalization

> On 4 May 2017, at 20:13, Murat Tasan <mmuurr at gmail.com> wrote:
> 
> The only semi-efficient method I've found around this is to `apply`
across
> rows (more accurately through blocks of rows coerced into dense
> sub-matrices of P), but I'd like to try to remove the looping logic
from my
> codebase if I can, and I'm wondering if perhaps there's a built-in
in the
> Matrix package (that I'm just not aware of) that helps with this
particular
> type of computation.
The "wordspace" package has an efficient C-level implementation for
this purpose:

	P.norm <- normalize.rows(P)

which is a short-hand for

	P.norm <- scaleMargins(P, rows=1 / rowNorms(P,
method="euclidean"))

Best,
Stefan

Murat Tasan

2017-May-04 18:53 UTC

head link

[R] Sparse (dgCMatrix) Matrix row-wise normalization

Thanks, Stefan, I'll take a look!

Also, I figured out another solution (~15 minutes after posting :-/):

```
row_normalized_P <- Matrix::Diagonal(x = 1 / sqrt(Matrix::rowSums(P^2)))
%*% P
```

Cheers,

-m

On Thu, May 4, 2017 at 12:23 PM, Stefan Evert <stefanML at
collocations.de>
wrote:
>
> > On 4 May 2017, at 20:13, Murat Tasan <mmuurr at gmail.com>
wrote:
> >
> > The only semi-efficient method I've found around this is to
`apply`
> across
> > rows (more accurately through blocks of rows coerced into dense
> > sub-matrices of P), but I'd like to try to remove the looping
logic from
> my
> > codebase if I can, and I'm wondering if perhaps there's a
built-in in the
> > Matrix package (that I'm just not aware of) that helps with this
> particular
> > type of computation.
>
> The "wordspace" package has an efficient C-level implementation
for this
> purpose:
>
>         P.norm <- normalize.rows(P)
>
> which is a short-hand for
>
>         P.norm <- scaleMargins(P, rows=1 / rowNorms(P,
method="euclidean"))
>
> Best,
> Stefan
	[[alternative HTML version deleted]]

R help - May 2017 - Sparse (dgCMatrix) Matrix row-wise normalization

[R] Sparse (dgCMatrix) Matrix row-wise normalization

[R] Sparse (dgCMatrix) Matrix row-wise normalization

[R] Sparse (dgCMatrix) Matrix row-wise normalization