thr3ads.net - R help - [R] sparse vectors [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Robin Hankin

2009-Sep-08 13:06 UTC

[R] sparse vectors

Hi

I deal with long vectors almost all of whose elements are zero.
Typically, the length will be ~5e7 with ~100 nonzero elements.

I want to deal with these objects using a sort of sparse
vector.

The problem is that I want to be able to 'add' two such
vectors. 

Toy problem follows.  Suppose I have two such objects, 'a' and
'b':



 > a
$index
[1]    20   30 100000000

$val
[1] 2.2 3.3 4.4



 > b
$index
[1]   3  30

$val
[1] 0.1 0.1

 >


What I want is the "sum" of these:

 > AplusB
$index
[1]    3   20   30 100000000

$val
[1]  0.1 2.2 3.4 4.4

 >


See how the value for index=30 (being common to both) is 3.4
(=3.3+0.1).   What's the best R idiom to achieve this?



-- 
Robin K. S. Hankin
Uncertainty Analyst
University of Cambridge
19 Silver Street
Cambridge CB3 9EP
01223-764877

Henrique Dallazuanna

2009-Sep-08 13:19 UTC

head link

[R] sparse vectors

Try this:

abMerge <- merge(a, b, by = 'index', all = TRUE)
list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE))

On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin <rksh1@cam.ac.uk> wrote:
> Hi
>
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
>
> I want to deal with these objects using a sort of sparse
> vector.
>
> The problem is that I want to be able to 'add' two such
> vectors.
> Toy problem follows.  Suppose I have two such objects, 'a' and
'b':
>
>
>
> > a
> $index
> [1]    20   30 100000000
>
> $val
> [1] 2.2 3.3 4.4
>
>
>
> > b
> $index
> [1]   3  30
>
> $val
> [1] 0.1 0.1
>
> >
>
>
> What I want is the "sum" of these:
>
> > AplusB
> $index
> [1]    3   20   30 100000000
>
> $val
> [1]  0.1 2.2 3.4 4.4
>
> >
>
>
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
>
>
>
> --
> Robin K. S. Hankin
> Uncertainty Analyst
> University of Cambridge
> 19 Silver Street
> Cambridge CB3 9EP
> 01223-764877
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2009-Sep-08 13:27 UTC

head link

[R] sparse vectors

one simple way could be:

sparse.vec <- function (..., fun = sum) {
     lis <- list(...)
     values <- unlist(lapply(lis, "[[", "value"))
     inds <- factor(unlist(lapply(lis, "[[", "index")))
     out <- tapply(values, inds, FUN = fun)
     list(index = as.numeric(levels(inds)), values = out)
}

a <- list(index = c(20, 30, 100000000), value = c(2.2, 3.3, 4.4))
b <- list(index = c(3, 30), value = c(0.1, 0.1))
sparse.vec(a, b)
sparse.vec(a, b, fun = prod)
sparse.vec(a, b, fun = function(x) Reduce("-", x))


I hope it helps.

Best,
Dimitris


Robin Hankin wrote:> Hi
> 
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
> 
> I want to deal with these objects using a sort of sparse
> vector.
> 
> The problem is that I want to be able to 'add' two such
> vectors.
> Toy problem follows.  Suppose I have two such objects, 'a' and
'b':
> 
> 
> 
>  > a
> $index
> [1]    20   30 100000000
> 
> $val
> [1] 2.2 3.3 4.4
> 
> 
> 
>  > b
> $index
> [1]   3  30
> 
> $val
> [1] 0.1 0.1
> 
>  >
> 
> 
> What I want is the "sum" of these:
> 
>  > AplusB
> $index
> [1]    3   20   30 100000000
> 
> $val
> [1]  0.1 2.2 3.4 4.4
> 
>  >
> 
> 
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
> 
> 
> 
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

Steve Lianoglou

2009-Sep-08 13:36 UTC

head link

[R] sparse vectors

Hi,

On Sep 8, 2009, at 9:06 AM, Robin Hankin wrote:
> Hi
>
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
>
> I want to deal with these objects using a sort of sparse
> vector.
Would using sparse matrices (from the Matrix or SparseM packages) be  
overkill?

-steve
>
> The problem is that I want to be able to 'add' two such
> vectors.
> Toy problem follows.  Suppose I have two such objects, 'a' and
'b':
>
>
>
> > a
> $index
> [1]    20   30 100000000
>
> $val
> [1] 2.2 3.3 4.4
>
>
>
> > b
> $index
> [1]   3  30
>
> $val
> [1] 0.1 0.1
>
> >
>
>
> What I want is the "sum" of these:
>
> > AplusB
> $index
> [1]    3   20   30 100000000
>
> $val
> [1]  0.1 2.2 3.4 4.4
>
> >
>
>
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
>
>
>
> -- 
> Robin K. S. Hankin
> Uncertainty Analyst
> University of Cambridge
> 19 Silver Street
> Cambridge CB3 9EP
> 01223-764877
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

(Ted Harding)

2009-Sep-08 13:42 UTC

head link

[R] sparse vectors

On 08-Sep-09 13:06:28, Robin Hankin wrote:> Hi
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
> 
> I want to deal with these objects using a sort of sparse
> vector.
> 
> The problem is that I want to be able to 'add' two such
> vectors. 
> 
> Toy problem follows.  Suppose I have two such objects, 'a' and
'b':
> 
>  > a
> $index
> [1]    20   30 100000000
> $val
> [1] 2.2 3.3 4.4
> 
>  > b
> $index
> [1]   3  30
> $val
> [1] 0.1 0.1
> 
> What I want is the "sum" of these:
> 
>  > AplusB
> $index
> [1]    3   20   30 100000000
> $val
> [1]  0.1 2.2 3.4 4.4
> 
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
I don't know about "the best", Robin, but how about something
like:

  indices <- sort(unique(c(a$index,b$index)))
  N       <- length(indices)
  values  <- NULL
  for(i in indices){
    if(i %in% a$index){A <- a$val[a$index==i]} else A <- 0
    if(i %in% b$index){B <- b$val[b$index==i]} else B <- 0
    values <- c(values,A+B)
  }
  AplusB <- list(index=indices,val=values)

## Test:
  a<-list(index=c(20,30,100000000),val=c(2.2,3.3,4.4))
  b<-list(index=c(3,30),val=c(0.1, 0.1))
  indices <- sort(unique(c(a$index,b$index)))
  N       <- length(indices)
  values  <- NULL
  for(i in indices){
    if(i %in% a$index){A <- a$val[a$index==i]} else A <- 0
    if(i %in% b$index){B <- b$val[b$index==i]} else B <- 0
    values <- c(values,A+B)
  }
  AplusB <- list(index=indices,val=values)
  AplusB
  # $index
  # [1] 3e+00 2e+01 3e+01 1e+08
  # $val
  # [1] 0.1 2.2 3.4 4.4

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Sep-09                                       Time: 14:42:53
------------------------------ XFMail ------------------------------

Martin Morgan

2009-Sep-08 16:31 UTC

head link

[R] sparse vectors

Hi Robin --

Robin Hankin wrote:> Hi
> 
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
> 
> I want to deal with these objects using a sort of sparse
> vector.
> 
> The problem is that I want to be able to 'add' two such
> vectors.
> Toy problem follows.  Suppose I have two such objects, 'a' and
'b':
The Bioconductor package IRanges has an Rle (run length encoding) class
with math. operations defined on it.

## once only, to install IRanges
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")

## load library
library(IRanges)

It represents runs encoded by their length, rather than by their ends, so

ree2Rle <- function(ends, values)
{
    ## untested
    idx <- diff(c(0, ends)) - 1L
    len <- integer(2*length(idx))
    len[c(TRUE, FALSE)] <- idx
    len[c(FALSE, TRUE)] <- 1L

    val <- vector(typeof(values), 2*length(idx))
    val[c(FALSE, TRUE)] <- values
    Rle(lengths=len, values=val)
}

Since we're adding vectors, and R has recycling rules, we create Rle's
of the same length (by adding a '0' at the last position of b)

a <- ree2Rle(c(20,30, 10000000), c(2.2,3.3,4.4))
b <- ree2Rle(c(3, 30, length(a)), c(.1, .1, 0))

and then do the math
> system.time(abPlus <- a + b)   user  system elapsed
  0.000   0.000   0.001> abPlus  'numeric' Rle instance of length 10000000 with 8 runs
  Lengths:  2 1 16 1 9 1 9999969 1
  Values :  0 0.1 0 2.2 0 3.4 0 4.4

the ends are
> cumsum(runLength(abPlus))[runValue(abPlus) != 0][1]        3       20       30 10000000

and values runValue(abPlus)[runValue(abPlus) != 0]

Martin

> 
> 
> 
>> a
> $index
> [1]    20   30 100000000
> 
> $val
> [1] 2.2 3.3 4.4
> 
> 
> 
>> b
> $index
> [1]   3  30
> 
> $val
> [1] 0.1 0.1
> 
>>
> 
> 
> What I want is the "sum" of these:
> 
>> AplusB
> $index
> [1]    3   20   30 100000000
> 
> $val
> [1]  0.1 2.2 3.4 4.4
> 
>>
> 
> 
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
> 
> 
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Sep 2009 - sparse vectors

[R] sparse vectors

[R] sparse vectors

[R] sparse vectors

[R] sparse vectors

[R] sparse vectors

[R] sparse vectors

Seemingly Similar Threads