Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': > a $index [1] 20 30 100000000 $val [1] 2.2 3.3 4.4 > b $index [1] 3 30 $val [1] 0.1 0.1 > What I want is the "sum" of these: > AplusB $index [1] 3 20 30 100000000 $val [1] 0.1 2.2 3.4 4.4 > See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877
Try this: abMerge <- merge(a, b, by = 'index', all = TRUE) list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE)) On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin <rksh1@cam.ac.uk> wrote:> Hi > > I deal with long vectors almost all of whose elements are zero. > Typically, the length will be ~5e7 with ~100 nonzero elements. > > I want to deal with these objects using a sort of sparse > vector. > > The problem is that I want to be able to 'add' two such > vectors. > Toy problem follows. Suppose I have two such objects, 'a' and 'b': > > > > > a > $index > [1] 20 30 100000000 > > $val > [1] 2.2 3.3 4.4 > > > > > b > $index > [1] 3 30 > > $val > [1] 0.1 0.1 > > > > > > What I want is the "sum" of these: > > > AplusB > $index > [1] 3 20 30 100000000 > > $val > [1] 0.1 2.2 3.4 4.4 > > > > > > See how the value for index=30 (being common to both) is 3.4 > (=3.3+0.1). What's the best R idiom to achieve this? > > > > -- > Robin K. S. Hankin > Uncertainty Analyst > University of Cambridge > 19 Silver Street > Cambridge CB3 9EP > 01223-764877 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
one simple way could be: sparse.vec <- function (..., fun = sum) { lis <- list(...) values <- unlist(lapply(lis, "[[", "value")) inds <- factor(unlist(lapply(lis, "[[", "index"))) out <- tapply(values, inds, FUN = fun) list(index = as.numeric(levels(inds)), values = out) } a <- list(index = c(20, 30, 100000000), value = c(2.2, 3.3, 4.4)) b <- list(index = c(3, 30), value = c(0.1, 0.1)) sparse.vec(a, b) sparse.vec(a, b, fun = prod) sparse.vec(a, b, fun = function(x) Reduce("-", x)) I hope it helps. Best, Dimitris Robin Hankin wrote:> Hi > > I deal with long vectors almost all of whose elements are zero. > Typically, the length will be ~5e7 with ~100 nonzero elements. > > I want to deal with these objects using a sort of sparse > vector. > > The problem is that I want to be able to 'add' two such > vectors. > Toy problem follows. Suppose I have two such objects, 'a' and 'b': > > > > > a > $index > [1] 20 30 100000000 > > $val > [1] 2.2 3.3 4.4 > > > > > b > $index > [1] 3 30 > > $val > [1] 0.1 0.1 > > > > > > What I want is the "sum" of these: > > > AplusB > $index > [1] 3 20 30 100000000 > > $val > [1] 0.1 2.2 3.4 4.4 > > > > > > See how the value for index=30 (being common to both) is 3.4 > (=3.3+0.1). What's the best R idiom to achieve this? > > >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Hi, On Sep 8, 2009, at 9:06 AM, Robin Hankin wrote:> Hi > > I deal with long vectors almost all of whose elements are zero. > Typically, the length will be ~5e7 with ~100 nonzero elements. > > I want to deal with these objects using a sort of sparse > vector.Would using sparse matrices (from the Matrix or SparseM packages) be overkill? -steve> > The problem is that I want to be able to 'add' two such > vectors. > Toy problem follows. Suppose I have two such objects, 'a' and 'b': > > > > > a > $index > [1] 20 30 100000000 > > $val > [1] 2.2 3.3 4.4 > > > > > b > $index > [1] 3 30 > > $val > [1] 0.1 0.1 > > > > > > What I want is the "sum" of these: > > > AplusB > $index > [1] 3 20 30 100000000 > > $val > [1] 0.1 2.2 3.4 4.4 > > > > > > See how the value for index=30 (being common to both) is 3.4 > (=3.3+0.1). What's the best R idiom to achieve this? > > > > -- > Robin K. S. Hankin > Uncertainty Analyst > University of Cambridge > 19 Silver Street > Cambridge CB3 9EP > 01223-764877 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On 08-Sep-09 13:06:28, Robin Hankin wrote:> Hi > I deal with long vectors almost all of whose elements are zero. > Typically, the length will be ~5e7 with ~100 nonzero elements. > > I want to deal with these objects using a sort of sparse > vector. > > The problem is that I want to be able to 'add' two such > vectors. > > Toy problem follows. Suppose I have two such objects, 'a' and 'b': > > > a > $index > [1] 20 30 100000000 > $val > [1] 2.2 3.3 4.4 > > > b > $index > [1] 3 30 > $val > [1] 0.1 0.1 > > What I want is the "sum" of these: > > > AplusB > $index > [1] 3 20 30 100000000 > $val > [1] 0.1 2.2 3.4 4.4 > > See how the value for index=30 (being common to both) is 3.4 > (=3.3+0.1). What's the best R idiom to achieve this?I don't know about "the best", Robin, but how about something like: indices <- sort(unique(c(a$index,b$index))) N <- length(indices) values <- NULL for(i in indices){ if(i %in% a$index){A <- a$val[a$index==i]} else A <- 0 if(i %in% b$index){B <- b$val[b$index==i]} else B <- 0 values <- c(values,A+B) } AplusB <- list(index=indices,val=values) ## Test: a<-list(index=c(20,30,100000000),val=c(2.2,3.3,4.4)) b<-list(index=c(3,30),val=c(0.1, 0.1)) indices <- sort(unique(c(a$index,b$index))) N <- length(indices) values <- NULL for(i in indices){ if(i %in% a$index){A <- a$val[a$index==i]} else A <- 0 if(i %in% b$index){B <- b$val[b$index==i]} else B <- 0 values <- c(values,A+B) } AplusB <- list(index=indices,val=values) AplusB # $index # [1] 3e+00 2e+01 3e+01 1e+08 # $val # [1] 0.1 2.2 3.4 4.4 Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 14:42:53 ------------------------------ XFMail ------------------------------
Hi Robin -- Robin Hankin wrote:> Hi > > I deal with long vectors almost all of whose elements are zero. > Typically, the length will be ~5e7 with ~100 nonzero elements. > > I want to deal with these objects using a sort of sparse > vector. > > The problem is that I want to be able to 'add' two such > vectors. > Toy problem follows. Suppose I have two such objects, 'a' and 'b':The Bioconductor package IRanges has an Rle (run length encoding) class with math. operations defined on it. ## once only, to install IRanges source("http://bioconductor.org/biocLite.R") biocLite("IRanges") ## load library library(IRanges) It represents runs encoded by their length, rather than by their ends, so ree2Rle <- function(ends, values) { ## untested idx <- diff(c(0, ends)) - 1L len <- integer(2*length(idx)) len[c(TRUE, FALSE)] <- idx len[c(FALSE, TRUE)] <- 1L val <- vector(typeof(values), 2*length(idx)) val[c(FALSE, TRUE)] <- values Rle(lengths=len, values=val) } Since we're adding vectors, and R has recycling rules, we create Rle's of the same length (by adding a '0' at the last position of b) a <- ree2Rle(c(20,30, 10000000), c(2.2,3.3,4.4)) b <- ree2Rle(c(3, 30, length(a)), c(.1, .1, 0)) and then do the math> system.time(abPlus <- a + b)user system elapsed 0.000 0.000 0.001> abPlus'numeric' Rle instance of length 10000000 with 8 runs Lengths: 2 1 16 1 9 1 9999969 1 Values : 0 0.1 0 2.2 0 3.4 0 4.4 the ends are> cumsum(runLength(abPlus))[runValue(abPlus) != 0][1] 3 20 30 10000000 and values runValue(abPlus)[runValue(abPlus) != 0] Martin> > > >> a > $index > [1] 20 30 100000000 > > $val > [1] 2.2 3.3 4.4 > > > >> b > $index > [1] 3 30 > > $val > [1] 0.1 0.1 > >> > > > What I want is the "sum" of these: > >> AplusB > $index > [1] 3 20 30 100000000 > > $val > [1] 0.1 2.2 3.4 4.4 > >> > > > See how the value for index=30 (being common to both) is 3.4 > (=3.3+0.1). What's the best R idiom to achieve this? > > >