Hi, all together. I have - a maybe trivial - problem with aggregating a list of weights. Here is the problem: - At first I have set of nodes (X/Y-coordinates) and associated weights, where the set of nodes is typically not unique - I want to get a set of unique nodes and the sum of associated weights I am grateful for any help See for example: # weights: w <- c(1, 1, 1, 1, 1) # not unique set of nodes (X/Y-coordinates): nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) desired Result: #nodes [,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6 [4,] 1 4 #weights 2 1 1 1 That is my solution, but it is very slow (typical size of nodes --> 200000x2): weights <- c(1, 1, 1, 1, 1) nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) ## to be replaced by a faster code drop.index <- duplicated(nodes) n.unique <- nodes[!drop.index, ] w.unique <- numeric(length(n.unique[,1])) lw <- length(weights) for (i in seq_along(w.unique)){ index <- as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow = TRUE, nrow=lw))) w.unique[i] <- sum(weights[index]) } ## n.unique w.unique ^ | X | / | /eiser, Constantin | / Gutenberg University of Mainz, Germany | * /\ / Chair of Statistics & Econometrics | \ / \ / Jakob-Welder-Weg 4, 55128 Mainz | \/ \/ House of Law and Economics II, Room 00-116 | Tel: 0049 6131 39 22715 +--------------------------------------------------------->
You say "typical size of nodes --> 200000x2" Do you mean that nodes has many many rows? Or many many columns? Or both? This is minimalist coding, but I'm not sure how fast it will run on your data aggregate(w, as.data.frame(nodes), sum) Jean "Weiser, Constantin" <constantin.weiser@uni-mainz.de> wrote on 06/28/2012 08:06:27 AM:> Hi, all together. I have - a maybe trivial - problem with aggregating a > list of weights. > > Here is the problem: > - At first I have set of nodes (X/Y-coordinates) and associatedweights,> where the set > of nodes is typically not unique > - I want to get a set of unique nodes and the sum of associated weights > > I am grateful for any help > > > See for example: > > # weights: > w <- c(1, 1, 1, 1, 1) > > # not unique set of nodes (X/Y-coordinates): > nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) > > > desired Result: > > #nodes > [,1] [,2] > [1,] 1 2 > [2,] 3 4 > [3,] 5 6 > [4,] 1 4 > > > > #weights > 2 1 1 1 > > > > That is my solution, but it is very slow (typical size of nodes --> > 200000x2): > > weights <- c(1, 1, 1, 1, 1) > nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) > > > ## to be replaced by a faster code > drop.index <- duplicated(nodes) > n.unique <- nodes[!drop.index, ] > w.unique <- numeric(length(n.unique[,1])) > > lw <- length(weights) > for (i in seq_along(w.unique)){ > index <-as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow> = TRUE, nrow=lw))) > w.unique[i] <- sum(weights[index]) > } > ## > > n.unique > w.unique > > > > > > ^ > | X > | / > | /eiser, Constantin > | / Gutenberg University of Mainz, Germany > | * /\ / Chair of Statistics & Econometrics > | \ / \ / Jakob-Welder-Weg 4, 55128 Mainz > | \/ \/ House of Law and Economics II, Room 00-116 > | Tel: 0049 6131 39 22715 > +--------------------------------------------------------->[[alternative HTML version deleted]]
Hello, Would around two orders of magnitude interess? f1 <- function(Nodes, Weights){ drop.index <- duplicated(Nodes) n.unique <- Nodes[!drop.index, ] w.unique <- numeric(length(n.unique[,1])) lw <- length(Weights) for (i in seq_along(w.unique)){ index <- as.logical(2 == rowSums( Nodes == matrix(rep(n.unique[i,],lw), byrow=TRUE, nrow=lw))) w.unique[i] <- sum(Weights[index]) } list(n.unique=n.unique, w.unique=w.unique) } f2 <- function(Nodes, Weights){ rows <- paste(Nodes[,1], Nodes[,2], sep=".") w.uniq <- tapply(Weights, rows, sum) attributes(w.uniq) <- NULL ord <- order(unique(rows)) list(n.unique=unique(Nodes), w.unique=w.uniq[order(ord)]) } # Test it M <- 100 # see text below n <- 2e5 set.seed(1234) nd <- matrix(sample(M, n*2, TRUE), n, 2) ww <- rep(1, n) t1 <- system.time(r1 <- f1(nd, ww)) t2 <- system.time(r2 <- f2(nd, ww)) identical(r1, r2) print(rbind(t1=t1, t2=t2, ratio=t1/t2), digits=3) user.self sys.self elapsed user.child sys.child t1 310.41 67 379.07 NA NA t2 5.59 0 5.62 NA NA ratio 55.53 Inf 67.45 NA NA With bigger M the number of pairwise combinations increases and so does the number of unique rows. This causes the time taken by f1 to really increase, but f2 scales up rather slowly. The ratio above becomes really better and better. Hope this helps, Rui Barradas Em 28-06-2012 14:06, Weiser, Constantin escreveu:> Hi, all together. I have - a maybe trivial - problem with aggregating a > list of weights. > > Here is the problem: > - At first I have set of nodes (X/Y-coordinates) and associated weights, > where the set > of nodes is typically not unique > - I want to get a set of unique nodes and the sum of associated weights > > I am grateful for any help > > > See for example: > > # weights: > w <- c(1, 1, 1, 1, 1) > > # not unique set of nodes (X/Y-coordinates): > nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) > > > desired Result: > > #nodes > [,1] [,2] > [1,] 1 2 > [2,] 3 4 > [3,] 5 6 > [4,] 1 4 > > > > #weights > 2 1 1 1 > > > > That is my solution, but it is very slow (typical size of nodes --> > 200000x2): > > weights <- c(1, 1, 1, 1, 1) > nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T) > > > ## to be replaced by a faster code > drop.index <- duplicated(nodes) > n.unique <- nodes[!drop.index, ] > w.unique <- numeric(length(n.unique[,1])) > > lw <- length(weights) > for (i in seq_along(w.unique)){ > index <- as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow > = TRUE, nrow=lw))) > w.unique[i] <- sum(weights[index]) > } > ## > > n.unique > w.unique > > > > > > ^ > | X > | / > | /eiser, Constantin > | / Gutenberg University of Mainz, Germany > | * /\ / Chair of Statistics & Econometrics > | \ / \ / Jakob-Welder-Weg 4, 55128 Mainz > | \/ \/ House of Law and Economics II, Room 00-116 > | Tel: 0049 6131 39 22715 > +---------------------------------------------------------> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Maybe Matching Threads
- Using missing() in a S4 method with extra arguments
- Problem with documentation of user-defined operator (S4 method)
- Package with multiple shared libraries
- How to find out if a data frame has automatic row names?
- [R] Sweave stops when opening X11 device fails