Christoph Lehmann
2005-Feb-02 12:49 UTC
[Fwd: Re: [R] vectorization of a data-aggregation loop]
great! many thanks, Phil Cheers christoph Phil Spector wrote:> Christoph - > I think reshape is the function you're looking for: > >> tt <- data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), > + c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', > + 'a', 'b', 'a', 'b', 'c'))) >> reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar="id",timevar="type",direction="wide") >> > id x.a x.b x.c > 1 1 6 13 6 > 2 2 10 NA 7 > 3 3 9 14 7 > > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector at stat.berkeley.edu > > > On Tue, 1 Feb 2005, Christoph Lehmann wrote: > >> Hi >> I have a simple question: >> >> the following data.frame >> >> id iwv type >> 1 1 1 a >> 2 1 2 b >> 3 1 11 b >> 4 1 5 a >> 5 1 6 c >> 6 2 4 c >> 7 2 3 c >> 8 2 10 a >> 9 3 6 b >> 10 3 9 a >> 11 3 8 b >> 12 3 7 c >> >> shall be aggregated into the form: >> >> id t.a t.b t.c >> 1 1 6 13 6 >> 6 2 10 0 7 >> 9 3 9 14 7 >> >> means for each 'type' (a, b, c) a new column is introduced which >> gets the sum of iwv for the respective observations 'id' >> >> of course I can do this transformation/aggregation in a loop (see >> below), but is there a way to do this more efficiently, eg. in using >> tapply (or something similar)- since I have lot many rows? >> >> thanks for a hint >> >> christoph >> >> #------------------------------------------------------------------------------ >> >> # the loop-way >> t <- data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), >> c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', >> 'c', 'a', 'b', 'a', 'b', 'c'))) >> names(t) <- c("id", "iwv", "type") >> t$iwv <- as.numeric(t$iwv) >> t >> >> # define the additional columns (type.a, type.b, type.c) >> tt <- rep(0, nrow(t) * length(levels(t$type))) >> dim(tt) <- c(nrow(t), length(levels(t$type))) >> tt <- data.frame(tt) >> dimnames(tt)[[2]] <- paste("t.", levels(t$type), sep = "") >> t <- cbind(t, tt) >> t >> >> obs <- 0 >> obs.previous <- 0 >> row.elim <- rep(FALSE, nrow(t)) >> ta <- which((names(t) == "t.a")) #number of column which codes the >> first type >> r.ctr <- 0 >> for (i in 1:nrow(t)){ >> obs <- t[i,]$id >> if (obs == obs.previous) { >> row.elim[i] <- TRUE >> r.ctr <- r.ctr + 1 #increment >> type.col <- as.numeric(t[i,]$type) >> t[i - r.ctr, ta - 1 + type.col] <- t[i - r.ctr, ta - 1 + >> type.col] + t[i,]$iwv >> } >> else { >> r.ctr <- 0 #record counter >> type.col <- as.numeric(t[i,]$type) >> t[i, ta - 1 + type.col] <- t[i,]$iwv >> } >> obs.previous <- obs >> } >> >> t <- t[!row.elim,] >> t <- subset(t, select = -c(iwv, type)) >> t >> >> #------------------------------------------------------------------------------ >> >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > >