Young Cho
2007-May-29  20:37 UTC
[R] summing up colum values for unique IDs when multiple ID's exist in data frame
I have data.frame's with IDs and multiple columns. B/c some of IDs showed up more than once, I need sum up colum values to creat a new dataframe with unique ids. I hope there are some cheaper ways of doing it... Because the dataframe is huge, it takes almost an hour to do the task. Thanks so much in advance! Young # ------------------------- examples are here and sum.dup.r is at the bottom.> x = data.frame(ID = c('A','B','C','A'), val=c(0.1,0.001,-0.1,0.2)) > xID val 1 A 0.100 2 B 0.001 3 C -0.100 4 A 0.200> sum.dup(x)ID val 1 A 0.300 2 B 0.001 3 C -0.100 sum.dup <- function( x ){ d.row = which(duplicated(x$ID)) if( length(d.row) > 0){ id = x$ID[d.row] com.val = x[-d.row,] for(i in 1:length(id)){ s = sum(x$val[ x$ID == id[i] ]) com.val$val[ com.val$ID == id[i] ] = s } ix = sort(as.character(com.val[,1]),index.return=T) return(com.val[ix$ix,]) }else{ ix = sort(as.character(x[,1]),index.return=T) return(x[ix$ix,]) } } [[alternative HTML version deleted]]
Seth Falcon
2007-May-29  21:47 UTC
[R] summing up colum values for unique IDs when multiple ID's exist in data frame
"Young Cho" <young.stat at gmail.com> writes:> I have data.frame's with IDs and multiple columns. B/c some of IDs > showed up more than once, I need sum up colum values to creat a new > dataframe with unique ids. > > I hope there are some cheaper ways of doing it... Because the > dataframe is huge, it takes almost an hour to do the task. Thanks > so much in advance!Does this do what you want in a faster way? sum_dup <- function(df) { idIdx <- split(1:nrow(df), as.character(df$ID)) whID <- match("ID", names(df)) colNms <- names(df)[-whID] ans <- lapply(colNms, function(cn) { unlist(lapply(idIdx, function(x) sum(df[[cn]][x])), use.names=FALSE) }) attributes(ans) <- list(names=colNms, row.names=names(idIdx), class="data.frame") ans } -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
jim holtman
2007-May-30  00:06 UTC
[R] summing up colum values for unique IDs when multiple ID's exist in data frame
try this:> x <- " ID val+ 1 A 0.100 + 2 B 0.001 + 3 C -0.100 + 4 A 0.200 + "> x <- read.table(textConnection(x), header=TRUE) > (z <- tapply(x$val, x$ID, sum))A B C 0.300 0.001 -0.100> data.frame(ID=names(z), val=z)ID val A A 0.300 B B 0.001 C C -0.100>On 5/29/07, Young Cho <young.stat@gmail.com> wrote:> > I have data.frame's with IDs and multiple columns. B/c some of IDs showed > up > more than once, I need sum up colum values to creat a new dataframe with > unique ids. > > I hope there are some cheaper ways of doing it... Because the dataframe > is > huge, it takes almost an hour to do the task. Thanks so much in advance! > > Young > > # ------------------------- examples are here and sum.dup.r is at the > bottom. > > > x = data.frame(ID = c('A','B','C','A'), val=c(0.1,0.001,-0.1,0.2)) > > x > ID val > 1 A 0.100 > 2 B 0.001 > 3 C -0.100 > 4 A 0.200 > > sum.dup(x) > ID val > 1 A 0.300 > 2 B 0.001 > 3 C -0.100 > > > > sum.dup <- function( x ){ > > d.row = which(duplicated(x$ID)) > if( length(d.row) > 0){ > id = x$ID[d.row] > com.val = x[-d.row,] > for(i in 1:length(id)){ > s = sum(x$val[ x$ID == id[i] ]) > com.val$val[ com.val$ID == id[i] ] = s > } > ix = sort(as.character(com.val[,1]),index.return=T) > return(com.val[ix$ix,]) > }else{ > ix = sort(as.character(x[,1]),index.return=T) > return(x[ix$ix,]) > } > > } > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]