thr3ads.net - R help - [R] summing up colum values for unique IDs when multiple ID's exist in data frame [May 2007]

If this information is useful, please help other people find it:
Share via:

Young Cho

2007-May-29 20:37 UTC

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

I have data.frame's with IDs and multiple columns. B/c some of IDs showed up
more than once, I need sum up colum values to creat a new dataframe with
unique ids.

I hope there are some cheaper ways of doing it...  Because the dataframe is
huge, it takes almost an hour to do the task.  Thanks so much in advance!

Young

# -------------------------  examples are here and sum.dup.r is at the
bottom.
> x = data.frame(ID = c('A','B','C','A'),
val=c(0.1,0.001,-0.1,0.2))
> x  ID    val
1  A  0.100
2  B  0.001
3  C -0.100
4  A  0.200> sum.dup(x)  ID    val
1  A  0.300
2  B  0.001
3  C -0.100



sum.dup <- function( x ){

        d.row = which(duplicated(x$ID))
        if( length(d.row) > 0){
                id = x$ID[d.row]
                com.val = x[-d.row,]
                for(i in 1:length(id)){
                        s = sum(x$val[ x$ID == id[i] ])
                        com.val$val[ com.val$ID == id[i] ] = s
                }
                ix = sort(as.character(com.val[,1]),index.return=T)
                return(com.val[ix$ix,])
        }else{
                ix = sort(as.character(x[,1]),index.return=T)
                return(x[ix$ix,])
        }

}

	[[alternative HTML version deleted]]

Seth Falcon

2007-May-29 21:47 UTC

head link

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

"Young Cho" <young.stat at gmail.com> writes:
> I have data.frame's with IDs and multiple columns. B/c some of IDs
> showed up more than once, I need sum up colum values to creat a new
> dataframe with unique ids.
>
> I hope there are some cheaper ways of doing it...  Because the
> dataframe is huge, it takes almost an hour to do the task.  Thanks
> so much in advance!
Does this do what you want in a faster way?

sum_dup <- function(df) {
    idIdx <- split(1:nrow(df), as.character(df$ID))
    whID <- match("ID", names(df))
    colNms <- names(df)[-whID]
    ans <- lapply(colNms, function(cn) {
        unlist(lapply(idIdx,
                      function(x) sum(df[[cn]][x])),
               use.names=FALSE)
    })
    attributes(ans) <- list(names=colNms,
                            row.names=names(idIdx),
                            class="data.frame")
    ans
}


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

jim holtman

2007-May-30 00:06 UTC

head link

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

try this:
> x <- " ID    val+ 1  A  0.100
+ 2  B  0.001
+ 3  C -0.100
+ 4  A  0.200
+ "> x <- read.table(textConnection(x), header=TRUE)
> (z <- tapply(x$val, x$ID, sum))     A      B      C
 0.300  0.001 -0.100> data.frame(ID=names(z), val=z)  ID    val
A  A  0.300
B  B  0.001
C  C -0.100>


On 5/29/07, Young Cho <young.stat@gmail.com>
wrote:>
> I have data.frame's with IDs and multiple columns. B/c some of IDs
showed
> up
> more than once, I need sum up colum values to creat a new dataframe with
> unique ids.
>
> I hope there are some cheaper ways of doing it...  Because the dataframe
> is
> huge, it takes almost an hour to do the task.  Thanks so much in advance!
>
> Young
>
> # -------------------------  examples are here and sum.dup.r is at the
> bottom.
>
> > x = data.frame(ID =
c('A','B','C','A'), val=c(0.1,0.001,-0.1,0.2))
> > x
> ID    val
> 1  A  0.100
> 2  B  0.001
> 3  C -0.100
> 4  A  0.200
> > sum.dup(x)
> ID    val
> 1  A  0.300
> 2  B  0.001
> 3  C -0.100
>
>
>
> sum.dup <- function( x ){
>
>        d.row = which(duplicated(x$ID))
>        if( length(d.row) > 0){
>                id = x$ID[d.row]
>                com.val = x[-d.row,]
>                for(i in 1:length(id)){
>                        s = sum(x$val[ x$ID == id[i] ])
>                        com.val$val[ com.val$ID == id[i] ] = s
>                }
>                ix = sort(as.character(com.val[,1]),index.return=T)
>                return(com.val[ix$ix,])
>        }else{
>                ix = sort(as.character(x[,1]),index.return=T)
>                return(x[ix$ix,])
>        }
>
> }
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more possibly parallel threads

R help - May 2007 - summing up colum values for unique IDs when multiple ID's exist in data frame

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

[R] summing up colum values for unique IDs when multiple ID's exist in data frame

Reasonably Related Threads