Dear All, I am new to R. I have a 2 column data frame with more than ten thousand rows. Something like below. I want to add up all duplicated items, e.g. the three "aa" add up together to get a single value gene=a, value=74. How can I do that?? Thanks for help ! gene value aa 20 bb 10 cc 9 aa 30 aa 24 dd 100 ee 55 .... ... Millions thanks. Best Regards, hon WONG, Hon-Kit (Stephen) Cleary Lab, Dept of Pathology Stanford University
Hi: There are many ways to do this sort of thing in R; one way is (naming your example data frame d) aggregate(value ~ gene, data = d, FUN = sum) gene value 1 aa 74 2 bb 10 3 cc 9 4 dd 100 5 ee 55 This code line works for R-2.11.0 and later. HTH, Dennis On Fri, May 13, 2011 at 4:06 PM, wong, honkit (Stephen) <honkit at stanford.edu> wrote:> Dear All, > I am new to R. I have a 2 column data frame with more than ten thousand > rows. Something like below. I want to add up all duplicated items, e.g. the > three "aa" add up together to get a single value gene=a, value=74. How can I > do that?? Thanks for help ! > gene value > aa ? ? ? 20 > bb ? ? ?10 > cc ? ? ? 9 > aa ? ? ?30 > aa ? ? ?24 > dd ? ? ? 100 > ee ? ? ?55 > .... ... > Millions thanks. > Best Regards, > hon > > WONG, Hon-Kit (Stephen) > Cleary Lab, Dept of Pathology > Stanford University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi, On Fri, May 13, 2011 at 7:06 PM, wong, honkit (Stephen) <honkit at stanford.edu> wrote:> Dear All, > I am new to R. I have a 2 column data frame with more than ten thousand > rows. Something like below. I want to add up all duplicated items, e.g. the > three "aa" add up together to get a single value gene=a, value=74. How can I > do that?? Thanks for help ! > gene value > aa ? ? ? 20 > bb ? ? ?10 > cc ? ? ? 9 > aa ? ? ?30 > aa ? ? ?24 > dd ? ? ? 100 > ee ? ? ?55In addition to Dennis' suggestion to use the aggregate function, you could look at the plyr or data.table packages. For instance. As Dennis suggested, lets assume your data is in a data.frame object named `d`. R> d <- data.frame(gene=c('aa', 'bb', 'cc', 'aa', 'aa', 'dd', 'ee'), value=c(20, 10, 9, 30, 24, 100, 55)) Using data.table: R> library(data.table) R> dd <- data.table(d, key='gene') # note this will reorder the data in dd R> dd[, list(total=sum(value)), by=gene] gene total [1,] aa 74 [2,] bb 10 [3,] cc 9 [4,] dd 100 [5,] ee 55 Or using plyr R> library(plyr) R> ddply(idata.frame(d), .(gene), summarize, total=sum(value)) gene total 1 aa 74 2 bb 10 3 cc 9 4 dd 100 5 ee 55 Note that you don't have to use idata.frame(d) -- you can just do: R> ddply(d, .(gene), summarize, total=sum(value)) but using idata.frame(d) helps to calculate the result faster, especially noticeable for larger data.frame(s). Using data.table will likely be faster still (again, more noticeable with larger data.frames), but (for one thing) be aware that the order of the rows in dd will be different than the ones in d: they will be ordered by the key column(s). Also working with data.table objects is somehow similar to "normal" data.frame objects, but they do differ in important ways (eg. how to index columns using the [] syntax, for starters). You should go through the plyr tutorial(s) (at: http://had.co.nz/plyr/) , or the vignette(s) that comes w/ data.table for more info/help/use-cases if you plan to go that route. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
And another approach: ================================================================library(reshape2) mydata <- data.frame(aa = sample(Cs(a,b,c,d,e),10,replace=TRUE), bb = sample(1:10, 10, replace=TRUE)) (m1 <- melt(mydata)) ================================================================ --- On Fri, 5/13/11, wong, honkit (Stephen) <honkit at stanford.edu> wrote:> From: wong, honkit (Stephen) <honkit at stanford.edu> > Subject: [R] Adding same items together in data.frame > To: r-help at r-project.org > Received: Friday, May 13, 2011, 7:06 PM > Dear All, > I am new to R. I have a 2 column data frame with more than > ten thousand > rows. Something like below. I want to add up all duplicated > items, e.g. the > three "aa" add up together to get a single value gene=a, > value=74. How can I > do that?? Thanks for help ! > gene value > aa?????20 > bb ??? 10 > cc?????9 > aa ??? 30 > aa ??? 24 > dd?????100 > ee ??? 55 > .... ... > Millions thanks. > Best Regards, > hon > > WONG, Hon-Kit (Stephen) > Cleary Lab, Dept of Pathology > Stanford University >