Juliet Hannah
2010-Sep-07 16:00 UTC
[R] average columns of data frame corresponding to replicates
Hi Group, I have a data frame below. Within this data frame there are samples (columns) that are measured more than once. Samples are indicated by "idx". So "id1" is present in columns 1, 3, and 5. Not every id is repeated. I would like to create a new data frame so that the repeated ids are averaged. For example, in the new data frame, columns 1, 3, and 5 of the original will be replaced by 1 new column that is the mean of these three. Thanks for any suggestions. Juliet myData <- data.frame("sample1.id1" =rep(1,10), "sample1.id2"=rep(2,10), "sample2.id1" = rep(2,10), "sample1.id3" = 1:10, "sample3.id1" = rep(1,10), "sample1.id4" = 1:10, "sample2.id2" = rep(1,10)) repeat_ids <- c("id1","id2")
jim holtman
2010-Sep-09 11:31 UTC
[R] average columns of data frame corresponding to replicates
try this:> myDatasample1.id1 sample1.id2 sample2.id1 sample1.id3 sample3.id1 sample1.id4 sample2.id2 1 1 2 2 1 1 1 1 2 1 2 2 2 1 2 1 3 1 2 2 3 1 3 1 4 1 2 2 4 1 4 1 5 1 2 2 5 1 5 1 6 1 2 2 6 1 6 1 7 1 2 2 7 1 7 1 8 1 2 2 8 1 8 1 9 1 2 2 9 1 9 1 10 1 2 2 10 1 10 1> newData <- NULL > for (i in repeat_ids){+ # determine the columns to use + colIndx <- grep(paste(i, "$", sep=''), colnames(myData)) + if (length(colIndx) == 0) next # make sure it exists + # create the average of the columns + newData <- cbind(newData, rowMeans(myData[, colIndx], na.rm=TRUE)) + colnames(newData)[ncol(newData)] <- i # add the name + }> newDataid1 id2 [1,] 1.333333 1.5 [2,] 1.333333 1.5 [3,] 1.333333 1.5 [4,] 1.333333 1.5 [5,] 1.333333 1.5 [6,] 1.333333 1.5 [7,] 1.333333 1.5 [8,] 1.333333 1.5 [9,] 1.333333 1.5 [10,] 1.333333 1.5>On Tue, Sep 7, 2010 at 12:00 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:> Hi Group, > > I have a data frame below. Within this data frame there are ?samples > (columns) that are measured ?more than once. Samples are indicated by > "idx". So "id1" is present in columns 1, 3, and 5. Not every id is > repeated. I would like to create a new data frame so that the repeated > ?ids are averaged. For example, in the new data frame, columns 1, 3, > and 5 of the original will be replaced by 1 new column ?that is the > mean of these three. Thanks for any suggestions. > > Juliet > > > > myData <- data.frame("sample1.id1" =rep(1,10), > "sample1.id2"=rep(2,10), > "sample2.id1" = rep(2,10), > "sample1.id3" = 1:10, > "sample3.id1" = rep(1,10), > "sample1.id4" = 1:10, > "sample2.id2" = rep(1,10)) > > repeat_ids <- c("id1","id2") > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Possibly Parallel Threads
- computing marginal values based on multiple columns?
- conditional filter resulting in 2 new dataframes
- phantom NA/NaN/Inf in foreign function call (or something altogether different?)
- Converting Strings to Variable names
- Problem in converting natural numbers to bits and others