Is there any function/way to merge/unite the following data GENEID col1 col2 col3 col4 G234064 1 0 0 0 G234064 1 0 0 0 G234064 1 0 0 0 G234064 0 1 0 0 G234065 0 1 0 0 G234065 0 1 0 0 G234065 0 1 0 0 G234065 0 0 1 0 G234065 0 0 1 0 G234065 0 0 0 1 into GENEID col1 col2 col3 col4 G234064 1 1 0 0 // 1 appears in col1 and col2 above, rest are zero G234065 0 1 1 1 // 1 appears in col2 , 3 and 4 above. Thanks -- Fahim [[alternative HTML version deleted]]
try this:> x <- read.table(textConnection(" GENEID col1 col2 col3 col4+ G234064 1 0 0 0 + G234064 1 0 0 0 + G234064 1 0 0 0 + G234064 0 1 0 0 + G234065 0 1 0 0 + G234065 0 1 0 0 + G234065 0 1 0 0 + G234065 0 0 1 0 + G234065 0 0 1 0 + G234065 0 0 0 1"), header=TRUE, as.is=TRUE)> closeAllConnections() > # split the data and process the columns > do.call(rbind, lapply(split(x, x$GENEID), function(z){+ # 'or' the columns + + colOR <- t(apply(z[-1], 2, any)) + 0L + data.frame(GENEID=z[[1]][1], colOR) + })) GENEID col1 col2 col3 col4 G234064 G234064 1 1 0 0 G234065 G234065 0 1 1 1>On Sat, Jul 24, 2010 at 12:10 AM, Fahim Md <fahim.md at gmail.com> wrote:> Is there any function/way to merge/unite the following data > > ?GENEID ? ? ?col1 ? ? ? ? ?col2 ? ? ? ? ? ? col3 ? ? ? ? ? ? ? ?col4 > ?G234064 ? ? ? ? 1 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234064 ? ? ? ? 1 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234064 ? ? ? ? 1 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234064 ? ? ? ? 0 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? 0 > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 1 > > > into > GENEID ? ? ?col1 ? ? ? ? ?col2 ? ? ? ? ? ? col3 ? ? ? ? ? ? ? ?col4 > ?G234064 ? ? ? ? 1 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0 > // 1 appears in col1 and col2 above, rest are zero > ?G234065 ? ? ? ? 0 ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? 1 > // 1 appears in col2 , 3 and 4 above. > > > Thanks > > > > -- > Fahim > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Fahim Md wrote:> Is there any function/way to merge/unite the following data > > GENEID col1 col2 col3 col4 > G234064 1 0 0 0 > G234064 1 0 0 0 > G234064 1 0 0 0 > G234064 0 1 0 0 > G234065 0 1 0 0 > G234065 0 1 0 0 > G234065 0 1 0 0 > G234065 0 0 1 0 > G234065 0 0 1 0 > G234065 0 0 0 1 > > > into > GENEID col1 col2 col3 col4 > G234064 1 1 0 0 > // 1 appears in col1 and col2 above, rest are zero > G234065 0 1 1 1 > // 1 appears in col2 , 3 and 4 above. > > > ThankWarning on terminology: there is a "merge" function in R that lines up rows from different tables to make a new set of longer rows (more columns). The usual term for combining column values from multiple rows is "aggregation". In addition to the example offered by Jim Holtzman, here are some other options in no particular order: x <- read.table(textConnection(" GENEID col1 col2 col3 col4 G234064 1 0 0 0 G234064 1 0 0 0 G234064 1 0 0 0 G234064 0 1 0 0 G234065 0 1 0 0 G234065 0 1 0 0 G234065 0 1 0 0 G234065 0 0 1 0 G234065 0 0 1 0 G234065 0 0 0 1 "), header=TRUE, as.is=TRUE, row.names=NULL) closeAllConnections() # syntactic repackaging of Jim's basic approach library(plyr) ddply( x, .(GENEID), function(df) {with(as.integer(c(col1=any(col1),col2=any(col2),col3=any(col3),col4=any(col4))))} ) # if you are familiar with SQL, this approach allows you merge data frames and aggregate rows in the same step, but has a somewhat limited range of aggregating functions available library(sqldf) sqldf("SELECT GENEID, max(col1) AS col1, max(col2) AS col2, max(col3) AS col3, max(col4) AS col4 FROM x GROUP BY GENEID") # when the data you want to crunch is separate from the key(s) or you can separate them yourself crunch <- function(tmp){as.integer(any(tmp))} aggregate( x[,c("col1","col2","col3","col4")], by=list(x$GENEID), FUN="crunch" ) # this is typically used if you want to aggregate many columns with the same set of operations library(doBy) summaryBy( .~GENEID, data=x, FUN=c(sum,crunch) )