Hello -- I am trying to merge columns in a dataframe based on substring matches in colnames. I would appreciate if somebody can suggest a faster/cleaner approach (eg. I would have really liked to avoid the if-else piece but rowSums does not like that). Thanks. data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0), aac=c(0,0,1), bbk=c(1,0,1)) col2 <- substr(colnames(data.df),1,2) col2.uniq <- unique(col2) names(col2.uniq) <- col2.uniq data.frame(sapply(col2.uniq, function(col) { wcol <- which(col==col2) if(length(wcol)>1) { tmp <- rowSums(data.df[,wcol]) } else { tmp <- data.df[,wcol] } as.numeric(tmp>0) }))
Additional clarification: the problem only comes when you have one column selected from the original dataframe. You need to make the following modification to the original example: data.df <- data.frame(aa=c(1,1,0), cc=c(1,0,0), aab=c(0,1,0), aac=c(0,0,1), bb=c(1,0,1)) And, the following seems to work: data.frame(sapply(col2.uniq, function(col) { wcol <- which(col==col2) as.numeric(rowSums(data.frame(data.df[,wcol]))>0) })) I had to wrap data.df[,wcol] in another data.frame to handle situations where wcol had one element. Is there a better approach? ---- Chuck White <chuckwhite8 at charter.net> wrote:> Hello -- I am trying to merge columns in a dataframe based on substring matches in colnames. I would appreciate if somebody can suggest a faster/cleaner approach (eg. I would have really liked to avoid the if-else piece but rowSums does not like that). Thanks. > > data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0), aac=c(0,0,1), bbk=c(1,0,1)) > col2 <- substr(colnames(data.df),1,2) > > col2.uniq <- unique(col2) > names(col2.uniq) <- col2.uniq > > data.frame(sapply(col2.uniq, function(col) { > wcol <- which(col==col2) > if(length(wcol)>1) { > tmp <- rowSums(data.df[,wcol]) > } else { > tmp <- data.df[,wcol] > } > as.numeric(tmp>0) > }))
Yes. data.df[,wcol,drop=FALSE] For an explanation of drop see ?"[.data.frame" "Chuck White" <chuckwhite8 at charter.net> wrote in message news:20100202212800.O8XBU.681696.root at mp11...> Additional clarification: the problem only comes when you have one column > selected from the original dataframe. You need to make the following > modification to the original example: > > data.df <- data.frame(aa=c(1,1,0), cc=c(1,0,0), aab=c(0,1,0), > aac=c(0,0,1), bb=c(1,0,1)) > > And, the following seems to work: > data.frame(sapply(col2.uniq, function(col) { > wcol <- which(col==col2) > as.numeric(rowSums(data.frame(data.df[,wcol]))>0) > })) > I had to wrap data.df[,wcol] in another data.frame to handle situations > where wcol had one element. Is there a better approach? > > > ---- Chuck White <chuckwhite8 at charter.net> wrote: >> Hello -- I am trying to merge columns in a dataframe based on substring >> matches in colnames. I would appreciate if somebody can suggest a >> faster/cleaner approach (eg. I would have really liked to avoid the >> if-else piece but rowSums does not like that). Thanks. >> >> data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0), >> aac=c(0,0,1), bbk=c(1,0,1)) >> col2 <- substr(colnames(data.df),1,2) >> >> col2.uniq <- unique(col2) >> names(col2.uniq) <- col2.uniq >> >> data.frame(sapply(col2.uniq, function(col) { >> wcol <- which(col==col2) >> if(length(wcol)>1) { >> tmp <- rowSums(data.df[,wcol]) >> } else { >> tmp <- data.df[,wcol] >> } >> as.numeric(tmp>0) >> })) >