Hello -- I am trying to merge columns in a dataframe based on substring matches
in colnames. I would appreciate if somebody can suggest a faster/cleaner
approach (eg. I would have really liked to avoid the if-else piece but rowSums
does not like that). Thanks.
data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0), aac=c(0,0,1),
bbk=c(1,0,1))
col2 <- substr(colnames(data.df),1,2)
col2.uniq <- unique(col2)
names(col2.uniq) <- col2.uniq
data.frame(sapply(col2.uniq, function(col) {
wcol <- which(col==col2)
if(length(wcol)>1) {
tmp <- rowSums(data.df[,wcol])
} else {
tmp <- data.df[,wcol]
}
as.numeric(tmp>0)
}))
Additional clarification: the problem only comes when you have one column
selected from the original dataframe. You need to make the following
modification to the original example:
data.df <- data.frame(aa=c(1,1,0), cc=c(1,0,0), aab=c(0,1,0), aac=c(0,0,1),
bb=c(1,0,1))
And, the following seems to work:
data.frame(sapply(col2.uniq, function(col) {
wcol <- which(col==col2)
as.numeric(rowSums(data.frame(data.df[,wcol]))>0)
}))
I had to wrap data.df[,wcol] in another data.frame to handle situations where
wcol had one element. Is there a better approach?
---- Chuck White <chuckwhite8 at charter.net> wrote:
> Hello -- I am trying to merge columns in a dataframe based on substring
matches in colnames. I would appreciate if somebody can suggest a faster/cleaner
approach (eg. I would have really liked to avoid the if-else piece but rowSums
does not like that). Thanks.
>
> data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0),
aac=c(0,0,1), bbk=c(1,0,1))
> col2 <- substr(colnames(data.df),1,2)
>
> col2.uniq <- unique(col2)
> names(col2.uniq) <- col2.uniq
>
> data.frame(sapply(col2.uniq, function(col) {
> wcol <- which(col==col2)
> if(length(wcol)>1) {
> tmp <- rowSums(data.df[,wcol])
> } else {
> tmp <- data.df[,wcol]
> }
> as.numeric(tmp>0)
> }))
Yes. data.df[,wcol,drop=FALSE] For an explanation of drop see ?"[.data.frame" "Chuck White" <chuckwhite8 at charter.net> wrote in message news:20100202212800.O8XBU.681696.root at mp11...> Additional clarification: the problem only comes when you have one column > selected from the original dataframe. You need to make the following > modification to the original example: > > data.df <- data.frame(aa=c(1,1,0), cc=c(1,0,0), aab=c(0,1,0), > aac=c(0,0,1), bb=c(1,0,1)) > > And, the following seems to work: > data.frame(sapply(col2.uniq, function(col) { > wcol <- which(col==col2) > as.numeric(rowSums(data.frame(data.df[,wcol]))>0) > })) > I had to wrap data.df[,wcol] in another data.frame to handle situations > where wcol had one element. Is there a better approach? > > > ---- Chuck White <chuckwhite8 at charter.net> wrote: >> Hello -- I am trying to merge columns in a dataframe based on substring >> matches in colnames. I would appreciate if somebody can suggest a >> faster/cleaner approach (eg. I would have really liked to avoid the >> if-else piece but rowSums does not like that). Thanks. >> >> data.df <- data.frame(aa=c(1,1,0), bbcc=c(1,0,0), aab=c(0,1,0), >> aac=c(0,0,1), bbk=c(1,0,1)) >> col2 <- substr(colnames(data.df),1,2) >> >> col2.uniq <- unique(col2) >> names(col2.uniq) <- col2.uniq >> >> data.frame(sapply(col2.uniq, function(col) { >> wcol <- which(col==col2) >> if(length(wcol)>1) { >> tmp <- rowSums(data.df[,wcol]) >> } else { >> tmp <- data.df[,wcol] >> } >> as.numeric(tmp>0) >> })) >