Hi all I have a data frame with duplicate columns and i want to remove duplicates by adding rows in each group of duplicates, but have lots of NA's. Data: dfrm <- data.frame(a = 1:4, b= 1:4, cc= 1:4, dd=1:10, ee=1:4) names(dfrm) <- c("a", "a", "b", "b", "b") dfrm[3,2:3]<-NA dfrm a a b b b 1 1 1 1 1 1 2 2 2 2 2 2 3 NA NA NA 3 3 4 4 4 4 4 4 I did: sapply(unique(names(dfrm)),function(x){ rowSums(dfrm[ ,grep(x, names(dfrm)),drop=FALSE])}) which works. However, I want rowSums conditional: 1) if there is at least one value non NA in a row of each group of duplicates, apply rowSums to get the value independently of the existence of other NA's in the group row. 2) if all values in a row of duplicates are NA, I get NA In my data dfrm I would get a b 1 2 3 2 4 6 3 NA 6 4 8 12 Can't use na.rm=TRUE or FALSE. I tried: sapply(unique(names(dfrm)),function(x) ifelse(any(!is.na(dfrm[ ,grep(x, names(dfrm))])), rowSums(dfrm[ ,grep(x, names(dfrm)),drop=FALSE],na.rm=TRUE),NA)) and it doesn't work. Can someone please help me? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/conditional-rowsums-in-sapply-tp3526332p3526332.html Sent from the R help mailing list archive at Nabble.com.
assuming that the row entries for the columns with the same name are not all zero, you can try something in the following lines: dfrm <- data.frame(a = 1:4, a = 1:4, b = 1:4, b = 1:4, b = 1:4, check.names = FALSE) dfrm[3, 1:3] <- NA dfrm vals <- unlist(dfrm) res <- tapply(vals, names(vals), sum, na.rm = TRUE) res[res == 0] <- NA as.data.frame(matrix(res, ncol = 2)) I hope it helps. Best, Dimitris On 5/16/2011 4:25 PM, Assu wrote:> Hi all > > I have a data frame with duplicate columns and i want to remove duplicates > by adding rows in each group of duplicates, but have lots of NA's. > Data: > dfrm<- data.frame(a = 1:4, b= 1:4, cc= 1:4, dd=1:10, ee=1:4) > names(dfrm)<- c("a", "a", "b", "b", "b") > dfrm[3,2:3]<-NA > dfrm > a a b b b > 1 1 1 1 1 1 > 2 2 2 2 2 2 > 3 NA NA NA 3 3 > 4 4 4 4 4 4 > I did: sapply(unique(names(dfrm)),function(x){ > rowSums(dfrm[ ,grep(x, names(dfrm)),drop=FALSE])}) > which works. However, I want rowSums conditional: > 1) if there is at least one value non NA in a row of each group of > duplicates, apply rowSums to get the value independently of the existence of > other NA's in the group row. > 2) if all values in a row of duplicates are NA, I get NA > In my data dfrm I would get > > a b > 1 2 3 > 2 4 6 > 3 NA 6 > 4 8 12 > Can't use na.rm=TRUE or FALSE. > I tried: sapply(unique(names(dfrm)),function(x) ifelse(any(!is.na(dfrm[ > ,grep(x, names(dfrm))])), rowSums(dfrm[ ,grep(x, > names(dfrm)),drop=FALSE],na.rm=TRUE),NA)) > > and it doesn't work. > Can someone please help me? > Thanks in advance. > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/conditional-rowsums-in-sapply-tp3526332p3526332.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
On May 16, 2011, at 10:25 AM, Assu wrote:> Hi all > > I have a data frame with duplicate columns and i want to remove > duplicates > by adding rows in each group of duplicates, but have lots of NA's. > Data: > dfrm <- data.frame(a = 1:4, b= 1:4, cc= 1:4, dd=1:10, ee=1:4) > names(dfrm) <- c("a", "a", "b", "b", "b") > dfrm[3,2:3]<-NA > dfrm > a a b b b > 1 1 1 1 1 1 > 2 2 2 2 2 2 > 3 NA NA NA 3 3 > 4 4 4 4 4 4 > I did: sapply(unique(names(dfrm)),function(x){ > rowSums(dfrm[ ,grep(x, names(dfrm)),drop=FALSE])}) > which works. However, I want rowSums conditional: > 1) if there is at least one value non NA in a row of each group of > duplicates, apply rowSums to get the value independently of the > existence of > other NA's in the group row. > 2) if all values in a row of duplicates are NA, I get NA > In my data dfrm I would get > > a b > 1 2 3 > 2 4 6 > 3 NA 6 > 4 8 12 > Can't use na.rm=TRUE or FALSE. > I tried: sapply(unique(names(dfrm)),function(x) ifelse(any(! > is.na(dfrm[ > ,grep(x, names(dfrm))])), rowSums(dfrm[ ,grep(x, > names(dfrm)),drop=FALSE],na.rm=TRUE),NA)) > > and it doesn't work. > Can someone please help me? > Thanks in advance.You didn't like the answer I posted last night on SO? sapply(unique(names(dfrm)), function(x) apply(dfrm[grep(x, names(dfrm))], 1, function(y) if ( all(is.na(y)) ) {NA} else { sum(y, na.rm=TRUE) } ) ) -- David.> > -- > View this message in context: http://r.789695.n4.nabble.com/conditional-rowsums-in-sapply-tp3526332p3526332.html > Sent from the R help mailing list archive at Nabble.com.-- David Winsemius, MD Heritage Laboratories West Hartford, CT