Say I have the following data: testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3))> testDatA B 1 1 NA 2 NA NA 3 3 3 rowsums() with na.rm=TRUE generates the following, which is not desired:> rowSums(testDat[, c('A', 'B')], na.rm=T)[1] 1 0 6 rowsums() with na.rm=F generates the following, which is also not desired:> rowSums(testDat[, c('A', 'B')], na.rm=F)[1] NA NA 6 I see why this occurs, but what I hope to have returned would be: [1] 1 NA 6 To get what I want I could do the following, but normally my ideas are bad ideas and there are codified and proper ways to do things. rr <- numeric(nrow(testDat)) for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else sum(testDat[i,], na.rm=T)> rr[1] 1 NA 6 Is there a "proper" way to do this? In my real data, nrow is over 100,000 Thanks, Harold> sessionInfo()R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MiscPsycho_1.2 lattice_0.17-13 statmod_1.3.6 loaded via a namespace (and not attached): [1] grid_2.7.2
On 9/24/2008 10:06 AM, Doran, Harold wrote:> Say I have the following data: > > testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) > >> testDat > A B > 1 1 NA > 2 NA NA > 3 3 3 > > rowsums() with na.rm=TRUE generates the following, which is not desired: > >> rowSums(testDat[, c('A', 'B')], na.rm=T) > [1] 1 0 6 > > rowsums() with na.rm=F generates the following, which is also not > desired: > > >> rowSums(testDat[, c('A', 'B')], na.rm=F) > [1] NA NA 6 > > I see why this occurs, but what I hope to have returned would be: > [1] 1 NA 6 > > To get what I want I could do the following, but normally my ideas are > bad ideas and there are codified and proper ways to do things. > > rr <- numeric(nrow(testDat)) > for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else > sum(testDat[i,], na.rm=T) > >> rr > [1] 1 NA 6 > > Is there a "proper" way to do this? In my real data, nrow is over > 100,000I don't know if it is "proper", but here is a slightly different way that I find easier to read: apply(testDat, 1, function(x){ ifelse(all(is.na(x)), NA, sum(x, na.rm=TRUE)) }) [1] 1 NA 6 hope this helps, Chuck> Thanks, > Harold > >> sessionInfo() > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] MiscPsycho_1.2 lattice_0.17-13 statmod_1.3.6 > > loaded via a namespace (and not attached): > [1] grid_2.7.2 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
try the following: testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) ind <- rowSums(is.na(testDat)) == length(testDat) out <- rowSums(testDat, na.rm = TRUE) out[ind] <- NA out I hope it helps. Best, Dimitris Doran, Harold wrote:> Say I have the following data: > > testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) > >> testDat > A B > 1 1 NA > 2 NA NA > 3 3 3 > > rowsums() with na.rm=TRUE generates the following, which is not desired: > >> rowSums(testDat[, c('A', 'B')], na.rm=T) > [1] 1 0 6 > > rowsums() with na.rm=F generates the following, which is also not > desired: > > >> rowSums(testDat[, c('A', 'B')], na.rm=F) > [1] NA NA 6 > > I see why this occurs, but what I hope to have returned would be: > [1] 1 NA 6 > > To get what I want I could do the following, but normally my ideas are > bad ideas and there are codified and proper ways to do things. > > rr <- numeric(nrow(testDat)) > for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else > sum(testDat[i,], na.rm=T) > >> rr > [1] 1 NA 6 > > Is there a "proper" way to do this? In my real data, nrow is over > 100,000 > > Thanks, > Harold > >> sessionInfo() > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] MiscPsycho_1.2 lattice_0.17-13 statmod_1.3.6 > > loaded via a namespace (and not attached): > [1] grid_2.7.2 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
I guess this would be the fastest way would be: rs <- rowSums( testDat, na.rm=T) rs[ which( rowMeans(is.na(testDat)) == 1 ) ] <- NA since both rowSums and rowMeans are internally coded in C. Regards, Adai Doran, Harold wrote:> Say I have the following data: > > testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) > >> testDat > A B > 1 1 NA > 2 NA NA > 3 3 3 > > rowsums() with na.rm=TRUE generates the following, which is not desired: > >> rowSums(testDat[, c('A', 'B')], na.rm=T) > [1] 1 0 6 > > rowsums() with na.rm=F generates the following, which is also not > desired: > > >> rowSums(testDat[, c('A', 'B')], na.rm=F) > [1] NA NA 6 > > I see why this occurs, but what I hope to have returned would be: > [1] 1 NA 6 > > To get what I want I could do the following, but normally my ideas are > bad ideas and there are codified and proper ways to do things. > > rr <- numeric(nrow(testDat)) > for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else > sum(testDat[i,], na.rm=T) > >> rr > [1] 1 NA 6 > > Is there a "proper" way to do this? In my real data, nrow is over > 100,000 > > Thanks, > Harold > >> sessionInfo() > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] MiscPsycho_1.2 lattice_0.17-13 statmod_1.3.6 > > loaded via a namespace (and not attached): > [1] grid_2.7.2 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
on 09/24/2008 09:06 AM Doran, Harold wrote:> Say I have the following data: > > testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) > >> testDat > A B > 1 1 NA > 2 NA NA > 3 3 3 > > rowsums() with na.rm=TRUE generates the following, which is not desired: > >> rowSums(testDat[, c('A', 'B')], na.rm=T) > [1] 1 0 6 > > rowsums() with na.rm=F generates the following, which is also not > desired: > > >> rowSums(testDat[, c('A', 'B')], na.rm=F) > [1] NA NA 6 > > I see why this occurs, but what I hope to have returned would be: > [1] 1 NA 6 > > To get what I want I could do the following, but normally my ideas are > bad ideas and there are codified and proper ways to do things. > > rr <- numeric(nrow(testDat)) > for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else > sum(testDat[i,], na.rm=T) > >> rr > [1] 1 NA 6 > > Is there a "proper" way to do this? In my real data, nrow is over > 100,000 > > Thanks, > HaroldThe behavior you observe is documented in ?rowSums in the Value section: If there are no values in a range to be summed over (after removing missing values with na.rm = TRUE), that component of the output is set to 0 (*Sums) or NA (*Means), consistent with sum and mean. So:> sum(c(NA, NA), na.rm = TRUE)[1] 0 As per the definition of the sum of an empty set being 0, which I got burned on myself a while back. You could feasibly use: Res <- rowSums(testDat, na.rm = TRUE) is.na(Res) <- rowSums(is.na(testDat)) == ncol(testDat) HTH, Marc Schwartz