Michael Kubovy
2007-Feb-09 02:38 UTC
[R] How to count the number of NAs in each column of a df?
I would like to remove columns of a df which have too many NAs. I think that summary() should give me the information, I just don't know how to access it. Advice? _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road Charlottesville, VA 22903 Office: B011 +1-434-982-4729 Lab: B019 +1-434-982-4751 Fax: +1-434-982-4766 WWW: http://www.people.virginia.edu/~mk9y/
Richard M. Heiberger
2007-Feb-09 03:16 UTC
[R] How to count the number of NAs in each column of a df?
drop.col.kna <- function(mydf, k) mydf[sapply(mydf, function(x) sum(is.na(x))) < k] tmp <- data.frame(matrix(1:24, 6,4, dimnames=list(letters[1:6], LETTERS[1:4]))) tmp[1:3,1] <- NA tmp[2:5,2] <- NA tmp[6,3] <- NA drop.col.kna(tmp, 0) drop.col.kna(tmp, 1) drop.col.kna(tmp, 2) drop.col.kna(tmp, 3) drop.col.kna(tmp, 4) drop.col.kna(tmp, 5) drop.col.kna(tmp, 6)
Chuck Cleland
2007-Feb-09 09:40 UTC
[R] How to count the number of NAs in each column of a df?
Richard M. Heiberger wrote:> drop.col.kna <- function(mydf, k) > mydf[sapply(mydf, function(x) sum(is.na(x))) < k] > > tmp <- data.frame(matrix(1:24, 6,4, dimnames=list(letters[1:6], LETTERS[1:4]))) > tmp[1:3,1] <- NA > tmp[2:5,2] <- NA > tmp[6,3] <- NA > > drop.col.kna(tmp, 0) > drop.col.kna(tmp, 1) > drop.col.kna(tmp, 2) > drop.col.kna(tmp, 3) > drop.col.kna(tmp, 4) > drop.col.kna(tmp, 5) > drop.col.kna(tmp, 6)Possibly simpler (does not require a new function definition and seems highly intuitive) might be something like this: tmp.dropna <- tmp[,colSums(is.na(tmp)) < 2] tmp.dropna C D a 13 19 b 14 20 c 15 21 d 16 22 e 17 23 f NA 24> ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Michael Kubovy
2007-Feb-09 15:43 UTC
[R] How to count the number of NAs in each column of a df?
Dear Jim (25 minutes!), Richard (27 minutes!), and Chuck, Thanks to your hints, I have come up with what I hope is a pithy idiom that drops columns of a dataframe (df) in which the number of NAs is > (e.g.) 30. tmp <- df tmp <- tmp[, which(as.numeric(colSums(is.na(tmp))) > 30)] df <- tmp I wonder if we have a place to keep R programming idioms (which probably get unnecessarily reinvented). Is the R-Wiki suitable? _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road Charlottesville, VA 22903 Office: B011 +1-434-982-4729 Lab: B019 +1-434-982-4751 Fax: +1-434-982-4766 WWW: http://www.people.virginia.edu/~mk9y/