Hi I have a data frame df with 3 columns. Some rows are NA across all 3 columns. How can I remove rows with NA across all columns? df=data.frame(col1=c(1:3,NA,NA,4),col2=c(7:9,NA,NA,NA),col3=c(2:4,NA,NA,4)) Thanks Joseph ____________________________________________________________________________________ Be a better friend, newshound, and [[alternative HTML version deleted]]
Learn to use the power and flexibility of R subscripting. ## Warning:untested apply(df,1,function(x)any(!is.na(x))) gives TRUE for all rows that aren't all NA's. So stick this expression into the 1st coordinate of a subscript for the df: df[apply(df,1,function(x)any(!is.na(x))),] Cheers, Bert Gunter Genentech -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of joseph Sent: Thursday, February 14, 2008 8:53 PM To: r-help at r-project.org Cc: r-help at r-project.org Subject: [R] Remove rows with NA across all columns Hi I have a data frame df with 3 columns. Some rows are NA across all 3 columns. How can I remove rows with NA across all columns? df=data.frame(col1=c(1:3,NA,NA,4),col2=c(7:9,NA,NA,NA),col3=c(2:4,NA,NA,4)) Thanks Joseph ____________________________________________________________________________ ________ Be a better friend, newshound, and [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
There were two queries recently regarding removing rows or columns that have all NAs. Three respondents suggested combinations of apply() with any() or all(). I cringe when I see apply() used unnecessarily. Using rowSums() or colSums() is much faster, and gives more readable code. (Two respondents did suggest colSums for the second query.) # original small data frame df <- data.frame(col1=c(1:3,NA,NA,4),col2=c(7:9,NA,NA,NA),col3=c(2:4,NA,NA,4)) system.time( for(i in 1:10^4) temp <- rowSums(is.na(df)) < 3) # .078 system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x)))) # 3.33 # larger data frame x <- matrix(runif(10^5), 10^3) x[ runif(10^5) < .99 ] <- NA df2 <- data.frame(x) system.time( for(i in 1:100) temp <- rowSums(is.na(df2)) < 100) # .34 system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x)))) # 3.34 Tim Hesterberg