Dear R users, I'm new but already fascinated R user so please forgive for my ignorance. I have the problem, I read most of help pages but couldn't find the solution. The problem follows.... I have large data set 10,000 rows and more than 100 columns... Say something like var1,var2,var2,var4.......var120 ------------------------------------------- 12,12,345,657,67,8..... 12,12,345,657,0,8..... NA,12,345,657,NA,8..... 12,12,NA,657,67,8..... 12,12,345,657,NA,8..... I would like to select only rows where all variables are not NA.... so I can do something like df <- subset( df , !is.na(var1) & !is.na(var2) & !is.na(var3) & !is.na(var4) & !is.na(var5)...................... ); But that would be very bad solution because I have more than 100 variables and if would be lengthy code to maintan..... also, it might be error prone programming style...Am I right? my question is if there is some smarter way of doing this which would work even if I have 1000 variables???
?complete.cases On Jan 1, 2008 8:50 PM, Marko Milicic <milicic.marko at gmail.com> wrote:> Dear R users, > > I'm new but already fascinated R user so please forgive for my > ignorance. I have the problem, I read most of help pages but couldn't > find the solution. The problem follows.... > > I have large data set 10,000 rows and more than 100 columns... Say > something like > > var1,var2,var2,var4.......var120 > ------------------------------------------- > 12,12,345,657,67,8..... > 12,12,345,657,0,8..... > NA,12,345,657,NA,8..... > 12,12,NA,657,67,8..... > 12,12,345,657,NA,8..... > > I would like to select only rows where all variables are not NA.... so > I can do something like > > > df <- subset( > df > , !is.na(var1) & !is.na(var2) & > !is.na(var3) & !is.na(var4) & !is.na(var5)...................... > ); > > > But that would be very bad solution because I have more than 100 > variables and if would be lengthy code to maintan..... also, it might > be error prone programming style...Am I right? > > my question is if there is some smarter way of doing this which would > work even if I have 1000 variables??? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
You could try> complete.case.df <- na.omit(df)Ross Darnell -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Marko Milicic Sent: Wednesday, 2 January 2008 11:50 AM To: r-help at r-project.org Subject: [R] Subsetting data frame problem.... Dear R users, I'm new but already fascinated R user so please forgive for my ignorance. I have the problem, I read most of help pages but couldn't find the solution. The problem follows.... I have large data set 10,000 rows and more than 100 columns... Say something like var1,var2,var2,var4.......var120 ------------------------------------------- 12,12,345,657,67,8..... 12,12,345,657,0,8..... NA,12,345,657,NA,8..... 12,12,NA,657,67,8..... 12,12,345,657,NA,8..... I would like to select only rows where all variables are not NA.... so I can do something like df <- subset( df , !is.na(var1) & !is.na(var2) & !is.na(var3) & !is.na(var4) & !is.na(var5)...................... ); But that would be very bad solution because I have more than 100 variables and if would be lengthy code to maintan..... also, it might be error prone programming style...Am I right? my question is if there is some smarter way of doing this which would work even if I have 1000 variables??? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- Bug in colnames of data.frames?
- reading from text file that have different rowlength and create a data frame
- splitting into multiple dataframes and then create a loop to work
- Sum of a couple of variables of which a few have NA values
- Several lattice plots on one page