Dear all, Say I have the following dataset:> DFx y z [1] 1 1 1 [2] 2 2 2 [3] 3 3 NA [4] 4 NA 4 [5] NA 5 5 And I want to omit all the rows which have NA, but only in columns X and Y, so that I get: x y z 1 1 1 2 2 2 3 3 NA If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus x y z 1 1 1 2 2 2 But this is not what I want, of course. If I use na.omit(DF[,1:2]), then I obtain x y 1 1 2 2 3 3 which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA) Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)? Sincerely, Jose Luis Jose Luis Iparraguirre Senior Research Economist Economic Research Institute of Northern Ireland [[alternative HTML version deleted]]
On 01-Apr-09 15:49:40, Jose Iparraguirre D'Elia wrote:> Dear all, > Say I have the following dataset: > >> DF > x y z > [1] 1 1 1 > [2] 2 2 2 > [3] 3 3 NA > [4] 4 NA 4 > [5] NA 5 5 > > And I want to omit all the rows which have NA, but only in columns X > and Y, so that I get: > > x y z > 1 1 1 > 2 2 2 > 3 3 NARoll up your sleeves, and spell out in detail the condition you need: DF<-data.frame(x=c(1,2,3,4,NA),y=c(1,2,3,NA,5),z=c(1,2,NA,4,5)) DF # x y z # 1 1 1 1 # 2 2 2 2 # 3 3 3 NA # 4 4 NA 4 # 5 NA 5 5 DF[!(is.na(rowSums(DF[,(1:2)]))),] # x y z # 1 1 1 1 # 2 2 2 2 # 3 3 3 NA Hoping this helps, Ted.> If I use na.omit(DF), I would delete the row for which z=NA, obtaining > thus > > x y z > 1 1 1 > 2 2 2 > > But this is not what I want, of course. > If I use na.omit(DF[,1:2]), then I obtain > > x y > 1 1 > 2 2 > 3 3 > > which is OK for x and y columns, but I wouldn't get the corresponding > values for z (ie 1 2 NA) > > Any suggestions about how to obtain the desired results efficiently > (the actual dataset has millions of records and almost 50 columns, and > I would apply the procedure on 12 of these columns)? > > Sincerely, > > Jose Luis > > Jose Luis Iparraguirre > Senior Research Economist > Economic Research Institute of Northern Ireland > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 01-Apr-09 Time: 18:00:53 ------------------------------ XFMail ------------------------------
First input the data frame:> Lines <- "x y z+ 1 1 1 + 2 2 2 + 3 3 NA + 4 NA 4 + NA 5 5"> > DF <- read.table(textConnection(Lines), header = TRUE)> # Now uses complete.cases to get required rows:> > DF[complete.cases(DF[1:2]),]x y z 1 1 1 1 2 2 2 2 3 3 3 NA On Wed, Apr 1, 2009 at 11:49 AM, Jose Iparraguirre D'Elia <Jose at erini.ac.uk> wrote:> Dear all, > > Say I have the following dataset: > >> DF > ? ? ? ?x ? ? y ? ? z > [1] ? 1 ? ? 1 ? ? 1 > [2] ? 2 ? ? 2 ? ? 2 > [3] ? 3 ? ? 3 ? ?NA > [4] ? 4 ? NA ? 4 > [5] ?NA ?5 ? ? 5 > > And I want to omit all the rows which have NA, but only in columns X and Y, so that I get: > > ?x ?y ?z > 1 ?1 ?1 > 2 ?2 ?2 > 3 ?3 ?NA > > If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus > > x y z > 1 1 1 > 2 2 2 > > But this is not what I want, of course. > If I use na.omit(DF[,1:2]), then I obtain > > x y > 1 1 > 2 2 > 3 3 > > which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA) > > Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)? > > Sincerely, > > Jose Luis > > Jose Luis Iparraguirre > Senior Research Economist > Economic Research Institute of Northern Ireland > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Wed, 2009-04-01 at 16:49 +0100, Jose Iparraguirre D'Elia wrote:> Dear all, > > Say I have the following dataset: > > > DF > x y z > [1] 1 1 1 > [2] 2 2 2 > [3] 3 3 NA > [4] 4 NA 4 > [5] NA 5 5 > > And I want to omit all the rows which have NA, but only in columns X and Y, so that I get: > > x y z > 1 1 1 > 2 2 2 > 3 3 NA > > If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus > > x y z > 1 1 1 > 2 2 2 > > But this is not what I want, of course. > If I use na.omit(DF[,1:2]), then I obtain > > x y > 1 1 > 2 2 > 3 3 > > which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA) > > Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)? > > Sincerely, > > Jose Luis > > Jose Luis Iparraguirre > Senior Research Economist > Economic Research Institute of Northern Ireland >Hi Jose Luis, I think this script is sufficient for your problem: tab<-matrix(c(1,1,1,2,2,2,3,3,NA,4,NA,4,NA,5,5),ncol=3,byrow=T) tab[!is.na(tab[,1])&!is.na(tab[,2]),] -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil
Mark, Ted, Gabor, Thanks for all your input. Jos? -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Sent: 01 April 2009 18:12 To: Jose Iparraguirre D'Elia Cc: r-help at r-project.org Subject: Re: [R] A query about na.omit First input the data frame:> Lines <- "x y z+ 1 1 1 + 2 2 2 + 3 3 NA + 4 NA 4 + NA 5 5"> > DF <- read.table(textConnection(Lines), header = TRUE)> # Now uses complete.cases to get required rows:> > DF[complete.cases(DF[1:2]),]x y z 1 1 1 1 2 2 2 2 3 3 3 NA On Wed, Apr 1, 2009 at 11:49 AM, Jose Iparraguirre D'Elia <Jose at erini.ac.uk> wrote:> Dear all, > > Say I have the following dataset: > >> DF > ? ? ? ?x ? ? y ? ? z > [1] ? 1 ? ? 1 ? ? 1 > [2] ? 2 ? ? 2 ? ? 2 > [3] ? 3 ? ? 3 ? ?NA > [4] ? 4 ? NA ? 4 > [5] ?NA ?5 ? ? 5 > > And I want to omit all the rows which have NA, but only in columns X and Y, so that I get: > > ?x ?y ?z > 1 ?1 ?1 > 2 ?2 ?2 > 3 ?3 ?NA > > If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus > > x y z > 1 1 1 > 2 2 2 > > But this is not what I want, of course. > If I use na.omit(DF[,1:2]), then I obtain > > x y > 1 1 > 2 2 > 3 3 > > which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA) > > Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)? > > Sincerely, > > Jose Luis > > Jose Luis Iparraguirre > Senior Research Economist > Economic Research Institute of Northern Ireland > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >