thr3ads.net - R help - [R] A query about na.omit [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Jose Iparraguirre D'Elia

2009-Apr-01 15:49 UTC

[R] A query about na.omit

Dear all,
 
Say I have the following dataset:
 > DF        x     y     z
[1]   1     1     1
[2]   2     2     2
[3]   3     3    NA
[4]   4   NA   4
[5]  NA  5     5
 
And I want to omit all the rows which have NA, but only in columns X and Y, so
that I get:
 
 x  y  z
1  1  1
2  2  2
3  3  NA
 
If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
 
x y z
1 1 1
2 2 2
 
But this is not what I want, of course. 
If I use na.omit(DF[,1:2]), then I obtain
 
x y 
1 1
2 2
3 3
 
which is OK for x and y columns, but I wouldn't get the corresponding values
for z (ie 1 2 NA)
 
Any suggestions about how to obtain the desired results efficiently (the actual
dataset has millions of records and almost 50 columns, and I would apply the
procedure on 12 of these columns)?
 
Sincerely,
 
Jose Luis 
 
Jose Luis Iparraguirre
Senior Research Economist 
Economic Research Institute of Northern Ireland
 
 

	[[alternative HTML version deleted]]

(Ted Harding)

2009-Apr-01 17:00 UTC

head link

[R] A query about na.omit

On 01-Apr-09 15:49:40, Jose Iparraguirre D'Elia
wrote:> Dear all,
> Say I have the following dataset:
>  
>> DF
>         x     y     z
> [1]   1     1     1
> [2]   2     2     2
> [3]   3     3    NA
> [4]   4   NA   4
> [5]  NA  5     5
>  
> And I want to omit all the rows which have NA, but only in columns X
> and Y, so that I get:
>  
>  x  y  z
> 1  1  1
> 2  2  2
> 3  3  NA
Roll up your sleeves, and spell out in detail the condition you need:

  DF<-data.frame(x=c(1,2,3,4,NA),y=c(1,2,3,NA,5),z=c(1,2,NA,4,5))
  DF
#    x  y  z
# 1  1  1  1
# 2  2  2  2
# 3  3  3 NA
# 4  4 NA  4
# 5 NA  5  5

  DF[!(is.na(rowSums(DF[,(1:2)]))),]
#   x y  z
# 1 1 1  1
# 2 2 2  2
# 3 3 3 NA

Hoping this helps,
Ted.
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining
> thus
>  
> x y z
> 1 1 1
> 2 2 2
>  
> But this is not what I want, of course. 
> If I use na.omit(DF[,1:2]), then I obtain
>  
> x y 
> 1 1
> 2 2
> 3 3
>  
> which is OK for x and y columns, but I wouldn't get the corresponding
> values for z (ie 1 2 NA)
>  
> Any suggestions about how to obtain the desired results efficiently
> (the actual dataset has millions of records and almost 50 columns, and
> I would apply the procedure on 12 of these columns)?
>  
> Sincerely,
>  
> Jose Luis 
>  
> Jose Luis Iparraguirre
> Senior Research Economist 
> Economic Research Institute of Northern Ireland
>  
>  
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Apr-09                                       Time: 18:00:53
------------------------------ XFMail ------------------------------

Gabor Grothendieck

2009-Apr-01 17:11 UTC

head link

[R] A query about na.omit

First input the data frame:
> Lines <- "x     y     z+    1     1     1
+    2     2     2
+    3     3    NA
+    4   NA   4
+   NA  5     5">
> DF <- read.table(textConnection(Lines), header = TRUE)
> # Now uses complete.cases to get required rows:
>
> DF[complete.cases(DF[1:2]),]  x y  z
1 1 1  1
2 2 2  2
3 3 3 NA


On Wed, Apr 1, 2009 at 11:49 AM, Jose Iparraguirre D'Elia
<Jose at erini.ac.uk> wrote:> Dear all,
>
> Say I have the following dataset:
>
>> DF
> ? ? ? ?x ? ? y ? ? z
> [1] ? 1 ? ? 1 ? ? 1
> [2] ? 2 ? ? 2 ? ? 2
> [3] ? 3 ? ? 3 ? ?NA
> [4] ? 4 ? NA ? 4
> [5] ?NA ?5 ? ? 5
>
> And I want to omit all the rows which have NA, but only in columns X and Y,
so that I get:
>
> ?x ?y ?z
> 1 ?1 ?1
> 2 ?2 ?2
> 3 ?3 ?NA
>
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>
> x y z
> 1 1 1
> 2 2 2
>
> But this is not what I want, of course.
> If I use na.omit(DF[,1:2]), then I obtain
>
> x y
> 1 1
> 2 2
> 3 3
>
> which is OK for x and y columns, but I wouldn't get the corresponding
values for z (ie 1 2 NA)
>
> Any suggestions about how to obtain the desired results efficiently (the
actual dataset has millions of records and almost 50 columns, and I would apply
the procedure on 12 of these columns)?
>
> Sincerely,
>
> Jose Luis
>
> Jose Luis Iparraguirre
> Senior Research Economist
> Economic Research Institute of Northern Ireland
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Bernardo Rangel Tura

2009-Apr-01 19:19 UTC

head link

[R] A query about na.omit

On Wed, 2009-04-01 at 16:49 +0100, Jose Iparraguirre D'Elia
wrote:> Dear all,
>  
> Say I have the following dataset:
>  
> > DF
>         x     y     z
> [1]   1     1     1
> [2]   2     2     2
> [3]   3     3    NA
> [4]   4   NA   4
> [5]  NA  5     5
>  
> And I want to omit all the rows which have NA, but only in columns X and Y,
so that I get:
>  
>  x  y  z
> 1  1  1
> 2  2  2
> 3  3  NA
>  
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>  
> x y z
> 1 1 1
> 2 2 2
>  
> But this is not what I want, of course. 
> If I use na.omit(DF[,1:2]), then I obtain
>  
> x y 
> 1 1
> 2 2
> 3 3
>  
> which is OK for x and y columns, but I wouldn't get the corresponding
values for z (ie 1 2 NA)
>  
> Any suggestions about how to obtain the desired results efficiently (the
actual dataset has millions of records and almost 50 columns, and I would apply
the procedure on 12 of these columns)?
>  
> Sincerely,
>  
> Jose Luis 
>  
> Jose Luis Iparraguirre
> Senior Research Economist 
> Economic Research Institute of Northern Ireland
>  
Hi Jose Luis,

I think this script is sufficient for your problem:

tab<-matrix(c(1,1,1,2,2,2,3,3,NA,4,NA,4,NA,5,5),ncol=3,byrow=T)
tab[!is.na(tab[,1])&!is.na(tab[,2]),]

-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

Jose Iparraguirre D'Elia

2009-Apr-02 09:53 UTC

head link

[R] A query about na.omit

Mark, Ted, Gabor,

Thanks for all your input.
Jos?

-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Sent: 01 April 2009 18:12
To: Jose Iparraguirre D'Elia
Cc: r-help at r-project.org
Subject: Re: [R] A query about na.omit

First input the data frame:
> Lines <- "x     y     z+    1     1     1
+    2     2     2
+    3     3    NA
+    4   NA   4
+   NA  5     5">
> DF <- read.table(textConnection(Lines), header = TRUE)
> # Now uses complete.cases to get required rows:
>
> DF[complete.cases(DF[1:2]),]  x y  z
1 1 1  1
2 2 2  2
3 3 3 NA


On Wed, Apr 1, 2009 at 11:49 AM, Jose Iparraguirre D'Elia
<Jose at erini.ac.uk> wrote:> Dear all,
>
> Say I have the following dataset:
>
>> DF
> ? ? ? ?x ? ? y ? ? z
> [1] ? 1 ? ? 1 ? ? 1
> [2] ? 2 ? ? 2 ? ? 2
> [3] ? 3 ? ? 3 ? ?NA
> [4] ? 4 ? NA ? 4
> [5] ?NA ?5 ? ? 5
>
> And I want to omit all the rows which have NA, but only in columns X and Y,
so that I get:
>
> ?x ?y ?z
> 1 ?1 ?1
> 2 ?2 ?2
> 3 ?3 ?NA
>
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>
> x y z
> 1 1 1
> 2 2 2
>
> But this is not what I want, of course.
> If I use na.omit(DF[,1:2]), then I obtain
>
> x y
> 1 1
> 2 2
> 3 3
>
> which is OK for x and y columns, but I wouldn't get the corresponding
values for z (ie 1 2 NA)
>
> Any suggestions about how to obtain the desired results efficiently (the
actual dataset has millions of records and almost 50 columns, and I would apply
the procedure on 12 of these columns)?
>
> Sincerely,
>
> Jose Luis
>
> Jose Luis Iparraguirre
> Senior Research Economist
> Economic Research Institute of Northern Ireland
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Apr 2009 - A query about na.omit

[R] A query about na.omit

[R] A query about na.omit

[R] A query about na.omit

[R] A query about na.omit

[R] A query about na.omit

Reasonably Related Threads