thr3ads.net - R help - [R] sometimes removing NAs from code [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Schatzi

2011-Oct-26 15:25 UTC

[R] sometimes removing NAs from code

Sometimes I have NA values within specific columns of a dataframe (in this
example, the first two columns can have NAs). If there are NA values, I
would like them to be removed.

I have been using the code:

y<-c(NA,5,4,2,5,6,NA)
z<-c(NA,3,4,NA,1,3,7)
x<-1:7
adata<-data.frame(y,z,x)
adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]

This works well if there are NA values, but when a dataset doesn't have NA
values, this code messes up the dataframe. I was trying to pick apart this
code and could not understand why it didn't work when there were no NA
values.


If there are no NA values and I run just the part:
apply(adata[,1:2],1,function(x)any(is.na(x)))
it results in:
    2     3     5     6 
FALSE FALSE FALSE FALSE 

I was thinking that I can put in an if statement, but I think there has to
be a better way.

Any ideas/help? Thank you.

-----
In theory, practice and theory are the same. In practice, they are not - Albert
Einstein
--
View this message in context:
http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html
Sent from the R help mailing list archive at Nabble.com.

Natalie Van Zuydam

2011-Oct-26 15:36 UTC

head link

[R] sometimes removing NAs from code

Hi,

Why don't you give subset a try:

adata <- subset(adata, is.na(z)==FALSE&is.na(y)==FALSE)

I'm not sure if you want to use AND or OR for this statement.

Best wishes,
Natalie
On 26/10/2011 16:25, Schatzi wrote:> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
>
> I have been using the code:
>
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
>
> This works well if there are NA values, but when a dataset doesn't have
NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.
>
>
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
>      2     3     5     6
> FALSE FALSE FALSE FALSE
>
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
>
> Any ideas/help? Thank you.
>
> -----
> In theory, practice and theory are the same. In practice, they are not -
Albert Einstein
> --
> View this message in context:
http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2011-Oct-26 15:40 UTC

head link

[R] sometimes removing NAs from code

On Oct 26, 2011, at 10:25 AM, Schatzi wrote:
> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
> 
> I have been using the code:
> 
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
> 
> This works well if there are NA values, but when a dataset doesn't have
NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.
> 
> 
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
>    2     3     5     6 
> FALSE FALSE FALSE FALSE 
> 
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
> 
> Any ideas/help? Thank you.

Presuming that you want to remove an entire row, if any of the elements in that
row are NA's, see ?na.omit
> na.omit(adata)  y z x
2 5 3 2
3 4 4 3
5 5 1 5
6 6 3 6

HTH,

Marc Schwartz

Sarah Goslee

2011-Oct-26 15:50 UTC

head link

[R] sometimes removing NAs from code

Hi,

On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thompson at cargill.com>
wrote:> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
>
> I have been using the code:
>
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
>
> This works well if there are NA values, but when a dataset doesn't have
NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.
Thanks for the example. Your problem is because of the which() statement.

If there are NA values, which() returns the row numbers where the NAs are:
> which(apply(adata[,1:2],1,function(x)any(is.na(x))))[1] 1 4 7
> bdata <- data.frame(1:7, 1:7, 1:7)
> which(apply(bdata[,1:2],1,function(x)any(is.na(x))))integer(0)

But if there aren't any, which() returns 0. How does R subset on a row
index of 0?
Unhelpfully.

Fortunately you don't need the which() at all: the logical vector
returned by your
apply statement is entirely sufficient (with added negation):
> adata[apply(adata[,1:2],1,function(x)!any(is.na(x))), ]  y z x
2 5 3 2
3 4 4 3
5 5 1 5
6 6 3 6> bdata[apply(bdata[,1:2],1,function(x)!any(is.na(x))), ]  X1.7 X1.7.1 X1.7.2
1    1      1      1
2    2      2      2
3    3      3      3
4    4      4      4
5    5      5      5
6    6      6      6
7    7      7      7

Sarah
>
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
> ? ?2 ? ? 3 ? ? 5 ? ? 6
> FALSE FALSE FALSE FALSE
>
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
>
> Any ideas/help? Thank you.
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

jim holtman

2011-Oct-26 15:53 UTC

head link

[R] sometimes removing NAs from code

?complete.cases
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata   y  z x
1 NA NA 1
2  5  3 2
3  4  4 3
4  2 NA 4
5  5  1 5
6  6  3 6
7 NA  7 7> adata[complete.cases(adata),]  y z x
2 5 3 2
3 4 4 3
5 5 1 5
6 6 3 6


On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thompson at cargill.com>
wrote:> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
>
> I have been using the code:
>
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
>
> This works well if there are NA values, but when a dataset doesn't have
NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.
>
>
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
> ? ?2 ? ? 3 ? ? 5 ? ? 6
> FALSE FALSE FALSE FALSE
>
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
>
> Any ideas/help? Thank you.
>
> -----
> In theory, practice and theory are the same. In practice, they are not -
Albert Einstein
> --
> View this message in context:
http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

William Dunlap

2011-Oct-26 15:54 UTC

head link

[R] sometimes removing NAs from code

Instead of
   d[-which(condition)]
use
   d[!condition]
where 'condition' is a logical vector.

which(condition) returns integer(0) (an integer vector
of length 0) if there are no TRUEs in 'condition'.
-integer(0) is identical to integer(0) and d[integer(0)]
means to select zero elements from d.

!condition means to flip the senses of all the TRUEs and
FALSEs (and to leave NAs alone) so d[!condition] returns
the elements of d for which condition is not TRUE (along
with NA's for NA's in condition, but you won't have any
of them in your example).

By the way, your use of apply() slows things down and
might lead to errors.  Try replacing
  apply(adata[,1:2],1,function(x)any(is.na(x))))
by
  is.na(adata$y) | is.na(adata$z)
or
  rowSums(is.na(adata[,1:2])) > 0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf Of Schatzi
> Sent: Wednesday, October 26, 2011 8:25 AM
> To: r-help at r-project.org
> Subject: [R] sometimes removing NAs from code
> 
> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
> 
> I have been using the code:
> 
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
> 
> This works well if there are NA values, but when a dataset doesn't have
NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.
> 
> 
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
>     2     3     5     6
> FALSE FALSE FALSE FALSE
> 
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
> 
> Any ideas/help? Thank you.
> 
> -----
> In theory, practice and theory are the same. In practice, they are not -
Albert Einstein
> --
> View this message in context:
http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-
> tp3941009p3941009.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Schatzi

2011-Oct-26 17:25 UTC

head link

[R] sometimes removing NAs from code

Thank you for the help and explanations. I used the "complete.cases"
function
and it is working great.

adata[complete.cases(adata[,1:2]),]



-----
In theory, practice and theory are the same. In practice, they are not - Albert
Einstein
--
View this message in context:
http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941431.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more seemingly similar threads

R help - Oct 2011 - sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

[R] sometimes removing NAs from code

Maybe Matching Threads