Sometimes I have NA values within specific columns of a dataframe (in this example, the first two columns can have NAs). If there are NA values, I would like them to be removed. I have been using the code: y<-c(NA,5,4,2,5,6,NA) z<-c(NA,3,4,NA,1,3,7) x<-1:7 adata<-data.frame(y,z,x) adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] This works well if there are NA values, but when a dataset doesn't have NA values, this code messes up the dataframe. I was trying to pick apart this code and could not understand why it didn't work when there were no NA values. If there are no NA values and I run just the part: apply(adata[,1:2],1,function(x)any(is.na(x))) it results in: 2 3 5 6 FALSE FALSE FALSE FALSE I was thinking that I can put in an if statement, but I think there has to be a better way. Any ideas/help? Thank you. ----- In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html Sent from the R help mailing list archive at Nabble.com.
Hi, Why don't you give subset a try: adata <- subset(adata, is.na(z)==FALSE&is.na(y)==FALSE) I'm not sure if you want to use AND or OR for this statement. Best wishes, Natalie On 26/10/2011 16:25, Schatzi wrote:> Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values. > > > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > 2 3 5 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you. > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Oct 26, 2011, at 10:25 AM, Schatzi wrote:> Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values. > > > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > 2 3 5 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you.Presuming that you want to remove an entire row, if any of the elements in that row are NA's, see ?na.omit> na.omit(adata)y z x 2 5 3 2 3 4 4 3 5 5 1 5 6 6 3 6 HTH, Marc Schwartz
Hi, On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thompson at cargill.com> wrote:> Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values.Thanks for the example. Your problem is because of the which() statement. If there are NA values, which() returns the row numbers where the NAs are:> which(apply(adata[,1:2],1,function(x)any(is.na(x))))[1] 1 4 7> bdata <- data.frame(1:7, 1:7, 1:7) > which(apply(bdata[,1:2],1,function(x)any(is.na(x))))integer(0) But if there aren't any, which() returns 0. How does R subset on a row index of 0? Unhelpfully. Fortunately you don't need the which() at all: the logical vector returned by your apply statement is entirely sufficient (with added negation):> adata[apply(adata[,1:2],1,function(x)!any(is.na(x))), ]y z x 2 5 3 2 3 4 4 3 5 5 1 5 6 6 3 6> bdata[apply(bdata[,1:2],1,function(x)!any(is.na(x))), ]X1.7 X1.7.1 X1.7.2 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 Sarah> > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > ? ?2 ? ? 3 ? ? 5 ? ? 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you. >-- Sarah Goslee http://www.functionaldiversity.org
?complete.cases> y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adatay z x 1 NA NA 1 2 5 3 2 3 4 4 3 4 2 NA 4 5 5 1 5 6 6 3 6 7 NA 7 7> adata[complete.cases(adata),]y z x 2 5 3 2 3 4 4 3 5 5 1 5 6 6 3 6 On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thompson at cargill.com> wrote:> Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values. > > > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > ? ?2 ? ? 3 ? ? 5 ? ? 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you. > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941009.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Instead of d[-which(condition)] use d[!condition] where 'condition' is a logical vector. which(condition) returns integer(0) (an integer vector of length 0) if there are no TRUEs in 'condition'. -integer(0) is identical to integer(0) and d[integer(0)] means to select zero elements from d. !condition means to flip the senses of all the TRUEs and FALSEs (and to leave NAs alone) so d[!condition] returns the elements of d for which condition is not TRUE (along with NA's for NA's in condition, but you won't have any of them in your example). By the way, your use of apply() slows things down and might lead to errors. Try replacing apply(adata[,1:2],1,function(x)any(is.na(x)))) by is.na(adata$y) | is.na(adata$z) or rowSums(is.na(adata[,1:2])) > 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Schatzi > Sent: Wednesday, October 26, 2011 8:25 AM > To: r-help at r-project.org > Subject: [R] sometimes removing NAs from code > > Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values. > > > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > 2 3 5 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you. > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code- > tp3941009p3941009.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you for the help and explanations. I used the "complete.cases" function and it is working great. adata[complete.cases(adata[,1:2]),] ----- In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/sometimes-removing-NAs-from-code-tp3941009p3941431.html Sent from the R help mailing list archive at Nabble.com.