Hi R-helpers! I have the following dataframe: firm<-c(rep(1:3,4)) year<-c(rep(2001:2003,4)) X1<-rep(c(10,NA),6) X2<-rep(c(5,NA,2),4) data<-data.frame(firm, year,X1,X2) data So I want to obtain the same dataframe with a variable X3 that is: X1, if X2=NA X2, if X1=NA X1+X2 if X1 and X2 are not NA So my final data is X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2) finaldata<-data.frame(firm, year,X1,X2,X3) I've tried this finaldata<-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2)) But I got just NA in X3. Anyone could help me with this? Thanks in advance, Cec?lia (Universidade de Aveiro - Portugal)
On 6/8/2009 1:48 PM, Cecilia Carmo wrote:> Hi R-helpers! > > I have the following dataframe: > firm<-c(rep(1:3,4)) > year<-c(rep(2001:2003,4)) > X1<-rep(c(10,NA),6) > X2<-rep(c(5,NA,2),4) > data<-data.frame(firm, year,X1,X2) > data > > So I want to obtain the same dataframe with a variable X3 that is: > X1, if X2=NA > X2, if X1=NA > X1+X2 if X1 and X2 are not NA > > So my final data is > X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2) > finaldata<-data.frame(firm, year,X1,X2,X3)library(fortunes) fortune("dog") firm <- c(rep(1:3, 4)) year <- c(rep(2001:2003, 4)) X1 <- rep(c(10, NA), 6) X2 <- rep(c(5, NA, 2), 4) mydata <- data.frame(firm, year, X1, X2) mydata$X3 <- with(mydata, ifelse( is.na(X1) & !is.na(X2), X2, ifelse(!is.na(X1) & is.na(X2), X1, ifelse(!is.na(X1) & !is.na(X2), X1 + X2, NA)))) mydata firm year X1 X2 X3 1 1 2001 10 5 15 2 2 2002 NA NA NA 3 3 2003 10 2 12 4 1 2001 NA 5 5 5 2 2002 10 NA 10 6 3 2003 NA 2 2 7 1 2001 10 5 15 8 2 2002 NA NA NA 9 3 2003 10 2 12 10 1 2001 NA 5 5 11 2 2002 10 NA 10 12 3 2003 NA 2 2> I've tried this > finaldata<-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2)) > > But I got just NA in X3. > Anyone could help me with this? > > Thanks in advance, > > Cec?lia (Universidade de Aveiro - Portugal) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
On Jun 8, 2009, at 1:48 PM, Cecilia Carmo wrote:> Hi R-helpers! > > I have the following dataframe: > firm<-c(rep(1:3,4)) > year<-c(rep(2001:2003,4)) > X1<-rep(c(10,NA),6) > X2<-rep(c(5,NA,2),4) > data<-data.frame(firm, year,X1,X2) > data > > So I want to obtain the same dataframe with a variable X3 that is: > X1, if X2=NA > X2, if X1=NA > X1+X2 if X1 and X2 are not NA > > So my final data is > X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2) > finaldata<-data.frame(firm, year,X1,X2,X3)data$X3 <- with(data, X1+X2) # creates values for the valid pairs and NA otherwise data[is.na(data$X3),"X3"] <- data$X1[is.na(data$X3)] # NA's replaced with X1 including NA's data[is.na(data$X3),"X3"] <- data$X2[is.na(data$X3)] # remaining X1- NA's replaced with X2 > data firm year X1 X2 X3 1 1 2001 10 5 15 2 2 2002 NA NA NA 3 3 2003 10 2 12 4 1 2001 NA 5 5 5 2 2002 10 NA 10 6 3 2003 NA 2 2 7 1 2001 10 5 15 8 2 2002 NA NA NA 9 3 2003 10 2 12 10 1 2001 NA 5 5 11 2 2002 10 NA 10 12 3 2003 NA 2 2> > > I've tried this > finaldata<-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data > $varvendas==NA,X1,X1+X2)) > But I got just NA in X3. > Anyone could help me with this? > > Thanks in advance,David Winsemius, MD Heritage Laboratories West Hartford, CT
On Mon, Jun 8, 2009 at 1:48 PM, Cecilia Carmo <cecilia.carmo@ua.pt> wrote:> I have the following dataframe: > firm<-c(rep(1:3,4)) > year<-c(rep(2001:2003,4)) > X1<-rep(c(10,NA),6) > X2<-rep(c(5,NA,2),4) > data<-data.frame(firm, year,X1,X2) > data > > So I want to obtain the same dataframe with a variable X3 that is: > X1, if X2=NA > X2, if X1=NA > X1+X2 if X1 and X2 are not NA > > So my final data is > X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2) > finaldata<-data.frame(firm, year,X1,X2,X3) > > I've tried this > > finaldata<-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2)) > But I got just NA in X3. > Anyone could help me with this? >The problem here is that comparing NA to anything always gives NA, even for NA==NA. To check for NA, you need to use is.na, e.g. data$X3 <- ifelse( is.na(data$X1), data$X2, ifelse( is.na(data$X2), data$X1, data$X1+data$X2 ) (you don't need to handle the is.na(X1) & is.na(X2) case specially) which you can make more compact using 'with': data$X3 <- with(data, ifelse( is.na(X1), X2, ifelse( is.na(X2), X1, X1+X2 ))) Hope this helps, -s [[alternative HTML version deleted]]
I think one of the other good suggestions might have had a typo in it. And, I would like to append an alternate approach that can be generalized to more columns. In my opinion, nested ifelse() expressions are difficult to read and understand, and therefore difficult to get right. Easier to write one expression for each of your criteria. But do the last one first ## X1+X2 if X1 and X2 are not NA data$X3 <- with(data, X1+X2) ## X1, if X2=NA data$X3[is.na(data$X2)] <- data$X1[is.na(data$X2)] ## X2, if X1=NA data$X3[is.na(data$X1)] <- data$X2[is.na(data$X1)] But, what if you had three columns, X1, X2, X3, and wanted X4 to be the sum of the others, excluding NA values (which is essentially what you're doing) data$X4 <- apply(data[,c('X1','X2','X3')] , 1 , function(xr) {if (any(!is.na(xr))) sum(xr,na.rm=TRUE) else NA} ) Example: firm<-c(rep(1:3,4)) year<-c(rep(2001:2003,4)) X1<-rep(c(10,NA),6) X2<-rep(c(5,NA,2),4) X3 <- c(NA,NA,rep(1,9),NA) data<-data.frame(firm, year,X1,X2,X3) data$X4 <- apply(data[,c('X1','X2','X3')] , 1 , function(xr) {if (any(!is.na(xr))) sum(xr,na.rm=TRUE) else NA} ) print(data) Or, for your case, with just the two columns: data$X3 <- apply(data[,c('X1','X2')] , 1 , function(xr) {if (any(!is.na(xr))) sum(xr,na.rm=TRUE) else NA} ) should do it. -Don At 6:48 PM +0100 6/8/09, Cecilia Carmo wrote:>Hi R-helpers! > >I have the following dataframe: >firm<-c(rep(1:3,4)) >year<-c(rep(2001:2003,4)) >X1<-rep(c(10,NA),6) >X2<-rep(c(5,NA,2),4) >data<-data.frame(firm, year,X1,X2) >data > >So I want to obtain the same dataframe with a variable X3 that is: >X1, if X2=NA >X2, if X1=NA >X1+X2 if X1 and X2 are not NA > >So my final data is >X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2) >finaldata<-data.frame(firm, year,X1,X2,X3) > >I've tried this >finaldata<-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2)) >But I got just NA in X3. >Anyone could help me with this? > >Thanks in advance, > >Cec?lia (Universidade de Aveiro - Portugal) > >______________________________________________ >R-help at r-project.org mailing list >https:// stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062
On Mon, Jun 8, 2009 at 3:36 PM, Don MacQueen <macq@llnl.gov> wrote: Though I do agree that the way you've written the general case with any/ is.na and sum/na.rm is cleaner and clearer because more general, I don't agree at all with what you say about nested ifelse's vs. a series of assignments:> In my opinion, nested ifelse() expressions are difficult to read and > understand, and therefore difficult to get right. > Easier to write one expression for each of your criteria. But do the last > one first >In the ifelse case, it is easy to trace exactly what happens in each case, because all the cases are disjoint. This becomes especially clear if written with a lot of whitespace and proper indentation: ifelse( is.na(X1), X2, # the is.na(X1) case ifelse( is.na(X2), # the !is.na(X1) case X1, # the !is.na(X1) & is.na(X2) case X1+X2 ))) # the !is.na(X1) & !is.na(X2) case I suppose it might be clearer for some users at least if you wrote out *all* the cases, even though they're not necessary: ifelse( is.na(X1), ifelse( is.na(X2), # the is.na(X1) cases NA, # the is.na(X1) & is.na(X2) case X2 ))) # the is.na(X1) & !is.na(X2) case ifelse( is.na(X2), # the !is.na(X1) cases X1, # the !is.na(X1) & is.na(X2) case X1+X2 ))) # the !is.na(X1) & !is.na(X2) case On the other hand, with the multiple assignment case, if you're not careful, it's easy to have different statements overwriting each other's results in unintended ways. For those who've been around programming for a while, they may recall Dijkstra's "goto considered harmful" letter -- which is echoed by functional programming's "assignment considered harmful"! -s [[alternative HTML version deleted]]