Marna Wagley
2021-Apr-22 23:45 UTC
[R] Conditional extraction of values in a data.frame in r
Hi R Users, I have been struggling to extract the data based on conditional values in different columns. I have a very big dataset (rows) and a couple of columns. here an example of the dataset is: daT<-structure(list(ID = c("id1", "id2", "id3", "id4", "id5", "id6", "id7"), First_detectiondate = c("7/21/2015", "5/19/2015", "5/27/2015", NA, "9/25/2015", NA, NA), Second_detectiondate = c(NA, NA, "6/1/2015", "5/29/2015", NA, NA, "4/17/2015"), third_detectiondate = c(NA, "5/21/2015", "6/20/2015", NA, NA, "", NA)), class = "data.frame", row.names = c(NA, -7L)) head(daT) I wanted to put conditions such as: if any of the columns of 2.3.4 has a date, get the date which was latest. If there is no date, put NA. and I was looking for the output as shown in the following table. output<-structure(list(ID = c("id1", "id2", "id3", "id4", "id5", "id6", "id7"), First_detectiondate = c("7/21/2015", "5/19/2015", "5/27/2015", NA, "9/25/2015", NA, NA), Second_detectiondate = c(NA, NA, "6/1/2015", "5/29/2015", NA, NA, "4/17/2015"), third_detectiondate = c(NA, "5/21/2015", "6/20/2015", NA, NA, "", NA), output1 = c("7/21/2015", "5/21/2015", "6/20/2015", "5/29/2015", "9/25/2015", NA, "4/17/2015" )), class = "data.frame", row.names = c(NA, -7L)) head(output) Is there a way to get the table similar to the table "output"? Thank you very much for your help. Sincerely, MW [[alternative HTML version deleted]]
Hi Marna, This may be what you want: get_latest<-function(x,format="%m/%d/%Y") { x<-unlist(x) x[nchar(x)==0]<-NA if(all(is.na(x))) return(NA) else return(format(max(as.Date(x,format),na.rm=TRUE),format)) } daT$output1<-apply(daT[,2:4],1,get_latest) The empty value in daT gave a bit of trouble. I have written it to accept and return character dates. Jim On Fri, Apr 23, 2021 at 9:46 AM Marna Wagley <marna.wagley at gmail.com> wrote:> > Hi R Users, > I have been struggling to extract the data based on conditional values in > different columns. I have a very big dataset (rows) and a couple of > columns. here an example of the dataset is: > > daT<-structure(list(ID = c("id1", "id2", "id3", "id4", "id5", "id6", > > "id7"), First_detectiondate = c("7/21/2015", "5/19/2015", "5/27/2015", > > NA, "9/25/2015", NA, NA), Second_detectiondate = c(NA, NA, "6/1/2015", > > "5/29/2015", NA, NA, "4/17/2015"), third_detectiondate = c(NA, > > "5/21/2015", "6/20/2015", NA, NA, "", NA)), class = "data.frame", row.names > = c(NA, > > -7L)) > > > head(daT) > > > I wanted to put conditions such as: if any of the columns of 2.3.4 has a > date, get the date which was latest. If there is no date, put NA. and I was > looking for the output as shown in the following table. > > > output<-structure(list(ID = c("id1", "id2", "id3", "id4", "id5", "id6", > > "id7"), First_detectiondate = c("7/21/2015", "5/19/2015", "5/27/2015", > > NA, "9/25/2015", NA, NA), Second_detectiondate = c(NA, NA, "6/1/2015", > > "5/29/2015", NA, NA, "4/17/2015"), third_detectiondate = c(NA, > > "5/21/2015", "6/20/2015", NA, NA, "", NA), output1 = c("7/21/2015", > > "5/21/2015", "6/20/2015", "5/29/2015", "9/25/2015", NA, "4/17/2015" > > )), class = "data.frame", row.names = c(NA, -7L)) > > head(output) > > > > Is there a way to get the table similar to the table "output"? > > > Thank you very much for your help. > > > Sincerely, > > > MW > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.