HI All, I have a sample of data frame DF1<-read.table(text="name ddate A 2019-10-28 A 2018-01-25 A 2020-01-12 A 2017-10-20 B 2020-11-20 B 2019-10-20 B 2017-05-20 B 2020-01-20 c 2009-10-01 ",header=TRUE) 1. I want sort by name and ddate on decreasing order and the output should like as follow A 2020-01-12 A 2019-01-12 A 2018-01-25 A 2017-10-20 B 2020-11-21 B 2020-11-01 B 2019-10-20 B 2017-05-20 c 2009-10-01 2. Take the top two rows by group( names) and the out put should like A 2020-01-12 A 2019-01-12 B 2020-11-21 B 2020-11-01 c 2009-10-01 3. Within each group (name) get the date difference between the first and second rows dates. If a group has only one row then the difference should be 0 The final out put is Name diff A 365 B 20 C 0 Here is my attempt and have an issue at the sorting DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d") DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE)))), ] not working Any help? Thank you
## the data you gave us DF1 <- read.table(text="name ddate A 2019-10-28 A 2018-01-25 A 2020-01-12 A 2017-10-20 B 2020-11-20 B 2019-10-20 B 2017-05-20 B 2020-01-20 c 2009-10-01 ", header=TRUE, colClasses=c("character", "POSIXct")) DF1 D2 <- split(DF1, DF1$name) D2 sapply(D2, function(x) { DD <- c(sort(x$ddate, decreasing=TRUE), min(x$ddate)) DD[1]-DD[2] }) ## the data that your intended answer is based on DF1 <- read.table(text="name ddate A 2019-01-12 ## intended value A 2018-01-25 A 2020-01-12 A 2017-10-20 B 2020-11-20 B 2019-10-20 B 2017-05-20 B 2020-01-20 c 2009-10-01 ", header=TRUE, colClasses=c("character", "POSIXct")) DF1 D2 <- split(DF1, DF1$name) D2 sapply(D2, function(x) { DD <- c(sort(x$ddate, decreasing=TRUE), min(x$ddate)) DD[1]-DD[2] }) On Thu, May 14, 2020 at 11:00 PM Val <valkremk at gmail.com> wrote:> HI All, > I have a sample of data frame > DF1<-read.table(text="name ddate > A 2019-10-28 > A 2018-01-25 > A 2020-01-12 > A 2017-10-20 > B 2020-11-20 > B 2019-10-20 > B 2017-05-20 > B 2020-01-20 > c 2009-10-01 ",header=TRUE) > > 1. I want sort by name and ddate on decreasing order and the output > should like as follow > A 2020-01-12 > A 2019-01-12 > A 2018-01-25 > A 2017-10-20 > B 2020-11-21 > B 2020-11-01 > B 2019-10-20 > B 2017-05-20 > c 2009-10-01 > > 2. Take the top two rows by group( names) and the out put should like > A 2020-01-12 > A 2019-01-12 > B 2020-11-21 > B 2020-11-01 > c 2009-10-01 > > 3. Within each group (name) get the date difference between the > first and second rows dates. If a group has only one row then the > difference should be 0 > > The final out put is > Name diff > A 365 > B 20 > C 0 > > Here is my attempt and have an issue at the sorting > DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d") > DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE)))), ] > > not working > Any help? > > Thank you > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Val, Your problem is keeping the orders straight. You want dates decreasing and names increasing: DF1<-read.table(text="name ddate A 2019-10-28 A 2018-01-25 A 2020-01-12 A 2017-10-20 B 2020-11-20 B 2019-10-20 B 2017-05-20 B 2020-01-20 C 2009-10-01 ",header=TRUE) DF1$Time<-as.POSIXct(DF1$ddate , format = "%Y-%m-%d") # get dates in decreasing order DF2<-DF1[order(DF1$Time,decreasing=TRUE),] # create the final output with the names in increasing order DF3<-data.frame(name=sort(unique(DF2$name))) # create a function that returns the first diff of a vector getdate1diff<-function(x) return(ifelse(length(x)>1,abs(diff(x))[1],0)) # apply the function by the unique names in DF2 DF3$date1diff<-by(DF2$Time,DF2$name,getdate1diff) DF3 Jim On Fri, May 15, 2020 at 1:00 PM Val <valkremk at gmail.com> wrote:> > HI All, > I have a sample of data frame > DF1<-read.table(text="name ddate > A 2019-10-28 > A 2018-01-25 > A 2020-01-12 > A 2017-10-20 > B 2020-11-20 > B 2019-10-20 > B 2017-05-20 > B 2020-01-20 > c 2009-10-01 ",header=TRUE) > > 1. I want sort by name and ddate on decreasing order and the output > should like as follow > A 2020-01-12 > A 2019-01-12 > A 2018-01-25 > A 2017-10-20 > B 2020-11-21 > B 2020-11-01 > B 2019-10-20 > B 2017-05-20 > c 2009-10-01 > > 2. Take the top two rows by group( names) and the out put should like > A 2020-01-12 > A 2019-01-12 > B 2020-11-21 > B 2020-11-01 > c 2009-10-01 > > 3. Within each group (name) get the date difference between the > first and second rows dates. If a group has only one row then the > difference should be 0 > > The final out put is > Name diff > A 365 > B 20 > C 0 > > Here is my attempt and have an issue at the sorting > DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d") > DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE)))), ] > > not working > Any help? > > Thank you > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, Here is a dplyr solution. arrange() sorts by name and desc(ddate) and top_n keeps the first 2 after grouping by ddate. Then it's a matter of being careful with diff() in the summarise instruction. library(dplyr) DF1 %>% mutate(ddate = as.Date(ddate)) %>% arrange(name, desc(ddate)) %>% group_by(name) %>% top_n(2) %>% summarise(diff = ifelse(n() > 1, diff(rev(ddate)), 0)) #Selecting by ddate ## A tibble: 3 x 2 # name diff # <chr> <dbl> #1 A 76 #2 B 305 #3 c 0 Hope this helps, Rui Barradas ?s 03:58 de 15/05/20, Val escreveu:> HI All, > I have a sample of data frame > DF1<-read.table(text="name ddate > A 2019-10-28 > A 2018-01-25 > A 2020-01-12 > A 2017-10-20 > B 2020-11-20 > B 2019-10-20 > B 2017-05-20 > B 2020-01-20 > c 2009-10-01 ",header=TRUE) > > 1. I want sort by name and ddate on decreasing order and the output > should like as follow > A 2020-01-12 > A 2019-01-12 > A 2018-01-25 > A 2017-10-20 > B 2020-11-21 > B 2020-11-01 > B 2019-10-20 > B 2017-05-20 > c 2009-10-01 > > 2. Take the top two rows by group( names) and the out put should like > A 2020-01-12 > A 2019-01-12 > B 2020-11-21 > B 2020-11-01 > c 2009-10-01 > > 3. Within each group (name) get the date difference between the > first and second rows dates. If a group has only one row then the > difference should be 0 > > The final out put is > Name diff > A 365 > B 20 > C 0 > > Here is my attempt and have an issue at the sorting > DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d") > DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE)))), ] > > not working > Any help? > > Thank you > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >