Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; /*Create last transaction flag per day*/ data all6; set all6; by tdate event_tim; last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
Suppose your data frame is d <- data.frame( stringsAsFactors = FALSE, transaction = c("T01", "T02", "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"), date = c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"), time = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00" )) (Convert the date and time to your favorite classes, it doesn't matter here.) A general way to say if an item is the last of its group is: isLastInGroup <- function(...) ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x)) is_last_of_dayA <- with(d, isLastInGroup(date)) If you know your data is sorted by date you could save a little time for large datasets by using isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE) is_last_of_dayB <- isLastInRun(d$date) The above d is sorted by date so you get the same results for both: > cbind(d, is_last_of_dayA, is_last_of_dayB) transaction date time is_last_of_dayA is_last_of_dayB 1 T01 2012-10-19 08:00 FALSE FALSE 2 T02 2012-10-19 09:00 FALSE FALSE 3 T03 2012-10-19 10:00 FALSE FALSE 4 T04 2012-10-19 11:00 TRUE TRUE 5 T05 2012-10-22 12:00 TRUE TRUE 6 T06 2012-10-23 13:00 FALSE FALSE 7 T07 2012-10-23 14:00 FALSE FALSE 8 T08 2012-10-23 15:00 FALSE FALSE 9 T09 2012-10-23 16:00 FALSE FALSE 10 T10 2012-10-23 17:00 TRUE TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of ramoss > Sent: Friday, October 19, 2012 10:52 AM > To: r-help at r-project.org > Subject: [R] Creating a new by variable in a dataframe > > Hello, > > I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & > time(event_tim). > How could I create a 4th variable (last_trans) that would flag the last > transaction of the day for each day? > In SAS I use: > proc sort data=all6; > by tdate event_tim; > run; > /*Create last transaction flag per day*/ > data all6; > set all6; > by tdate event_tim; > last_trans=last.tdate; > > Thanks ahead for any suggestions. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by- > variable-in-a-dataframe-tp4646782.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, May be this helps you: dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] dat2$tdate<-as.Date(dat2$tdate,format="%m/%d/%Y") dat3<-dat2 ?dat3$last_trans<-NA library(plyr) dat4<-merge(dat3,ddply(dat2,.(tdate),tail,1)) dat4$last_trans<-dat4$transaction ?res<-merge(dat4,dat2,all=TRUE) ?res #??????? tdate event_tim transaction last_trans #1? 2012-01-10???????? 2????????? 14???????? NA #2? 2012-01-10???????? 4????????? 28???????? NA #3? 2012-01-10???????? 6????????? 42???????? NA #4? 2012-01-10???????? 8????????? 14???????? 14 #5? 2012-02-10???????? 6????????? 46???????? NA #6? 2012-02-10???????? 8????????? 71???????? NA #7? 2012-02-10???????? 9????????? 64???????? 64 #8? 2012-03-10???????? 1????????? 14???????? NA #9? 2012-03-10???????? 3????????? 85???????? NA #10 2012-03-10???????? 4????????? 28???????? 28 #11 2012-09-10???????? 5????????? 51???????? NA #12 2012-09-10???????? 9????????? 66???????? 66 #13 2012-09-20??????? 12????????? 84???????? 84 ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks for all the help guys. This worked for me: all6 <- arrange(all6, tdate,event_tim) lt <- ddply(all6,.(tdate),tail,1) lt$last_trans <-'Y' all6 <-merge(all6,lt, by.x=c("tdate","event_tim"), by.y=c("tdate","event_tim"),all.x=TRUE) -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782p4646799.html Sent from the R help mailing list archive at Nabble.com.
Hi, In addition to merge(), you can also use join() dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] aggres<-aggregate(dat2[,-1],by=list(tdate=dat2$tdate),tail,1) aggres$last_trans<-"Y" library(plyr) join(dat2,aggres,by=intersect(names(dat2),names(aggres)),type="full") #?????? tdate event_tim transaction last_trans #1? 1/10/2012???????? 2????????? 14?????? <NA> #2? 1/10/2012???????? 4????????? 28?????? <NA> #3? 1/10/2012???????? 6????????? 42?????? <NA> #4? 1/10/2012???????? 8????????? 14????????? Y #5? 2/10/2012???????? 6????????? 46?????? <NA> #6? 2/10/2012???????? 8????????? 71?????? <NA> #7? 2/10/2012???????? 9????????? 64????????? Y #8? 3/10/2012???????? 1????????? 14?????? <NA> #9? 3/10/2012???????? 3????????? 85?????? <NA> #10 3/10/2012???????? 4????????? 28????????? Y #11 9/10/2012???????? 5????????? 51?????? <NA> #12 9/10/2012???????? 9????????? 66????????? Y #13 9/20/2012??????? 12????????? 84????????? Y A.K. ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.