Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
/*Create last transaction flag per day*/
data all6;
set all6;
by tdate event_tim;
last_trans=last.tdate;
Thanks ahead for any suggestions.
--
View this message in context:
http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.
Suppose your data frame is
d <- data.frame(
stringsAsFactors = FALSE,
transaction = c("T01", "T02", "T03",
"T04", "T05", "T06",
"T07", "T08", "T09", "T10"),
date = c("2012-10-19", "2012-10-19",
"2012-10-19",
"2012-10-19", "2012-10-22", "2012-10-23",
"2012-10-23", "2012-10-23", "2012-10-23",
"2012-10-23"),
time = c("08:00", "09:00", "10:00",
"11:00", "12:00",
"13:00", "14:00", "15:00",
"16:00", "17:00"
))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
isLastInGroup <- function(...) ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
> cbind(d, is_last_of_dayA, is_last_of_dayB)
transaction date time is_last_of_dayA is_last_of_dayB
1 T01 2012-10-19 08:00 FALSE FALSE
2 T02 2012-10-19 09:00 FALSE FALSE
3 T03 2012-10-19 10:00 FALSE FALSE
4 T04 2012-10-19 11:00 TRUE TRUE
5 T05 2012-10-22 12:00 TRUE TRUE
6 T06 2012-10-23 13:00 FALSE FALSE
7 T07 2012-10-23 14:00 FALSE FALSE
8 T08 2012-10-23 15:00 FALSE FALSE
9 T09 2012-10-23 16:00 FALSE FALSE
10 T10 2012-10-23 17:00 TRUE TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of ramoss
> Sent: Friday, October 19, 2012 10:52 AM
> To: r-help at r-project.org
> Subject: [R] Creating a new by variable in a dataframe
>
> Hello,
>
> I have a dataframe w/ 3 variables of interest: transaction,date(tdate)
&
> time(event_tim).
> How could I create a 4th variable (last_trans) that would flag the last
> transaction of the day for each day?
> In SAS I use:
> proc sort data=all6;
> by tdate event_tim;
> run;
> /*Create last transaction flag per day*/
> data all6;
> set all6;
> by tdate event_tim;
> last_trans=last.tdate;
>
> Thanks ahead for any suggestions.
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Creating-a-new-by-
> variable-in-a-dataframe-tp4646782.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hi, May be this helps you: dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] dat2$tdate<-as.Date(dat2$tdate,format="%m/%d/%Y") dat3<-dat2 ?dat3$last_trans<-NA library(plyr) dat4<-merge(dat3,ddply(dat2,.(tdate),tail,1)) dat4$last_trans<-dat4$transaction ?res<-merge(dat4,dat2,all=TRUE) ?res #??????? tdate event_tim transaction last_trans #1? 2012-01-10???????? 2????????? 14???????? NA #2? 2012-01-10???????? 4????????? 28???????? NA #3? 2012-01-10???????? 6????????? 42???????? NA #4? 2012-01-10???????? 8????????? 14???????? 14 #5? 2012-02-10???????? 6????????? 46???????? NA #6? 2012-02-10???????? 8????????? 71???????? NA #7? 2012-02-10???????? 9????????? 64???????? 64 #8? 2012-03-10???????? 1????????? 14???????? NA #9? 2012-03-10???????? 3????????? 85???????? NA #10 2012-03-10???????? 4????????? 28???????? 28 #11 2012-09-10???????? 5????????? 51???????? NA #12 2012-09-10???????? 9????????? 66???????? 66 #13 2012-09-20??????? 12????????? 84???????? 84 ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks for all the help guys.
This worked for me:
all6 <- arrange(all6, tdate,event_tim)
lt <- ddply(all6,.(tdate),tail,1)
lt$last_trans <-'Y'
all6 <-merge(all6,lt, by.x=c("tdate","event_tim"),
by.y=c("tdate","event_tim"),all.x=TRUE)
--
View this message in context:
http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782p4646799.html
Sent from the R help mailing list archive at Nabble.com.
Hi, In addition to merge(), you can also use join() dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] aggres<-aggregate(dat2[,-1],by=list(tdate=dat2$tdate),tail,1) aggres$last_trans<-"Y" library(plyr) join(dat2,aggres,by=intersect(names(dat2),names(aggres)),type="full") #?????? tdate event_tim transaction last_trans #1? 1/10/2012???????? 2????????? 14?????? <NA> #2? 1/10/2012???????? 4????????? 28?????? <NA> #3? 1/10/2012???????? 6????????? 42?????? <NA> #4? 1/10/2012???????? 8????????? 14????????? Y #5? 2/10/2012???????? 6????????? 46?????? <NA> #6? 2/10/2012???????? 8????????? 71?????? <NA> #7? 2/10/2012???????? 9????????? 64????????? Y #8? 3/10/2012???????? 1????????? 14?????? <NA> #9? 3/10/2012???????? 3????????? 85?????? <NA> #10 3/10/2012???????? 4????????? 28????????? Y #11 9/10/2012???????? 5????????? 51?????? <NA> #12 9/10/2012???????? 9????????? 66????????? Y #13 9/20/2012??????? 12????????? 84????????? Y A.K. ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.