Hi, I have a following data set: id event time (in sec) 1 add 1373502892 2 add 1373502972 3 delete 1373502995 4 view 1373503896 5 add 1373503996 ... I'd like to add new column "time on task" which is time elapsed between two events (id2 - id1...). What would be the best approach to do that? Thanks, Srecko [[alternative HTML version deleted]]
Hi, Try: dat1<- read.table(text=" id??? event??? time 1??? add????? 1373502892 2??? add????? 1373502972 3??? delete? 1373502995 4??? view????? 1373503896 5??? add????? 1373503996 ",sep="",header=TRUE,stringsAsFactors=FALSE) ?dat1$time_on_task<- c(NA,diff(dat1$time)) ?dat1 #? id? event?????? time time_on_task #1? 1??? add 1373502892?????????? NA #2? 2??? add 1373502972?????????? 80 #3? 3 delete 1373502995?????????? 23 #4? 4?? view 1373503896????????? 901 #5? 5??? add 1373503996????????? 100 #Not sure whether this depends on the values of "event" or not.. A.K. ----- Original Message ----- From: srecko joksimovic <sreckojoksimovic at gmail.com> To: R help <R-help at r-project.org> Cc: Sent: Thursday, August 29, 2013 1:52 PM Subject: [R] Add new calculated column to data frame Hi, I have a following data set: id? ? event? ? time (in sec) 1? ? add? ? ? 1373502892 2? ? add? ? ? 1373502972 3? ? delete? 1373502995 4? ? view? ? ? 1373503896 5? ? add? ? ? 1373503996 ... I'd like to add new column "time on task" which is time elapsed between two events (id2 - id1...). What would be the best approach to do that? Thanks, Srecko ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Arun, There is one more question... you explained me how to use split(dat1,cumsum(dat1$action=="login")) in one of previous questions, and that is great. Now, if I have something like this: id module event time time_on_task 1 sys login 1373502892 80 2 task add 1373502892 80 3 task add 1373502972 23 4 sys login 1373502892 80 5 list delete 1373502995 901 6 list view 1373503896 100 7 task add 1373503996 NA I know how to split at each "login" occurrence, and I know how to add new column with time differences. But, how to add new column "category" which will be calculated based on columns "module" and "even"? For example if module=task and event=add => category= A... Srecko On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink111@yahoo.com> wrote:> Hi Srecko, > No problem. > Regards, > Arun > > > > > > > ________________________________ > From: srecko joksimovic <sreckojoksimovic@gmail.com> > To: arun <smartpink111@yahoo.com> > Sent: Thursday, August 29, 2013 2:22 PM > Subject: Re: [R] Add new calculated column to data frame > > > > Sorry... I should figure it out... > > thanks so much! > Srecko > > > > On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink111@yahoo.com> wrote: > > Hi, > >The one you showed is: > > > >dat1$time_on_task<- c(diff(dat1$time),NA) > > > > dat1 > ># id event time time_on_task > >#1 1 add 1373502892 80 > > > >#2 2 add 1373502972 23 > >#3 3 delete 1373502995 901 > >#4 4 view 1373503896 100 > >#5 5 add 1373503996 NA > > > > > > > > > >________________________________ > >From: srecko joksimovic <sreckojoksimovic@gmail.com> > > > >To: arun <smartpink111@yahoo.com> > >Cc: R help <r-help@r-project.org> > >Sent: Thursday, August 29, 2013 2:15 PM > >Subject: Re: [R] Add new calculated column to data frame > > > > > > > > > >Thanks Arun, > > > >this is great. However, it should be just a little bit different: > > > ># id event time time_on_task > >#1 1 add 1373502892 80 > >#2 2 add 1373502972 23 > >#3 3 delete 1373502995 901 > >#4 4 view 1373503896 100 > >#5 5 add 1373503996 NA > > > > > >When I calculate difference, I need to know how long each activity was. > It is id2-id1 for the first activity... > > > > > > > >On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink111@yahoo.com> wrote: > > > > > >> > >>Hi, > >>Try: > >>dat1<- read.table(text=" > >>id event time > >> > >>1 add 1373502892 > >>2 add 1373502972 > >>3 delete 1373502995 > >>4 view 1373503896 > >>5 add 1373503996 > >>",sep="",header=TRUE,stringsAsFactors=FALSE) > >> dat1$time_on_task<- c(NA,diff(dat1$time)) > >> dat1 > >># id event time time_on_task > >>#1 1 add 1373502892 NA > >>#2 2 add 1373502972 80 > >>#3 3 delete 1373502995 23 > >>#4 4 view 1373503896 901 > >>#5 5 add 1373503996 100 > >> > >>#Not sure whether this depends on the values of "event" or not.. > >>A.K. > >> > >> > >> > >> > >> > >> > >>----- Original Message ----- > >>From: srecko joksimovic <sreckojoksimovic@gmail.com> > >>To: R help <R-help@r-project.org> > >>Cc: > >>Sent: Thursday, August 29, 2013 1:52 PM > >>Subject: [R] Add new calculated column to data frame > >> > >>Hi, > >> > >>I have a following data set: > >>id event time (in sec) > >>1 add 1373502892 > >>2 add 1373502972 > >>3 delete 1373502995 > >>4 view 1373503896 > >>5 add 1373503996 > >>... > >> > >>I'd like to add new column "time on task" which is time elapsed between > two > >>events (id2 - id1...). What would be the best approach to do that? > >> > >>Thanks, > >>Srecko > >> > >> [[alternative HTML version deleted]] > >> > >>______________________________________________ > >>R-help@r-project.org mailing list > >>https://stat.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>and provide commented, minimal, self-contained, reproducible code. > >> > >> > > >[[alternative HTML version deleted]]
HI,
It's not really clear, but you can try this:
dat1<- read.table(text="
id module? event?????? time time_on_task Categ??? url
? 1??? sys? login 1373502892?????????? 80???? B???
http://post/add?id=42&idp=45
?2?? task??? add 1373502892?????????? 80???? A????
http://post/add?id=33&idp=45
?3?? task??? add 1373502972?????????? 23???? A????
http://post/add?id=34&idp=45
?4??? sys? login 1373502892?????????? 80???? B????
http://post/add?id=39&idp=42
?5?? list delete 1373502995????????? 901???? C????
http://post/add?id=37&idp=41
?6?? list?? view 1373503896????????? 100???? D????
http://post/add?id=36&idp=46
?7?? task??? add 1373503996?????????? NA???? A????
http://post/add?id=31&idp=45
",sep="",header=TRUE,stringsAsFactors=FALSE)
vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))
?vec1
#[1] 33 34 31
dat2<- read.table(text="
id idpost idtopic iduser
1?? 45????? 33?????? 101
2?? 46????? 34?????? 102
3?? 47????? 33?????? 103
4?? 48????? 33?????? 101
5?? 49????? 35?????? 104
",sep="",header=TRUE)
?dat1$Categ[dat1$Categ=="A"][!vec1%in%dat2$idtopic]<-"F"
?dat1
#? id module? event?????? time time_on_task Categ????????????????????????? url
#1? 1??? sys? login 1373502892?????????? 80???? B
http://post/add?id=42&idp=45
#2? 2?? task??? add 1373502892?????????? 80???? A
http://post/add?id=33&idp=45
#3? 3?? task??? add 1373502972?????????? 23???? A
http://post/add?id=34&idp=45
#4? 4??? sys? login 1373502892?????????? 80???? B
http://post/add?id=39&idp=42
#5? 5?? list delete 1373502995????????? 901???? C
http://post/add?id=37&idp=41
#6? 6?? list?? view 1373503896????????? 100???? D
http://post/add?id=36&idp=46
#7? 7?? task??? add 1373503996?????????? NA???? F
http://post/add?id=31&idp=45
A.K.
________________________________
From: srecko joksimovic <sreckojoksimovic at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Thursday, August 29, 2013 5:38 PM
Subject: Re: [R] Add new calculated column to data frame
Hi Arun,
I really appreciate your help, and we did a great job :)
but, now I think that R can do anything, so I'd like to try one more thing,
if you don't mind...
from the table with categories,?
#? id module? event?????? time time_on_task Categ ? ?url
#1? 1??? sys? login 1373502892?????????? 80???? B ? ? ? ? http:
#2? 2?? task??? add 1373502892?????????? 80???? A ? ? ? ??http:
#3? 3?? task??? add 1373502972?????????? 23???? A ? ? ? ??http:
#4? 4??? sys? login 1373502892?????????? 80???? B ? ? ? ? ?http:
#5? 5?? list delete 1373502995????????? 901???? C
#6? 6?? list?? view 1373503896????????? 100???? D
#7? 7?? task??? add 1373503996?????????? NA???? A
I'd like to use only certain category (for example A). Each of these fields
has an url whose format is something like http://post/add?id=33&idp=45.
First step would be to extract this id (33 in this case). Based on that value, I
want to find all "iduser" from the following table:
id idpost idtopic iduser
1 ? 45 ? ? ?33 ? ? ? 101
2 ? 46 ? ? ?34 ? ? ? 102
3 ? 47 ? ? ?33 ? ? ? 103
4 ? 48 ? ? ?33 ? ? ? 101
5 ? 49 ? ? ?35 ? ? ? 104
The next step would be to check if at least one of these values (iduser) is not
in the vectors "users" (only ids). If that is the case, I want to
change category to F, if not, I want to keep the same category.
If this is too much for one question, I'll implement this in Java, but
I'd really like to try this with R. Maybe this id extraction from url is the
most important problem... I tried most of these steps, but still not able to put
them all together...
Thank you so much for your time.
Srecko
On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink111 at yahoo.com> wrote:
Hi Srecko,>No problem.
>
>Arun
>
>
>
>
>
>
>________________________________
>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Thursday, August 29, 2013 3:19 PM
>
>Subject: Re: [R] Add new calculated column to data frame
>
>
>
>This is great Arun, thank you again.
>
>I was thinking to use sqldf and issue query for each module-action
combination, but this is much better. Since I have table with categories
(module, action, category), I could create vector "levels" based on
the first two columns and vector "labels" based on the category column
and that should to the work...
>
>Best,
>Srecko
>
>
>
>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink111 at yahoo.com>
wrote:
>
>Hi Srecko,
>>
>>You didn't mention the order in which the letters are assigned.? If
you need a different order, just change the order in the
",levels=c(....),".
>>Arun
>>
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink111 at yahoo.com>
>>To: srecko joksimovic <sreckojoksimovic at gmail.com>
>>Cc: R help <r-help at r-project.org>
>>
>>Sent: Thursday, August 29, 2013 3:13 PM
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>Hi,
>>You could try this:
>>dat1<- read.table(text="
>>id? module??? event?????? time?????????????????????? time_on_task
>>1?? sys???????? login???????? 1373502892?????????? 80
>>2?? task??????? add????????? 1373502892?????????? 80
>>3?? task??????? add????????? 1373502972?????????? 23
>>4?? sys???????? login???????? 1373502892?????????? 80
>>5?? list???????? delete?????? 1373502995????????? 901
>>6?? list????????? view???????? 1373503896????????? 100
>>7?? task??????? add????????? 1373503996?????????? NA
>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>?dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))
>>
>>
>>dat1
>>#? id module? event?????? time time_on_task Categ
>>#1? 1??? sys? login 1373502892?????????? 80???? B
>>#2? 2?? task??? add 1373502892?????????? 80???? A
>>#3? 3?? task??? add 1373502972?????????? 23???? A
>>#4? 4??? sys? login 1373502892?????????? 80???? B
>>#5? 5?? list delete 1373502995????????? 901???? C
>>#6? 6?? list?? view 1373503896????????? 100???? D
>>#7? 7?? task??? add 1373503996?????????? NA???? A
>>A.K.
>>
>>________________________________
>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: R help <R-help at r-project.org>
>>Sent: Thursday, August 29, 2013 2:34 PM
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>Hi Arun,
>>
>>There is one more question... you explained me how to
use?split(dat1,cumsum(dat1$action=="login")) in one of previous
questions, and that is great.
>>Now, if I have something like this:
>>
>>id ?module ? ?event?????? time ? ? ? ? ? ? ? ? ? ? ? time_on_task
>>1 ? sys ? ? ? ? login ? ? ? ? 1373502892?????????? 80
>>2 ? task ? ? ? ?add ? ? ? ? ?1373502892?????????? 80
>>
>>3 ? task ? ? ? ?add ? ? ? ? ?1373502972?????????? 23
>>4 ? sys ? ? ? ? login ? ? ? ? 1373502892?????????? 80
>>5 ? list ? ? ? ? delete ? ? ? 1373502995????????? 901
>>6 ? list ? ? ? ? ?view ? ? ? ? 1373503896????????? 100
>>7 ? task ? ? ? ?add ? ? ? ? ?1373503996?????????? NA
>>I know how to split at each "login" occurrence, and I know how
to add new column with time differences. But, how to add new column
"category" which will be calculated based on columns
"module" and "even"? For example if module=task and
event=add => category= A...
>>
>>Srecko
>>
>>
>>
>>
>>
>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink111 at yahoo.com>
wrote:
>>
>>Hi Srecko,
>>>No problem.
>>>Regards,
>>>Arun
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Thursday, August 29, 2013 2:22 PM
>>>
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Sorry... I should figure it out...
>>>
>>>thanks so much!
>>>Srecko
>>>
>>>
>>>
>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink111 at
yahoo.com> wrote:
>>>
>>>Hi,
>>>>The one you showed is:
>>>>
>>>>dat1$time_on_task<- c(diff(dat1$time),NA)
>>>>
>>>>?dat1
>>>>#? id? event?????? time time_on_task
>>>>#1? 1??? add 1373502892?????????? 80
>>>>
>>>>#2? 2??? add 1373502972?????????? 23
>>>>#3? 3 delete 1373502995????????? 901
>>>>#4? 4?? view 1373503896????????? 100
>>>>#5? 5??? add 1373503996?????????? NA
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>>
>>>>To: arun <smartpink111 at yahoo.com>
>>>>Cc: R help <r-help at r-project.org>
>>>>Sent: Thursday, August 29, 2013 2:15 PM
>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>
>>>>
>>>>
>>>>
>>>>Thanks Arun,
>>>>
>>>>this is great. However, it should be just a little bit
different:
>>>>
>>>>#? id? event?????? time time_on_task
>>>>#1? 1??? add 1373502892 ? ? ? ? ? 80
>>>>#2? 2??? add 1373502972 ? ? ? ? ? 23
>>>>#3? 3 delete 1373502995 ? ? ? ? ? 901
>>>>#4? 4?? view 1373503896 ? ? ? ? ?100
>>>>#5? 5??? add 1373503996 ? ? ? ? ?NA
>>>>
>>>>
>>>>When I calculate difference, I need to know how long each
activity was. It is id2-id1 for the first activity...
>>>>
>>>>
>>>>
>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink111 at
yahoo.com> wrote:
>>>>
>>>>
>>>>>
>>>>>Hi,
>>>>>Try:
>>>>>dat1<- read.table(text="
>>>>>id??? event??? time
>>>>>
>>>>>1??? add????? 1373502892
>>>>>2??? add????? 1373502972
>>>>>3??? delete? 1373502995
>>>>>4??? view????? 1373503896
>>>>>5??? add????? 1373503996
>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>>?dat1$time_on_task<- c(NA,diff(dat1$time))
>>>>>?dat1
>>>>>#? id? event?????? time time_on_task
>>>>>#1? 1??? add 1373502892?????????? NA
>>>>>#2? 2??? add 1373502972?????????? 80
>>>>>#3? 3 delete 1373502995?????????? 23
>>>>>#4? 4?? view 1373503896????????? 901
>>>>>#5? 5??? add 1373503996????????? 100
>>>>>
>>>>>#Not sure whether this depends on the values of
"event" or not..
>>>>>A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----- Original Message -----
>>>>>From: srecko joksimovic <sreckojoksimovic at
gmail.com>
>>>>>To: R help <R-help at r-project.org>
>>>>>Cc:
>>>>>Sent: Thursday, August 29, 2013 1:52 PM
>>>>>Subject: [R] Add new calculated column to data frame
>>>>>
>>>>>Hi,
>>>>>
>>>>>I have a following data set:
>>>>>id? ? event? ? time (in sec)
>>>>>1? ? ?add? ? ? 1373502892
>>>>>2? ? ?add? ? ? 1373502972
>>>>>3? ? ?delete? ?1373502995
>>>>>4? ? ?view? ? ? 1373503896
>>>>>5? ? ?add? ? ? ?1373503996
>>>>>...
>>>>>
>>>>>I'd like to add new column "time on task"
which is time elapsed between two
>>>>>events (id2 - id1...). What would be the best approach to do
that?
>>>>>
>>>>>Thanks,
>>>>>Srecko
>>>>>
>>>>>??? [[alternative HTML version deleted]]
>>>>>
>>>>>______________________________________________
>>>>>R-help at r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible
code.
>>>>>
>>>>>
>>>>
>>>
>>
>
Hi Srecko,
Try this:
dat1<- read.table(text="
id module? event?????? time time_on_task Categ??? url
1??? sys? login 1373502892?????????? 80???? B???????? http://
2?? task??? add 1373502892?????????? 80???? A????????
http://post/add?id=33&idp=67
3?? task??? add 1373502972?????????? 23???? A????????
http://post/add?id=34&idp=67
4??? sys? login 1373502892?????????? 80???? B????????? http://
5?? list delete 1373502995????????? 901???? C????????? http://
6?? list?? view 1373503896????????? 100???? D?????????? http://
7?? task??? add 1373503996?????????? NA???? A???????
http://post/add?id=35&idp=99
",sep="",header=TRUE,stringsAsFactors=FALSE)
vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))
dat2<- read.table(text="
id idpost idtopic iduser
1?? 45????? 33?????? 101
2?? 46????? 34?????? 102
3?? 47????? 33?????? 103
4?? 48????? 33?????? 101
5?? 49????? 35?????? 104
",sep="",header=TRUE)
?student_list<- c(101:102,104:107)
?vec2<-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in%
student_list)))
dat1$Categ[dat1$Categ=="A"][match(vec1,as.numeric(names(vec2)))[!vec2]]<-"F"
?dat1
#? id module? event?????? time time_on_task Categ????????????????????????? url
#1? 1??? sys? login 1373502892?????????? 80???? B????????????????????? http://
#2? 2?? task??? add 1373502892?????????? 80???? F
http://post/add?id=33&idp=67
#3? 3?? task??? add 1373502972?????????? 23???? A
http://post/add?id=34&idp=67
#4? 4??? sys? login 1373502892?????????? 80???? B????????????????????? http://
#5? 5?? list delete 1373502995????????? 901???? C????????????????????? http://
#6? 6?? list?? view 1373503896????????? 100???? D????????????????????? http://
#7? 7?? task??? add 1373503996?????????? NA???? A
http://post/add?id=35&idp=99
A.K.
________________________________
From: srecko joksimovic <sreckojoksimovic at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Thursday, August 29, 2013 6:04 PM
Subject: Re: [R] Add new calculated column to data frame
"Did you mean to separate the number 33 from the link? ", yes that is
correct. It should be something like this:
#? id module? event?????? time time_on_task Categ ? ?url
#1? 1??? sys? login 1373502892?????????? 80???? B ? ? ? ? http://
#2? 2?? task??? add 1373502892?????????? 80???? A ? ? ?
??http://post/add?id=33&idp=67
#3? 3?? task??? add 1373502972?????????? 23???? A ? ? ?
??http://post/add?id=34&idp=67
#4? 4??? sys? login 1373502892?????????? 80???? B ? ? ? ? ?http://
#5? 5?? list delete 1373502995????????? 901???? C ? ? ? ? ?http://
#6? 6?? list?? view 1373503896????????? 100???? D ? ? ? ? ? http://
#7? 7?? task??? add 1373503996?????????? NA???? A ? ? ?
?http://post/add?id=35&idp=99
from this table I should get 3 rows with 3
URLs:?http://post/add?id=33&idp=67,?http://post/add?id=34&idp=67,
and?http://post/add?id=35&idp=99
For each of them, I need to extract id (33, 34, and 35). Once I do that, I need
to obtain users from this table:
id idpost idtopic iduser
1 ? 45 ? ? ?33 ? ? ? 101
2 ? 46 ? ? ?34 ? ? ? 102
3 ? 47 ? ? ?33 ? ? ? 103
4 ? 48 ? ? ?33 ? ? ? 101
5 ? 49 ? ? ?35 ? ? ? 104
again, for each id. This means:?
id = 33 => 101, 103
id = 34 => 102
id = 35 => 104
Next, for each vector I need to check whether or not all it's values are in
the students list (101,102, 104,105, 106,107)
id = 33 => FALSE (since 103 is not in the list)
id = 34 => TRUE
id = 35 => TRUE
This means that category for row 2 in the first table is not A any more, but
F...
Thanks,
Srecko
On Thu, Aug 29, 2013 at 2:56 PM, arun <smartpink111 at yahoo.com> wrote:
HI Srecko,>Did you mean to separate the number 33 from the link? Could you provide a
reproducible example with the output you expected?
>Tx.
>
>
>Arun
>
>
>
>
>
>________________________________
>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Thursday, August 29, 2013 5:38 PM
>
>Subject: Re: [R] Add new calculated column to data frame
>
>
>
>Hi Arun,
>
>I really appreciate your help, and we did a great job :)
>but, now I think that R can do anything, so I'd like to try one more
thing, if you don't mind...
>
>from the table with categories,?
>
>#? id module? event?????? time time_on_task Categ ? ?url
>#1? 1??? sys? login 1373502892?????????? 80???? B ? ? ? ? http:
>#2? 2?? task??? add 1373502892?????????? 80???? A ? ? ? ??http:
>#3? 3?? task??? add 1373502972?????????? 23???? A ? ? ? ??http:
>#4? 4??? sys? login 1373502892?????????? 80???? B ? ? ? ? ?http:
>#5? 5?? list delete 1373502995????????? 901???? C
>#6? 6?? list?? view 1373503896????????? 100???? D
>#7? 7?? task??? add 1373503996?????????? NA???? A
>
>
>I'd like to use only certain category (for example A). Each of these
fields has an url whose format is something like
http://post/add?id=33&idp=45. First step would be to extract this id (33 in
this case). Based on that value, I want to find all "iduser" from the
following table:
>
>id idpost idtopic iduser
>1 ? 45 ? ? ?33 ? ? ? 101
>2 ? 46 ? ? ?34 ? ? ? 102
>
>3 ? 47 ? ? ?33 ? ? ? 103
>
>4 ? 48 ? ? ?33 ? ? ? 101
>
>5 ? 49 ? ? ?35 ? ? ? 104
>
>
>The next step would be to check if at least one of these values (iduser) is
not in the vectors "users" (only ids). If that is the case, I want to
change category to F, if not, I want to keep the same category.
>
>If this is too much for one question, I'll implement this in Java, but
I'd really like to try this with R. Maybe this id extraction from url is the
most important problem... I tried most of these steps, but still not able to put
them all together...
>
>Thank you so much for your time.
>Srecko
>
>
>
>
>
>
>
>
>On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink111 at yahoo.com>
wrote:
>
>Hi Srecko,
>>No problem.
>>
>>Arun
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Thursday, August 29, 2013 3:19 PM
>>
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>This is great Arun, thank you again.
>>
>>I was thinking to use sqldf and issue query for each module-action
combination, but this is much better. Since I have table with categories
(module, action, category), I could create vector "levels" based on
the first two columns and vector "labels" based on the category column
and that should to the work...
>>
>>Best,
>>Srecko
>>
>>
>>
>>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink111 at yahoo.com>
wrote:
>>
>>Hi Srecko,
>>>
>>>You didn't mention the order in which the letters are assigned.?
If you need a different order, just change the order in the
",levels=c(....),".
>>>Arun
>>>
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: arun <smartpink111 at yahoo.com>
>>>To: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>Cc: R help <r-help at r-project.org>
>>>
>>>Sent: Thursday, August 29, 2013 3:13 PM
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Hi,
>>>You could try this:
>>>dat1<- read.table(text="
>>>id? module??? event?????? time?????????????????????? time_on_task
>>>1?? sys???????? login???????? 1373502892?????????? 80
>>>2?? task??????? add????????? 1373502892?????????? 80
>>>3?? task??????? add????????? 1373502972?????????? 23
>>>4?? sys???????? login???????? 1373502892?????????? 80
>>>5?? list???????? delete?????? 1373502995????????? 901
>>>6?? list????????? view???????? 1373503896????????? 100
>>>7?? task??????? add????????? 1373503996?????????? NA
>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>?dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))
>>>
>>>
>>>dat1
>>>#? id module? event?????? time time_on_task Categ
>>>#1? 1??? sys? login 1373502892?????????? 80???? B
>>>#2? 2?? task??? add 1373502892?????????? 80???? A
>>>#3? 3?? task??? add 1373502972?????????? 23???? A
>>>#4? 4??? sys? login 1373502892?????????? 80???? B
>>>#5? 5?? list delete 1373502995????????? 901???? C
>>>#6? 6?? list?? view 1373503896????????? 100???? D
>>>#7? 7?? task??? add 1373503996?????????? NA???? A
>>>A.K.
>>>
>>>________________________________
>>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <R-help at r-project.org>
>>>Sent: Thursday, August 29, 2013 2:34 PM
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Hi Arun,
>>>
>>>There is one more question... you explained me how to
use?split(dat1,cumsum(dat1$action=="login")) in one of previous
questions, and that is great.
>>>Now, if I have something like this:
>>>
>>>id ?module ? ?event?????? time ? ? ? ? ? ? ? ? ? ? ? time_on_task
>>>1 ? sys ? ? ? ? login ? ? ? ? 1373502892?????????? 80
>>>2 ? task ? ? ? ?add ? ? ? ? ?1373502892?????????? 80
>>>
>>>3 ? task ? ? ? ?add ? ? ? ? ?1373502972?????????? 23
>>>4 ? sys ? ? ? ? login ? ? ? ? 1373502892?????????? 80
>>>5 ? list ? ? ? ? delete ? ? ? 1373502995????????? 901
>>>6 ? list ? ? ? ? ?view ? ? ? ? 1373503896????????? 100
>>>7 ? task ? ? ? ?add ? ? ? ? ?1373503996?????????? NA
>>>I know how to split at each "login" occurrence, and I know
how to add new column with time differences. But, how to add new column
"category" which will be calculated based on columns
"module" and "even"? For example if module=task and
event=add => category= A...
>>>
>>>Srecko
>>>
>>>
>>>
>>>
>>>
>>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink111 at
yahoo.com> wrote:
>>>
>>>Hi Srecko,
>>>>No problem.
>>>>Regards,
>>>>Arun
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>>To: arun <smartpink111 at yahoo.com>
>>>>Sent: Thursday, August 29, 2013 2:22 PM
>>>>
>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>
>>>>
>>>>
>>>>Sorry... I should figure it out...
>>>>
>>>>thanks so much!
>>>>Srecko
>>>>
>>>>
>>>>
>>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink111 at
yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>The one you showed is:
>>>>>
>>>>>dat1$time_on_task<- c(diff(dat1$time),NA)
>>>>>
>>>>>?dat1
>>>>>#? id? event?????? time time_on_task
>>>>>#1? 1??? add 1373502892?????????? 80
>>>>>
>>>>>#2? 2??? add 1373502972?????????? 23
>>>>>#3? 3 delete 1373502995????????? 901
>>>>>#4? 4?? view 1373503896????????? 100
>>>>>#5? 5??? add 1373503996?????????? NA
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>________________________________
>>>>>From: srecko joksimovic <sreckojoksimovic at
gmail.com>
>>>>>
>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>Cc: R help <r-help at r-project.org>
>>>>>Sent: Thursday, August 29, 2013 2:15 PM
>>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Thanks Arun,
>>>>>
>>>>>this is great. However, it should be just a little bit
different:
>>>>>
>>>>>#? id? event?????? time time_on_task
>>>>>#1? 1??? add 1373502892 ? ? ? ? ? 80
>>>>>#2? 2??? add 1373502972 ? ? ? ? ? 23
>>>>>#3? 3 delete 1373502995 ? ? ? ? ? 901
>>>>>#4? 4?? view 1373503896 ? ? ? ? ?100
>>>>>#5? 5??? add 1373503996 ? ? ? ? ?NA
>>>>>
>>>>>
>>>>>When I calculate difference, I need to know how long each
activity was. It is id2-id1 for the first activity...
>>>>>
>>>>>
>>>>>
>>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink111 at
yahoo.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>>Hi,
>>>>>>Try:
>>>>>>dat1<- read.table(text="
>>>>>>id??? event??? time
>>>>>>
>>>>>>1??? add????? 1373502892
>>>>>>2??? add????? 1373502972
>>>>>>3??? delete? 1373502995
>>>>>>4??? view????? 1373503896
>>>>>>5??? add????? 1373503996
>>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>>>?dat1$time_on_task<- c(NA,diff(dat1$time))
>>>>>>?dat1
>>>>>>#? id? event?????? time time_on_task
>>>>>>#1? 1??? add 1373502892?????????? NA
>>>>>>#2? 2??? add 1373502972?????????? 80
>>>>>>#3? 3 delete 1373502995?????????? 23
>>>>>>#4? 4?? view 1373503896????????? 901
>>>>>>#5? 5??? add 1373503996????????? 100
>>>>>>
>>>>>>#Not sure whether this depends on the values of
"event" or not..
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: srecko joksimovic <sreckojoksimovic at
gmail.com>
>>>>>>To: R help <R-help at r-project.org>
>>>>>>Cc:
>>>>>>Sent: Thursday, August 29, 2013 1:52 PM
>>>>>>Subject: [R] Add new calculated column to data frame
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I have a following data set:
>>>>>>id? ? event? ? time (in sec)
>>>>>>1? ? ?add? ? ? 1373502892
>>>>>>2? ? ?add? ? ? 1373502972
>>>>>>3? ? ?delete? ?1373502995
>>>>>>4? ? ?view? ? ? 1373503896
>>>>>>5? ? ?add? ? ? ?1373503996
>>>>>>...
>>>>>>
>>>>>>I'd like to add new column "time on task"
which is time elapsed between two
>>>>>>events (id2 - id1...). What would be the best approach
to do that?
>>>>>>
>>>>>>Thanks,
>>>>>>Srecko
>>>>>>
>>>>>>??? [[alternative HTML version deleted]]
>>>>>>
>>>>>>______________________________________________
>>>>>>R-help at r-project.org mailing list
>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>>and provide commented, minimal, self-contained,
reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>