Given a dataframe: dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')), D=c(5,1,3,2,3,4)) where S is a subject identifier and D a visit (actually a date in my real dataset). I would like to generate another column giving the visit number R=c(2,1,1,1,2,3) My current solution uses nested loops and is slow and ugly. I've looked at by() but can't see how to keep the order of R correct. Thanks, Tom
Hello, Aren't the levels of your example wrong? If the levels are levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do the job. unname(unlist(tapply(dat$D, dat$S, order))) Hope this helps, Rui Barradas Em 04-02-2015 19:34, Tom Wright escreveu:> Given a dataframe: > dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')), > D=c(5,1,3,2,3,4)) > > where S is a subject identifier and D a visit (actually a date in my > real dataset). I would like to generate another column giving the visit > number > > R=c(2,1,1,1,2,3) > > My current solution uses nested loops and is slow and ugly. I've looked > at by() but can't see how to keep the order of R correct. > > Thanks, > Tom > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
tapply() (of which by() is essentially a wrapper) **is** a (disguised) loop (at the R level, of course). Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Feb 4, 2015 at 11:49 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do the > job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: >> >> Given a dataframe: >> >> dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')), >> D=c(5,1,3,2,3,4)) >> >> where S is a subject identifier and D a visit (actually a date in my >> real dataset). I would like to generate another column giving the visit >> number >> >> R=c(2,1,1,1,2,3) >> >> My current solution uses nested loops and is slow and ugly. I've looked >> at by() but can't see how to keep the order of R correct. >> >> Thanks, >> Tom >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks, I was not aware of order(). I did deliberately mess up the order of S. The following example breaks your solution dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), D=c(5,3,1,3,2,4)) which should give the answer c(2,2,1,1,2,3) Your solution does indicate that sorting the data correctly before starting might solve the problem. On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:> Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > the job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: > > Given a dataframe: > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > D=c(1,5,3,2,3,4)) > > > > where S is a subject identifier and D a visit (actually a date in my > > real dataset). I would like to generate another column giving the visit > > number > > > > R=c(2,1,1,1,2,3) > > > > My current solution uses nested loops and is slow and ugly. I've looked > > at by() but can't see how to keep the order of R correct. > > > > Thanks, > > Tom > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >