Given a dataframe:
dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')),
D=c(5,1,3,2,3,4))
where S is a subject identifier and D a visit (actually a date in my
real dataset). I would like to generate another column giving the visit
number
R=c(2,1,1,1,2,3)
My current solution uses nested loops and is slow and ugly. I've looked
at by() but can't see how to keep the order of R correct.
Thanks,
Tom
Hello,
Aren't the levels of your example wrong? If the levels are
levels=c('a','b','c'), not c('b', 'a',
'c'), then the following will do
the job.
unname(unlist(tapply(dat$D, dat$S, order)))
Hope this helps,
Rui Barradas
Em 04-02-2015 19:34, Tom Wright escreveu:> Given a dataframe:
>
dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')),
> D=c(5,1,3,2,3,4))
>
> where S is a subject identifier and D a visit (actually a date in my
> real dataset). I would like to generate another column giving the visit
> number
>
> R=c(2,1,1,1,2,3)
>
> My current solution uses nested loops and is slow and ugly. I've looked
> at by() but can't see how to keep the order of R correct.
>
> Thanks,
> Tom
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
tapply() (of which by() is essentially a wrapper) **is** a (disguised) loop (at the R level, of course). Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Feb 4, 2015 at 11:49 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do the > job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: >> >> Given a dataframe: >> >> dat<-data.frame(S=factor(c(rep('a',2),rep('b',1),rep('c',3)),levels=c('b','a','c')), >> D=c(5,1,3,2,3,4)) >> >> where S is a subject identifier and D a visit (actually a date in my >> real dataset). I would like to generate another column giving the visit >> number >> >> R=c(2,1,1,1,2,3) >> >> My current solution uses nested loops and is slow and ugly. I've looked >> at by() but can't see how to keep the order of R correct. >> >> Thanks, >> Tom >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks, I was not aware of order().
I did deliberately mess up the order of S. The following example breaks
your solution
dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
D=c(5,3,1,3,2,4))
which should give the answer c(2,2,1,1,2,3)
Your solution does indicate that sorting the data correctly before
starting might solve the problem.
On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:> Hello,
>
> Aren't the levels of your example wrong? If the levels are
> levels=c('a','b','c'), not c('b',
'a', 'c'), then the following will do
> the job.
>
> unname(unlist(tapply(dat$D, dat$S, order)))
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 04-02-2015 19:34, Tom Wright escreveu:
> > Given a dataframe:
> >
dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')),
> > D=c(1,5,3,2,3,4))
> >
> > where S is a subject identifier and D a visit (actually a date in my
> > real dataset). I would like to generate another column giving the
visit
> > number
> >
> > R=c(2,1,1,1,2,3)
> >
> > My current solution uses nested loops and is slow and ugly. I've
looked
> > at by() but can't see how to keep the order of R correct.
> >
> > Thanks,
> > Tom
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >