Thanks, I was not aware of order(). I did deliberately mess up the order of S. The following example breaks your solution dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), D=c(5,3,1,3,2,4)) which should give the answer c(2,2,1,1,2,3) Your solution does indicate that sorting the data correctly before starting might solve the problem. On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:> Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > the job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: > > Given a dataframe: > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > D=c(1,5,3,2,3,4)) > > > > where S is a subject identifier and D a visit (actually a date in my > > real dataset). I would like to generate another column giving the visit > > number > > > > R=c(2,1,1,1,2,3) > > > > My current solution uses nested loops and is slow and ugly. I've looked > > at by() but can't see how to keep the order of R correct. > > > > Thanks, > > Tom > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >
A useful technique when it is easy to compute a vector from an ordered data.frame but you need to do it for an unordered one is to compute the order vector 'ord', compute the vector from df[ord,], and use df[ord,...] <- vector to reorder the vector. In your case you could do: > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), + D=c(5,3,1,3,2,4)) > ord <- with(dat_2, order(S, D)) # order by subject, break ties by date > dat_2$visitNo <- integer(nrow(dat_2)) # will fill this in next > dat_2$visitNo[ord] <- with(dat_2[ord,], ave(visitNo, S, FUN=seq_along)) > dat_2 S D visitNo 1 a 5 2 2 c 3 2 3 a 1 1 4 b 3 1 5 c 2 1 6 c 4 3 Now this is different from your answer, c(2,2,1,1,2,3). Which is correct? You can also do the reordering of the result from the ordered dataset by subscripting the right hand side with [order(ord)], but I find using [ord] on left side easier to remember. with(dat_2[ord,], ave(visitNo, S, FUN=seq_along))[order(ord)] [1] 2 2 1 1 1 3 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Feb 4, 2015 at 12:07 PM, Tom Wright <tom at maladmin.com> wrote:> Thanks, I was not aware of order(). > I did deliberately mess up the order of S. The following example breaks > your solution > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), > D=c(5,3,1,3,2,4)) > > which should give the answer c(2,2,1,1,2,3) > > Your solution does indicate that sorting the data correctly before > starting might solve the problem. > > > On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote: > > Hello, > > > > Aren't the levels of your example wrong? If the levels are > > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > > the job. > > > > unname(unlist(tapply(dat$D, dat$S, order))) > > > > > > Hope this helps, > > > > Rui Barradas > > > > Em 04-02-2015 19:34, Tom Wright escreveu: > > > Given a dataframe: > > > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > > D=c(1,5,3,2,3,4)) > > > > > > where S is a subject identifier and D a visit (actually a date in my > > > real dataset). I would like to generate another column giving the visit > > > number > > > > > > R=c(2,1,1,1,2,3) > > > > > > My current solution uses nested loops and is slow and ugly. I've looked > > > at by() but can't see how to keep the order of R correct. > > > > > > Thanks, > > > Tom > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
How about?> ave(dat$D, dat$S, FUN=order)[1] 2 1 1 1 2 3> ave(dat_2$D, dat_2$S, FUN=order)[1] 2 2 1 1 1 3 Note, your answer for the second example is incorrect since row 2 (c, 3) and row 5 (c, 2) are both assigned 2. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tom Wright Sent: Wednesday, February 4, 2015 2:08 PM To: Rui Barradas Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Still trying to avoid loops Thanks, I was not aware of order(). I did deliberately mess up the order of S. The following example breaks your solution dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), D=c(5,3,1,3,2,4)) which should give the answer c(2,2,1,1,2,3) Your solution does indicate that sorting the data correctly before starting might solve the problem. On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:> Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > the job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: > > Given a dataframe: > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > D=c(1,5,3,2,3,4)) > > > > where S is a subject identifier and D a visit (actually a date in my > > real dataset). I would like to generate another column giving the visit > > number > > > > R=c(2,1,1,1,2,3) > > > > My current solution uses nested loops and is slow and ugly. I've looked > > at by() but can't see how to keep the order of R correct. > > > > Thanks, > > Tom > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
A potential problem with ave(dat_2$D, dat_2$S, FUN=order) is that it will silently give the wrong answer or give an error if dat_2$D is not numeric. E.g., if D is a Date vector we get > dat_3 <- dat_2[,1:2] > dat_3$D <- as.Date(paste0("2015-02-", dat_2$D)) > with(dat_3, ave(D, S, FUN=order)) Error in as.Date.numeric(value) : 'origin' must be supplied Another problem is that it may take a lot more time than is required if you have a lot of small groups in your data. Both of those are avoided if you sort the entire dataset first and 'unsort' the results when putting them into dataset. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Feb 4, 2015 at 12:53 PM, David L Carlson <dcarlson at tamu.edu> wrote:> How about? > > > ave(dat$D, dat$S, FUN=order) > [1] 2 1 1 1 2 3 > > ave(dat_2$D, dat_2$S, FUN=order) > [1] 2 2 1 1 1 3 > > Note, your answer for the second example is incorrect since row 2 (c, 3) > and row 5 (c, 2) are both assigned 2. > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tom Wright > Sent: Wednesday, February 4, 2015 2:08 PM > To: Rui Barradas > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Still trying to avoid loops > > Thanks, I was not aware of order(). > I did deliberately mess up the order of S. The following example breaks > your solution > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), > D=c(5,3,1,3,2,4)) > > which should give the answer c(2,2,1,1,2,3) > > Your solution does indicate that sorting the data correctly before > starting might solve the problem. > > > On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote: > > Hello, > > > > Aren't the levels of your example wrong? If the levels are > > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > > the job. > > > > unname(unlist(tapply(dat$D, dat$S, order))) > > > > > > Hope this helps, > > > > Rui Barradas > > > > Em 04-02-2015 19:34, Tom Wright escreveu: > > > Given a dataframe: > > > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > > D=c(1,5,3,2,3,4)) > > > > > > where S is a subject identifier and D a visit (actually a date in my > > > real dataset). I would like to generate another column giving the visit > > > number > > > > > > R=c(2,1,1,1,2,3) > > > > > > My current solution uses nested loops and is slow and ugly. I've looked > > > at by() but can't see how to keep the order of R correct. > > > > > > Thanks, > > > Tom > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Of course you are correct the second answer should be c(2,2,1,1,1,3) Thanks everyone. On Wed, 2015-02-04 at 20:53 +0000, David L Carlson wrote:> How about? > > > ave(dat$D, dat$S, FUN=order) > [1] 2 1 1 1 2 3 > > ave(dat_2$D, dat_2$S, FUN=order) > [1] 2 2 1 1 1 3 > > Note, your answer for the second example is incorrect since row 2 (c, 3) and row 5 (c, 2) are both assigned 2. > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tom Wright > Sent: Wednesday, February 4, 2015 2:08 PM > To: Rui Barradas > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Still trying to avoid loops > > Thanks, I was not aware of order(). > I did deliberately mess up the order of S. The following example breaks > your solution > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), > D=c(5,3,1,3,2,4)) > > which should give the answer c(2,2,1,1,2,3) > > Your solution does indicate that sorting the data correctly before > starting might solve the problem. > > > On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote: > > Hello, > > > > Aren't the levels of your example wrong? If the levels are > > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > > the job. > > > > unname(unlist(tapply(dat$D, dat$S, order))) > > > > > > Hope this helps, > > > > Rui Barradas > > > > Em 04-02-2015 19:34, Tom Wright escreveu: > > > Given a dataframe: > > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > > D=c(1,5,3,2,3,4)) > > > > > > where S is a subject identifier and D a visit (actually a date in my > > > real dataset). I would like to generate another column giving the visit > > > number > > > > > > R=c(2,1,1,1,2,3) > > > > > > My current solution uses nested loops and is slow and ugly. I've looked > > > at by() but can't see how to keep the order of R correct. > > > > > > Thanks, > > > Tom > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.