Dear R-help list, I've got a data set in long format - each subject can have several (varying in number) measurements, with each record representing one measurement. I want to assign a sequence number to each measurement, starting at 1 for a person's first measurement. I can do this with the by function, but there must be an easier way. Here's my code - id is id number, age is the age of the person, and seq is the sequence variable that I've created. Thanks very much for the help. david freedman, atlanta ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L, 16L, 16L, 16L, 19L, 32L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 79L, 79L, 80L, 80L, 80L, 80L, 85L, 86L, 96L, 96L, 96L, 103L, 103L, 106L, 106L, 106L, 106L, 106L, 106L, 106L, 140L, 140L, 144L, 144L, 144L, 144L, 144L, 144L, 144L, 146L, 146L, 146L, 146L, 160L, 160L, 160L, 160L, 160L, 160L, 164L, 164L, 176L, 176L, 176L, 176L, 176L, 176L, 176L, 176L, 181L, 190L, 192L, 192L, 192L, 192L, 192L, 192L, 197L, 197L, 197L, 224L, 224L, 224L, 229L, 232L, 232L, 232L, 232L, 232L, 232L, 232L, 249L, 249L), age = c(6.6054794521, 9.301369863, 22.638356164, 31.961670089, 17.15890411, 25.106091718, 8.197260274, 11.295890411, 14.191780822, 22.43394935, 28.6, 6.6794520548, 10.824657534, 10.479452055, 13.432876712, 15.408219178, 17.643835616, 19.268493151, 22.624657534, 26.139726027, 35.493497604, 37.6, 15.895890411, 23.351129363, 13.810958904, 16.783561644, 17.95890411, 22.430136986, 12.021902806, 14.904859685, 7.4219178082, 10.060273973, 15.802739726, 17.328767123, 31.028062971, 8.3945205479, 10.350684932, 13.783561644, 17.843835616, 21.816438356, 27.901437372, 34.3, 10.517808219, 18.18630137, 11.378082192, 14.794520548, 16.77260274, 23.101369863, 27.912328767, 34.316221766, 40.2, 8.6054794521, 11.561643836, 14.863013699, 17.835616438, 8.0219178082, 9, 9.9726027397, 10.690410959, 13.032876712, 30.138261465, 7.0602739726, 10.438356164, 8.9232876712, 9.9589041096, 10.915068493, 12.263013699, 14.257534247, 17.326027397, 18.454794521, 21.334246575, 45.190965092, 8.5643835616, 12.197260274, 15.405479452, 17.106849315, 27.843835616, 34.417522245, 39.9, 6.7890410959, 10.21369863, 15.857534247, 10.147945205, 13.473972603, 36.06844627, 17.331506849, 14.980821918, 15.939726027, 16.939726027, 17.619178082, 18.698630137, 37.084188912, 43.3, 7.7068493151, 10.726027397))) head(ds,10) x=with(ds,by(ds,list(id),FUN=function(dc)1:length(dc$age))); x[1:20]; ds$seq=unlist(x); head(ds,20) -- View this message in context: http://www.nabble.com/sequence-number-for-%27long-format%27-tp23338043p23338043.html Sent from the R help mailing list archive at Nabble.com.
Try this: ds$seq <- ave(ds$id, ds$id, FUN = seq_along) On Fri, May 1, 2009 at 2:52 PM, David Freedman <3.14david at gmail.com> wrote:> > Dear R-help list, > > I've got a data set in long format - each subject can have several (varying > in number) measurements, with each record representing one measurement. ?I > want to assign a sequence number to each measurement, starting at 1 for a > person's first measurement. ?I can do this with the by function, but there > must be an easier way. > > Here's my code - id is id number, age is the age of the person, and seq is > the sequence variable that I've created. ?Thanks very much for the help. > > david freedman, atlanta > > ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L, 16L, > 16L, 16L, 19L, 32L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, > 79L, 79L, 80L, 80L, 80L, 80L, 85L, 86L, 96L, 96L, 96L, 103L, > 103L, 106L, 106L, 106L, 106L, 106L, 106L, 106L, 140L, 140L, 144L, > 144L, 144L, 144L, 144L, 144L, 144L, 146L, 146L, 146L, 146L, 160L, > 160L, 160L, 160L, 160L, 160L, 164L, 164L, 176L, 176L, 176L, 176L, > 176L, 176L, 176L, 176L, 181L, 190L, 192L, 192L, 192L, 192L, 192L, > 192L, 197L, 197L, 197L, 224L, 224L, 224L, 229L, 232L, 232L, 232L, > 232L, 232L, 232L, 232L, 249L, 249L), age = c(6.6054794521, 9.301369863, > 22.638356164, 31.961670089, 17.15890411, 25.106091718, 8.197260274, > 11.295890411, 14.191780822, 22.43394935, 28.6, 6.6794520548, > 10.824657534, 10.479452055, 13.432876712, 15.408219178, 17.643835616, > 19.268493151, 22.624657534, 26.139726027, 35.493497604, 37.6, > 15.895890411, 23.351129363, 13.810958904, 16.783561644, 17.95890411, > 22.430136986, 12.021902806, 14.904859685, 7.4219178082, 10.060273973, > 15.802739726, 17.328767123, 31.028062971, 8.3945205479, 10.350684932, > 13.783561644, 17.843835616, 21.816438356, 27.901437372, 34.3, > 10.517808219, 18.18630137, 11.378082192, 14.794520548, 16.77260274, > 23.101369863, 27.912328767, 34.316221766, 40.2, 8.6054794521, > 11.561643836, 14.863013699, 17.835616438, 8.0219178082, 9, 9.9726027397, > 10.690410959, 13.032876712, 30.138261465, 7.0602739726, 10.438356164, > 8.9232876712, 9.9589041096, 10.915068493, 12.263013699, 14.257534247, > 17.326027397, 18.454794521, 21.334246575, 45.190965092, 8.5643835616, > 12.197260274, 15.405479452, 17.106849315, 27.843835616, 34.417522245, > 39.9, 6.7890410959, 10.21369863, 15.857534247, 10.147945205, > 13.473972603, 36.06844627, 17.331506849, 14.980821918, 15.939726027, > 16.939726027, 17.619178082, 18.698630137, 37.084188912, 43.3, > 7.7068493151, 10.726027397))) > > head(ds,10) > x=with(ds,by(ds,list(id),FUN=function(dc)1:length(dc$age))); x[1:20]; > ds$seq=unlist(x); head(ds,20) > -- > View this message in context: http://www.nabble.com/sequence-number-for-%27long-format%27-tp23338043p23338043.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of David Freedman > Sent: Friday, May 01, 2009 11:52 AM > To: r-help at r-project.org > Subject: [R] sequence number for 'long format' > > > Dear R-help list, > > I've got a data set in long format - each subject can have > several (varying > in number) measurements, with each record representing one > measurement. I > want to assign a sequence number to each measurement, > starting at 1 for a > person's first measurement. I can do this with the by > function, but there > must be an easier way. > > Here's my code - id is id number, age is the age of the > person, and seq is > the sequence variable that I've created. Thanks very much > for the help. > > david freedman, atlanta > > ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L, 16L, > 16L, 16L, 19L, 32L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, > 79L, 79L, 80L, 80L, 80L, 80L, 85L, 86L, 96L, 96L, 96L, 103L, > 103L, 106L, 106L, 106L, 106L, 106L, 106L, 106L, 140L, 140L, 144L, > 144L, 144L, 144L, 144L, 144L, 144L, 146L, 146L, 146L, 146L, 160L, > 160L, 160L, 160L, 160L, 160L, 164L, 164L, 176L, 176L, 176L, 176L, > 176L, 176L, 176L, 176L, 181L, 190L, 192L, 192L, 192L, 192L, 192L, > 192L, 197L, 197L, 197L, 224L, 224L, 224L, 229L, 232L, 232L, 232L, > 232L, 232L, 232L, 232L, 249L, 249L), age = c(6.6054794521, > 9.301369863, > 22.638356164, 31.961670089, 17.15890411, 25.106091718, 8.197260274, > 11.295890411, 14.191780822, 22.43394935, 28.6, 6.6794520548, > 10.824657534, 10.479452055, 13.432876712, 15.408219178, 17.643835616, > 19.268493151, 22.624657534, 26.139726027, 35.493497604, 37.6, > 15.895890411, 23.351129363, 13.810958904, 16.783561644, 17.95890411, > 22.430136986, 12.021902806, 14.904859685, 7.4219178082, 10.060273973, > 15.802739726, 17.328767123, 31.028062971, 8.3945205479, 10.350684932, > 13.783561644, 17.843835616, 21.816438356, 27.901437372, 34.3, > 10.517808219, 18.18630137, 11.378082192, 14.794520548, 16.77260274, > 23.101369863, 27.912328767, 34.316221766, 40.2, 8.6054794521, > 11.561643836, 14.863013699, 17.835616438, 8.0219178082, 9, > 9.9726027397, > 10.690410959, 13.032876712, 30.138261465, 7.0602739726, 10.438356164, > 8.9232876712, 9.9589041096, 10.915068493, 12.263013699, 14.257534247, > 17.326027397, 18.454794521, 21.334246575, 45.190965092, 8.5643835616, > 12.197260274, 15.405479452, 17.106849315, 27.843835616, 34.417522245, > 39.9, 6.7890410959, 10.21369863, 15.857534247, 10.147945205, > 13.473972603, 36.06844627, 17.331506849, 14.980821918, 15.939726027, > 16.939726027, 17.619178082, 18.698630137, 37.084188912, 43.3, > 7.7068493151, 10.726027397))) > > head(ds,10) > x=with(ds,by(ds,list(id),FUN=function(dc)1:length(dc$age))); x[1:20]; > ds$seq=unlist(x); head(ds,20)If your data is sorted so that identical id values are always contiguous you can replace the by() with sequence(rle(ds$id)$lengths)> View this message in context: > http://www.nabble.com/sequence-number-for-%27long-format%27-tp23338043p23338043.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >