Rolf Turner
2015-Oct-24 23:02 UTC
[R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On 25/10/15 11:28, John Sorkin wrote:> I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. > > > line <- c(1,2,3,4,5,6,7,8,9,10) > ID<- c(1,1,2,3,4,5,6,7,8,8) > cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") > cbind(line,ID) > Seq <- c(1,2,1,1,1,1,1,1,1,2) > cat("Sequence numbers within ID added to the data") > cbind(line,ID,Seq)I *think* that unlist(lapply(rle(ID)$lengths,seq_len)) gives what you want. At least it does for the given example. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Bert Gunter
2015-Oct-24 23:33 UTC
[R] Add sequence numbers to lines with the same ID: How can this be accomplished?
Rolf's solution works for the situation where all duplicated values are contiguous, which may be what you need. However, I wondered how it could be done if this were not the case. Below is an answer. It is not as efficient or elegant as Rolf's solution for the contiguous case I think; maybe someone will come up with something better. But I think it works. Here's an example with code:> w <- c(1:5,3,1,2,7,8,5,5,5,2,3) > w[1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3> d <- 0+duplicated(w) > for(x in unique(w)){+ i <- w==x + d[i]<-1+ cumsum(d[i]) + + }> d[1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 As always, corrections and/or improvements welcome. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:> On 25/10/15 11:28, John Sorkin wrote: >> >> I have a file that has (1) Line numbers, (2) IDs. A given ID number can >> appear in more than one row. For each row with a repeated ID, I want to add >> a number that gives the sequence number of the repeated ID number. The R >> code below demonstrates what I want to have, without any attempt to produce >> the result, as I have no idea how to accomplish my goal. >> >> >> line <- c(1,2,3,4,5,6,7,8,9,10) >> ID<- c(1,1,2,3,4,5,6,7,8,8) >> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID >> 8") >> cbind(line,ID) >> Seq <- c(1,2,1,1,1,1,1,1,1,2) >> cat("Sequence numbers within ID added to the data") >> cbind(line,ID,Seq) > > > I *think* that > > unlist(lapply(rle(ID)$lengths,seq_len)) > > gives what you want. At least it does for the given example. > > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry
2015-Oct-25 04:05 UTC
[R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On Sat, 24 Oct 2015, Bert Gunter wrote:> Rolf's solution works for the situation where all duplicated values > are contiguous, which may be what you need. However, I wondered how it > could be done if this were not the case. Below is an answer. It is not > as efficient or elegant as Rolf's solution for the contiguous case I > think; maybe someone will come up with something better.The often underappreciated `ave' comes to mind. viz., ave(w,w,FUN=seq_along) and ave(ID,ID,FUN=seq_along) agree with the results below. Of course, ave(...) is just split/unsplit in guise, further our discussion of a month or two back. Best, Chuck> But I think > it works. Here's an example with code: > >> w <- c(1:5,3,1,2,7,8,5,5,5,2,3) >> w > [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 >> d <- 0+duplicated(w) >> for(x in unique(w)){ > + i <- w==x > + d[i]<-1+ cumsum(d[i]) > + > + } >> d > [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 > > As always, corrections and/or improvements welcome. > > Cheers, > Bert > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: >> On 25/10/15 11:28, John Sorkin wrote: >>> >>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can >>> appear in more than one row. For each row with a repeated ID, I want to add >>> a number that gives the sequence number of the repeated ID number. The R >>> code below demonstrates what I want to have, without any attempt to produce >>> the result, as I have no idea how to accomplish my goal. >>> >>> >>> line <- c(1,2,3,4,5,6,7,8,9,10) >>> ID<- c(1,1,2,3,4,5,6,7,8,8) >>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID >>> 8") >>> cbind(line,ID) >>> Seq <- c(1,2,1,1,1,1,1,1,1,2) >>> cat("Sequence numbers within ID added to the data") >>> cbind(line,ID,Seq) >> >> >> I *think* that >> >> unlist(lapply(rle(ID)$lengths,seq_len)) >> >> gives what you want. At least it does for the given example. >> >> cheers, >> >> Rolf Turner >> >> -- >> Technical Editor ANZJS >> Department of Statistics >> University of Auckland >> Phone: +64-9-373-7599 ext. 88276 >> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry Dept of Family Medicine & Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://famprevmed.ucsd.edu/faculty/cberry/
Rolf Turner
2015-Oct-25 04:05 UTC
[R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On 25/10/15 12:33, Bert Gunter wrote:> Rolf's solution works for the situation where all duplicated values > are contiguous, which may be what you need. However, I wondered how it > could be done if this were not the case. Below is an answer. It is not > as efficient or elegant as Rolf's solution for the contiguous case I > think; maybe someone will come up with something better. But I think > it works. Here's an example with code: > >> w <- c(1:5,3,1,2,7,8,5,5,5,2,3) >> w > [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 >> d <- 0+duplicated(w) >> for(x in unique(w)){ > + i <- w==x > + d[i]<-1+ cumsum(d[i]) > + > + } >> d > [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 > > As always, corrections and/or improvements welcome.How about: o <- order(w) d <- unlist(lapply(rle(w[o])$lengths,seq_len))[order(o)] Works for the given example. :-) cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: >> On 25/10/15 11:28, John Sorkin wrote: >>> >>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can >>> appear in more than one row. For each row with a repeated ID, I want to add >>> a number that gives the sequence number of the repeated ID number. The R >>> code below demonstrates what I want to have, without any attempt to produce >>> the result, as I have no idea how to accomplish my goal. >>> >>> >>> line <- c(1,2,3,4,5,6,7,8,9,10) >>> ID<- c(1,1,2,3,4,5,6,7,8,8) >>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID >>> 8") >>> cbind(line,ID) >>> Seq <- c(1,2,1,1,1,1,1,1,1,2) >>> cat("Sequence numbers within ID added to the data") >>> cbind(line,ID,Seq) >> >> >> I *think* that >> >> unlist(lapply(rle(ID)$lengths,seq_len)) >> >> gives what you want. At least it does for the given example.