thr3ads.net - R help - [R] Add sequence numbers to lines with the same ID: How can this be accomplished? [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Rolf Turner

2015-Oct-24 23:02 UTC

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

On 25/10/15 11:28, John Sorkin wrote:> I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add a
number that gives the sequence number of the repeated ID number. The R code
below demonstrates what I want to have, without any attempt to produce the
result, as I have no idea how to accomplish my goal.
>
>
> line <- c(1,2,3,4,5,6,7,8,9,10)
> ID<-    c(1,1,2,3,4,5,6,7,8,8)
> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain
ID 8")
> cbind(line,ID)
> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
> cat("Sequence numbers within ID added to the data")
> cbind(line,ID,Seq)
I *think* that

   unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

Bert Gunter

2015-Oct-24 23:33 UTC

head link

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better. But I think
it works. Here's an example with code:
> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
> w
 [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3> d <- 0+duplicated(w)
> for(x in unique(w)){+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }> d [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at auckland.ac.nz>
wrote:> On 25/10/15 11:28, John Sorkin wrote:
>>
>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can
>> appear in more than one row. For each row with a repeated ID, I want to
add
>> a number that gives the sequence number of the repeated ID number. The
R
>> code below demonstrates what I want to have, without any attempt to
produce
>> the result, as I have no idea how to accomplish my goal.
>>
>>
>> line <- c(1,2,3,4,5,6,7,8,9,10)
>> ID<-    c(1,1,2,3,4,5,6,7,8,8)
>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both
contain ID
>> 8")
>> cbind(line,ID)
>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>> cat("Sequence numbers within ID added to the data")
>> cbind(line,ID,Seq)
>
>
> I *think* that
>
>   unlist(lapply(rle(ID)$lengths,seq_len))
>
> gives what you want.  At least it does for the given example.
>
> cheers,
>
> Rolf Turner
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry

2015-Oct-25 04:05 UTC

head link

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

On Sat, 24 Oct 2015, Bert Gunter wrote:
> Rolf's solution works for the situation where all duplicated values
> are contiguous, which may be what you need. However, I wondered how it
> could be done if this were not the case. Below is an answer. It is not
> as efficient or elegant as Rolf's solution for the contiguous case I
> think; maybe someone will come up with something better.
The often underappreciated `ave' comes to mind. viz.,

 	ave(w,w,FUN=seq_along)
and
 	ave(ID,ID,FUN=seq_along)

agree with the results below.

Of course, ave(...) is just split/unsplit in guise, further our discussion 
of a month or two back.

Best,

Chuck
> But I think
> it works. Here's an example with code:
>
>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
>> w
> [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
>> d <- 0+duplicated(w)
>> for(x in unique(w)){
> +   i <- w==x
> +   d[i]<-1+ cumsum(d[i])
> +
> + }
>> d
> [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3
>
> As always, corrections and/or improvements welcome.
>
> Cheers,
> Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
>
>
> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at
auckland.ac.nz> wrote:
>> On 25/10/15 11:28, John Sorkin wrote:
>>>
>>> I have a file that has (1) Line numbers, (2) IDs. A given ID number
can
>>> appear in more than one row. For each row with a repeated ID, I
want to add
>>> a number that gives the sequence number of the repeated ID number.
The R
>>> code below demonstrates what I want to have, without any attempt to
produce
>>> the result, as I have no idea how to accomplish my goal.
>>>
>>>
>>> line <- c(1,2,3,4,5,6,7,8,9,10)
>>> ID<-    c(1,1,2,3,4,5,6,7,8,8)
>>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both
contain ID
>>> 8")
>>> cbind(line,ID)
>>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>>> cat("Sequence numbers within ID added to the data")
>>> cbind(line,ID,Seq)
>>
>>
>> I *think* that
>>
>>   unlist(lapply(rle(ID)$lengths,seq_len))
>>
>> gives what you want.  At least it does for the given example.
>>
>> cheers,
>>
>> Rolf Turner
>>
>> --
>> Technical Editor ANZJS
>> Department of Statistics
>> University of Auckland
>> Phone: +64-9-373-7599 ext. 88276
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                 Dept of Family Medicine & Public Health
cberry at ucsd edu               UC San Diego / La Jolla, CA 92093-0901
http://famprevmed.ucsd.edu/faculty/cberry/

Rolf Turner

2015-Oct-25 04:05 UTC

head link

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

On 25/10/15 12:33, Bert Gunter wrote:
> Rolf's solution works for the situation where all duplicated values
> are contiguous, which may be what you need. However, I wondered how it
> could be done if this were not the case. Below is an answer. It is not
> as efficient or elegant as Rolf's solution for the contiguous case I
> think; maybe someone will come up with something better. But I think
> it works. Here's an example with code:
>
>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
>> w
>   [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
>> d <- 0+duplicated(w)
>> for(x in unique(w)){
> +   i <- w==x
> +   d[i]<-1+ cumsum(d[i])
> +
> + }
>> d
>   [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3
>
> As always, corrections and/or improvements welcome.
How about:

o <- order(w)
d <- unlist(lapply(rle(w[o])$lengths,seq_len))[order(o)]

Works for the given example. :-)

cheers,

Rolf

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at
auckland.ac.nz> wrote:
>> On 25/10/15 11:28, John Sorkin wrote:
>>>
>>> I have a file that has (1) Line numbers, (2) IDs. A given ID number
can
>>> appear in more than one row. For each row with a repeated ID, I
want to add
>>> a number that gives the sequence number of the repeated ID number.
The R
>>> code below demonstrates what I want to have, without any attempt to
produce
>>> the result, as I have no idea how to accomplish my goal.
>>>
>>>
>>> line <- c(1,2,3,4,5,6,7,8,9,10)
>>> ID<-    c(1,1,2,3,4,5,6,7,8,8)
>>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both
contain ID
>>> 8")
>>> cbind(line,ID)
>>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>>> cat("Sequence numbers within ID added to the data")
>>> cbind(line,ID,Seq)
>>
>>
>> I *think* that
>>
>>    unlist(lapply(rle(ID)$lengths,seq_len))
>>
>> gives what you want.  At least it does for the given example.

R help - Oct 2015 - Add sequence numbers to lines with the same ID: How can this be accomplished?

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

[R] Add sequence numbers to lines with the same ID: How can this be accomplished?