thr3ads.net - R help - [R] counting sets of consecutive integers in a vector [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Mike Miller

2015-Jan-05 00:03 UTC

[R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers 
after applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the 
first value in every run of consecutive integer values, and (2) the 
corresponding number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and 
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

      [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I suppose that works well enough, but there may be a better way, and 
besides, I wouldn't want to deny anyone here the opportunity to solve a 
fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file 
to extract data.  seek() gives the starting point and readBin(n=X) gives 
the number of bytes to read.  So when there are many consecutive variables 
to be read, I can multiply the X in n=X by that number instead of doing 
many different seek() calls.  (The data are in a transposed format where I 
read in every record for some variable as sequential elements.)  I'm 
probably not the first person to deal with this.

Best,

Mike

-- 
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

Peter Alspach

2015-Jan-05 01:27 UTC

head link

[R] counting sets of consecutive integers in a vector

Tena koe Mike

An alternative, which is slightly fast:

  diffv <- diff(v)
  starts <- c(1, which(diffv!=1)+1)
  cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first
value in every run of consecutive integer values, and (2) the corresponding
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

      [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I suppose that works well enough, but there may be a better way, and besides, I
wouldn't want to deny anyone here the opportunity to solve a fun puzzle. 
;-)

The use for this is that I will be doing repeated seeks of a binary file to
extract data.  seek() gives the starting point and readBin(n=X) gives the number
of bytes to read.  So when there are many consecutive variables to be read, I
can multiply the X in n=X by that number instead of doing many different seek()
calls.  (The data are in a transposed format where I read in every record for
some variable as sequential elements.)  I'm probably not the first person to
deal with this.

Best,

Mike

-- 
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}

jim holtman

2015-Jan-05 01:32 UTC

head link

[R] counting sets of consecutive integers in a vector

Here is another approach:
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
>
> # split by differences != 1
> t(sapply(split(v, cumsum(c(1, diff(v)) != 1)), function(x){+     c(value = x[1L], length = length(x))  # output first value and length
+ }))
  value length
0     1      2
1     5      4
2    25      1
3    30      4



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 8:27 PM, Peter Alspach <
Peter.Alspach at plantandfood.co.nz> wrote:
> Tena koe Mike
>
> An alternative, which is slightly fast:
>
>   diffv <- diff(v)
>   starts <- c(1, which(diffv!=1)+1)
>   cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))
>
> Peter Alspach
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mike
> Miller
> Sent: Monday, 5 January 2015 1:03 p.m.
> To: R-Help List
> Subject: [R] counting sets of consecutive integers in a vector
>
> I have a vector of sorted positive integer values (e.g., postive integers
> after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
> first value in every run of consecutive integer values, and (2) the
> corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>       [,1] [,2]
> [1,]    1    2
> [2,]    5    4
> [3,]   25    1
> [4,]   30    4
>
> I suppose that works well enough, but there may be a better way, and
> besides, I wouldn't want to deny anyone here the opportunity to solve a
fun
> puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file
> to extract data.  seek() gives the starting point and readBin(n=X) gives
> the number of bytes to read.  So when there are many consecutive variables
> to be read, I can multiply the X in n=X by that number instead of doing
> many different seek() calls.  (The data are in a transposed format where I
> read in every record for some variable as sequential elements.)  I'm
> probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be ...{{dropped:14}}
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

jim holtman

2015-Jan-05 01:46 UTC

head link

[R] counting sets of consecutive integers in a vector

Here is a solution using data.table
> require(data.table)
> x <- data.table(v, diff = cumsum(c(1, diff(v)) != 1))
> x     v diff
 1:  1    0
 2:  2    0
 3:  5    1
 4:  6    1
 5:  7    1
 6:  8    1
 7: 25    2
 8: 30    3
 9: 31    3
10: 32    3
11: 33    3> x[, list(value = v[1L], length = .N), key = 'diff']   diff value length
1:    0     1      2
2:    1     5      4
3:    2    25      1
4:    3    30      4> x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with =
FALSE]# get rid of 'diff' column
   value length
1:     1      2
2:     5      4
3:    25      1
4:    30      4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller <mbmiller+l at gmail.com>
wrote:
> I have a vector of sorted positive integer values (e.g., postive integers
> after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
> first value in every run of consecutive integer values, and (2) the
> corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>      [,1] [,2]
> [1,]    1    2
> [2,]    5    4
> [3,]   25    1
> [4,]   30    4
>
> I suppose that works well enough, but there may be a better way, and
> besides, I wouldn't want to deny anyone here the opportunity to solve a
fun
> puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file
> to extract data.  seek() gives the starting point and readBin(n=X) gives
> the number of bytes to read.  So when there are many consecutive variables
> to be read, I can multiply the X in n=X by that number instead of doing
> many different seek() calls.  (The data are in a transposed format where I
> read in every record for some variable as sequential elements.)  I'm
> probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Mike Miller

2015-Jan-05 03:53 UTC

head link

[R] counting sets of consecutive integers in a vector

Thanks, Peter.  Why not cbind your idea for the first column with my idea 
for the second column and get it done in one line?:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v)
)$lengths )
M

      [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I find that pretty appealing and I'll probably stick with it.  It seems 
quite fast.  Here's an example:

# make fairly long vector
v <- sort(unique(round(100000*runif(100000))))
length(v)
[1] 63274

# time the procedure:
ptm <- proc.time() ; M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] ,
rle( v - 1:length(v) )$lengths ) ; proc.time() - ptm
    user  system elapsed
    0.03    0.00    0.03

dim(M)
[1] 23212     2

I probably won't be using vectors any longer than that, and this isn't
the
kind of thing that I do over and over again, so that speed is excellent.

Mike



On Mon, 5 Jan 2015, Peter Alspach wrote:
> Tena koe Mike
>
> An alternative, which is slightly fast:
>
>  diffv <- diff(v)
>  starts <- c(1, which(diffv!=1)+1)
>  cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))
>
> Peter Alspach
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mike
Miller
> Sent: Monday, 5 January 2015 1:03 p.m.
> To: R-Help List
> Subject: [R] counting sets of consecutive integers in a vector
>
> I have a vector of sorted positive integer values (e.g., postive integers
after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
first value in every run of consecutive integer values, and (2) the
corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>      [,1] [,2]
> [1,]    1    2
> [2,]    5    4
> [3,]   25    1
> [4,]   30    4
>
> I suppose that works well enough, but there may be a better way, and
besides, I wouldn't want to deny anyone here the opportunity to solve a fun
puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file to
extract data.  seek() gives the starting point and readBin(n=X) gives the number
of bytes to read.  So when there are many consecutive variables to be read, I
can multiply the X in n=X by that number instead of doing many different seek()
calls.  (The data are in a transposed format where I read in every record for
some variable as sequential elements.)  I'm probably not the first person to
deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be subject to legal
privilege.
> If you are not the intended recipient you must not use, disseminate,
distribute or
> reproduce all or any part of this e-mail or attachments.  If you have
received this
> e-mail in error, please notify the sender and delete all material
pertaining to this
> e-mail.  Any opinion or views expressed in this e-mail are those of the
individual
> sender and may not represent those of The New Zealand Institute for Plant
and
> Food Research Limited.
>

R help - Jan 2015 - counting sets of consecutive integers in a vector

[R] counting sets of consecutive integers in a vector

[R] counting sets of consecutive integers in a vector

[R] counting sets of consecutive integers in a vector

[R] counting sets of consecutive integers in a vector

[R] counting sets of consecutive integers in a vector