thr3ads.net - R help - [R] sample consecutive integers efficiently [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Chris Oldmeadow

2008-Aug-28 06:05 UTC

[R] sample consecutive integers efficiently

Hi all,

I have some rough code to sample consecutive integers with length 
according to a vector of lengths

#sample space (representing positions)
pos<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

#sample lengths
lengths<-c(2,3,2)

 From these two vectors I need a vector of sampled positions.


the sampling is without replacement, making things tough as the sampled 
integers need to be consecutive. Im hoping somebody knows a faster way 
of doing it than I have. ATM its way to slow on large vectors.


samplePos<-function(l){
    start.pos<-sample(pos,1)
    end.pos<-start.pos+l-1
    posies<-start.pos:end.pos
    posies
}

s.start<-c()


newPos<-function(a){
    rp<-samplePos(a)
    #test sampled range is consecutive, if not resample
    if (length(rp) != rp[a]+1 -rp[1]){rp<-samplePos(a)}
    pos<-setdiff(pos,rp)
    rp[1]
}

newps<-c()
newps<-unlist(lapply(lengths,newPos))

I think the bottleneck may be on the setdiff() function - the sample 
space is quite large so I dont think there would be too many rejections.



Many thanks,
Chris

Charles C. Berry

2008-Aug-28 18:12 UTC

head link

[R] sample consecutive integers efficiently

On Thu, 28 Aug 2008, Chris Oldmeadow wrote:
> Hi all,
>
> I have some rough code to sample consecutive integers with length according
> to a vector of lengths
>
> #sample space (representing positions)
> pos<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
>
> #sample lengths
> lengths<-c(2,3,2)
>
> From these two vectors I need a vector of sampled positions.
>
>
> the sampling is without replacement, making things tough as the sampled 
> integers need to be consecutive. Im hoping somebody knows a faster way of 
> doing it than I have. ATM its way to slow on large vectors.
>
>
> samplePos<-function(l){
>    start.pos<-sample(pos,1)
>    end.pos<-start.pos+l-1
>    posies<-start.pos:end.pos
>    posies
> }
>
> s.start<-c()
>
>
> newPos<-function(a){
>    rp<-samplePos(a)
>    #test sampled range is consecutive, if not resample
>    if (length(rp) != rp[a]+1 -rp[1]){rp<-samplePos(a)}
>    pos<-setdiff(pos,rp)
>    rp[1]
> }
>
> newps<-c()
> newps<-unlist(lapply(lengths,newPos))
>
> I think the bottleneck may be on the setdiff() function - the sample space
is
> quite large so I dont think there would be too many rejections.

The bottleneck is in the formulation of the sampling scheme.

This is a simple combinatorics problem. There are 3360 possible values ( 
prod(16:14) ) for the start positions of the three elements, and you can 
form a bijection between 1:3360 and the individual samples. If the number 
of possible sample is small enough, it would be most efficient to sample 
from the corresponding integer vector and then translate it to the 
corresponding sample. For larger values where the number of possible 
samples become a challenge for 32-bit integer arithmetic, I expect this 
approach would be preferred:

Permute length ( pos ) - sum ( lengths ) + length( lengths ) distinct 
(consecutively labelled) elements:

 	elz <- sample( length ( pos ) - sum ( lengths ) + length( lengths ) )


Take the lengths of the original objects to be

 	z.lens <- rep( 1, length( elz ) )
 	z.lens[ seq(along = lengths ) ] <- lengths

(i.e. objects longer than 1 appear first)

Determine the start positions of the objects as if they were laid down 
consecutively according to the permutation:

 	start <- head( cumsum( c(0, z.lens[ elz ]) ) + 1 , -1 )

Find the start positions of just those with lengths greater than 1

 	gt.1 <- match( seq(along=lengths) , elz )

Report the start positions

 	start[ gt.1 ]

---

If length( pos ) is large, you can rewrite the above to simply sample 
the positions (in the ordering) of the objects with lengths greater than 
1. You will have to revise the calculation of start and gt.1 in that case.

HTH,

Chuck


  >>
>
> Many thanks,
> Chris
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Chris Oldmeadow

2008-Aug-28 22:35 UTC

head link

[R] sample consecutive integers efficiently

Charles C. Berry wrote:> On Thu, 28 Aug 2008, Chris Oldmeadow wrote:
>
>> Hi all,
>>
>> I have some rough code to sample consecutive integers with length 
>> according to a vector of lengths
>>
>> #sample space (representing positions)
>> pos<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
>>
>> #sample lengths
>> lengths<-c(2,3,2)
>>
>> From these two vectors I need a vector of sampled positions.
>>
>>
>> the sampling is without replacement, making things tough as the 
>> sampled integers need to be consecutive. Im hoping somebody knows a 
>> faster way of doing it than I have. ATM its way to slow on large 
>> vectors.
>>
>>
>> samplePos<-function(l){
>>    start.pos<-sample(pos,1)
>>    end.pos<-start.pos+l-1
>>    posies<-start.pos:end.pos
>>    posies
>> }
>>
>> s.start<-c()
>>
>>
>> newPos<-function(a){
>>    rp<-samplePos(a)
>>    #test sampled range is consecutive, if not resample
>>    if (length(rp) != rp[a]+1 -rp[1]){rp<-samplePos(a)}
>>    pos<-setdiff(pos,rp)
>>    rp[1]
>> }
>>
>> newps<-c()
>> newps<-unlist(lapply(lengths,newPos))
>>
>> I think the bottleneck may be on the setdiff() function - the sample 
>> space is quite large so I dont think there would be too many
rejections.
>
>
> The bottleneck is in the formulation of the sampling scheme.
>
> This is a simple combinatorics problem. There are 3360 possible values 
> ( prod(16:14) ) for the start positions of the three elements, and you 
> can form a bijection between 1:3360 and the individual samples. If the 
> number of possible sample is small enough, it would be most efficient 
> to sample from the corresponding integer vector and then translate it 
> to the corresponding sample. For larger values where the number of 
> possible samples become a challenge for 32-bit integer arithmetic, I 
> expect this approach would be preferred:
>
> Permute length ( pos ) - sum ( lengths ) + length( lengths ) distinct 
> (consecutively labelled) elements:
>
>     elz <- sample( length ( pos ) - sum ( lengths ) + length( lengths )
)
>
>
> Take the lengths of the original objects to be
>
>     z.lens <- rep( 1, length( elz ) )
>     z.lens[ seq(along = lengths ) ] <- lengths
>
> (i.e. objects longer than 1 appear first)
>
> Determine the start positions of the objects as if they were laid down 
> consecutively according to the permutation:
>
>     start <- head( cumsum( c(0, z.lens[ elz ]) ) + 1 , -1 )
>
> Find the start positions of just those with lengths greater than 1
>
>     gt.1 <- match( seq(along=lengths) , elz )
>
> Report the start positions
>
>     start[ gt.1 ]
>
> ---
>
> If length( pos ) is large, you can rewrite the above to simply sample 
> the positions (in the ordering) of the objects with lengths greater 
> than 1. You will have to revise the calculation of start and gt.1 in 
> that case.
>
> HTH,
>
> Chuck
>
>
>  >
>>
>>
>> Many thanks,
>> Chris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                             Dept of Family/Preventive 
> Medicine
> E mailto:cberry at tajo.ucsd.edu                UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
> 92093-0901
>
>
>
Thanks! that worked perfectly.

Chris

Possibly Parallel Threads

Search for more maybe matching threads

R help - Aug 2008 - sample consecutive integers efficiently

[R] sample consecutive integers efficiently

[R] sample consecutive integers efficiently

[R] sample consecutive integers efficiently

Possibly Parallel Threads