On 9/18/2006 3:37 AM, Sean O'Riordain wrote:> Good morning,
> 
> I'm trying to concisely generate a single integer from 0 to n
> inclusive, where n might be of the order of hundreds of millions.
> This will however be used many times during the general procedure, so
> it must be "reasonably efficient" in both memory and time... (at
some
> later stage in the development I hope to go vectorized)
> 
> The examples I've found through searching RSiteSearch() relating to
> generating random integers say to use : sample(0:n, 1)
> However, when n is "large" this first generates a large sequence
0:n
> before taking a sample of one... this computer doesn't have the memory
> for that!
You don't need to give the whole vector:  just give n, and you'll get 
draws from 1:n.  The man page is clear on this.
So what you want is sample(n+1, 1) - 1.  (Use "replace=TRUE" if you
want
a sample bigger than 1, or you'll get sampling without
replacement.)> 
> When I look at the documentation for runif(n, min, max) it states that
> the generated numbers will be min <= x <= max.  Note the "<=
max"...
Actually it says that's the range for the uniform density.  It's silent 
on the range of the output.  But it's good defensive programming to 
assume that it's possible to get the endpoints.
> 
> How do I generate an x such that the probability of being (the
> integer) max is the same as any other integer from min (an integer) to
> max-1 (an integer) inclusive... My attempt is:
> 
> urand.int <- function(n,t) {
>   as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
> }
> # where I've included the parameter n to help testing...
Because of rounding error, t+1-.Machine$double.eps might be exactly 
equal to t+1.  I'd suggest using a rejection method if you need to use 
this approach:  but sample() is better in the cases where as.integer() 
will work.
Duncan Murdoch> 
> is floor() "better" than as.integer?
> 
> Is this correct?  Is the probability of the integer t the same as the
> integer 1 or 0 etc... I have done some rudimentary testing and this
> appears to work, but power being what it is, I can't see how to
> realistically test this hypothesis.
> 
> Or is there a a better way of doing this?
> 
> I'm trying to implement an algorithm which samples into an array,
> hence the need for an integer - and yes I know about sample() thanks!
> :-)
> 
> { incidentally, I was surprised to note that the maximum value
> returned by summary(integer_vector) is "pretty" and appears to be
> rounded up to a "nice round number", and is not necessarily the
same
> as max(integer_vector) where the value is large, i.e. of the order of
> say 50 million }
> 
> Is version etc relevant? (I'll want to be portable)
>> version               _
> platform       i386-pc-mingw32
> arch           i386
> os             mingw32
> system         i386, mingw32
> status
> major          2
> minor          3.1
> year           2006
> month          06
> day            01
> svn rev        38247
> language       R
> version.string Version 2.3.1 (2006-06-01)
> 
> Many thanks in advance for your help.
> Sean O'Riordain
> affiliation <- NULL
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.