thr3ads.net - R devel - [Rd] algorithm reference for sample() [Sep 2004]

If this information is useful, please help other people find it:
Share via:

Vadim Ogranovich

2004-Sep-24 02:49 UTC

[Rd] algorithm reference for sample()

Hi,
 
Don't know if it belongs to r-devel or r-help, but since I am planning
to alter some of R's internal code I am sending it here.
 
The existing implementation of the sample() function, when the optional
'prob' argument is given, is quite inefficient. The complexity is
O(sampleSize * universeSize), see ProbSampleReplace() and
ProbSampleNoReplace() in random.c. This makes the function impractical
for the vector sizes I use.  I want to re-code these functions and I
"think" I can come up with a more efficient algorithm. However before
I
go and reinvent the wheel I wonder if there is a published description
of an efficient sampling algorithm with user-specified probabilities?
 
Thanks,
Vadim

	[[alternative HTML version deleted]]

Martin Maechler

2004-Sep-24 10:02 UTC

head link

[Rd] algorithm reference for sample()

Hi Vadim,
>>>>> "Vadim" == Vadim Ogranovich
<vograno@evafunds.com>
>>>>>     on Thu, 23 Sep 2004 17:48:45 -0700 writes:
    Vadim> Hi, Don't know if it belongs to r-devel or r-help,
    Vadim> but since I am planning to alter some of R's internal code

i.e., you will propose a patch to the R sources eventually ?

    Vadim>  I am sending it here.

good choice.  Also, since you are talking about
internal (non-API) C code from R - which I would deem
inappropriate on R-help.

    Vadim> The existing implementation of the sample() function,
    Vadim> when the optional 'prob' argument is given, is quite
    Vadim> inefficient. The complexity is O(sampleSize *
    Vadim> universeSize), see ProbSampleReplace() and
    Vadim> ProbSampleNoReplace() in random.c. This makes the
    Vadim> function impractical for the vector sizes I use. 

I'm interested: What problem are you solving where sample() is
the bottleneck (rather than what you *do* with the sample ..)

    Vadim> I want to re-code these functions and I "think" I can
    Vadim> come up with a more efficient algorithm.

I agree. It's a kind of table lookup, that definitely can be
made faster e.g. by bisection ideas.

    Vadim> However before I go and reinvent the wheel I wonder if there
    Vadim> is a published description of an efficient sampling
    Vadim> algorithm with user-specified probabilities?

I've got some ideas, but maybe would first want to get a reply to the
current ones.

    Vadim> Thanks, Vadim

    Vadim> 	[[alternative HTML version deleted]]
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(you *did* read the posting guide or just the general
 instructions on http://www.R-project.org/mail.html  ?
)

Martin Maechler

Maybe Matching Threads

Search for more reasonably related threads

R devel - Sep 2004 - algorithm reference for sample()

[Rd] algorithm reference for sample()

[Rd] algorithm reference for sample()

Maybe Matching Threads