Hi all,
I am studying R's sample() function's behaviour when we are sampling
from a large vector with weights. For instance,
the following code measures the time it takes to do a weighted Bootstrap
sampling.
>logfile <- file("/home/jonqli/nortel/logfile.txt",
"at")
>cat("\n", file=logfile)
>cat(date(), file=logfile)
>UPPER <- 1e5
>tmp <- sample(UPPER, replace=T, prob=1:UPPER)
>cat(date(), file=logfile)
Here is a table for the time it takes in a Pentium III with 750 M
memory running Debian Linux.
UPPER = 1e6 TIME = 24600 seconds.
UPPER = 1e5 TIME = 218 seconds
UPPER = 1e4 TIME = 1~2 seconds
Clearly this is an algorithm with complexity of O(n^2). When I use
sample(UPPER) instead (no weights),
the complexity seems to go up in O(n).
UPPER = 1e6 TIME < 1 second
UPPER = 1e7 TIME = 7 seconds
UPPER = 1e8 TIME = 95 seconds
(UPPER = 1e9 didn't run because of memory allocation error).
Am I right in these observations? If I am right, are there ways to
speed up the weighted bootstrap
sampling to O(nlogn), for instance?
Thanks in advance!
--
Jonathan Q. Li, PhD
Agilent Technologies Laboratory
Palo Alto, California, USA
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._