Hallo everybody, I'm wondering whether it might be possible to speed up the following code: Error<-rnorm(10000000, mean=0, sd=0.05) estimate<-(log(1.1)-Error) DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) D<-100000 Delta_ln<-rep(0,D) for(i in 1:D) Delta_ln[i]<-(log(mean(sample(DCF_korrigiert,1000000,replace=TRUE))/(1/0.10))) The calculation of the for-loop takes several hours even on a very quick machine (4GHz, 8 GB RAM Windows 2008 Server 64bit). Has anybody an idea, how to improve the for-line? Thanks for helping me. Hans -- View this message in context: http://r.789695.n4.nabble.com/Speed-up-code-with-for-loop-tp3481680p3481680.html Sent from the R help mailing list archive at Nabble.com.
Hans, You could parallelize it with the multicore package. The only other thing I can think of is to use calls to .Internal(). But be vigilant, as this might not be good advice. ?.Internal warns that only true R wizards should even consider using the function. First, an example with .Internal() calls, later mutlicore. For me, the following reduces elapsed time by about 9% on Windows 7 and by about 20% on today's new Ubuntu Natty. ## Set number of replicates n <- 10000 ## Your example set.seed(1) time.one <- Sys.time() Error<-rnorm(n, mean=0, sd=0.05) estimate<-(log(1.1)-Error) DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) D<-n Delta_ln<-rep(0,D) for(i in 1:D) Delta_ln[i]<-(log(mean(sample(DCF_korrigiert,D,replace=TRUE))/(1/0.10))) time.one <- Sys.time() - time.one ## A few modifications with .Internal() set.seed(1) time.two <- Sys.time() Error <- rnorm(n, mean = 0, sd = 0.05) estimate <- (log(1.1) - Error) DCF_korrigiert <- (1 / (exp(1 / (exp(0.5 * (-estimate)^2 / (0.05^2)) * sqrt( 2* pi / (0.05^2)) * (1 - pnorm(0,((-estimate) / (0.05^2)), sqrt(1 / (0.05^2))))))-1)) D <- n Delta_ln2 <- numeric(length = D) Delta_ln2 <- vapply(Delta_ln2, function(x) { log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1) time.two <- Sys.time() - time.two ## Compare all.equal(Delta_ln, Delta_ln2) time.one time.two as.numeric(time.two) / as.numeric(time.one) Then you could parallelize it with multicore's parallel() function: ## Try multicore require(multicore) set.seed(1) time.three <- Sys.time() Error <- rnorm(n, mean = 0, sd = 0.05) estimate <- (log(1.1) - Error) DCF_korrigiert <- (1 / (exp(1 / (exp(0.5 * (-estimate)^2 / (0.05^2)) * sqrt( 2* pi / (0.05^2)) * (1 - pnorm(0,((-estimate) / (0.05^2)), sqrt(1 / (0.05^2))))))-1)) D <- n/2 Delta_ln3 <- numeric(length = D) Delta_ln3.1 <- parallel(vapply(Delta_ln3, function(x) { log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1), mc.set.seed = T) Delta_ln3.2 <- parallel(vapply(Delta_ln3, function(x) { log(.Internal(mean(DCF_korrigiert[.Internal( sample(D, D, replace = T, prob = NULL))])) / (1 / 0.10)) }, FUN.VALUE = 1), mc.set.seed = T) results <- collect(list(Delta_ln3.1, Delta_ln3.2)) names(results) <- NULL Delta_ln3 <- do.call("append", results) time.three <- Sys.time() - time.three ## Compare # Results won't be equal due to the different way # parallel() handles set.seed() randomization all.equal(Delta_ln, Delta_ln3) time.one time.two time.three as.numeric(time.three) / as.numeric(time.one) Combining parallel() with the .Internal calls reduces the elapsed time by about 70% on Ubuntu Natty. Multicore is not available for Windows, or at least not easily available for Windows. But maybe the true R wizards have better ideas. Jeremy
Barth sent me a very good code and I modified it a bit. Have a look: Error<-rnorm(10000000, mean=0, sd=0.05) estimate<-(log(1+0.10)+Error) DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2 ))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) DCF_verzerrt<-(1/(exp(estimate)-1)) S <- 10000000 # total sample size D <- 10000 # number of subsamples Subset <- 10000 # number in each subsample Select <- matrix(sample(S,D*Subset,replace=TRUE),nrow=Subset,ncol=D) DCF_korrigiert_select <- matrix(DCF_korrigiert[Select],nrow=Subset,ncol=D) Delta_ln <-(log(colMeans(DCF_korrigiert_select, na.rm=T)/(1/0.10))) The only problem I discovered is that R cannot handle more than 2.147.483.647 integers, thus the cells in the matrix are bounded by this condition. (R shows the max by typing: .Machine$integer.max). And if you want to safe the workspace, the file with 10.000 times 10.000 becomes round 2 GB. Compared to the original of "just" 300 MB. So I cannot perform my previous bootstrap with 1.000.000 times 100.000. But nevertheless 10.000 times 10.000 seems to be sufficiently; I have to say its amazing, how fast the idea works. Has anybody a suggestion how to make it work for the 1.000.000 times 100.000 bootstrap??? -- View this message in context: http://r.789695.n4.nabble.com/Speed-up-code-with-for-loop-tp3481680p3484548.html Sent from the R help mailing list archive at Nabble.com.
On 29.04.2011 22:20, hck wrote:> Barth sent me a very good code and I modified it a bit. Have a look: > > Error<-rnorm(10000000, mean=0, sd=0.05) > estimate<-(log(1+0.10)+Error) > > DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2 > ))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) > DCF_verzerrt<-(1/(exp(estimate)-1)) > > S<- 10000000 # total sample size > D<- 10000 # number of subsamples > Subset<- 10000 # number in each subsample > Select<- matrix(sample(S,D*Subset,replace=TRUE),nrow=Subset,ncol=D) > > DCF_korrigiert_select<- matrix(DCF_korrigiert[Select],nrow=Subset,ncol=D) > Delta_ln<-(log(colMeans(DCF_korrigiert_select, na.rm=T)/(1/0.10))) > > > > The only problem I discovered is that R cannot handle more than > 2.147.483.647 integers, thus the cells in the matrix are bounded by this > condition. (R shows the max by typing: .Machine$integer.max). And if you > want to safe the workspace, the file with 10.000 times 10.000 becomes round > 2 GB. Compared to the original of "just" 300 MB. > > So I cannot perform my previous bootstrap with 1.000.000 times 100.000. But > nevertheless 10.000 times 10.000 seems to be sufficiently; I have to say its > amazing, how fast the idea works. > > Has anybody a suggestion how to make it work for the 1.000.000 times 100.000 > bootstrap???Run it in several blocks of matrices with appropriate dimensions? This allows easy parallelization as well. Uwe Ligges> > > -- > View this message in context: http://r.789695.n4.nabble.com/Speed-up-code-with-for-loop-tp3481680p3484548.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.