Seeliger.Curt at epamail.epa.gov
2011-Apr-19 19:52 UTC
[R] doSMP package works better than perfect, at least sometimes.
Some might have noticed that REvolution Computing released the doSMP package to the general public about a month and a half ago, which allows multiple cores to be accessed for parallel computation in R. Some of our physical habitat calculations were taking an extraordinary amount of time to complete and required over-weekend runs, which prompted our interest in this package. What follows is the results of those tests. In brief, the toy test resulted in speed increase of the calculations to a plausible degree depending on the number of workers (cores? threads?) used. Timing of our real world application gave results that were better than perfect. In fact, they were staggeringly better than perfect. Maybe someone can suggest why. Also in brief, I'd like to quickly thank REvolution for providing us with this really great package. These metrics are based on a for-loop construct that is difficult to vectorize, so a toy test was developed (code given below) which loops through simple sqrt() calculations in a way one might find in Burns' third circle of Hell. Short loops were used to cause thrashing during processor assigning, and longer ones used to simulate 'harder' or more time-consuming tasks. The processing time of each set of tasks was measured for basic unvectorized for() looping, foreach() %do% looping, and foreach() %dopar% looping, using a 4 core Xenon PC running XP with 3.2 GB RAM. Using 3 'workers', the increase in speed due to iteration with the foreach() %do% construct showed the expected amount of thrashing for small/easy calculations, with the internal overhead being overcome after roughly 10,000 total calculations. The increase due to use of SMP relative to the single-processor iteration showed it to start being worth while with only 10 groups, regardless of the group size. Speedup of foreach() %do% construct relative to basic for(): n g= 1 10 100 1000 10 0.6000000 0.6250000 0.900000 1.010499 100 0.7230769 0.9333333 1.180000 1.231752 1000 0.7968750 1.8987730 3.564801 2.078614 10000 2.1724356 10.4700474 8.002192 NA Speedup of foreach() %dopar% construct relative to foreach() %do% construct: n g= 1 10 100 1000 10 0.09803922 1.142857 1.875000 2.164773 100 0.94202899 1.363636 2.702703 2.689359 1000 0.81012658 1.429825 2.951413 2.602386 10000 0.87239919 1.182743 1.548661 NA Using 7 'workers', the increase in speed due to iteration with the foreach() %do% construct was not as close to the results with three 'workers' as expected, though thrashing was still evident when the number of calculations were small. The increase due to using multiple cores maxed out around 5.5, below the theoretically perfect 7x speedup but not consistently high for all conditions. I'm not sure if this is system noise, or if some other constraint is influencing the results. Speedup of foreach() %do% construct relative to basic for(): n g= 1 10 100 1000 10 0.400000 1.1111111 0.9210526 1.037190 100 0.650000 0.8831169 1.1215881 1.199677 1000 0.768116 1.7843360 3.5691298 2.051362 10000 1.981686 8.8194254 8.2673038 NA Speedup of foreach() %dopar% construct relative to foreach() %do% construct: n g= 1 10 100 1000 10 0.8333333 1.285714 4.222222 3.751938 100 0.9523810 1.452830 5.302632 5.516474 1000 0.9409091 1.284257 3.123677 3.848393 10000 0.8640463 1.073046 1.609020 NA The real world test was to time our residual pool calculations for about 1200 channels (80-150 depths recorded in each) on the same machine using 7 'workers'. This had previously taken 32 hours and 2 minutes, judging by the timestamp of the intermediate files created during calculation. With doSMP the calculations took 7 minutes and the results were identical. Nothing in the toy tests would have indicated we'd see these calculations sped up by a factor of 275. Since 275 is much larger than 7, this is due to more than just making unused cores available and I suspect it's due to internal compilation. A quick check of the docs does not support this conjecture. Does anyone have a better explanation? Thanks for your input, cur ps - Thanks to Revolution for releasing this package. They occasionally get kicked for their closed-source addon to R, but it's clear that their releases of packages like doSMP and foreach are important contributions to the community. ###### Toy test code follows:###### # Toy SMP memory.limit(3000) require(doSMP) require(reshape2) getDoParWorkers() w<- startWorkers(workerCount=3) registerDoSMP(w) timeSMP <- function(g, n) # g = number of groups to process # n = size of each group. { for(rep in 1:3) { times <- NULL dd <- data.frame(k=rep(1:g, n), x=runif(g*n)) ddSplit <- split(dd, dd$k) tt<-system.time({ dd2 <- foreach(e=names(ddSplit), .combine=rbind) %dopar% { # SMP elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } elem } }) times <- rbind(times, as.data.frame(cbind(t(tt),g=g,n=n,method='SMPVectorized'))) tt<-system.time({ dd3 <- foreach(e=names(ddSplit), .combine=rbind) %do% { # Single core elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } elem } }) times <- rbind(times, as.data.frame(cbind(t(tt),g=g,n=n,method='1CoreVectorized'))) dd4<-NULL tt<-system.time({ # loop through list elements for (e in names(ddSplit)) { elem <- ddSplit[[e]] for (i in 1:nrow(elem)) { elem$y[i] <- sqrt(elem$x[i]) } dd4 <- rbind(dd4, elem) } }) times <- rbind(times, as.data.frame(cbind(t(tt),g=g,n=n,method='unvectorized'))) write.table(times, file='c:/r/dosmpTest.csv', append=TRUE, row.names=FALSE, sep=',') } # end of repetition loop } summarizeTimes <- function(fname) # Summarize timing results and display them. { # read in results, format columns and make methods more 'variable-name friendly'. times <- read.csv(fname, stringsAsFactors=FALSE) times <- subset(times, user.self != 'user.self', select=-c(user.child,sys.child)) times$user.self <- as.numeric(times$user.self) times$sys.self <- as.numeric(times$sys.self) times$elapsed <- as.numeric(times$elapsed) times$g <- as.numeric(times$g) times$n <- as.numeric(times$n) # Summarize stats <- merge(aggregate(list(meanElapsed=times$elapsed) ,list(g=times$g, n=times$n, method=times$method) ,mean, na.rm=TRUE ) ,aggregate(list(meanSelf=times$user.self) ,list(g=times$g, n=times$n, method=times$method) ,mean, na.rm=TRUE ) ,by=c('g','n','method') ) # transpose to wide mm <- melt(stats, id=c('g','n','method')) tstats <- dcast(mm, g + n ~ variable+method) tstats$speedup.elapsed1 <- tstats$meanElapsed_unvectorized / tstats$meanElapsed_1CoreVectorized tstats$speedup.elapsed3 <- tstats$meanElapsed_1CoreVectorized / tstats$meanElapsed_SMPVectorized speedupVectorizing <- dcast(tstats[c('g','n','speedup.elapsed1')], g~n, value_var='speedup.elapsed1') speedupSMP <- dcast(tstats[c('g','n','speedup.elapsed3')], g~n, value_var='speedup.elapsed3') return(list(vectoring=speedupVectorizing, smp=speedupSMP)) } timeSMP(10,1) # make it thrash as much as possible timeSMP(100,1) timeSMP(1000,1) timeSMP(10000,1) #timeSMP(100000,1) # too much memory #timeSMP(1000000,1) # too much memory timeSMP(10,10) timeSMP(100,10) timeSMP(1000,10) timeSMP(10000,10) timeSMP(10,100) timeSMP(100,100) timeSMP(1000,100) timeSMP(10000,100) timeSMP(10,1000) timeSMP(100,1000) timeSMP(1000,1000) timeSMP(10000,1000) timeSMP(10,10000) timeSMP(100,10000) # The following take up too much memory, even with a 3GB memory limit. #timeSMP(1000,5000) #timeSMP(5000,100) #timeSMP(5000,1000) #timeSMP(5000,5000) summarizeTimes('c:/r/dosmpTest.csv') -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.curt@epa.gov 541/754-4638 [[alternative HTML version deleted]]
Possibly Parallel Threads
- How to use doSMP(revoIPC) with R 2.15.x version
- Why was the ‘doSMP’ package removed from CRAN?
- Ubuntu Maverick and revoIPC/doSMP
- Can't load "doSMP" from REvolutionR in regular R2.11.0
- [PATCH] virtio-net: Reporting traffic queue distribution statistics through ethtool