Seeliger.Curt at epamail.epa.gov
2011-Apr-19 19:52 UTC
[R] doSMP package works better than perfect, at least sometimes.
Some might have noticed that REvolution Computing released the doSMP
package to the general public about a month and a half ago, which allows
multiple cores to be accessed for parallel computation in R. Some of our
physical habitat calculations were taking an extraordinary amount of time
to complete and required over-weekend runs, which prompted our interest in
this package. What follows is the results of those tests.
In brief, the toy test resulted in speed increase of the calculations to a
plausible degree depending on the number of workers (cores? threads?)
used. Timing of our real world application gave results that were better
than perfect. In fact, they were staggeringly better than perfect. Maybe
someone can suggest why. Also in brief, I'd like to quickly thank
REvolution for providing us with this really great package.
These metrics are based on a for-loop construct that is difficult to
vectorize, so a toy test was developed (code given below) which loops
through simple sqrt() calculations in a way one might find in Burns' third
circle of Hell. Short loops were used to cause thrashing during
processor assigning, and longer ones used to simulate 'harder' or more
time-consuming tasks. The processing time of each set of tasks was
measured for basic unvectorized for() looping, foreach() %do% looping, and
foreach() %dopar% looping, using a 4 core Xenon PC running XP with 3.2 GB
RAM.
Using 3 'workers', the increase in speed due to iteration with the
foreach() %do% construct showed the expected amount of thrashing for
small/easy calculations, with the internal overhead being overcome after
roughly 10,000 total calculations. The increase due to use of SMP
relative to the single-processor iteration showed it to start being worth
while with only 10 groups, regardless of the group size.
Speedup of foreach() %do% construct relative to basic for():
n g= 1 10 100 1000
10 0.6000000 0.6250000 0.900000 1.010499
100 0.7230769 0.9333333 1.180000 1.231752
1000 0.7968750 1.8987730 3.564801 2.078614
10000 2.1724356 10.4700474 8.002192 NA
Speedup of foreach() %dopar% construct relative to foreach() %do%
construct:
n g= 1 10 100 1000
10 0.09803922 1.142857 1.875000 2.164773
100 0.94202899 1.363636 2.702703 2.689359
1000 0.81012658 1.429825 2.951413 2.602386
10000 0.87239919 1.182743 1.548661 NA
Using 7 'workers', the increase in speed due to iteration with the
foreach() %do% construct was not as close to the results with three
'workers' as expected, though thrashing was still evident when the
number
of calculations were small. The increase due to using multiple cores
maxed out around 5.5, below the theoretically perfect 7x speedup but not
consistently high for all conditions. I'm not sure if this is system
noise, or if some other constraint is influencing the results.
Speedup of foreach() %do% construct relative to basic for():
n g= 1 10 100 1000
10 0.400000 1.1111111 0.9210526 1.037190
100 0.650000 0.8831169 1.1215881 1.199677
1000 0.768116 1.7843360 3.5691298 2.051362
10000 1.981686 8.8194254 8.2673038 NA
Speedup of foreach() %dopar% construct relative to foreach() %do%
construct:
n g= 1 10 100 1000
10 0.8333333 1.285714 4.222222 3.751938
100 0.9523810 1.452830 5.302632 5.516474
1000 0.9409091 1.284257 3.123677 3.848393
10000 0.8640463 1.073046 1.609020 NA
The real world test was to time our residual pool calculations for about
1200 channels (80-150 depths recorded in each) on the same machine using 7
'workers'. This had previously taken 32 hours and 2 minutes, judging by
the timestamp of the intermediate files created during calculation. With
doSMP the calculations took 7 minutes and the results were identical.
Nothing in the toy tests would have indicated we'd see these calculations
sped up by a factor of 275. Since 275 is much larger than 7, this is due
to more than just making unused cores available and I suspect it's due to
internal compilation. A quick check of the docs does not support this
conjecture. Does anyone have a better explanation?
Thanks for your input,
cur
ps - Thanks to Revolution for releasing this package. They occasionally
get kicked for their closed-source addon to R, but it's clear that their
releases of packages like doSMP and foreach are important contributions to
the community.
###### Toy test code follows:######
# Toy SMP
memory.limit(3000)
require(doSMP)
require(reshape2)
getDoParWorkers()
w<- startWorkers(workerCount=3)
registerDoSMP(w)
timeSMP <- function(g, n)
# g = number of groups to process
# n = size of each group.
{
for(rep in 1:3) {
times <- NULL
dd <- data.frame(k=rep(1:g, n), x=runif(g*n))
ddSplit <- split(dd, dd$k)
tt<-system.time({
dd2 <- foreach(e=names(ddSplit), .combine=rbind) %dopar% { # SMP
elem <- ddSplit[[e]]
for (i in 1:nrow(elem)) {
elem$y[i] <- sqrt(elem$x[i])
}
elem
}
})
times <- rbind(times,
as.data.frame(cbind(t(tt),g=g,n=n,method='SMPVectorized')))
tt<-system.time({
dd3 <- foreach(e=names(ddSplit), .combine=rbind) %do% { # Single
core
elem <- ddSplit[[e]]
for (i in 1:nrow(elem)) {
elem$y[i] <- sqrt(elem$x[i])
}
elem
}
})
times <- rbind(times,
as.data.frame(cbind(t(tt),g=g,n=n,method='1CoreVectorized')))
dd4<-NULL
tt<-system.time({ # loop through list elements
for (e in names(ddSplit)) {
elem <- ddSplit[[e]]
for (i in 1:nrow(elem)) {
elem$y[i] <- sqrt(elem$x[i])
}
dd4 <- rbind(dd4, elem)
}
})
times <- rbind(times,
as.data.frame(cbind(t(tt),g=g,n=n,method='unvectorized')))
write.table(times, file='c:/r/dosmpTest.csv', append=TRUE,
row.names=FALSE, sep=',')
} # end of repetition loop
}
summarizeTimes <- function(fname)
# Summarize timing results and display them.
{
# read in results, format columns and make methods more 'variable-name
friendly'.
times <- read.csv(fname, stringsAsFactors=FALSE)
times <- subset(times, user.self != 'user.self',
select=-c(user.child,sys.child))
times$user.self <- as.numeric(times$user.self)
times$sys.self <- as.numeric(times$sys.self)
times$elapsed <- as.numeric(times$elapsed)
times$g <- as.numeric(times$g)
times$n <- as.numeric(times$n)
# Summarize
stats <- merge(aggregate(list(meanElapsed=times$elapsed)
,list(g=times$g, n=times$n, method=times$method)
,mean, na.rm=TRUE
)
,aggregate(list(meanSelf=times$user.self)
,list(g=times$g, n=times$n, method=times$method)
,mean, na.rm=TRUE
)
,by=c('g','n','method')
)
# transpose to wide
mm <- melt(stats, id=c('g','n','method'))
tstats <- dcast(mm, g + n ~ variable+method)
tstats$speedup.elapsed1 <- tstats$meanElapsed_unvectorized /
tstats$meanElapsed_1CoreVectorized
tstats$speedup.elapsed3 <- tstats$meanElapsed_1CoreVectorized /
tstats$meanElapsed_SMPVectorized
speedupVectorizing <-
dcast(tstats[c('g','n','speedup.elapsed1')], g~n,
value_var='speedup.elapsed1')
speedupSMP <-
dcast(tstats[c('g','n','speedup.elapsed3')], g~n,
value_var='speedup.elapsed3')
return(list(vectoring=speedupVectorizing, smp=speedupSMP))
}
timeSMP(10,1) # make it thrash as much as possible
timeSMP(100,1)
timeSMP(1000,1)
timeSMP(10000,1)
#timeSMP(100000,1) # too much memory
#timeSMP(1000000,1) # too much memory
timeSMP(10,10)
timeSMP(100,10)
timeSMP(1000,10)
timeSMP(10000,10)
timeSMP(10,100)
timeSMP(100,100)
timeSMP(1000,100)
timeSMP(10000,100)
timeSMP(10,1000)
timeSMP(100,1000)
timeSMP(1000,1000)
timeSMP(10000,1000)
timeSMP(10,10000)
timeSMP(100,10000)
# The following take up too much memory, even with a 3GB memory limit.
#timeSMP(1000,5000)
#timeSMP(5000,100)
#timeSMP(5000,1000)
#timeSMP(5000,5000)
summarizeTimes('c:/r/dosmpTest.csv')
--
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.curt@epa.gov
541/754-4638
[[alternative HTML version deleted]]
Maybe Matching Threads
- How to use doSMP(revoIPC) with R 2.15.x version
- Why was the ‘doSMP’ package removed from CRAN?
- Ubuntu Maverick and revoIPC/doSMP
- Can't load "doSMP" from REvolutionR in regular R2.11.0
- [PATCH] virtio-net: Reporting traffic queue distribution statistics through ethtool
