On 04/09/2010 08:52 AM, Peter Danenberg wrote:> In principle, I'd like to be able to do something like this:
>
> sge.parLapply(seq(10), function(x) parLapply(seq(x), function(x) x^2))
I'm not sure that's such a good principle! It seems like it would be
hard to think about the tasks that are being executed, how many
processes there are, how load balancing works, etc. What about starting
with some complicated data structure that requires processing
work <- lapply(seq(10), function(x) as.list(seq(x)))
Then making a flat list of tasks that need to be done
idx0 <- rep(seq_along(work), sapply(work, length))
idx1 <- unlist(lapply(work, seq_along))
tasks <- mapply(c, idx0, idx1, SIMPLIFY=oFALSE)
and actually do the work in an easily parallelizable lapply
answers <- lapply(tasks, function(t, w) w[[ t ]]^2, work)
(the idea here is that work[[ c(3, 2) ]] selects the third element of
the outer list, and then the second element of that element). You could
transform the result back into the original form with
result <- work
for (t in seq_along(tasks))
result[[ tasks[[t]] ]] <- answers[[t]]
Martin
>
> In practice, however, I have to resort to acrobatics like this:
>
> sge.options(sge.remove.files=FALSE)
> sge.options(sge.qsub.options='-cwd -V')
> sge.parLapply(seq(10),
> function(x) {
> sge.options(sge.save.global=TRUE)
> sge.options(sge.remove.files=FALSE)
> sge.parLapply(seq(x),
> function(x) x^2,
> cluster=TRUE,
> debug=FALSE,
> trace=FALSE,
> file.prefix='Rsge_data',
> global.savelist=NULL,
> packages=NULL)
> },
> function.savelist=c('sge.parLapply',
'sge.parParApply',
> 'sge.options', 'sge.taskPrep'),
> global.savelist=c('sge.parParApply',
'sge.globalPrep',
> 'global.savelist', 'sge.taskPrep',
'sge.checkNotNow',
> 'sge.get.jobid', 'sge.get.result',
'docall',
> 'enquote'),
> packages=NULL)
>
> and I still get bizarre behavior: half of the results will be NULL,
> for instance; the other half, incomplete.
>
> Would non-trivial changes to Rsge be required to make something like
> this possible?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793