Hello, I am running large simulations, which unfortunately I can't really replicate here because the code is so extensive. I rely heavily on mclapply, but I realize that I'm losing data somewhere. There are two worrisome symptoms: 1) I am getting 'NULL' as a return value for some (but not all) elements of the output when I use mclapply, but not if I use lapply > tmp2[1:3] #output from lapply [[1]] 10000076 10000077 24 24 [[2]] 10000076 10000077 119 119 [[3]] 10000076 71 > tmp[1:3] #output from mclapply [[1]] NULL [[2]] NULL [[3]] NULL 2) I am not getting back a list the same length as my input vector I'm parallelizing over. i.e. a command like this: tmp<-mclapply(x, FUN=myfunc, mc.cores=16) gives me back a list tmp which is not the same length as x (and so I'm getting all kinds of errors) This is extremely discouraging, because I've been using mclapply extensively at very many points on simulations that take a very long time to run, and now I'm wondering if what I'm getting is trustworthy. I don't think I could reasonably finish my results without mclapply, but I am thinking to cut it out except where it was absolutely necessary, time-wise. If anyone had any suggestions as to why this might be happening and how I can circumvent it (or test for it happening), I would greatly appreciate it. Thanks, Elizabeth Purdom > sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] multicore_0.1-4 msm_1.0 gtools_2.6.2 graph_1.28.0 Rsamtools_1.2.3 [6] Biostrings_2.18.2 GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9 loaded via a namespace (and not attached): [1] Biobase_2.10.0 biomaRt_2.6.0 BSgenome_1.18.3 DBI_0.2-5 mvtnorm_0.9-96 RCurl_1.5-0 [7] RSQLite_0.9-4 rtracklayer_1.10.6 splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0
Hello, I forgot to mention that I am looping over ~70K objects. If I do mclapply on the first 200, its fine (i.e. doesn't give NULL values); if I go up to 2K (or over all of them), then I start to see NULL values. Also the function I call uses commands 'restrict', 'gaps' and 'width' from the package IRanges in bioconductor in my functions. I don't know what is under the hood with those functions in terms of what calls they make, but could that be a source of a problem? (I saw an earlier post regarding errors when a function used Java code, but I'm not getting an error like they did) Thanks, Elizabeth On 3/22/11 1:13 AM, Elizabeth Purdom wrote:> Hello, > I am running large simulations, which unfortunately I can't really > replicate here because the code is so extensive. I rely heavily on > mclapply, but I realize that I'm losing data somewhere. > > There are two worrisome symptoms: > 1) I am getting 'NULL' as a return value for some (but not all) > elements of the output when I use mclapply, but not if I use lapply > > tmp2[1:3] #output from lapply > [[1]] > 10000076 10000077 > 24 24 > > [[2]] > 10000076 10000077 > 119 119 > > [[3]] > 10000076 > 71 > > > tmp[1:3] #output from mclapply > [[1]] > NULL > > [[2]] > NULL > > [[3]] > NULL > > > 2) I am not getting back a list the same length as my input vector I'm > parallelizing over. i.e. a command like this: > > tmp<-mclapply(x, FUN=myfunc, mc.cores=16) > > gives me back a list tmp which is not the same length as x (and so I'm > getting all kinds of errors) > > This is extremely discouraging, because I've been using mclapply > extensively at very many points on simulations that take a very long > time to run, and now I'm wondering if what I'm getting is trustworthy. > I don't think I could reasonably finish my results without mclapply, > but I am thinking to cut it out except where it was absolutely > necessary, time-wise. If anyone had any suggestions as to why this > might be happening and how I can circumvent it (or test for it > happening), I would greatly appreciate it. > > Thanks, > Elizabeth Purdom > > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] multicore_0.1-4 msm_1.0 gtools_2.6.2 > graph_1.28.0 Rsamtools_1.2.3 > [6] Biostrings_2.18.2 GenomicFeatures_1.2.3 GenomicRanges_1.2.3 > IRanges_1.8.9 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 biomaRt_2.6.0 BSgenome_1.18.3 > DBI_0.2-5 mvtnorm_0.9-96 RCurl_1.5-0 > [7] RSQLite_0.9-4 rtracklayer_1.10.6 splines_2.12.1 > survival_2.36-2 tools_2.12.1 XML_3.2-0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
G'day Elizabeth, For what it's worth, this is what I'd do were I in a position like yours: I would put a condition near the end of myfunc. that responded when there was an indication that NULLs were to be returned into your main list. I'd make an additional list with those bits which would also collect sufficient information to work out which values of x lead to that result. Then you'll be able to see which ones give the problem. Try running mclapply on only those bits and see if they all respond the same way. If they do not, something very strange is happening. But if those still behave the same way, then run with only a single value of x in your call to mclapply. I find the browser() function to be almost indispensable when working out what's causing such problemss but to my knowledge, it won't work when multiple cores are running in parallel. If you use a single value of x, you can go back to using that trusted method. You might also have to set nc.cores to 1, but I don't think so. HTH On Tue, 22-Mar-2011 at 01:13AM -0700, Elizabeth Purdom wrote:> Hello, > I am running large simulations, which unfortunately I can't really > replicate here because the code is so extensive. I rely heavily on > mclapply, but I realize that I'm losing data somewhere. > > There are two worrisome symptoms: > 1) I am getting 'NULL' as a return value for some (but not all) elements > of the output when I use mclapply, but not if I use lapply > > tmp2[1:3] #output from lapply > [[1]] > 10000076 10000077 > 24 24 > > [[2]] > 10000076 10000077 > 119 119 > > [[3]] > 10000076 > 71 > > > tmp[1:3] #output from mclapply > [[1]] > NULL > > [[2]] > NULL > > [[3]] > NULL > > > 2) I am not getting back a list the same length as my input vector I'm > parallelizing over. i.e. a command like this: > > tmp<-mclapply(x, FUN=myfunc, mc.cores=16) > > gives me back a list tmp which is not the same length as x (and so I'm > getting all kinds of errors) > > This is extremely discouraging, because I've been using mclapply > extensively at very many points on simulations that take a very long > time to run, and now I'm wondering if what I'm getting is trustworthy. I > don't think I could reasonably finish my results without mclapply, but I > am thinking to cut it out except where it was absolutely necessary, > time-wise. If anyone had any suggestions as to why this might be > happening and how I can circumvent it (or test for it happening), I > would greatly appreciate it. > > Thanks, > Elizabeth Purdom > > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] multicore_0.1-4 msm_1.0 gtools_2.6.2 > graph_1.28.0 Rsamtools_1.2.3 > [6] Biostrings_2.18.2 GenomicFeatures_1.2.3 GenomicRanges_1.2.3 > IRanges_1.8.9 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 biomaRt_2.6.0 BSgenome_1.18.3 DBI_0.2-5 > mvtnorm_0.9-96 RCurl_1.5-0 > [7] RSQLite_0.9-4 rtracklayer_1.10.6 splines_2.12.1 > survival_2.36-2 tools_2.12.1 XML_3.2-0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.