I'll be that one does give me 1 second. What I'm really trying to do is
process a bunch of (~300) relatively large CSV files simultaneously.
Something like this:
testFunc <- function(fileName) {
print(fileName)
df <- process_raw_data(fname)
# process_raw_data just reads in CSV file and does some parsing/column
renaming, nothing magical
return(df)
}
processed_data <- mclapply(list_of_files, testFunc, mc.cores=24)
I would expect to see 24 filenames printed when this starts, but I'm only
seeing 2, and htop is only showing 1 core as being used. At least I now
know that mclapply is doing something from the example above, but what
could be blocking?
ulimit -n is 1024 FYI
thanks for the help!
On Wed, Aug 28, 2013 at 8:00 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote:
> On 08/28/2013 06:29 AM, joe meiring wrote:
>
>> SO what could possibly be causing this? Has anyone encountered this?
This
>> is RedHat Enterprise 5.7.
>>
>
> Prof Ripley indicated that there's overhead to parallelization, and
> perhaps the overhead doesn't outweigh the benefit. Try my favorite
example
>
> mclapply(1:24, function(i) Sys.sleep(1), mc.cores=24)
>
> which should take one second, instead of 24. At least this would rule out
> the idea that overhead outweighs computational gain.
>
> Martin
>
>
>>
>> On Tue, Aug 27, 2013 at 11:14 PM, Prof Brian Ripley
>> <ripley@stats.ox.ac.uk>wrote:
>>
>> On 28/08/2013 06:54, joe meiring wrote:
>>>
>>> This does speed up on an OsX install, so something must be wacky
with
>>>> the
>>>> linux install. Any ideas as to what would cause this?
>>>>
>>>>
>>> It does on a Linux platform ('install' has nothing to do
with this):
>>>
>>> system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
>>>>
>>> user system elapsed
>>> 4.095 0.036 4.140
>>>
>>> system.time(x <- mclapply(test,function(x) loess.smooth(x,x),
>>>>
>>> mc.cores=24))
>>> user system elapsed
>>> 8.125 0.639 0.563
>>>
>>> What is odd is that no CPU time is being recorded in the original
>>> posting.
>>>
>>> That is about what I would expect: there is an overhead in forking
24
>>> processes and this example is too small to be realistic.
>>>
>>>
>>> On Tuesday, August 27, 2013 4:19:31 PM UTC-7, joe meiring wrote:
>>>>
>>>>
>>>>> I can't seem to get mclapply to use more than a single
core. I have a
>>>>> 64
>>>>> core server running Linux.
>>>>>
>>>>> Fore example:
>>>>>
>>>>> library(parallel)
>>>>>
>>>>> test <- lapply(1:100,function(x) rnorm(10000))
>>>>> system.time(x <- lapply(test,function(x)
loess.smooth(x,x)))
>>>>> system.time(x <- mclapply(test,function(x)
loess.smooth(x,x),
>>>>> mc.cores=32))
>>>>>
>>>>> gives me:
>>>>>
>>>>> user system elapsed
>>>>> 0.000 0.000 7.441
>>>>> user system elapsed
>>>>> 0.000 0.000 8.868
>>>>>
>>>>> i.e. mclapply is taking longer than lapply(). What is going
wrong here?
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________****________________
>>>>> R-help@r-project.org mailing list
>>>>>
https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
>>>>>
<https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> >
>>>>>
>>>>> PLEASE do read the posting guide
>>>>>
http://www.R-project.org/****posting-guide.html<http://www.R-project.org/**posting-guide.html>
>>>>>
<http://www.**R-project.org/posting-guide.**html<http://www.R-project.org/posting-guide.html>
>>>>> >
>>>>>
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>>
>>>>> ______________________________****________________
>>>>> R-help@r-project.org mailing list
>>>>>
https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
>>>>>
<https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> >
>>>>> PLEASE do read the posting guide
http://www.R-project.org/**
>>>>> posting-guide.html
<http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>> >
>>>>>
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>
>>> --
>>> Brian D. Ripley, ripley@stats.ox.ac.uk
>>> Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~****ripley/<http://www.stats.ox.ac.uk/~**ripley/>
>>>
<http://www.stats.ox.**ac.uk/~ripley/<http://www.stats.ox.ac.uk/~ripley/>
>>> >
>>>
>>> University of Oxford, Tel: +44 1865 272861 (self)
>>> 1 South Parks Road, +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>>
>>> ______________________________****________________
>>> R-help@r-project.org mailing list
>>>
https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
>>>
<https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> >
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html
<http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>> >
>>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**________________
>> R-help@r-project.org mailing list
>>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
[[alternative HTML version deleted]]