thr3ads.net - R help - [R] multicore by(), like mclapply? [Oct 2011]

If this information is useful, please help other people find it:
Share via:

ivo welch

2011-Oct-10 17:41 UTC

[R] multicore by(), like mclapply?

dear r experts---Is there a multicore equivalent of by(), just like
mclapply() is the multicore equivalent of lapply()?

if not, is there a fast way to convert a data.table into a list based
on a column that lapply and mclapply can consume?

advice appreciated...as always.

regards,

/iaw
----
Ivo Welch (ivo.welch at gmail.com)

Joshua Wiley

2011-Oct-10 18:07 UTC

head link

[R] multicore by(), like mclapply?

Hi Ivo,

My suggestion would be to only pass lapply (or mclapply) the indices.
That should be fast, subsetting with data table should also be fast,
and then you do whatever computations you will.  For example:

require(data.table)
DT <- data.table(x=rep(c("a","b","c"),each=3),
y=c(1,3,6), v=1:9)
setkey(DT, x)

lapply(as.character(unique(DT[,x])), function(i) DT[i])

the DT[i] object is the subset of the data table you want.  You can
pass this to whatever function for computations you need.

Hope this helps,

Josh


On Mon, Oct 10, 2011 at 10:41 AM, ivo welch <ivo.welch at gmail.com>
wrote:> dear r experts---Is there a multicore equivalent of by(), just like
> mclapply() is the multicore equivalent of lapply()?
>
> if not, is there a fast way to convert a data.table into a list based
> on a column that lapply and mclapply can consume?
>
> advice appreciated...as always.
>
> regards,
>
> /iaw
> ----
> Ivo Welch (ivo.welch at gmail.com)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

Matthew Dowle

2011-Oct-10 18:14 UTC

head link

[R] multicore by(), like mclapply?

Package plyr has .parallel.

Searching datatable-help for "multicore", say on Nabble here,

http://r.789695.n4.nabble.com/datatable-help-f2315188.html

yields three relevant posts and examples.

Please check wiki do's and don'ts to make sure you didn't
fall into one of those traps, though (we don't know data or task so
just guessing) :

http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table

HTH
Matthew

"ivo welch" <ivo.welch at gmail.com> wrote in message 
news:CAPr7RtUroPQtQvoh5uBuT60OYkwGR+ufGr_Z=g5g+vLJEOjeaA at
mail.gmail.com...> dear r experts---Is there a multicore equivalent of by(), just like
> mclapply() is the multicore equivalent of lapply()?
>
> if not, is there a fast way to convert a data.table into a list based
> on a column that lapply and mclapply can consume?
>
> advice appreciated...as always.
>
> regards,
>
> /iaw
> ----
> Ivo Welch (ivo.welch at gmail.com)
>

ivo welch

2011-Oct-10 18:54 UTC

head link

[R] multicore by(), like mclapply?

hi josh---thx.  I had a different version of this, and discarded it
because I think it was very slow.  the reason is that on each
application, your version has to scan my (very long) data vector.  (I
have many thousand different cases, too.)  I presume that by() has one
scan through the vector that makes all splits.

regards,

/iaw
----
Ivo Welch (ivo.welch at gmail.com)




On Mon, Oct 10, 2011 at 11:07 AM, Joshua Wiley <jwiley.psych at gmail.com>
wrote:> Hi Ivo,
>
> My suggestion would be to only pass lapply (or mclapply) the indices.
> That should be fast, subsetting with data table should also be fast,
> and then you do whatever computations you will. ?For example:
>
> require(data.table)
> DT <-
data.table(x=rep(c("a","b","c"),each=3),
y=c(1,3,6), v=1:9)
> setkey(DT, x)
>
> lapply(as.character(unique(DT[,x])), function(i) DT[i])
>
> the DT[i] object is the subset of the data table you want. ?You can
> pass this to whatever function for computations you need.
>
> Hope this helps,
>
> Josh
>
>
> On Mon, Oct 10, 2011 at 10:41 AM, ivo welch <ivo.welch at gmail.com>
wrote:
>> dear r experts---Is there a multicore equivalent of by(), just like
>> mclapply() is the multicore equivalent of lapply()?
>>
>> if not, is there a fast way to convert a data.table into a list based
>> on a column that lapply and mclapply can consume?
>>
>> advice appreciated...as always.
>>
>> regards,
>>
>> /iaw
>> ----
>> Ivo Welch (ivo.welch at gmail.com)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>

Thomas Lumley

2011-Oct-10 19:19 UTC

head link

[R] multicore by(), like mclapply?

On Tue, Oct 11, 2011 at 7:54 AM, ivo welch <ivo.welch at gmail.com>
wrote:> hi josh---thx. ?I had a different version of this, and discarded it
> because I think it was very slow. ?the reason is that on each
> application, your version has to scan my (very long) data vector. ?(I
> have many thousand different cases, too.) ?I presume that by() has one
> scan through the vector that makes all splits.
 by.data.frame() is basically a wrapper for tapply(), and the key line
in tapply() is
   ans <- lapply(split(X, group), FUN, ...)
which should be easy to adapt for mclapply.

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

Joshua Wiley

2011-Oct-10 20:14 UTC

head link

[R] multicore by(), like mclapply?

I could be waay off base here, but my concern about presplitting the data is
that you will have your data, and a second copy of our data that is something
like a list where each element contains the portion of the data for that split. 
Good speed wise, bad memory wise.  My hope with the technique I showed (again I
may not have accomplished it) was to only have at anyone time, the original data
and a copy of the particular elements being worked with.  Of course  this is not
an issue if you have plenty of memory.

On Oct 10, 2011, at 12:19, Thomas Lumley <tlumley at uw.edu> wrote:
> On Tue, Oct 11, 2011 at 7:54 AM, ivo welch <ivo.welch at gmail.com>
wrote:
>> hi josh---thx.  I had a different version of this, and discarded it
>> because I think it was very slow.  the reason is that on each
>> application, your version has to scan my (very long) data vector.  (I
>> have many thousand different cases, too.)  I presume that by() has one
>> scan through the vector that makes all splits.
> 
> by.data.frame() is basically a wrapper for tapply(), and the key line
> in tapply() is
>   ans <- lapply(split(X, group), FUN, ...)
> which should be easy to adapt for mclapply.
> 
> -- 
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland

Possibly Parallel Threads

Search for more maybe matching threads

R help - Oct 2011 - multicore by(), like mclapply?

[R] multicore by(), like mclapply?

[R] multicore by(), like mclapply?

[R] multicore by(), like mclapply?

[R] multicore by(), like mclapply?

[R] multicore by(), like mclapply?

[R] multicore by(), like mclapply?

Possibly Parallel Threads