On Fri, 14-Sep-2012 at 02:03PM -0400, Earl Brown wrote:
|> Hello R-helpers.
|>
|> I've tried to recreate a parallel version of tapply() and table()
|> using a combination of the parallel functions mclapply() and pvec()
|> and papply(), but haven't been successful. In the end, I'm trying
|> to get a cross tab of two vectors. I currently (can) use
|> tapply(..., sum) and table(), and even xtabs() and ftable(), but
|> with tens of millions of words and tens of thousands of files to
|> loop over, it take a long time, like days.
|> Does anyone know of a parallel version of tapply(), table(),
|> xtabs(), or ftable()? Or has anyone created something that
|> approximates a parallel version of one of these functions?
Not sure I have much of an idea of what your cross tab would look
like, but from what I can ascertain, I think you could do something
along these lines:
1. Partition your data into the number of processors you have
available.
2. Specify your tapply function as the function that mclapply
"apply"s to each tranch of the data.
3. Use regular lapply (using one processor) to the list that will be
the result of part 2 to get all the bits back together again and do
whatever summation is appropriate.
HTH
|> Thank you for your time and help. Earl Brown
|>
|> -----
|> Earl K. Brown, PhD
|> Assistant Professor of Spanish Linguistics
|> Department of Modern Languages
|> Kansas State University
|>
|> ______________________________________________
|> R-help at r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.