Hi folks,
I've attached a patch to the svn trunk that improves the performance
of the serialize/unserialize interface for vector types. The current
implementation: a) invokes the R_XDREncode operation for each element
of the vector type, and b) uses a switch statement to determine the
stream type for each element of the vector type. I've added
R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements
at a time, and I've reorganized the implementation so that the stream
type is not queried once per element.
In the following microbenchmark (below), I've observed performance
improvements of about x2.4. In a real benchmark that is using the
serialization interface to make MPI calls, I see about a 10%
improvement in performance.
Cheers,
--Michael
microbenchmark:
input <- matrix(1:100000000, 10000, 10000)
output <- serialize(input, NULL)
for(i in 1:10) { print(system.time(serialize(input, NULL))) }
for(i in 1:10) { print(system.time(unserialize(output))) }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: serialize-vector-performance.patch
Type: text/x-patch
Size: 10234 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20110928/47381102/attachment.bin>
Any thoughts? I haven't heard any feedback on this patch. Thanks! --Michael On Wed, Sep 28, 2011 at 3:10 PM, Michael Spiegel <michael.m.spiegel at gmail.com> wrote:> Hi folks, > > I've attached a patch to the svn trunk that improves the performance > of the serialize/unserialize interface for vector types. The current > implementation: a) invokes the R_XDREncode operation for each element > of the vector type, and b) uses a switch statement to determine the > stream type for each element of the vector type. I've added > R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements > at a time, and I've reorganized the implementation so that the stream > type is not queried once per element. > > In the following microbenchmark (below), I've observed performance > improvements of about x2.4. ?In a real benchmark that is using the > serialization interface to make MPI calls, I see about a 10% > improvement in performance. > > Cheers, > --Michael > > microbenchmark: > > input <- matrix(1:100000000, 10000, 10000) > output <- serialize(input, NULL) > for(i in 1:10) { print(system.time(serialize(input, NULL))) } > for(i in 1:10) { print(system.time(unserialize(output))) } >
Reasonably Related Threads
- unserialize and eager execution
- DOCUMENTATION(?): parallel::mcparallel() gives various types of "Error in unserialize(r) : ..." errors if value is of type raw
- Something is wrong with the unserialize function
- [External] Something is wrong with the unserialize function
- C versions of serialize/unserialize in packages