Hi folks, I've attached a patch to the svn trunk that improves the performance of the serialize/unserialize interface for vector types. The current implementation: a) invokes the R_XDREncode operation for each element of the vector type, and b) uses a switch statement to determine the stream type for each element of the vector type. I've added R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements at a time, and I've reorganized the implementation so that the stream type is not queried once per element. In the following microbenchmark (below), I've observed performance improvements of about x2.4. In a real benchmark that is using the serialization interface to make MPI calls, I see about a 10% improvement in performance. Cheers, --Michael microbenchmark: input <- matrix(1:100000000, 10000, 10000) output <- serialize(input, NULL) for(i in 1:10) { print(system.time(serialize(input, NULL))) } for(i in 1:10) { print(system.time(unserialize(output))) } -------------- next part -------------- A non-text attachment was scrubbed... Name: serialize-vector-performance.patch Type: text/x-patch Size: 10234 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110928/47381102/attachment.bin>
Any thoughts? I haven't heard any feedback on this patch. Thanks! --Michael On Wed, Sep 28, 2011 at 3:10 PM, Michael Spiegel <michael.m.spiegel at gmail.com> wrote:> Hi folks, > > I've attached a patch to the svn trunk that improves the performance > of the serialize/unserialize interface for vector types. The current > implementation: a) invokes the R_XDREncode operation for each element > of the vector type, and b) uses a switch statement to determine the > stream type for each element of the vector type. I've added > R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements > at a time, and I've reorganized the implementation so that the stream > type is not queried once per element. > > In the following microbenchmark (below), I've observed performance > improvements of about x2.4. ?In a real benchmark that is using the > serialization interface to make MPI calls, I see about a 10% > improvement in performance. > > Cheers, > --Michael > > microbenchmark: > > input <- matrix(1:100000000, 10000, 10000) > output <- serialize(input, NULL) > for(i in 1:10) { print(system.time(serialize(input, NULL))) } > for(i in 1:10) { print(system.time(unserialize(output))) } >
Apparently Analagous Threads
- unserialize and eager execution
- DOCUMENTATION(?): parallel::mcparallel() gives various types of "Error in unserialize(r) : ..." errors if value is of type raw
- Something is wrong with the unserialize function
- [External] Something is wrong with the unserialize function
- C versions of serialize/unserialize in packages