I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead? I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session. library(parallel) library(microbenchmark) gcinfo(TRUE) cl <- makeCluster(1) (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us")) plot(x$time, ylab = "microseconds") head(x$time, n = 10) On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker. On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds). I see this behavior across various platforms and hardware combinations: Ubuntu 18.04 (Intel Xeon Platinum 8259CL) Linux Mint 19.3 (AMD Ryzen 7 1800X) Linux Mint 20 (AMD Ryzen 7 3700X) Windows 10 (AMD Ryzen 7 4800H) MacOS 10.15.7 (Intel Core i7-8850H)
Simon Urbanek
2020-Nov-02 01:21 UTC
[Rd] parallel PSOCK connection latency is greater on Linux?
It looks like R sockets on Linux could do with TCP_NODELAY -- without (status
quo):
Unit: microseconds
expr min lq mean median uq max
clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
neval
1000
exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
Unit: microseconds
expr min lq mean median uq max neval
clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234 1000
Cheers,
Simon
> On 2/11/2020, at 3:39 AM, Jeff <jeff at vtkellers.com> wrote:
>
> I'm exploring latency overhead of parallel PSOCK workers and noticed
that serializing/unserializing data back to the main R session is significantly
slower on Linux than it is on Windows/MacOS with similar hardware. Is there a
reason for this difference and is there a way to avoid the apparent additional
Linux overhead?
>
> I attempted to isolate the behavior with a test that simply returns an
existing object from the worker back to the main R session.
>
> library(parallel)
> library(microbenchmark)
> gcinfo(TRUE)
> cl <- makeCluster(1)
> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit =
"us"))
> plot(x$time, ylab = "microseconds")
> head(x$time, n = 10)
>
> On Windows/MacOS, the test runs in 300-500 microseconds depending on
hardware. A few of the 1000 runs are an order of magnitude slower but this can
probably be attributed to garbage collection on the worker.
>
> On Linux, the first 5 or so executions run at comparable speeds but all
subsequent executions are two orders of magnitude slower (~40 milliseconds).
>
> I see this behavior across various platforms and hardware combinations:
>
> Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
> Linux Mint 19.3 (AMD Ryzen 7 1800X)
> Linux Mint 20 (AMD Ryzen 7 3700X)
> Windows 10 (AMD Ryzen 7 4800H)
> MacOS 10.15.7 (Intel Core i7-8850H)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
IƱaki Ucar
2020-Nov-02 13:05 UTC
[Rd] parallel PSOCK connection latency is greater on Linux?
On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <simon.urbanek at r-project.org> wrote:> > It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):How many network packets are generated with and without it? If there are many small writes and thus setting TCP_NODELAY causes many small packets to be sent, it might make more sense to set TCP_QUICKACK instead. I?aki> Unit: microseconds > expr min lq mean median uq max > clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83 > neval > 1000 > > exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect(): > > Unit: microseconds > expr min lq mean median uq max neval > clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234 1000 > > Cheers, > Simon > > > > On 2/11/2020, at 3:39 AM, Jeff <jeff at vtkellers.com> wrote: > > > > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead? > > > > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session. > > > > library(parallel) > > library(microbenchmark) > > gcinfo(TRUE) > > cl <- makeCluster(1) > > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us")) > > plot(x$time, ylab = "microseconds") > > head(x$time, n = 10) > > > > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker. > > > > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds). > > > > I see this behavior across various platforms and hardware combinations: > > > > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) > > Linux Mint 19.3 (AMD Ryzen 7 1800X) > > Linux Mint 20 (AMD Ryzen 7 3700X) > > Windows 10 (AMD Ryzen 7 4800H) > > MacOS 10.15.7 (Intel Core i7-8850H) > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- I?aki ?car
Reasonably Related Threads
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?