Iñaki Ucar
2020-Nov-02 13:05 UTC
[Rd] parallel PSOCK connection latency is greater on Linux?
On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <simon.urbanek at r-project.org> wrote:> > It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):How many network packets are generated with and without it? If there are many small writes and thus setting TCP_NODELAY causes many small packets to be sent, it might make more sense to set TCP_QUICKACK instead. I?aki> Unit: microseconds > expr min lq mean median uq max > clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83 > neval > 1000 > > exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect(): > > Unit: microseconds > expr min lq mean median uq max neval > clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234 1000 > > Cheers, > Simon > > > > On 2/11/2020, at 3:39 AM, Jeff <jeff at vtkellers.com> wrote: > > > > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead? > > > > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session. > > > > library(parallel) > > library(microbenchmark) > > gcinfo(TRUE) > > cl <- makeCluster(1) > > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us")) > > plot(x$time, ylab = "microseconds") > > head(x$time, n = 10) > > > > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker. > > > > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds). > > > > I see this behavior across various platforms and hardware combinations: > > > > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) > > Linux Mint 19.3 (AMD Ryzen 7 1800X) > > Linux Mint 20 (AMD Ryzen 7 3700X) > > Windows 10 (AMD Ryzen 7 4800H) > > MacOS 10.15.7 (Intel Core i7-8850H) > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- I?aki ?car
Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application? Best, Jeff On Mon, Nov 2, 2020 at 14:05, I?aki Ucar <iucar at fedoraproject.org> wrote:> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek > <simon.urbanek at r-project.org> wrote: >> >> It looks like R sockets on Linux could do with TCP_NODELAY -- >> without (status quo): > > How many network packets are generated with and without it? If there > are many small writes and thus setting TCP_NODELAY causes many small > packets to be sent, it might make more sense to set TCP_QUICKACK > instead. > > I?aki > >> Unit: microseconds >> expr min lq mean median uq >> max >> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 >> 48027.83 >> neval >> 1000 >> >> exactly the same machine + R but with TCP_NODELAY enabled in >> R_SockConnect(): >> >> Unit: microseconds >> expr min lq mean median uq >> max neval >> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 >> 5322.234 1000 >> >> Cheers, >> Simon >> >> >> > On 2/11/2020, at 3:39 AM, Jeff <jeff at vtkellers.com> wrote: >> > >> > I'm exploring latency overhead of parallel PSOCK workers and >> noticed that serializing/unserializing data back to the main R >> session is significantly slower on Linux than it is on Windows/MacOS >> with similar hardware. Is there a reason for this difference and is >> there a way to avoid the apparent additional Linux overhead? >> > >> > I attempted to isolate the behavior with a test that simply >> returns an existing object from the worker back to the main R >> session. >> > >> > library(parallel) >> > library(microbenchmark) >> > gcinfo(TRUE) >> > cl <- makeCluster(1) >> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = >> "us")) >> > plot(x$time, ylab = "microseconds") >> > head(x$time, n = 10) >> > >> > On Windows/MacOS, the test runs in 300-500 microseconds depending >> on hardware. A few of the 1000 runs are an order of magnitude slower >> but this can probably be attributed to garbage collection on the >> worker. >> > >> > On Linux, the first 5 or so executions run at comparable speeds >> but all subsequent executions are two orders of magnitude slower >> (~40 milliseconds). >> > >> > I see this behavior across various platforms and hardware >> combinations: >> > >> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) >> > Linux Mint 19.3 (AMD Ryzen 7 1800X) >> > Linux Mint 20 (AMD Ryzen 7 3700X) >> > Windows 10 (AMD Ryzen 7 4800H) >> > MacOS 10.15.7 (Intel Core i7-8850H) >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > I?aki ?car
Iñaki Ucar
2020-Nov-02 13:47 UTC
[Rd] parallel PSOCK connection latency is greater on Linux?
On Mon, 2 Nov 2020 at 14:29, Jeff <jeff at vtkellers.com> wrote:> > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that > they might determine what is best for their potentially latency- or > throughput-sensitive application?I think it makes sense (with a sensible default). E.g., Julia does this [1-2]. [1] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.nagle [2] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.quickack -- I?aki ?car
Simon Urbanek
2020-Nov-04 01:06 UTC
[Rd] parallel PSOCK connection latency is greater on Linux?
I'm not sure the user would know ;). This is very system-specific issue just because the Linux network stack behaves so differently from other OSes (for purely historical reasons). That makes it hard to abstract as a "feature" for the R sockets that are supposed to be platform-independent. At least TCP_NODELAY is actually part of POSIX so it is on better footing, and disabling delayed ACK is practically only useful to work around the other side having Nagle on, so I would expect it to be rarely used. This is essentially RFC since we don't have a mechanism for socket options (well, almost, there is timeout and blocking already...) and I don't think we want to expose low-level details so perhaps one idea would be to add something like delay=NA to socketConnection() in order to not touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other way we could infer the intention of the user to try to choose the right approach... Cheers, Simon> On Nov 3, 2020, at 02:28, Jeff <jeff at vtkellers.com> wrote: > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application? > > Best, > Jeff > > On Mon, Nov 2, 2020 at 14:05, I?aki Ucar <iucar at fedoraproject.org> wrote: >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <simon.urbanek at r-project.org> wrote: >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo): >> How many network packets are generated with and without it? If there >> are many small writes and thus setting TCP_NODELAY causes many small >> packets to be sent, it might make more sense to set TCP_QUICKACK >> instead. >> I?aki >>> Unit: microseconds >>> expr min lq mean median uq max >>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83 >>> neval >>> 1000 >>> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect(): >>> Unit: microseconds >>> expr min lq mean median uq max neval >>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234 1000 >>> Cheers, >>> Simon >>> > On 2/11/2020, at 3:39 AM, Jeff <jeff at vtkellers.com> wrote: >>> > >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead? >>> > >>> > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session. >>> > >>> > library(parallel) >>> > library(microbenchmark) >>> > gcinfo(TRUE) >>> > cl <- makeCluster(1) >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us")) >>> > plot(x$time, ylab = "microseconds") >>> > head(x$time, n = 10) >>> > >>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker. >>> > >>> > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds). >>> > >>> > I see this behavior across various platforms and hardware combinations: >>> > >>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) >>> > Linux Mint 19.3 (AMD Ryzen 7 1800X) >>> > Linux Mint 20 (AMD Ryzen 7 3700X) >>> > Windows 10 (AMD Ryzen 7 4800H) >>> > MacOS 10.15.7 (Intel Core i7-8850H) >>> > >>> > ______________________________________________ >>> > R-devel at r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>> > >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> -- >> I?aki ?car > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Possibly Parallel Threads
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?
- parallel PSOCK connection latency is greater on Linux?