James Spottiswoode
2020-Feb-01 19:24 UTC
[R] How to parallelize a process called by a socket connection
Hi R Experts, I?m using R version 3.4.3 running under Linux on an AWS EC2 instance. I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine. Here?s the function that listens for a socket connection: # define server function server <- function() { while(TRUE){ con <- socketConnection(host="localhost", port = server_port, blocking=TRUE, server=TRUE, open="r+", timeout = 100000000) data <- readLines(con, 1L, skipNul = T, ok = T) response <- check(data) if (!is.null(response)) writeLines(response, con) } } The server function expects to receive a character string which is then passed to the function check(). check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine. This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously. I?m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things. Currently I have a kludge which is a round-robin approach to solving the problem. I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1? etc. This mitigates, but doesn?t solve, the problem. Any advice would be greatly appreciated! Thanks. James
Hervé Pagès
2020-Feb-02 00:07 UTC
[R] How to parallelize a process called by a socket connection
Seems like you've replied to an existing thread to ask a new question (your post gets buried deep inside the "How to extract or sort values from one column" thread in my Thunderbird). Unfortunately this means that a lot of people who might be able to help you will miss it. H. On 2/1/20 11:24, James Spottiswoode wrote:> Hi R Experts, > > I?m using R version 3.4.3 running under Linux on an AWS EC2 instance. I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine. Here?s the function that listens for a socket connection: > > # define server function > server <- function() { > while(TRUE){ > con <- socketConnection(host="localhost", port = server_port, blocking=TRUE, > server=TRUE, open="r+", timeout = 100000000) > data <- readLines(con, 1L, skipNul = T, ok = T) > response <- check(data) > if (!is.null(response)) writeLines(response, con) > } > } > > The server function expects to receive a character string which is then passed to the function check(). check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine. > > This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously. I?m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things. > > Currently I have a kludge which is a round-robin approach to solving the problem. I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1? etc. This mitigates, but doesn?t solve, the problem. > > Any advice would be greatly appreciated! Thanks. > > James > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=2N70kU171QMzQHhg6A9N3op5jqv8uCm9-njqZfPW3Ok&s=h4ZzqcZ-uTxQeMUcI1l7nHEQHY-Vn-EQsKH83fU7B3s&e> PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=2N70kU171QMzQHhg6A9N3op5jqv8uCm9-njqZfPW3Ok&s=GgmKzz9H7MAj3iy7Pu4U0q5v02Fumnl3hjxug2SY1zk&e> and provide commented, minimal, self-contained, reproducible code. >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Ivan Krylov
2020-Feb-02 09:12 UTC
[R] How to parallelize a process called by a socket connection
On Sat, 1 Feb 2020 11:24:51 -0800 James Spottiswoode <james at jsasoc.com> wrote:> while(TRUE){ > con <- socketConnection(host="localhost", port > server_port, blocking=TRUE, server=TRUE, open="r+", timeout > 100000000) > data <- readLines(con, 1L, skipNul = T, ok = T) > response <- check(data) > if (!is.null(response)) writeLines(response, con) > }> This all works perfectly except that while check() spends ~50ms doing > its stuff no more requests can be received and processed.This poses an interesting challenge. Normally, a single-threaded server would call listen(socket, backlog) [1] with backlog > 1, so that other clients attempting to connect() can wait in the queue while the server calls accept() in a loop and handles them one by one. Unfortunately, R socketConnection()s are single-client only, since R closes the listen()ing helper socket immediately after accept()ing the first client [2]. A quick and dirty hack would be to modify utils::make.socket (and use read.socket()/write.socket() instead of readLines()/writeLines()), omitting the check for "localhost" and .Call(C_sockclose, tmp) at the end of if(server) branch. (Instead, the socket called "tmp" should be kept and the code should loop on .Call(C_socklisten, tmp) to process all clients.) I cannot in recommend this in good faith as a long-term solution, since all involved APIs are private and should not be depended upon. Some code from the svSocket package [3] could be repurposed to rely on the Tcl/Tk event loop to handle multiple clients on a singe server socket, but implementing that would require knowledge of Tcl. If you can afford to change the clients to use the HTTP protocol, you can use the httpuv package [4] to handle the connection management and HTTP request parsing for you. Besides that, I don't see a way to handle multiple clients with a single server socket in R. I must be missing something. -- Best regards, Ivan [1] https://beej.us/guide/bgnet/html/#listen [2] https://github.com/wch/r-source/blob/07c17042d9e198319a425df726cc80545ae69812/src/modules/internet/sockconn.c#L77 [3] https://cran.r-project.org/package=svSocket [4] https://cran.r-project.org/package=httpuv