thr3ads.net - R devel - [Rd] Pre-compilation and server-side parallel execution [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Erik van Zijst

2006-Dec-08 14:51 UTC

[Rd] Pre-compilation and server-side parallel execution

Folks,

My company operates a platform that distributes real-time financial data 
from exchanges to users. To extend our services I want to allow users to 
write and submit custom R scripts to our platform that operate on our 
streaming data to do real-time analysis.

We have thousands of users deploying scripts and each script is 
evaluated repeatedly when certain conditions in the stream apply. For 
example, a script could compute the NASDAQ100 index value each time one 
of its 100 constituents trade.

Scripts are typically small and execute quickly. Each script is 
registered once and then repeatedly evaluated with different parameters 
(possibly several times per second per script). In this context my 
biggest concern is scalability.

The evaluation engine is a pure server-side component without display 
abilities. An R-script is invoked with parameters and whatever it 
returns is sent to the user.

Ideally I'd need a C api to interact with the interpreter. I've looked 
at projects like R/Apache, RServe and RSJava for inspiration and came to 
the conclusion that all these projects work by forking multiple 
instances of the R-engine where each instance evaluates one script at a 
time.

As our service must evaluate many different scripts concurrently 
(isolated from one another), I have the following concerns:

1. Spawning a pool of engine instances for massive parallel execution is 
expensive, but might work with lots of memory.
2. R's native C-api 
[http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does not 
separate parsing from evaluation. When the same script is evaluated 10 
times, it is also parsed 10 times.

I'm mostly concerned about the second issue. Our scripts are registered 
once and continuously evaluated. I want to avoid parsing the same script 
again each time it is evaluated. Does the engine recognize previously 
parsed scripts (like oracle does for SQL queries)?

I interested to hear your thoughts on my concerns and whether you think 
R would work in this architecture.

kind regards,
Erik van Zijst
-- 
And on the seventh day, He exited from append mode.

Simon Urbanek

2006-Dec-08 22:28 UTC

head link

[Rd] Pre-compilation and server-side parallel execution

On Dec 8, 2006, at 9:51 AM, Erik van Zijst wrote:
> 2. R's native C-api
> [http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does  
> not separate parsing from evaluation.
Actually it does - see "R_ParseVector" and "eval".
You're free to run
the parser once (or even construct the expression directly) and  
evaluate it many times. (Also note that you can serialize the parsed  
expression if desired).

If your worries are really at this level, then you will have to  
create entirely your own solution, because the overhead of IPC will  
be way more that the time spent in the parser. Actually I'm wondering  
whether you checked it at all, because I'd almost certainly expect  
the evaluation to take way more time than the parsing step. If it  
does, I'd be inclined to think that you have rather a design problem.

Cheers,
Simon

Byron Ellis

2006-Dec-13 00:16 UTC

head link

[Rd] Pre-compilation and server-side parallel execution

On 12/8/06, Erik van Zijst <r at erik.prutser.cx>
wrote:> 2. R's native C-api
> [http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does not
> separate parsing from evaluation. When the same script is evaluated 10
> times, it is also parsed 10 times.
>
> I'm mostly concerned about the second issue. Our scripts are registered
> once and continuously evaluated. I want to avoid parsing the same script
> again each time it is evaluated. Does the engine recognize previously
> parsed scripts (like oracle does for SQL queries)?
A database server is doing rather more than simply parsing a
query--it's also running a query planner to optimize execution and
quite possibly a number of other things so it behooves the DBMS to
cache that information whenever possible. The closest functional
equivalent in R would be wrapping everything in a function and then
serializing the resulting function somewhere.

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Dec 2006 - Pre-compilation and server-side parallel execution

[Rd] Pre-compilation and server-side parallel execution

[Rd] Pre-compilation and server-side parallel execution

[Rd] Pre-compilation and server-side parallel execution

Possibly Parallel Threads