thr3ads.net - R devel - [Rd] Speed-up/Cache loadNamespace() [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Mario Annau

2020-Jul-19 18:47 UTC

[Rd] Speed-up/Cache loadNamespace()

Thanks for the quick responses. As you both suggested storing the packages
to local drive is feasible but comes with a size restriction I wanted to
avoid. I'll keep this in mind as plan B.
@Hugh: 2. would impose even greater slowdowns and 4. is just not feasible.
However, 3. sounds interesting - how would this work in a Linux environment?

Thank you,
Mario


Am So., 19. Juli 2020 um 20:11 Uhr schrieb Hugh Parsonage <
hugh.parsonage at gmail.com>:
> My advice would be to avoid the network in one of the following ways
>
> 1. Store installed packages on your local drive
> 2. Copy the installed packages to a tempdir on your local drive each time
> the script is executed
> 3. Keep an R session running in perpetuity and source the scripts within
> that everlasting session
> 4. Rewrite your scripts to use base R only.
>
> I suspect this solution list is exhaustive.
>
> On Mon, 20 Jul 2020 at 1:50 am, Mario Annau <mario.annau at
gmail.com> wrote:
>
>> Dear all,
>>
>> in our current setting we have our packages stored on a (rather slow)
>> network drive and need to invoke short R scripts (using RScript) in a
>> timely manner. Most of the script's runtime is spent with package
loading
>> using library() (or loadNamespace to be precise).
>>
>> Is there a way to cache the package namespaces as listed in
>> loadedNamespaces() and load them into memory before the script is
>> executed?
>>
>> My first simplistic attempt was to serialize the environment output
>> from loadNamespace() to a file and load it before the script is
started.
>> However, loading the object automatically also loads all the referenced
>> namespaces (from the slow network share) which is undesirable for this
use
>> case.
>>
>> Cheers,
>> Mario
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
	[[alternative HTML version deleted]]

Simon Urbanek

2020-Jul-19 20:07 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

Mario,

On unix if you use Rseve you can pre-load all packages in the server (via eval
config directive or by running Rserve::run.Rserve() from a session that has
everything loaded) and all client connections will have the packages already
loaded and available* immediately. You could replace Rscript call with a very
tiny Rserve client program which just calls source(""). I can give you
more details if you're interested.

Cheers,
Simon


* - there are some packages that are inherently incompatible with fork() - e.g.
you cannot fork Java JVM or open connections.

> On Jul 20, 2020, at 6:47 AM, Mario Annau <mario.annau at gmail.com>
wrote:
> 
> Thanks for the quick responses. As you both suggested storing the packages
> to local drive is feasible but comes with a size restriction I wanted to
> avoid. I'll keep this in mind as plan B.
> @Hugh: 2. would impose even greater slowdowns and 4. is just not feasible.
> However, 3. sounds interesting - how would this work in a Linux
environment?
> 
> Thank you,
> Mario
> 
> 
> Am So., 19. Juli 2020 um 20:11 Uhr schrieb Hugh Parsonage <
> hugh.parsonage at gmail.com>:
> 
>> My advice would be to avoid the network in one of the following ways
>> 
>> 1. Store installed packages on your local drive
>> 2. Copy the installed packages to a tempdir on your local drive each
time
>> the script is executed
>> 3. Keep an R session running in perpetuity and source the scripts
within
>> that everlasting session
>> 4. Rewrite your scripts to use base R only.
>> 
>> I suspect this solution list is exhaustive.
>> 
>> On Mon, 20 Jul 2020 at 1:50 am, Mario Annau <mario.annau at
gmail.com> wrote:
>> 
>>> Dear all,
>>> 
>>> in our current setting we have our packages stored on a (rather
slow)
>>> network drive and need to invoke short R scripts (using RScript) in
a
>>> timely manner. Most of the script's runtime is spent with
package loading
>>> using library() (or loadNamespace to be precise).
>>> 
>>> Is there a way to cache the package namespaces as listed in
>>> loadedNamespaces() and load them into memory before the script is
>>> executed?
>>> 
>>> My first simplistic attempt was to serialize the environment output
>>> from loadNamespace() to a file and load it before the script is
started.
>>> However, loading the object automatically also loads all the
referenced
>>> namespaces (from the slow network share) which is undesirable for
this use
>>> case.
>>> 
>>> Cheers,
>>> Mario
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Dirk Eddelbuettel

2020-Jul-19 20:09 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

On 19 July 2020 at 20:47, Mario Annau wrote:
| Am So., 19. Juli 2020 um 20:11 Uhr schrieb Hugh Parsonage <
| hugh.parsonage at gmail.com>:
| > 3. Keep an R session running in perpetuity and source the scripts within
| > that everlasting session
| However, 3. sounds interesting - how would this work in a Linux environment?

You had Rserve by Simon for close to 20 years. There isn't much in terms of
fancy docs but it has been widely used. In essence, R runs "headless"
and
connect to it (think "telnet" or "ssh", but
programmatically), fire off
request and get results with zero startup latency.  But more work to build
the access layer.

And Rserve is also underneath RestRserve which allows you to query a running
server vai REST / modern web stack tech. (Think "plumber", but in C++
and
faster / more scaleable).

Lastly, there is Jeroen's OpenCPU.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

Tobias Verbeke

2020-Jul-19 20:38 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

----- Original Message -----> From: "Dirk Eddelbuettel" <edd at debian.org>
> To: "Mario Annau" <mario.annau at gmail.com>
> Cc: "r-devel at r-project.org" <r-devel at r-project.org>
> Sent: Sunday, July 19, 2020 10:09:24 PM
> Subject: Re: [Rd] Speed-up/Cache loadNamespace()
> On 19 July 2020 at 20:47, Mario Annau wrote:
>| Am So., 19. Juli 2020 um 20:11 Uhr schrieb Hugh Parsonage <
>| hugh.parsonage at gmail.com>:
>| > 3. Keep an R session running in perpetuity and source the scripts
within
>| > that everlasting session
>| However, 3. sounds interesting - how would this work in a Linux
environment?
> 
> You had Rserve by Simon for close to 20 years. There isn't much in
terms of
> fancy docs but it has been widely used. In essence, R runs
"headless" and
> connect to it (think "telnet" or "ssh", but
programmatically), fire off
> request and get results with zero startup latency.  But more work to build
> the access layer.
> 
> And Rserve is also underneath RestRserve which allows you to query a
running
> server vai REST / modern web stack tech. (Think "plumber", but in
C++ and
> faster / more scaleable).
> 
> Lastly, there is Jeroen's OpenCPU.
Or... lastly, the R Service Bus which has been used in production since 2010 and
got a maintenance release (6.4.0) last week:

https://rservicebus.io/

For REST (both asynchronous and synchronous APIs are available), you can start
here: https://rservicebus.io/api/introduction/

Best,
Tobias

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Jul 2020 - Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

Possibly Parallel Threads