thr3ads.net - R devel - [Rd] Speed-up/Cache loadNamespace() [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Mario Annau

2020-Jul-19 15:50 UTC

[Rd] Speed-up/Cache loadNamespace()

Dear all,

in our current setting we have our packages stored on a (rather slow)
network drive and need to invoke short R scripts (using RScript) in a
timely manner. Most of the script's runtime is spent with package loading
using library() (or loadNamespace to be precise).

Is there a way to cache the package namespaces as listed in
loadedNamespaces() and load them into memory before the script is executed?

My first simplistic attempt was to serialize the environment output
from loadNamespace() to a file and load it before the script is started.
However, loading the object automatically also loads all the referenced
namespaces (from the slow network share) which is undesirable for this use
case.

Cheers,
Mario

	[[alternative HTML version deleted]]

Duncan Murdoch

2020-Jul-19 16:02 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

On 19/07/2020 11:50 a.m., Mario Annau wrote:> Dear all,
> 
> in our current setting we have our packages stored on a (rather slow)
> network drive and need to invoke short R scripts (using RScript) in a
> timely manner. Most of the script's runtime is spent with package
loading
> using library() (or loadNamespace to be precise).
> 
> Is there a way to cache the package namespaces as listed in
> loadedNamespaces() and load them into memory before the script is executed?
> 
> My first simplistic attempt was to serialize the environment output
> from loadNamespace() to a file and load it before the script is started.
> However, loading the object automatically also loads all the referenced
> namespaces (from the slow network share) which is undesirable for this use
> case.
I don't think there is, but I doubt if it would help much. 
loadNamespace will be slow if loading the package is slow, and you can't 
avoid doing that once.  (If you call loadNamespace twice on the same 
package, the second one does nothing, and is really quick.)

I think the only savings you might get is the effort of merging various 
tables (e.g. the ones for dispatching S3 and S4 methods), and I wouldn't 
think that would take a really substantial amount of time.

One thing you could do is to create a library on a faster drive, and 
install the minimal set of packages there.  Then if that library comes 
first in .libPaths(), you'll never hit the slow network drive.

Duncan Murdoch

Hugh Parsonage

2020-Jul-19 18:11 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

My advice would be to avoid the network in one of the following ways

1. Store installed packages on your local drive
2. Copy the installed packages to a tempdir on your local drive each time
the script is executed
3. Keep an R session running in perpetuity and source the scripts within
that everlasting session
4. Rewrite your scripts to use base R only.

I suspect this solution list is exhaustive.

On Mon, 20 Jul 2020 at 1:50 am, Mario Annau <mario.annau at gmail.com>
wrote:
> Dear all,
>
> in our current setting we have our packages stored on a (rather slow)
> network drive and need to invoke short R scripts (using RScript) in a
> timely manner. Most of the script's runtime is spent with package
loading
> using library() (or loadNamespace to be precise).
>
> Is there a way to cache the package namespaces as listed in
> loadedNamespaces() and load them into memory before the script is executed?
>
> My first simplistic attempt was to serialize the environment output
> from loadNamespace() to a file and load it before the script is started.
> However, loading the object automatically also loads all the referenced
> namespaces (from the slow network share) which is undesirable for this use
> case.
>
> Cheers,
> Mario
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Mario Annau

2020-Jul-19 18:47 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

Thanks for the quick responses. As you both suggested storing the packages
to local drive is feasible but comes with a size restriction I wanted to
avoid. I'll keep this in mind as plan B.
@Hugh: 2. would impose even greater slowdowns and 4. is just not feasible.
However, 3. sounds interesting - how would this work in a Linux environment?

Thank you,
Mario


Am So., 19. Juli 2020 um 20:11 Uhr schrieb Hugh Parsonage <
hugh.parsonage at gmail.com>:
> My advice would be to avoid the network in one of the following ways
>
> 1. Store installed packages on your local drive
> 2. Copy the installed packages to a tempdir on your local drive each time
> the script is executed
> 3. Keep an R session running in perpetuity and source the scripts within
> that everlasting session
> 4. Rewrite your scripts to use base R only.
>
> I suspect this solution list is exhaustive.
>
> On Mon, 20 Jul 2020 at 1:50 am, Mario Annau <mario.annau at
gmail.com> wrote:
>
>> Dear all,
>>
>> in our current setting we have our packages stored on a (rather slow)
>> network drive and need to invoke short R scripts (using RScript) in a
>> timely manner. Most of the script's runtime is spent with package
loading
>> using library() (or loadNamespace to be precise).
>>
>> Is there a way to cache the package namespaces as listed in
>> loadedNamespaces() and load them into memory before the script is
>> executed?
>>
>> My first simplistic attempt was to serialize the environment output
>> from loadNamespace() to a file and load it before the script is
started.
>> However, loading the object automatically also loads all the referenced
>> namespaces (from the slow network share) which is undesirable for this
use
>> case.
>>
>> Cheers,
>> Mario
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
	[[alternative HTML version deleted]]

Abby Spurdle

2020-Jul-20 08:15 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

It's possible to run R (or a c parent process) as a background process
via a named pipe, and then write script files to the named pipe.
However, the details depend on what shell you use.

The last time I tried (which was a long time ago), I created a small c
program to run R, read from the named pipe from within c, then wrote
it's contents to R's standard in.

It might be possible to do it without the c program.
Haven't checked.


On Mon, Jul 20, 2020 at 3:50 AM Mario Annau <mario.annau at gmail.com>
wrote:>
> Dear all,
>
> in our current setting we have our packages stored on a (rather slow)
> network drive and need to invoke short R scripts (using RScript) in a
> timely manner. Most of the script's runtime is spent with package
loading
> using library() (or loadNamespace to be precise).
>
> Is there a way to cache the package namespaces as listed in
> loadedNamespaces() and load them into memory before the script is executed?
>
> My first simplistic attempt was to serialize the environment output
> from loadNamespace() to a file and load it before the script is started.
> However, loading the object automatically also loads all the referenced
> namespaces (from the slow network share) which is undesirable for this use
> case.
>
> Cheers,
> Mario
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Gábor Csárdi

2020-Jul-20 08:21 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

On Mon, Jul 20, 2020 at 9:15 AM Abby Spurdle <spurdle.a at gmail.com>
wrote:>
> It's possible to run R (or a c parent process) as a background process
> via a named pipe, and then write script files to the named pipe.
> However, the details depend on what shell you use.
I would use screen or tmux for this, if this is an R process that you
want to interact with, and you want to keep it running after a SIGHUP.

Gabor

[...]

Serguei Sokol

2020-Jul-20 09:54 UTC

head link

[Rd] Speed-up/Cache loadNamespace()

Le 20/07/2020 ? 10:15, Abby Spurdle a ?crit?:> It's possible to run R (or a c parent process) as a background process
> via a named pipe, and then write script files to the named pipe.
> However, the details depend on what shell you use.
>
> The last time I tried (which was a long time ago), I created a small c
> program to run R, read from the named pipe from within c, then wrote
> it's contents to R's standard in.
>
> It might be possible to do it without the c program.
> Haven't checked.For testing purposes, you can do:

- in a shell 1:
 ?mkfifo rpipe
 ?exec 3>rpipe # without this trick, Rscript will end after the first 
"echo" hereafter or at the end of your first script.

- in a shell 2:
 ?Rscript rfifo

- in a shell 3:
 ?echo "print('hello')" > rpipe
 ?echo "print('hello again')" > rpipe

Then in the shell 2, you will see the output:
[1] "hello"
[1] "hello again"
etc.

If your R scripts contain "stop()" or "q('yes')" or
any other error, it
will end the Rscript process. Kind of watch-dog can be set for automatic 
relaunching if needed. Another way to stop the Rscript process is to 
kill the "exec 3>rpipe" one. You can find its PID with "fuser
rpipe"

Best,
Serguei.
>
>
> On Mon, Jul 20, 2020 at 3:50 AM Mario Annau <mario.annau at
gmail.com> wrote:
>> Dear all,
>>
>> in our current setting we have our packages stored on a (rather slow)
>> network drive and need to invoke short R scripts (using RScript) in a
>> timely manner. Most of the script's runtime is spent with package
loading
>> using library() (or loadNamespace to be precise).
>>
>> Is there a way to cache the package namespaces as listed in
>> loadedNamespaces() and load them into memory before the script is
executed?
>>
>> My first simplistic attempt was to serialize the environment output
>> from loadNamespace() to a file and load it before the script is
started.
>> However, loading the object automatically also loads all the referenced
>> namespaces (from the slow network share) which is undesirable for this
use
>> case.
>>
>> Cheers,
>> Mario
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Serguei Sokol
Ingenieur de recherche INRAE

Cellule math?matiques
TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 61 55 98 49
email: sokol at insa-toulouse.fr
http://www.toulouse-biotechnology-institute.fr/

Seemingly Similar Threads

Search for more reasonably related threads

R devel - Jul 2020 - Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

[Rd] Speed-up/Cache loadNamespace()

Seemingly Similar Threads