thr3ads.net - R help - [R] Issue with gc() on Ubuntu 20.04 [Aug 2023]

If this information is useful, please help other people find it:
Share via:

John Logsdon

2023-Aug-27 18:54 UTC

[R] Issue with gc() on Ubuntu 20.04

Folks

I have come across an issue with gc() hogging the processor according to 
Rprof.

Platform is Ubuntu 20.04 all up to date
R version 4.3.1
libraries: survival, MASS, gtools and openxlsx.

With default gc.auto options, the profiler notes the garbage collector 
as self.pct 99.39%.

So I have tried switching it off using options(gc.auto=Inf) in the R 
session before running my program using source().

This lowered self.pct to 99.36.  Not much there.

After some pondering, I added an options(gc.auto=Inf) at the beginning 
of each function, not resetting it at exit, but expecting the offending 
function(s) to plead guilty.

Not so although it did lower the gc() time to 95.84%.

This was on a 16 core Threadripper 1950X box so I was intending to use 
library parallel but I tried it on my lowly windows box that is years 
old and got it down to 88.07%.

The only thing I can think of is that there are quite a lot of cases 
where a function is generated on the fly as in:

eval(parse(t=paste("dprob <- 
function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))

I haven't added the options to any of these.

The highest time used by any of my functions is 0.05% - the rest is 
dominated by gc().

There may not be much point in parallising the code until I can reduce 
the garbage collection.

I am not short of memory and would like to disable it fully but despite 
adding to all routines, I haven't managed to do this yet.

Can anyone advise me?

And why is the Linux version so much worse than Windows?

TIA

-- 
John Logsdon
Quantex Research Ltd
m:+447717758675/h:+441614454951

Ivan Krylov

2023-Aug-27 20:02 UTC

head link

[R] Issue with gc() on Ubuntu 20.04

On Sun, 27 Aug 2023 19:54:23 +0100
John Logsdon <j.logsdon at quantex-research.com> wrote:
> Not so although it did lower the gc() time to 95.84%.
> 
> This was on a 16 core Threadripper 1950X box so I was intending to
> use library parallel but I tried it on my lowly windows box that is
> years old and got it down to 88.07%.
Does the Windows box have the same version of R on it?
> The only thing I can think of is that there are quite a lot of cases 
> where a function is generated on the fly as in:
> 
> eval(parse(t=paste("dprob <- 
>
function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))
This isn't very idiomatic. If you need dprob to call the function named
in dist.functions[2,][dist.functions[1,]==distn], wouldn't it be easier
for R to assign that function straight to dprob?

dprob <- get(dist.functions[2,][dist.functions[1,]==distn])

This way, you avoid the need to parse the code, which is typically not
the fastest part of a programming language.

(Generally in R and other programming languages with recursive data
structures, storing variable names in other variables is not very
efficient. Why not put functions directly into a list?)

Rprof() samples the whole call stack. Can you find out which functions
result in a call to gc()? I haven't experimented with a wide sample of
R code, but I don't usually encounter gc() as a major entry in my
Rprof() outputs.

-- 
Best regards,
Ivan

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2023 - Issue with gc() on Ubuntu 20.04

[R] Issue with gc() on Ubuntu 20.04

[R] Issue with gc() on Ubuntu 20.04

Maybe Matching Threads