thr3ads.net - R devel - [Rd] Defragmentation of memory [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Måns Magnusson

2016-Sep-05 00:13 UTC

[Rd] Defragmentation of memory

Dear all developers,

I'm working with a lot of textual data in R and need to handle this batch
by batch. The problem is that I read in batches of 10 000 documents and do
some calculations that results in objects that consume quite some memory
(calculate unigrams, 2-grams and 3-grams). In every iteration a new objects
(~ 500 mB) is created (and I can't control the size, so a new object needs
to be created each iteration). The speed of this computations is decreasing
every iteration (first iteration 7 sec, after 30 iterations 20-30 minutes
per iteration).

I (think) I localized the problem to R:s memory handling and that my
approach is fragmenting the memory. If I do this batch handling in Bash and
starting up a new R session for each batch it takes ~ 7 sec per batch, so
it is nothing with the individual batches. The garbage collector do not
seem to handle this (potential) fragmentation.

Can the reason of the poor performance after a couple of iterations be that
I'm fragmenting the memory? If so, is there a solution that can used to
handle this within R, such as defragmentation or restarting R from within R?

With kind regards
M?ns Magnusson

PhD Student, Statistics, Link?ping University.

	[[alternative HTML version deleted]]

luke-tierney at uiowa.edu

2016-Sep-05 15:26 UTC

head link

[Rd] Defragmentation of memory

On Mon, 5 Sep 2016, M?ns Magnusson wrote:
> Dear all developers,
>
> I'm working with a lot of textual data in R and need to handle this
batch
> by batch. The problem is that I read in batches of 10 000 documents and do
> some calculations that results in objects that consume quite some memory
> (calculate unigrams, 2-grams and 3-grams). In every iteration a new objects
> (~ 500 mB) is created (and I can't control the size, so a new object
needs
> to be created each iteration). The speed of this computations is decreasing
> every iteration (first iteration 7 sec, after 30 iterations 20-30 minutes
> per iteration).
>
> I (think) I localized the problem to R:s memory handling and that my
> approach is fragmenting the memory. If I do this batch handling in Bash and
> starting up a new R session for each batch it takes ~ 7 sec per batch, so
> it is nothing with the individual batches. The garbage collector do not
> seem to handle this (potential) fragmentation.
>
> Can the reason of the poor performance after a couple of iterations be that
> I'm fragmenting the memory? If so, is there a solution that can used to
> handle this within R, such as defragmentation or restarting R from within
R?
Highly unlikely. Fragmentation is rarely an issue on a 64-bit OS and
the symptoms would be different.

To get help with what is actually happening please post a minimal
reproducible example, and please not in html.

Best,

luke
>
> With kind regards
> M?ns Magnusson
>
> PhD Student, Statistics, Link?ping University.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Possibly Parallel Threads

Search for more reasonably related threads

R devel - Sep 2016 - Defragmentation of memory

[Rd] Defragmentation of memory

[Rd] Defragmentation of memory

Possibly Parallel Threads