Michael Braun
2007-May-25 23:12 UTC
[Rd] R scripts slowing down after repeated called to compiled code
Thanks in advance to anyone that might be able to help me with this problem. I have not been able to find a reference to it in the documentation on online sources, so I am turning to this group. I am running R 2.4.1 under Red Hat Enterprise Linux 4, on an x86_64 platform (multi-core Intel Xeon processors, 3.6Ghx, 8GB of RAM). I have some rather complicated code (so I won't attach it here), but it is an iterative algorithm that takes data in the form of an R list, passes the entire list to some compiled C code, converts list items to GSL matrices and vectors, performs its various operations, and sends back the result to R. That result is then sent back to the compiled code until some kind of convergence (I won't bore the group with more details). The problem is that every .Call to the compiled code runs slower (I print the iteration time using Sys.time() and difftime() ). There is no logical reason for this (i.e., it's not a feature of the algorithm itself). I am using about 20% of my machine's available RAM (it's a large dataset and a memory-intensive algorithm), but there does not appear to be any swapping of memory to disk. I am sure that I am UNPROTECTING the SEXP's that I created, and I am freeing all of the GSL objects at the end of each function. The total RAM used does seem to increase, slowly but steadily, but the speed decrease occurs well before I start coming close to running out of RAM. Also, it is not just the compiled call that slows down. EVERYTHING slows down, even those that consist only of standard R functions. The time for each of these function calls is roughly proportional to the time of the .Call to the C function. Another observation is that when I terminate the algorithm, do a rm (list=ls()), and then a gc(), not all of the memory is returned to the OS. It is not until I terminate the R session that I get all of the memory back. In my C code, I am not doing anything to de-allocate the SEXP's I create, relying on the PROTECT/UNPROTECT mechanism instead (is this right?). I spent most of the day thinking I have a memory leak, but that no longer appears to be the case. I tried using Rprof(), but that only gives me the aggregated relative time spent in each function (more than 80% of the time, it's in the .Call). So I'm stuck. Can anyone help? Thanks, Michael -- Michael Braun Assistant Professor of Marketing MIT Sloan School of Management One Amherst St., E40-169 Cambridge, MA 02142 (617) 253-3436 braunm at mit.edu
Vladimir Dergachev
2007-May-25 23:29 UTC
[Rd] R scripts slowing down after repeated called to compiled code
On Friday 25 May 2007 7:12 pm, Michael Braun wrote:> Thanks in advance to anyone that might be able to help me with this > > Also, it is not just the compiled call that slows down. EVERYTHING > slows down, even those that consist only of standard R functions. The > time for each of these function calls is roughly proportional to the > time of the .Call to the C function. > > Another observation is that when I terminate the algorithm, do a rm > (list=ls()), and then a gc(), not all of the memory is returned to the > OS. It is not until I terminate the R session that I get all of the > memory back. In my C code, I am not doing anything to de-allocate the > SEXP's I create, relying on the PROTECT/UNPROTECT mechanism instead (is > this right?). > > I spent most of the day thinking I have a memory leak, but that no > longer appears to be the case. I tried using Rprof(), but that only > gives me the aggregated relative time spent in each function (more than > 80% of the time, it's in the .Call).One possibility is that you are somehow creating a lot of R objects (say by calling assign() or missing UNPROTECT()) and this slows garbage collector down. The garbage collector running time will grow with the number of objects you have - their total size does not have to be large. Could you try printing numbers from gc() call and checking whether the numbers of allocated objects grow a lot ? best Vladimir Dergachev> > So I'm stuck. Can anyone help? > > Thanks, > > Michael
Dirk Eddelbuettel
2007-May-26 01:40 UTC
[Rd] R scripts slowing down after repeated called to compiled code
On 25 May 2007 at 19:12, Michael Braun wrote: | So I'm stuck. Can anyone help? It sounds like a memory issue. Your memory may just get fragmented. One tool that may help you find leaks is valgrind -- see the 'R Extensions' manual. I can also recommend the visualisers like kcachegrind (part of KDE). But it may not be a leak. I found that R just doesn't cope well with many large memory allocations and releases -- I often loop over data request that I subset and process. This drives my 'peak' memory use to 1.5 or 1.7gb on 32bit/multicore machine with 4gb, 6gb or 8gb (but 32bit leading to the hard 3gb per process limit) . And I just can't loop over many such task. So I now use the littler frontend to script this, dump the processed chunks as Rdata files and later re-read the pieces. That works reliably. So one think you could try is to dump your data in 'gsl ready' format from R, quit R, leave it out of the equation and then see if what happens if you do the iterations in only GSL and your code. Hth, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Reasonably Related Threads
- include libraries for C code called from R
- Parameter changes and segfault when calling C code through .Call
- Undefined symbol when trying to dyn.load a shared object
- Using gsl libraries in R packages
- FW: Passing lists from R to C, extracting elements, and sending lists back again