Hi all, I'm posting this here as it discusses an issue with an external C library. If it would be better in R-Help, then I'll repost. I'm using an external library which I've written, which provides a large set of data (>500MB in a highly condensed format) and the tools to return values from the data. The functionality has been tested call by call and using valgrind and works fine, with no memory leaks. After retrieval, I process the data in R. A specific function is causing a problem that appears to be related to the garbage collector (judging by symptoms). In the C code, a Matrix is created using PROTECT(retVal = allocMatrix(INTSXP, x, y)); Values are written into this matrix using INTEGER(retVal)[translatedOffset]=z; where "translatedOffset" is a conversion from a row/column pair to an offset as shown in R-exts.pdf. The last two lines of the function call are: UNPROTECT(1); return retVal; The shared library was compiled with R CMD SHLIB and is called using .Call. Which returns our completed SEXP object to R where processing continues. In R, we continue to process the data, replacing -1s with NAs (I couldn't find a way to do that in that would make it back into R), sorting it, and trimming it. All of these operations are carried out on the original data. If I carry out the processing step by step from the interpreter, everything is fine and the data comes out how I would expect. But when I run the R code to carry out those steps, every now and again (Around 1/5th of the time), the returned data is garbage. I'm expecting to receive a bias per iteration that should be -5 <= bias <= 5, but for the garbaged data, I'm getting results of the order of 100s of thousands out (eg. -220627.7). If I call the routine which carries out the processing for one iteration from the intepreter, sometimes I get the correct data, sometimes (with the same frequency) I get garbage. There are two possibilities that I can envisage. 1) Race condition: R is starting to execute the R code after the .Call before the .Call has returned, thus the data is corrupted. 2) Garbage collector: the GC is collecting my data between the UNPROTECT(1); call and the assignment to an R variable. The created matrices can be large (where x > 1000, y > 100000), but the garbage doesn't appear to be related to the size of the matrix. Any ideas what steps I could take to proceed with this? Or other possibilities than those I've suggested? For reasons of confidentiality I'm unable to release test code, and the large dataset might make testing difficult. Thanks in advance -- Jon Senior <jon at restlesslemon.co.uk>
Sklyar, Oleg (London)
2009-Jan-27 12:25 UTC
[Rd] Return values from .Call and garbage collection
- R is not multithreaded (or so it was) and thus race condition cannot occur - I would think there is no call to GC at the time of assignment of the return value to a variable. GC is only called within other R calls as R as mentioned above is not multithreaded Most likely issue is your code itself, out of range indexing, failure to initialise all elements of the allocated structure correctly, 1 and not 0-based indexing, use of other R variables for initialisation that should have been protected but were not etc. Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 osklyar at maninvestments.com> -----Original Message----- > From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of Jon Senior > Sent: 27 January 2009 12:09 > To: r-devel at r-project.org > Subject: [Rd] Return values from .Call and garbage collection > > Hi all, > > I'm posting this here as it discusses an issue with an > external C library. If it would be better in R-Help, then I'll repost. > > I'm using an external library which I've written, which > provides a large set of data (>500MB in a highly condensed > format) and the tools to return values from the data. The > functionality has been tested call by call and using valgrind > and works fine, with no memory leaks. After retrieval, I > process the data in R. A specific function is causing a > problem that appears to be related to the garbage collector > (judging by symptoms). > > In the C code, a Matrix is created using > > PROTECT(retVal = allocMatrix(INTSXP, x, y)); > > Values are written into this matrix using > > INTEGER(retVal)[translatedOffset]=z; > > where "translatedOffset" is a conversion from a row/column > pair to an offset as shown in R-exts.pdf. > > The last two lines of the function call are: > > UNPROTECT(1); > return retVal; > > The shared library was compiled with R CMD SHLIB and is > called using .Call. > > Which returns our completed SEXP object to R where processing > continues. > > In R, we continue to process the data, replacing -1s with NAs > (I couldn't find a way to do that in that would make it back > into R), sorting it, and trimming it. All of these operations > are carried out on the original data. > > If I carry out the processing step by step from the > interpreter, everything is fine and the data comes out how I > would expect. But when I run the R code to carry out those > steps, every now and again (Around 1/5th of the time), the > returned data is garbage. I'm expecting to receive a bias per > iteration that should be -5 <= bias <= 5, but for the > garbaged data, I'm getting results of the order of 100s of > thousands out (eg. -220627.7). If I call the routine which > carries out the processing for one iteration from the > intepreter, sometimes I get the correct data, sometimes (with > the same frequency) I get garbage. > > There are two possibilities that I can envisage. > 1) Race condition: R is starting to execute the R code after > the .Call before the .Call has returned, thus the data is corrupted. > 2) Garbage collector: the GC is collecting my data between > the UNPROTECT(1); call and the assignment to an R variable. > > The created matrices can be large (where x > 1000, y > > 100000), but the garbage doesn't appear to be related to the > size of the matrix. > > Any ideas what steps I could take to proceed with this? Or > other possibilities than those I've suggested? For reasons of > confidentiality I'm unable to release test code, and the > large dataset might make testing difficult. > > Thanks in advance > > -- > Jon Senior <jon at restlesslemon.co.uk> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >********************************************************************** Please consider the environment before printing this email or its attachments. The contents of this email are for the named addressees ...{{dropped:19}}
Possibly Parallel Threads
- Return values from .Call and garbage collection [Additional information added]
- Memory problems, HDF5 library and R-1.2.2 garbage collection
- Garbage collection of seemingly PROTECTed pairlist
- garbage collection, "preserved" variables, and different outcome depending on "--verbose" or not
- Moderating consequences of garbage collection when in C