Apologies for any obtuseness in the following. We have been working on Version 2.0 of the randomSurvivalForest CRAN package and we're encountering a perplexing 'memory not mapped' segfault that we believe is "influenced" by GC. We essentially have two R functions, rsf.default(..), and predict.rsf(..) and two corresponding entry points, rsfGrow(...), and rsfPredict(...), into our C library. These entry points are implemented via the .Call(...) interface. Inputs to the C code are vectors of integers and reals in the form of SEXP pointers, and the outputs for both .Call(...)'s is a SEXP list containing vectors of integers and reals. rsf.default(...) grows a forest of binary trees given survival data, and predict.rsf(...) takes the forest output from rsf.default(...) and uses it to predict with a new data set. Things go fine until we put the system under stress. We can grow repeatedly without issues, and predict repeatedly without issues, using a loop to stress the system. We detect no memory leaks, and C stack usage is stable. However, when we grow and predict alternately within the same loop we encounter a segfault, randomly in the R functions. The segfault can occur after hundreds of iterations, but when gctorture is true, the segfault usually occurs much sooner. In the C code, we protect all incoming SEXP objects, though we don't believe it is necessary for function arguments. The output objects are of course protected, and all are balanced with unprotect statements. Within the C code, we manage our own memory using malloc(...) and free(...). We detect no memory leaks, and our experience has been that they are relatively easy to detect under stress given the large memory imprint our data structures typically have. Stack usage using Cstack_info() is stable. For clarity, pseudo code for the trivial stress loop is as follows: formula = as.formula(Survrsf(time,status)~.)) data(veteran, package="randomSurvivalForest") for (i in 1:1000) { growObject = rsf.default(formula, veteran) predictObject = rsf.predict(growObject, veteran) } On single iterations, we have carefully examined the output of each function for coherency. All vectors are initialized and populated with valid data. We can grow repeatedly or predict repeatedly. However, when the two functions are combined in the same loop, we consistently segfault with 'memory not mapped' in either R function, usually in some seemingly random and benign location. For example: Growing using logrank , Iteration 253 ... *** caught segfault *** address 0x7dbdda88, cause 'memory not mapped' Traceback: 1: as.vector(x[, i]) 2: as.data.frame.matrix(model.matrix(as.formula(paste("~ -1 +", paste(c(fNames[1:2], predTempNames), collapse = "+"))), data)) 3: as.data.frame(model.matrix(as.formula(paste("~ -1 +", paste(c(fNames[1:2], predTempNames), collapse = "+"))), data)) 4: rsf.default(formula = formula, data = dataSet, ntree, mtry, nodesize, splitrule = splitrule[j], importance = importance, forest = forest, do.trace = do.trace, proximity = proximity, ntime = ntime, seed = seed, add.noise = add.noise, predictorWt predictorWt) 5: rsf(formula = formula, data = dataSet, ntree, mtry, nodesize, splitrule = splitrule[j], importance = importance, forest = forest, do.trace = do.trace, proximity = proximity, ntime = ntime, seed seed, add.noise = add.noise, predictorWt = predictorWt) 6: eval.with.vis(expr, envir, enclos) 7: eval.with.vis(ei, envir) 8: source("stress.R") We see that we are in the grow phase in the above segfault, that does not depend on any output SEXP objects that may potentially be corrupt. However, the creation of SEXP objects (in the predict call) appears to be a necessary condition for failure. We are wondering if there is something fundamentally missing in our understanding of the interaction between R and C via SEXP objects, memory allocation, persistency, and any potential garbage collection that may be occurring. Any comments would be greatly appreciated. Our environment is as follows, though we have seen the same behaviour on an SGI Altix system, a Mac OS X (Intel) system, and with R 2.3.0: platform powerpc-apple-darwin8.8.0 arch powerpc os darwin8.8.0 system powerpc, darwin8.8.0 status major 2 minor 4.1 year 2006 month 12 day 18 svn rev 40228 language R version.string R version 2.4.1 (2006-12-18) -- ubk ubk2101 at columbia.edu Udaya B. Kogalur, Ph.D. Kogalur Shear Corporation 5425 Nestleway Drive, Suite L1 Clemmons, NC 27012
One possible reason for such problems is if you copy the pointers for say, attributes, classes, names, rather than duplicating them. With very few exceptions, mostly in classes, no two R objects of the sort you normally encounter/create/play-with should share *any* part of their data-structure. e.g. such problem can result if you assign the row names of the input to the output (even if both have the same row names). However, without the actual code, can't tell. K. B. Udaya wrote:> Apologies for any obtuseness in the following. We have been working > on Version 2.0 of the randomSurvivalForest CRAN package and we're > encountering a perplexing 'memory not mapped' segfault that we believe > is "influenced" by GC. > > We essentially have two R functions, rsf.default(..), and > predict.rsf(..) and two corresponding entry points, rsfGrow(...), and > rsfPredict(...), into our C library. These entry points are > implemented via the .Call(...) interface. Inputs to the C code are > vectors of integers and reals in the form of SEXP pointers, and the > outputs for both .Call(...)'s is a SEXP list containing vectors of > integers and reals. > > rsf.default(...) grows a forest of binary trees given survival data, > and predict.rsf(...) takes the forest output from rsf.default(...) and > uses it to predict with a new data set. > > Things go fine until we put the system under stress. We can grow > repeatedly without issues, and predict repeatedly without issues, > using a loop to stress the system. We detect no memory leaks, and C > stack usage is stable. > > However, when we grow and predict alternately within the same loop we > encounter a segfault, randomly in the R functions. The segfault can > occur after hundreds of iterations, but when gctorture is true, the > segfault usually occurs much sooner. > > In the C code, we protect all incoming SEXP objects, though we don't > believe it is necessary for function arguments. The output objects > are of course protected, and all are balanced with unprotect > statements. > > Within the C code, we manage our own memory using malloc(...) and > free(...). We detect no memory leaks, and our experience has been > that they are relatively easy to detect under stress given the large > memory imprint our data structures typically have. Stack usage using > Cstack_info() is stable. > > For clarity, pseudo code for the trivial stress loop is as follows: > > formula = as.formula(Survrsf(time,status)~.)) > data(veteran, package="randomSurvivalForest") > > for (i in 1:1000) { > growObject = rsf.default(formula, veteran) > predictObject = rsf.predict(growObject, veteran) > } > > On single iterations, we have carefully examined the output of each > function for coherency. All vectors are initialized and populated > with valid data. We can grow repeatedly or predict repeatedly. > However, when the two functions are combined in the same loop, we > consistently segfault with 'memory not mapped' in either R function, > usually in some seemingly random and benign location. For example: > > Growing using logrank , Iteration 253 ... > > *** caught segfault *** > address 0x7dbdda88, cause 'memory not mapped' > > Traceback: > 1: as.vector(x[, i]) > 2: as.data.frame.matrix(model.matrix(as.formula(paste("~ -1 +", > paste(c(fNames[1:2], predTempNames), collapse = "+"))), data)) > 3: as.data.frame(model.matrix(as.formula(paste("~ -1 +", > paste(c(fNames[1:2], predTempNames), collapse = "+"))), data)) > 4: rsf.default(formula = formula, data = dataSet, ntree, mtry, > nodesize, splitrule = splitrule[j], importance = importance, > forest = forest, do.trace = do.trace, proximity = proximity, ntime > = ntime, seed = seed, add.noise = add.noise, predictorWt > predictorWt) > 5: rsf(formula = formula, data = dataSet, ntree, mtry, nodesize, > splitrule = splitrule[j], importance = importance, forest = forest, > do.trace = do.trace, proximity = proximity, ntime = ntime, seed > seed, add.noise = add.noise, predictorWt = predictorWt) > 6: eval.with.vis(expr, envir, enclos) > 7: eval.with.vis(ei, envir) > 8: source("stress.R") > > We see that we are in the grow phase in the above segfault, that does > not depend on any output SEXP objects that may potentially be corrupt. > However, the creation of SEXP objects (in the predict call) appears > to be a necessary condition for failure. > > We are wondering if there is something fundamentally missing in our > understanding of the interaction between R and C via SEXP objects, > memory allocation, persistency, and any potential garbage collection > that may be occurring. Any comments would be greatly appreciated. > > Our environment is as follows, though we have seen the same behaviour > on an SGI Altix system, a Mac OS X (Intel) system, and with R 2.3.0: > > platform powerpc-apple-darwin8.8.0 > arch powerpc > os darwin8.8.0 > system powerpc, darwin8.8.0 > status > major 2 > minor 4.1 > year 2006 > month 12 > day 18 > svn rev 40228 > language R > version.string R version 2.4.1 (2006-12-18) > >
On Thursday 01 February 2007 2:01 pm, Hin-Tak Leung wrote:> One possible reason for such problems is if you copy the pointers > for say, attributes, classes, names, rather than duplicating them. > With very few exceptions, mostly in classes, no two R objects of > the sort you normally encounter/create/play-with should share *any* > part of their data-structure. e.g. such problem can result if you > assign the row names of the input to the output (even if both have > the same row names). >Hmm.. I thought that using setAttrib() would automatically increase the reference count, right ? In particular, I quite often use "pseudo-factor" string vectors - where the string objects are passed through cache and reused when forming a string vector. The result is true character() type but with considerable memory savings. The downside is that R reference count field is usually saturated. best Vladimir Dergachev
K. B. Udaya wrote:> Apologies for any obtuseness in the following. We have been working > on Version 2.0 of the randomSurvivalForest CRAN package and we're > encountering a perplexing 'memory not mapped' segfault that we believe > is "influenced" by GC.[...]> We are wondering if there is something fundamentally missing in our > understanding of the interaction between R and C via SEXP objects, > memory allocation, persistency, and any potential garbage collection > that may be occurring. Any comments would be greatly appreciated. > > Our environment is as follows, though we have seen the same behaviour > on an SGI Altix system, a Mac OS X (Intel) system, and with R 2.3.0:If you can run your code on linux (x86, amd64, ppc32, or ppc64), then consider using valgrind for catching memory access problems. You would need to recompile R with debugging support (-g) and it would be best to compile without optimizations (although -O1 seems to be tolerated). And running R within valgrind is as simple as: R -d valgrind --vanilla < script.R or even interactively with: R -d valgrind Best, Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner