Johannes Huesing
2014-Oct-06 06:42 UTC
[R] R coredumps when calling functions from rms package
My wife is trying to run some bootstrap model validation. She is using code to the effect of library(rms) residual <- sasxport.get("~/R_boot/residual.xpt") c1.odat <- somers2(residual$risk.5, residual$caco)["C"] # c - statistic for original risk estimate in full sample # fitboot <- function(data) { logitfit= lrm ( ... some model ..., data=data) absrisk.boot <- data$risk.5 * exp(predict(logitfit,data=data, type='lp') - logitfit$coefficients["Intercept"]) ni <- improveProb(x1=data$risk.5, x2=absrisk.boot,y=data$caco) nri.boot <- ni[8] idi.boot <- ni[19] # a half-sentence in the documentation tells us that this NRI is calculated independently of risk limits c1.boot <- somers2(data$risk.5, data$caco)["C"] # c - statistic for absolute risk in the i.th sample # c2.boot <- somers2(absrisk.boot, data$caco)["C"] # c - statistic for new risk estimate in the i.th sample # deltac.boot <- c2.boot - c1.boot # delta-c absrisk.original <- residual$risk.5 * exp(predict(logitfit,data=residual, type='lp') - logitfit$coefficients["Intercept"]) ni.Odat = improveProb(x1=residual$risk.5, x2=absrisk.original, y=residual$caco) nri.odat = ni.Odat[8] idi.odat = ni.Odat[19] # a half-sentence in the documentation tells us that this NRI is calculated independently of risk limits c2.odat <- somers2(absrisk.original, residual$caco)["C"] # c - statistic for new risk score on original sample deltac.odat <- c2.odat - c1.odat # delta c full sample result.both <- unlist(c(nri.boot = nri.boot, idi.boot = idi.boot, deltac.boot = deltac.boot, nri.odat = nri.odat, idi.odat = idi.odat, deltac.odat = deltac.odat)) # combining results of NRI, IDI and C-statistics from full sample, putting the values to the data frame called 'result.both' result.both } set.seed(1111) anziter <- 100 boot.res <- matrix(nrow=anziter, ncol=6) for (i in 1:anziter) { boot.id <- sample(nrow(residual), nrow(residual), replace = TRUE) bootdat <- residual[boot.id, ] boot.res[i, ] <- fitboot(bootdat) } When run within Rgui under Windows, Rgui will terminate showing a dialog promising that Rgui is faulty and she will be notified once this is fixed. When run from Rscript within the Powershell, Rscript will terminate with a message where "Rgui" is replaced by "Rscript". When run on a different machine within Ubuntu (Trusty Tahr), R will complain that Hmisc is in a version not recent enough (which seems to be a problem with the current Ubuntu LTS distribution). After fixing it, R (run within Emacs) will terminate with a memory address error, offering a core dump. Tossing in some trace instructions will reveal that the problems seem to occur after entering the predict() or improveProb() function body. In fact, when run from ESS it will print a backtrace revealing that it was in the middle of predict(). Sometimes (even without altering the seed) R will terminate the script with the message "GC encountered a node (?) with an unknown SEXP type", but this has been observed only in the Rgui calls. Is there a way to trap these occurrences and continue the script, ignoring all runs that lead to unforeseen behaviour? (So far I have suggested to save() the whole matrix of intermediate results to a file after each run, and start the script as a cron job every five minutes, and compile all the results to one big array after sufficiently enough successful iterations.) -- Johannes H?sing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johannes at huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")