I am now getting the occasional complaint about survival routines that are not able to handle big data.?? I looked in the manuals to try and update my understanding of max vector size, max matrix, max data set, etc; but it is either not there or I missed it (the latter more likely).?? Is it still .Machine$integer.max for everything??? Will that change??? Found where? I am going to need to go through the survival package and put specific checks in front some or all of my .Call() statements, in order to give a sensible message whenever a bounday is struck.? A well meaning person just posted a suggested "bug fix" to the github source of one routine where my .C call allocates a scratch vector, suggesting? "resid = double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" message,? in the code below.?? A fix is obvously not quite that easy :-) ??? ??? resid <- .C(Ccoxscore, as.integer(n), ??? ??? ??? ??? as.integer(nvar), ??? ??? ??? ??? as.double(y), ??? ??? ??? ??? x=as.double(x), ??? ??? ??? ??? as.integer(newstrat), ??? ??? ??? ??? as.double(score), ??? ??? ??? ??? as.double(weights[ord]), ??? ??? ??? ??? as.integer(method=='efron'), ??? ??? ??? ??? resid= double(n*nvar), ??? ??? ??? ??? double(2*nvar))$resid Terry T. [[alternative HTML version deleted]]
On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org> wrote:> > I am now getting the occasional complaint about survival routines that are not able to > handle big data. I looked in the manuals to try and update my understanding of max > vector size, max matrix, max data set, etc; but it is either not there or I missed it (the > latter more likely). Is it still .Machine$integer.max for everything? Will that > change? Found where?FWIW, this is the reference I've decided to follow for matrixStats: "* For now, keep 2^31-1 limit on matrix rows and columns." from Slide 5 in Luke Tierney's 'Some new developments for the R engine', June 24, 2012 (http://homepage.stat.uiowa.edu/~luke/talks/purdue12.pdf). /Henrik> > I am going to need to go through the survival package and put specific checks in front > some or all of my .Call() statements, in order to give a sensible message whenever a > bounday is struck. A well meaning person just posted a suggested "bug fix" to the github > source of one routine where my .C call allocates a scratch vector, suggesting "resid > double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" message, in > the code below. A fix is obvously not quite that easy :-) > > resid <- .C(Ccoxscore, as.integer(n), > as.integer(nvar), > as.double(y), > x=as.double(x), > as.integer(newstrat), > as.double(score), > as.double(weights[ord]), > as.integer(method=='efron'), > resid= double(n*nvar), > double(2*nvar))$resid > > Terry T. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Does this help a little? https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Long-vectors One thing I seem to remember but cannot find a reference for is that long vectors can only be passed to .Call calls, not C/Fortran. I remember rewriting .C() in my WGCNA package to .Call for this very reason but perhaps the restriction has been removed. Peter On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org> wrote:> > I am now getting the occasional complaint about survival routines that are not able to > handle big data. I looked in the manuals to try and update my understanding of max > vector size, max matrix, max data set, etc; but it is either not there or I missed it (the > latter more likely). Is it still .Machine$integer.max for everything? Will that > change? Found where? > > I am going to need to go through the survival package and put specific checks in front > some or all of my .Call() statements, in order to give a sensible message whenever a > bounday is struck. A well meaning person just posted a suggested "bug fix" to the github > source of one routine where my .C call allocates a scratch vector, suggesting "resid > double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" message, in > the code below. A fix is obvously not quite that easy :-) > > resid <- .C(Ccoxscore, as.integer(n), > as.integer(nvar), > as.double(y), > x=as.double(x), > as.integer(newstrat), > as.double(score), > as.double(weights[ord]), > as.integer(method=='efron'), > resid= double(n*nvar), > double(2*nvar))$resid > > Terry T. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
That is indeed helpful; reading the sections around it largely answered my questions. Rinternals.h has the definitions #define allocMatrix Rf_allocMatrix SEXP Rf_allocMatrix(SEXPTYPE, int, int); #define allocVector??? ??? Rf_allocVector SEXP???? Rf_allocVector(SEXPTYPE, R_xlen_t); Which answers the further question of what to expect inside C routines invoked by Call. It looks like the internal C routines for coxph work on large matrices by pure serendipity (nrow and ncol each less than 2^31 but with the product? > 2^31), but residuals.coxph fails with an allocation error on the same data.? A slight change and it could just as easily have led to a hard crash. ?? Sigh...?? I'll need to do a complete code review.?? I've been converting .C routines to .Call? as convenient, this will force conversion of many of the rest as a side effect (20 done, 23 to go).? As a statsitician my overall response is "haven't they ever heard of sampling"?? But as I said earlier, it isn't just one user. Terry T. On 10/02/2018 12:22 PM, Peter Langfelder wrote:> Does this help a little? > > https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Long-vectors > > One thing I seem to remember but cannot find a reference for is that > long vectors can only be passed to .Call calls, not C/Fortran. I > remember rewriting .C() in my WGCNA package to .Call for this very > reason but perhaps the restriction has been removed. > > Peter > On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel > <r-devel at r-project.org> wrote: >> I am now getting the occasional complaint about survival routines that are not able to >> handle big data. I looked in the manuals to try and update my understanding of max >> vector size, max matrix, max data set, etc; but it is either not there or I missed it (the >> latter more likely). Is it still .Machine$integer.max for everything? Will that >> change? Found where? >> >> I am going to need to go through the survival package and put specific checks in front >> some or all of my .Call() statements, in order to give a sensible message whenever a >> bounday is struck. A well meaning person just posted a suggested "bug fix" to the github >> source of one routine where my .C call allocates a scratch vector, suggesting "resid >> double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" message, in >> the code below. A fix is obvously not quite that easy :-) >> >> resid <- .C(Ccoxscore, as.integer(n), >> as.integer(nvar), >> as.double(y), >> x=as.double(x), >> as.integer(newstrat), >> as.double(score), >> as.double(weights[ord]), >> as.integer(method=='efron'), >> resid= double(n*nvar), >> double(2*nvar))$resid >> >> Terry T. >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]