The Rprofmem facility is currently enabled only if the configuration option --enable-memory-profiling is used. However, the overhead of having it enabled is negligible when profiling is not actually being done, and can easily be made even smaller. So I think it ought to be enabled all the time. I've attached a patch doing this, which also makes a number of other improvements to Rprofmem, which are upward-compatible with the current version. First, it allows for the profiling reports to be printed to the terminal (with Rprintf) as well as or instead of going to a file. This is not only a convenience, but also provides more information when these reports are interspersed with other output, allowing the source of the memory allocations to be better determined. Second, it gives the option for the alloction reports to include the type of the vector and its length, not just the number of bytes allocated. Third, it allows for all vector allocations to be reported, not just those that are large enough to be done with malloc (this distinction doesn't seem important for most uses). It also allows for reports to be produced only when the vector allocated has at least some number of elements, rather than only providing a threshold based on number of bytes. This seems more natural if a user knows that they are dealing with some vectors of length 100000 and wants to see only those (or bigger ones). Also, if either the option for terminal output or for type and length details is used, allocation reports always end with a newline. For some reason, the present code doesn't write a newline if the call stack happens to be empty. (Though not documented, this is clearly deliberate, not a bug.) Though I can't see why one would want this, it is retained as the default behaviour for backward compatibility. Finally, I changed the printf format for values that are cast with (unsigned long) to %lu, rather than %ld, which I think is not correct. I think incorporating this in the upcoming 2.14.0 release would be useful. For instance, the following gives some useful insights into what R is doing with memory allocation: > Rprofmem("",terminal=TRUE,pages=FALSE,details=TRUE,nelem=10000) > f <- function (x) + { cat("Now in f\n"); + s <- sum(x); + cat("sum =",s,"\n"); + x[10] <- 1; + s <- sum(x); + cat("sum =",s,"\n") + x[20] <- 1; + s <- sum(x); + cat("sum =",s,"\n") + y <<- x + NULL + } > f(rep(2,10000)) Now in f RPROFMEM: 40040 (integer 10000):"f" RPROFMEM: 40040 (integer 10000):"f" RPROFMEM: 80040 (double 10000):"f" sum = 20000 RPROFMEM: 80040 (double 10000):"f" sum = 19999 sum = 19998 RPROFMEM: 80040 (double 10000):"f" NULL > y[1] <- 0 > Rprofmem("") You can see the details of my modifications from the output of help(Rprofmem) after applying the patch I have attached, which I've put below: Rprofmem package:utils R Documentation Enable Profiling of R's Memory Use Description: Enable or disable reporting of memory allocation in R. Usage: Rprofmem(filename = "Rprofmem.out", append = FALSE, threshold = 0, nelem = 0, terminal = FALSE, pages = TRUE, details = FALSE) Arguments: filename: The file to which reports of memory allocations are written, or 'NULL' or '""' if reports should not go to a file. append: logical: should the file be appended to rather than overwritten? threshold: numeric: only allocations of vectors with size larger than this number of bytes will be reported. nelem: numeric: only allocations of vectors with at least this many elements will be reported. terminal: logical: should reports be printed on the terminal (as well as possibly written to 'filename')? pages: logical: should allocation of pages for small vectors be reported, and reporting of individual small vector allocations suppressed? details: logical: should details of allocation be reported, rather than only the total number of bytes? Details: The profiler tracks memory allocations, some of which will be to previously used memory and will not increase the total memory use of R. Calling 'Rprofmem' with either 'terminal=TRUE' or with 'filename' something other than 'NULL' or '""' (or both) will enable profiling, with allocation reports going to one or both places. Reports to the terminal are preceded by "RPROFMEM:". Enabling profiling automatically disables any existing profiling to another or the same file or to the terminal. Calling 'Rprofmem' with 'terminal=FALSE' (the default) and 'filename' either 'NULL' or '""' will disable profiling. If 'pages=TRUE' (the default) allocations of individual vectors will be reported only if they are "large", and allocations of pages to hold small vectors will be reported. The size of a page of memory and the size over which a vector is "large" (and hence for which 'malloc' is used) are compile-time constants, by default 2000 and 128 bytes respectively. If 'pages=FALSE', allocations of all vectors with size over 'threshold' and number of elements at least 'nelem' are reported, and page allocations are not reported. A report of an allocation of a vector (to 'filename' and/or the terminal) will contain the number of bytes allocated and the names of functions in the call stack. If 'details=TRUE' (not the default), the type and length of the vector allocated will also be displayed (in parentheses) before the call stack. An allocation of a page for small vectors (when 'pages=TRUE') will result in a report consisting of "new page:" followed by the call stack. When 'terminal=TRUE' or 'details=TRUE', a newline is always written after each allocation report. For backward compatibility, this is otherwise not the case when the call stack is empty. Value: None Note: The memory profiler can be used at the same time as other R and C profilers. See Also: The R sampling profiler, 'Rprof' also collects memory information. 'tracemem' traces duplications of specific objects. The "Writing R Extensions" manual section on "Tidying and profiling R code" Examples: # Reports printed to the terminal, with details, for all vectors of # at least 10 elements. Rprofmem("", terminal=TRUE, pages=FALSE, details=TRUE, nelem=10) v <- numeric(10) v[3] <- 1 u <- v v[3] <- 2 Rprofmem("") ## Not run: # Reports go to a file. Rprofmem("Rprofmem.out", threshold=1000) example(glm) Rprofmem(NULL) noquote(readLines("Rprofmem.out", n=5)) ## End(Not run) -------------- next part -------------- Index: src/library/utils/R/Rprof.R ==================================================================--- src/library/utils/R/Rprof.R (revision 35) +++ src/library/utils/R/Rprof.R (working copy) @@ -21,8 +21,10 @@ invisible(.Internal(Rprof(filename, append, interval, memory.profiling))) } -Rprofmem <- function(filename = "Rprofmem.out", append = FALSE, threshold = 0) +Rprofmem <- function(filename = "Rprofmem.out", append = FALSE, threshold = 0, + nelem = 0, terminal = FALSE, pages = TRUE, details = FALSE) { if(is.null(filename)) filename <- "" - invisible(.Internal(Rprofmem(filename, append, as.double(threshold)))) + invisible(.Internal(Rprofmem(filename, append, as.double(threshold), + as.double(nelem), terminal, pages, details))) } Index: src/library/utils/man/Rprofmem.Rd ==================================================================--- src/library/utils/man/Rprofmem.Rd (revision 35) +++ src/library/utils/man/Rprofmem.Rd (working copy) @@ -10,35 +10,66 @@ Enable or disable reporting of memory allocation in R. } \usage{ -Rprofmem(filename = "Rprofmem.out", append = FALSE, threshold = 0) +Rprofmem(filename = "Rprofmem.out", append = FALSE, + threshold = 0, nelem = 0, + terminal = FALSE, pages = TRUE, details = FALSE) } \arguments{ - \item{filename}{The file to be used for recording the memory - allocations. Set to \code{NULL} or \code{""} to disable reporting. } - \item{append}{logical: should the file be over-written or appended to? } - \item{threshold}{numeric: allocations on R's "large vector" heap - larger than this number of bytes will be reported. - } + \item{filename}{The file to which reports of memory allocations are written, + or \code{NULL} or \code{""} if reports should not go to a file. } + \item{append}{logical: should the file be appended to rather than + overwritten?} + \item{threshold}{numeric: only allocations of vectors + with size larger than this number of bytes will be reported.} + \item{nelem}{numeric: only allocations of vectors + with at least this many elements will be reported.} + \item{terminal}{logical: should reports be printed on the terminal (as + well as possibly written to \code{filename})?} + \item{pages}{logical: should allocation of pages for small vectors + be reported, and reporting of individual small vector allocations + suppressed?} + \item{details}{logical: should details of allocation be reported, + rather than only the total number of bytes?} } \details{ - Enabling profiling automatically disables any existing profiling to - another or the same file. + The profiler tracks memory allocations, some of which will be to previously + used memory and will not increase the total memory use of R. - Profiling writes the call stack to the specified file every time - \code{malloc} is called to allocate a large vector object or to - allocate a page of memory for small objects. The size of a page of - memory and the size above which \code{malloc} is used for vectors are - compile-time constants, by default 2000 and 128 bytes respectively. + Calling \code{Rprofmem} with either \code{terminal=TRUE} or with + \code{filename} something other than \code{NULL} or \code{""} (or both) + will enable profiling, with allocation reports going to one or both + places. Reports to the terminal are preceded by "RPROFMEM:". + Enabling profiling automatically disables any + existing profiling to another or the same file or to the terminal. - The profiler tracks allocations, some of which will be to previously - used memory and will not increase the total memory use of R. + Calling \code{Rprofmem} with \code{terminal=FALSE} (the default) and + \code{filename} either \code{NULL} or \code{""} will disable profiling. + + If \code{pages=TRUE} (the default) allocations of individual + vectors will be reported only if they are "large", and allocations of + pages to hold small vectors will be reported. + The size of a page of memory and the size over which a vector is "large" + (and hence for which \code{malloc} is used) are compile-time constants, + by default 2000 and 128 bytes respectively. + + If \code{pages=FALSE}, allocations + of all vectors with size over \code{threshold} and number of elements + at least \code{nelem} are reported, and page allocations are not reported. + + A report of an allocation of a vector (to \code{filename} and/or the + terminal) will contain the number of bytes allocated and the names of + functions in the call stack. If \code{details=TRUE} (not the default), + the type and length of the vector allocated will also be displayed + (in parentheses) before the call stack. + + An allocation of a page for small vectors (when \code{pages=TRUE}) will + result in a report consisting of "new page:" followed by the call stack. + + When \code{terminal=TRUE} or \code{details=TRUE}, a newline is always + written after each allocation report. For backward compatibility, this + is otherwise not the case when the call stack is empty. } \note{ - The memory profiler slows down R even when not in use, and so is a - compile-time option. -#ifdef Windows - (It is enabled in a standard Windows build of \R.) -#endif The memory profiler can be used at the same time as other \R and C profilers. } \value{ @@ -53,8 +84,18 @@ The "Writing R Extensions" manual section on "Tidying and profiling R code" } -\examples{\dontrun{ -## not supported unless R is compiled to support it. +\examples{ +# Reports printed to the terminal, with details, for all vectors of +# at least 10 elements. +Rprofmem("", terminal=TRUE, pages=FALSE, details=TRUE, nelem=10) +v <- numeric(10) +v[3] <- 1 +u <- v +v[3] <- 2 +Rprofmem("") + +\dontrun{ +# Reports go to a file. Rprofmem("Rprofmem.out", threshold=1000) example(glm) Rprofmem(NULL) Index: src/main/names.c ==================================================================--- src/main/names.c (revision 35) +++ src/main/names.c (working copy) @@ -749,7 +749,7 @@ {"rowSums", do_colsum, 2, 11, 4, {PP_FUNCALL, PREC_FN, 0}}, {"rowMeans", do_colsum, 3, 11, 4, {PP_FUNCALL, PREC_FN, 0}}, {"Rprof", do_Rprof, 0, 11, 4, {PP_FUNCALL, PREC_FN, 0}}, -{"Rprofmem", do_Rprofmem, 0, 11, 3, {PP_FUNCALL, PREC_FN, 0}}, +{"Rprofmem", do_Rprofmem, 0, 11, 7, {PP_FUNCALL, PREC_FN, 0}}, {"tracemem", do_tracemem, 0, 1, 1, {PP_FUNCALL, PREC_FN, 0}}, {"retracemem", do_retracemem, 0, 201, -1, {PP_FUNCALL, PREC_FN, 0}}, {"untracemem", do_untracemem, 0, 101, 1, {PP_FUNCALL, PREC_FN, 0}}, Index: src/main/memory.c ==================================================================--- src/main/memory.c (revision 35) +++ src/main/memory.c (working copy) @@ -220,10 +220,17 @@ # define FORCE_GC 0 #endif -#ifdef R_MEMORY_PROFILING -static void R_ReportAllocation(R_size_t); +/* Declarations relating to Rprofmem */ + +static int R_IsMemReporting; +static int R_MemReportingToTerminal; +static int R_MemPagesReporting; +static int R_MemDetailsReporting; +static FILE *R_MemReportingOutfile; +static R_size_t R_MemReportingThreshold; +static R_len_t R_MemReportingNElem; +static void R_ReportAllocation (R_size_t, SEXPTYPE, R_len_t); static void R_ReportNewPage(); -#endif extern SEXP framenames; @@ -790,9 +797,7 @@ if (page == NULL) mem_err_malloc((R_size_t) R_PAGE_SIZE); } -#ifdef R_MEMORY_PROFILING - R_ReportNewPage(); -#endif + if (R_IsMemReporting) R_ReportNewPage(); page->next = R_GenHeap[node_class].pages; R_GenHeap[node_class].pages = page; R_GenHeap[node_class].PageCount++; @@ -2312,6 +2317,13 @@ } } + if (R_IsMemReporting) { + if (!R_MemPagesReporting + || size > 0 && node_class >= NUM_SMALL_NODE_CLASSES) + R_ReportAllocation (sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC), + type, length); + } + /* save current R_VSize to roll back adjustment if malloc fails */ old_R_VSize = R_VSize; @@ -2351,9 +2363,6 @@ s = malloc(sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC)); } if (s != NULL) success = TRUE; -#ifdef R_MEMORY_PROFILING - R_ReportAllocation(sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC)); -#endif } if (! success) { double dsize = (double)size * sizeof(VECREC)/1024.0; @@ -3247,63 +3256,82 @@ int attribute_hidden (IS_CACHED)(SEXP x) { return IS_CACHED(x); } /*******************************************/ -/* Non-sampling memory use profiler - reports all large vector heap - allocations and all calls to GetNewPage */ +/* Non-sampling memory use profiler reports vector allocations and/or + calls to GetNewPage */ /*******************************************/ -#ifndef R_MEMORY_PROFILING - -SEXP attribute_hidden do_Rprofmem(SEXP call, SEXP op, SEXP args, SEXP rho) +static void R_OutputStackTrace (void) { - error(_("memory profiling is not available on this system")); - return R_NilValue; /* not reached */ -} + RCNTXT *cptr; + int newline; -#else -static int R_IsMemReporting; /* Rboolean more appropriate? */ -static FILE *R_MemReportingOutfile; -static R_size_t R_MemReportingThreshold; + newline = R_MemReportingToTerminal | R_MemDetailsReporting; -static void R_OutputStackTrace(FILE *file) -{ - int newline = 0; - RCNTXT *cptr; - for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) { if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN)) && TYPEOF(cptr->call) == LANGSXP) { SEXP fun = CAR(cptr->call); if (!newline) newline = 1; - fprintf(file, "\"%s\" ", - TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) : - "<Anonymous>"); + if (R_MemReportingOutfile != NULL) + fprintf(R_MemReportingOutfile, "\"%s\" ", + TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) : + "<Anonymous>"); + if (R_MemReportingToTerminal) + Rprintf("\"%s\" ", + TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) : + "<Anonymous>"); } } - if (newline) fprintf(file, "\n"); + + if (newline) { + if (R_MemReportingOutfile != NULL) + fprintf(R_MemReportingOutfile, "\n"); + if (R_MemReportingToTerminal) + Rprintf("\n"); + } } -static void R_ReportAllocation(R_size_t size) +static void R_ReportAllocation (R_size_t size, SEXPTYPE type, R_len_t length) { - if (R_IsMemReporting) { - if(size > R_MemReportingThreshold) { - fprintf(R_MemReportingOutfile, "%ld :", (unsigned long) size); - R_OutputStackTrace(R_MemReportingOutfile); - } + if (size > R_MemReportingThreshold && length >= R_MemReportingNElem) { + if (R_MemReportingOutfile != NULL) { + if (R_MemDetailsReporting) + fprintf (R_MemReportingOutfile, "%lu (%s %lu):", + (unsigned long) size, + type==intCHARSXP ? "char" : type2char(type), + (unsigned long) length); + else + fprintf (R_MemReportingOutfile, "%lu :", + (unsigned long) size); + } + if (R_MemReportingToTerminal) { + if (R_MemDetailsReporting) + Rprintf ("RPROFMEM: %lu (%s %lu):", + (unsigned long) size, + type==intCHARSXP ? "char" : type2char(type), + (unsigned long) length); + else + Rprintf ("RPROFMEM: %lu :", + (unsigned long) size); + } + R_OutputStackTrace(); } return; } static void R_ReportNewPage(void) { - if (R_IsMemReporting) { - fprintf(R_MemReportingOutfile, "new page:"); - R_OutputStackTrace(R_MemReportingOutfile); + if (R_MemPagesReporting) { + if (R_MemReportingOutfile != NULL) + fprintf(R_MemReportingOutfile, "new page:"); + if (R_MemReportingToTerminal) + Rprintf("RPROFMEM: new page:"); + R_OutputStackTrace(); } return; } -static void R_EndMemReporting() +static void R_EndMemReporting(void) { if(R_MemReportingOutfile != NULL) { /* does not fclose always flush? */ @@ -3315,39 +3343,66 @@ return; } -static void R_InitMemReporting(SEXP filename, int append, - R_size_t threshold) +static void R_InitMemReporting(SEXP filename, int append) { - if(R_MemReportingOutfile != NULL) R_EndMemReporting(); - R_MemReportingOutfile = RC_fopen(filename, append ? "a" : "w", TRUE); - if (R_MemReportingOutfile == NULL) - error(_("Rprofmem: cannot open output file '%s'"), filename); - R_MemReportingThreshold = threshold; + if (R_IsMemReporting) + R_EndMemReporting(); + + if (strlen(CHAR(filename)) > 0) { + R_MemReportingOutfile = RC_fopen(filename, append ? "a" : "w", TRUE); + if (R_MemReportingOutfile == NULL) + error(_("Rprofmem: cannot open output file '%s'"), filename); + } + else + R_MemReportingOutfile = NULL; + R_IsMemReporting = 1; + return; } SEXP attribute_hidden do_Rprofmem(SEXP call, SEXP op, SEXP args, SEXP rho) { - SEXP filename; - R_size_t threshold; + SEXP filename, ap; int append_mode; checkArity(op, args); - if (!isString(CAR(args)) || (LENGTH(CAR(args))) != 1) + + ap = args; + if (!isString(CAR(ap)) || (LENGTH(CAR(ap))) != 1) error(_("invalid '%s' argument"), "filename"); - append_mode = asLogical(CADR(args)); - filename = STRING_ELT(CAR(args), 0); - threshold = REAL(CADDR(args))[0]; - if (strlen(CHAR(filename))) - R_InitMemReporting(filename, append_mode, threshold); + filename = STRING_ELT(CAR(ap), 0); + + ap = CDR(ap); + append_mode = asLogical(CAR(ap)); + + ap = CDR(ap); + if (!isReal(CAR(ap)) || (LENGTH(CAR(ap))) != 1) + error(_("invalid '%s' argument"), "threshold"); + R_MemReportingThreshold = REAL(CAR(ap))[0]; + + ap = CDR(ap); + if (!isReal(CAR(ap)) || (LENGTH(CAR(ap))) != 1) + error(_("invalid '%s' argument"), "nelem"); + R_MemReportingNElem = REAL(CAR(ap))[0]; + + ap = CDR(ap); + R_MemReportingToTerminal = asLogical(CAR(ap)); + + ap = CDR(ap); + R_MemPagesReporting = asLogical(CAR(ap)); + + ap = CDR(ap); + R_MemDetailsReporting = asLogical(CAR(ap)); + + if (R_MemReportingToTerminal || strlen(CHAR(filename)) > 0) + R_InitMemReporting(filename, append_mode); else R_EndMemReporting(); + return R_NilValue; } -#endif /* R_MEMORY_PROFILING */ - /* RBufferUtils, moved from deparse.c */ #include "RBufferUtils.h" Index: doc/manual/R-exts.texi ==================================================================--- doc/manual/R-exts.texi (revision 35) +++ doc/manual/R-exts.texi (working copy) @@ -5273,8 +5273,8 @@ Measuring memory use in @R{} code is useful either when the code takes more memory than is conveniently available or when memory allocation and copying of objects is responsible for slow code. There are three -ways to profile memory use over time in @R{} code. All three require - at R{} to have been compiled with @option{--enable-memory-profiling}, +ways to profile memory use over time in @R{} code. All except @code{Rprofmem} +require @R{} to have been compiled with @option{--enable-memory-profiling}, which is not the default. All can be misleading, for different reasons.