Jonathan Greenberg
2013-Jun-20 14:45 UTC
[R] Determining the maximum memory usage of a function
Folks: I apologize for the cross-posting between r-help and r-sig-hpc, but I figured this question was relevant to both lists. I'm writing a function to be applied to an input dataset that will be broken up into chunks for memory management reasons and for parallel execution. I am trying to determine, for a given function, what the *maximum* memory usage during its execution is (which may not be the beginning or the end of the function, but somewhere in the middle), so I can "plan" for the chunk size (e.g. have a table of chunk size vs. max memory usage). Is there a trick for determining this? --j -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
What I would do is to use "memory.size()" to get the amount of memory being used. Do a call at the beginning of the function to determine the base, and then at other points in the code to see what the difference from the base is and keep track of the maximum difference. I am not sure if just getting the memory usage at the end would be sufficient since there may be some garbage collection in between, or you might be creating some large objects and then deleting/reusing them. So keep track after large chunks of code to see what is happening. On Thu, Jun 20, 2013 at 10:45 AM, Jonathan Greenberg <jgrn@illinois.edu>wrote:> Folks: > > I apologize for the cross-posting between r-help and r-sig-hpc, but I > figured this question was relevant to both lists. I'm writing a > function to be applied to an input dataset that will be broken up into > chunks for memory management reasons and for parallel execution. I am > trying to determine, for a given function, what the *maximum* memory > usage during its execution is (which may not be the beginning or the > end of the function, but somewhere in the middle), so I can "plan" for > the chunk size (e.g. have a table of chunk size vs. max memory usage). > > Is there a trick for determining this? > > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Global Environmental Analysis and Remote Sensing (GEARS) Laboratory > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 217-300-1924 > http://www.geog.illinois.edu/~jgrn/ > AIM: jgrn307, MSN: jgrn307@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]
Prof Brian Ripley
2013-Jun-20 15:34 UTC
[R] Determining the maximum memory usage of a function
On 20/06/2013 15:45, Jonathan Greenberg wrote:> Folks: > > I apologize for the cross-posting between r-help and r-sig-hpc, but I > figured this question was relevant to both lists. I'm writing a > function to be applied to an input dataset that will be broken up into > chunks for memory management reasons and for parallel execution. I am > trying to determine, for a given function, what the *maximum* memory > usage during its execution is (which may not be the beginning or the > end of the function, but somewhere in the middle), so I can "plan" for > the chunk size (e.g. have a table of chunk size vs. max memory usage). > > Is there a trick for determining this?Note that your subject line and the body of your message are different questions. You cannot determine the memory usage of any part of R, in particular not of a function's execution. Objects are shared, garbage collection happens asynchronously .... However, gc() is a good start. Call gc(reset = TRUE) before and gc() after your task, and you will see the maximum extra memory used by R in the interim. (This does not include memory malloced by compiled code, which is much harder to measure as it gets re-used.) Note that calls to gc() do affect the usage, and the usage also depends on what had already been done in the session (as the trigger values adapt to usage).> > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Global Environmental Analysis and Remote Sensing (GEARS) Laboratory > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 217-300-1924 > http://www.geog.illinois.edu/~jgrn/ > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595