Greetings, I have a user who is running an R program on two different Linux systems. For the most part, they are very similar in terms of hardware and 64bit OS. However, they perform significantly different. Under one box the program uses upwards of 20GB of ram but fluctuates around 15GB of ram and the job runs for a few hours. The second box has even more memory available to it, however, the exact same program with the exact same data set peaks at 7GB of ram but runs around 5GB of ram and takes 3x longer to run the job! I did some research, and from what I can tell R should just use as much memory as it needs on Linux. So a lot of the "help" I found online has been windows related information (eg: --max-mem-size ) and not very useful to me. I looked at the ulimits and everything looks like it should be correct (or at least it is comparable to the ulimits on the system that is working correctly). I have also checked other tidbits here and there but nothing seems to be of use. I also checked that a single user can allocate large quantities of memory (eg: Matlab and SAS both were able to allocate 20GB+ of memory) so I don't think it is a user-restriction placed by the OS. The only differences I have found between the two boxes that really stands out is that the system that works runs RHEL proper and has R compiled but the one that doesn't allocate all of the memory was installed via EPEL RPM on CentOS. Compiling R on the CentOS system is on the try-this list, but before I spend that time trying to compile I thought I would ask a few questions. 1) Anyone know why I might be seeing this strange behavior? 5-7GB of ram is clearly over any 32bit limitation so I don't think it has anything to do with that. It could be a RHEL vs CentOS thing, but that seems very strange to me. 2) When I compile from source to test this, is there a specific option I should pass to ensure max usage? Thank you. Chris Stackpole
On 08/12/2013 10:18 AM, Stackpole, Chris wrote:> Greetings, I have a user who is running an R program on two different > Linux systems. For the most part, they are very similar in terms of > hardware and 64bit OS. However, they perform significantly different. > Under one box the program uses upwards of 20GB of ram but fluctuates > around 15GB of ram and the job runs for a few hours. The second box > has even more memory available to it, however, the exact same program > with the exact same data set peaks at 7GB of ram but runs around 5GB > of ram and takes 3x longer to run the job! > > I did some research, and from what I can tell R should just use as > much memory as it needs on Linux. So a lot of the "help" I found > online has been windows related information (eg: --max-mem-size ) and > not very useful to me. I looked at the ulimits and everything looks > like it should be correct (or at least it is comparable to the > ulimits on the system that is working correctly). I have also checked > other tidbits here and there but nothing seems to be of use. I also > checked that a single user can allocate large quantities of memory > (eg: Matlab and SAS both were able to allocate 20GB+ of memory) so I > don't think it is a user-restriction placed by the OS. > > The only differences I have found between the two boxes that really > stands out is that the system that works runs RHEL proper and has R > compiled but the one that doesn't allocate all of the memory was > installed via EPEL RPM on CentOS. Compiling R on the CentOS system is > on the try-this list, but before I spend that time trying to compile > I thought I would ask a few questions. > > 1) Anyone know why I might be seeing this strange behavior? 5-7GB of > ram is clearly over any 32bit limitation so I don't think it has > anything to do with that. It could be a RHEL vs CentOS thing, but > that seems very strange to me. > > 2) When I compile from source to test this, is there a specific > option I should pass to ensure max usage? > > Thank you. > > Chris Stackpole >What does "ulimit -a" report on both of these machines? -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
> From: Jack Challen [mailto:jack.challen at ocsl.co.uk] > Sent: Wednesday, August 14, 2013 10:45 AM > Subject: RE: Memory limit on Linux? > > (I'm replying from a horrific WebMail UI. I've attempted to maintain > what I think is sensible quoting. Hopefully it reads ok).[snip]> If all users are able to allocate that much RAM in a single process > (e.g. "top" shows the process taking 20 GBytes) then it's very unlikely > to be an OS- or user-specific restriction (there are exceptions to that if > e.g. the R job is submitted in a different shell environment [e.g. batch > queuing]). Your ulimits look sensible to me.That is what I thought as well. I was just looking for maybe some cgroup limitation or something similar that might be stirring problems. However, I don't see anything like that.> > The only differences I have found between the two boxes that really > > stands out is that the system that works runs RHEL proper and has R > > compiled but the one that doesn't allocate all of the memory was installed > > via EPEL RPM on CentOS. Compiling R on the CentOS system is on the > > try-this list, but before I spend that time trying to compile I thought I > > would ask a few questions. > > I would look there first. It seems (from the first quoted bit) that your > problem is specific to that version of R on that machine as Matlab can > gobble up RAM happily (I do have a very simple bit of C kicking about > here which is specifically for testing the memory allocation limit of a > system which you could have if you really wanted).Thanks for the offer. I may take you up on that. I am downloading the latest and greatest version of R right now for compiling purposes. If that doesn't work, then I may try your program just to see what results it has. [snip]> > 2) When I compile from source to test this, is there a specific option I should pass to ensure max usage? > > Absolutely no idea, I'm afraid. There is an --enable-memory-profiling > option, but I doubt that''ll solve your problem and it'll probably just > slow R down. I'd simply give compiling it a go.I will report back to the list after I get R compiled. Thanks!
Greetings, Just a follow up on this problem. I am not sure where the problem lies, but we think it is the users code and/or CRAN plugin that may be the cause. We have been getting pretty familiar with R recently and we can allocate and load large datasets into 10+GB of memory. One of our other users runs a program at the start of every week and claims he regularly gets 35+GB of memory (indeed, when we tested it on this week's data set it was just over 30GB). So it is clear that this problem is not a problem with R, the system, or any artificial limits that we can find. So why is there a difference between one system and the other in terms of usage on what should be the exact same code? Well first off, I am not convinced it is the same dataset even though that is the claim (I don't have access to verify for various reasons). Second, he is using some libraries from the CRAN repos. We have already found an instance a few months ago where we had a bad compile that was behaving weird. I reran the compile for that library and it straightened out. I am wondering if this is the possibility again. The user is researching the library sets now. In short, we don't have a solution yet to this explicit problem but at least I know for certain it isn't the system or R. Now that I can take a solid stance on those facts I have good ground to approach the user and politely say "Let's look at how we might be able to improve your code." Thanks to everyone who helped me debug this issue. I do appreciate it. Chris Stackpole