Kévin Pemonon
2023-Aug-16 09:22 UTC
[R] R - Problem retrieving memory used after gc() using arrow library
Hello, I'm using R versions 4.1.3 on Windows 10 and I'm having a problem with memory usage. Currently, I need to use the arrow and dplyr libraries in a program and when I compare the memory used between the windows task manager and the memory.size(max=F) function, the one given by the windows task manager is much larger, 243.5 MB RAM Windows <https://i.stack.imgur.com/nlWnL.png>, than the one given by the memory.size(max=F) function, 75.77 MB. However, I delete objects created with rm() and then use the gc() function to recover the memory used by the object. Attached is the R code, with and without output, that I used to present my problem. Do you think this memory difference is normal? Could it be caused by the libraries used and/or by bad practices in using the R language? I'd like to understand why there's a difference in memory used between the Windows task manager and R's memory.size(max=F) function. Thank you for your help, and I remain at your disposal for any further information you may require. Best regards, -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: r_code_with_output.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20230816/1e28644e/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: r_code.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20230816/1e28644e/attachment-0001.txt>
Ivan Krylov
2023-Aug-16 10:57 UTC
[R] R - Problem retrieving memory used after gc() using arrow library
On Wed, 16 Aug 2023 11:22:00 +0200 K?vin Pemonon <kevinpemonon at gmail.com> wrote:> I'd like to understand why there's a difference in memory used > between the Windows task manager and R's memory.size(max=F) function.When R was initially ported to Windows, then-popular version of the system memory allocator was not a good fit for R. R allocates and frees lots of objects, sometimes small ones. The Windows 95-era allocator had backwards-compatibility obligations to lots of incorrectly-written programs that sometimes used memory after freeing it and poked at undocumented implementation details a lot [*]. I don't know the exact reasons (my family didn't even have a computer back then), but it seems that R couldn't make full use of the computer memory without bringing in its own memory allocator (a copy of Doug Lea's malloc). It is this particular allocator that memory.size() has access to. Nowadays, there are many ways for a Windows process to have memory allocated for it, and not all of them are under control of R. Apache Arrow, being "a platform for in-memory data", probably allocates its own memory without asking R to do it. Meanwhile, the Windows implementation of malloc() has improved, so R-4.2.0 got rid of its own copy (which also means no more memory.size()). You are welcome to trust the task manager. -- Best regards, Ivan [*] See, e.g., the free bonus chapters to "The Old New Thing" by Raymond Chen: https://www.informit.com/content/images/9780321440303/samplechapter/Chen_bonus_ch01.pdf https://www.informit.com/content/images/9780321440303/samplechapter/Chen_bonus_ch02.pdf