Dear list, I'm sorry to keep coming back with this time and time again, but this bug is still not fixed even though the root cause of the issue has been around for 2-3 years now. And as the number of packages that depend on XML grows, I thought maybe this deserves some wider attention. I did my best to make reproduction of the issue as easy as possible: https://github.com/omegahat/XML/issues/4 http://goo.gl/aV17Lv But as I'm not familiar with C I'm kind of out of clues of what else do to. Duncan has been really dedicated and helpful so far, but unfortunately he seems to have too little time to really dig into this himself. So I thought I'd try and raise the attention of other developers that have the skills to fix this. Apparently, the issue is caused by the way the memory consumed by the underlying C-objects/pointers is released (or not released, for that matter). I'd so much appreciate if someone could have a look at this. If I can be of any help whatsoever, please let me know! Thanks and best regards, Janko [[alternative HTML version deleted]]
On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson <janko.thyson at gmail.com> wrote:> > I'd so much appreciate if someone could have a look at this. If I can be of > any help whatsoever, please let me know! >Your current code uses various functions from XML and rvest so it is not a *minimal* reproducible example. Even if you are unfamiliar with C, you should be able to investigate exactly which function in the XML package you think has issues. Once you found the problematic R function, inspect the source code or use debug() to see if you can narrow it down even further, preferably to a particular call to C. Moreover you should create a reproducible example that allows us (and you) to test if this problem appears on other systems such as OSX or linux. Development and debugging on Windows is very painful so your windows-only example is not too helpful. Making people use windows is not a good strategy for getting help. If the "leak" does not appear on other systems, it is likely a problem in the libxml2 windows library on cran. In that case we can try to link against another build. On the other hand, if the problem does appear across systems, and you have provided a minimal reproducible example that pinpoints the problematic C function, we can help you review/debug the code C to see if/where some allocated object is not properly freed. [[alternative HTML version deleted]]
Duncan Temple Lang
2014-Dec-15 04:07 UTC
[Rd] Significant memory leak when using XML on Windows
Janko and I have been in touch. This is, I believe, a Windows specific issue and a compilation issue. I could be wrong, but that is my impression from other reports. When I have time (?! :-)), I will deal with it. Hopefully this will be very soon. Thanks Janko. D. On 12/14/14, 7:54 PM, Jeroen Ooms wrote:> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson > <janko.thyson at gmail.com <mailto:janko.thyson at gmail.com>> wrote: > > I'd so much appreciate if someone could have a look at this. If I can > be of any help whatsoever, please let me know! > > > Your current code uses various functions from XML and rvest so it is > not a *minimal* reproducible example. Even if you are unfamiliar with > C, you should be able to investigate exactly which function in the > XML package you think has issues. Once you found the problematic R > function, inspect the source code or use debug() to see if you can > narrow it down even further, preferably to a particular call to C. > > Moreover you should create a reproducible example that allows us (and > you) to test if this problem appears on other systems such as OSX or > linux. Development and debugging on Windows is very painful so your > windows-only example is not too helpful. Making people use windows is > not a good strategy for getting help. > > If the "leak" does not appear on other systems, it is likely a > problem in the libxml2 windows library on cran. In that case we can > try to link against another build. On the other hand, if the problem > does appear across systems, and you have provided a minimal > reproducible example that pinpoints the problematic C function, we > can help you review/debug the code C to see if/where some allocated > object is not properly freed. > > >
Thanks a lot for answering. Before I get into it, please note that everything below bears the big capture "Thanks for trying to help me at all". 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While the one side complained that my past examples regarding this issue were not informative enough, others didn't like the more elaborated version (as seems to be the case for you). I simply tried to make it as easy as possible for people to see what's actually going on so they wouldn't have to program their own stuff for things like reading the actual memory consumed by the Rterm process etc.. If you prefer plain vanilla, though, I guess this would be it: memoryLeak <- function( x = system.file("exampleData", "mtcars.xml", package="XML"), n = 5000, free_doc = FALSE, rm_doc = FALSE, use_gc = FALSE ) { lapply(1:n, function(ii) { doc <- xmlParse(x) if (free_doc) free(doc) if (rm_doc) rm(doc) if (use_gc) gc() NULL }) } 2) If I knew my way around OSX or Linux, I would be happy to go with your suggestions - but as I'm not, unfortunately that's out of reach for me. But IMO, a deeper level of cross-platform expertise should **not** be a generall prerequisite before you can ask for help - even at r-devel (as opposed to r-help). However, AFAIK from past conversations with Duncan, the problem is indeed Windows-specific as on all his non-Windows infrastructure (definitely Linux, possibly OSX), everything went fine. 3) The same goes for the level of expertise in C. After all, R is not C. I totally agree that the more programming languages one knows, the better. But again: I don't think that knowing your way around C should be a prerequisite for asking for help when an *R function* interfacing C causes trouble. Requesting this would sort of oppose R's nature/paradigm of being an awesome "top-level" interfacing language. But I'll try to narrow the problem down on a C-level if I can help you with that. 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed causing the problem. So trying to link against another build would possibly be a great way to start! How would I go about that? Thanks if you should take the time to further look into this! Janko On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms <jeroenooms at gmail.com> wrote:> > On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson <janko.thyson at gmail.com> > wrote: >> >> I'd so much appreciate if someone could have a look at this. If I can be >> of >> any help whatsoever, please let me know! >> > > Your current code uses various functions from XML and rvest so it is not a > *minimal* reproducible example. Even if you are unfamiliar with C, you > should be able to investigate exactly which function in the XML package you > think has issues. Once you found the problematic R function, inspect the > source code or use debug() to see if you can narrow it down even further, > preferably to a particular call to C. > > Moreover you should create a reproducible example that allows us (and you) > to test if this problem appears on other systems such as OSX or linux. > Development and debugging on Windows is very painful so your windows-only > example is not too helpful. Making people use windows is not a good > strategy for getting help. > > If the "leak" does not appear on other systems, it is likely a problem in > the libxml2 windows library on cran. In that case we can try to link > against another build. On the other hand, if the problem does appear across > systems, and you have provided a minimal reproducible example that > pinpoints the problematic C function, we can help you review/debug the code > C to see if/where some allocated object is not properly freed. > > > >[[alternative HTML version deleted]]