Andreas Kersting
2021-Mar-30 05:27 UTC
[Rd] custom allocators, Valgrind and uninitialized memory
Hi Simon, Yes, if this was acceptable on CRAN, I would agree that calling VALGRIND_MAKE_MEM_DEFINED() in my code would be sufficient. But since Tomas said, "So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary.", I am not sure if it is acceptable. Regards, Andreas 2021-03-30 00:39 GMT+02:00 "Simon Urbanek" <simon.urbanek at R-project.org>:> Andres, > > correct me if I'm wrong, but the issue here is not initialisation but rather valgrind flagging. You simply have to call VALGRIND_MAKE_MEM_DEFINED() in your code after allocVector3() to declare that you have initialised the memory - or am I missing something? > > Cheers, > Simon > > > >> On 30/03/2021, at 9:18 AM, Andreas Kersting <r-devel at akersting.de> wrote: >> >> Hi Tomas, >> >> Thanks for sharing your view on this! I understand your point, but still I think that the current situation is somewhat unfortunate: >> >> I would argue that mmap() is a natural candidate to be used together with allocVector3(); it is even mentioned explicitly here: https://github.com/wch/r-source/blob/trunk/src/main/memory.c#L2575-L2576 >> >> However, when using a non-anonymous mapping, i.e. we want mmap() to initialize the memory e.g. from a file or a POSIX shared memory object, this means that we need to use MAP_FIXED in case we are obliged to initialize the memory AFTER allocVector3() returned it; at least I cannot think of a different way to achieve this. >> >> The use of MAP_FIXED >> - is discouraged (e.g. https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mmap.2.html) >> - requires two calls to mmap(): (1) to obtain the (anonymous) memory to be handed out by the custom allocater and (2) to actually map the file "over" the just allocated vector (using MAP_FIXED), which will overwrite the vector header; hence, we need to first back it up to later restore it >> >> I have implemented my function using MAP_FIXED here: https://github.com/gfkse/bettermc/commit/f34c4f4c45c9ab11abe9b9e9b8b48064f128d731#diff-7098a5dde34efab163bbef27fe32f95c29e76236649479985d09c70100e4c737R278-R323 >> >> This solution, to me, is much more complicated and hacky than my previous one, which assumed it is OK to hand out already initialized memory directly from allocVector3(). >> >> Regards, >> Andreas >> >> >> 2021-03-29 10:41 GMT+02:00 "Tomas Kalibera" <tomas.kalibera at gmail.com>: >>> Hi Andreas, >>> On 3/26/21 8:48 PM, Andreas Kersting wrote: >>>> Hi Dirk, > > Sure, let me try to explain: > > CRAN ran the tests of my package using R which was configured > --with-valgrind-instrumentation > 0. Valgrind reported many errors > related to the use of supposedly uninitialized memory and the CRAN > team asked me to tackle these. > > These errors are false positives, because I pass a custom allocator > to allocVector3() which hands out memory which is already > initialized. However, this memory is explicitly marked for Valgrind > as uninitialized by allocVector3(), and I do not initialize it > subsequently, so Valgrind complains. > > Now I am asking if it is correct that allocVector3() marks memory as > uninitialized/undefined, even if it comes from a custom allocator. > This is because allocVector3() cannot know if the memory might > already by initialized. >>> I think the semantics of allocVector/allocVector3 should be the same regardless of whether custom allocators are used. The semantics of allocVector is to provide uninitialized memory (non-pointer types, Writing R Extensions 5.9.2). Therefore, it is the caller who needs to take care of initialization. This is also the semantics of "malloc" and Rallocators.h says "custom_alloc_t mem_alloc; /* malloc equivalent */". >>> >>> So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary. >>> >>> So technically speaking, the valgrind reports are not false alarms. I think your call sites should initialize. >>> >>> Best >>> Tomas >>> >>> >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >
Simon Urbanek
2021-Mar-30 08:03 UTC
[Rd] custom allocators, Valgrind and uninitialized memory
Andreas, What does any of this to do with CRAN? This not a the CRAN list - we're discussing the proper approach of using valgrind and R can only assume that the memory is uninitialised (since it cannot safely assume anything else) so it is up to you to declare the memory as initialised if you can guarantee that being true. I don't quite get your earlier response about allocating *after* the call since that makes no sense to me - the whole point of a custom allocator is to allow you to allocate the memory, so whether it is initialised or not is under your control - but that means also it is your responsibility to flag the state accordingly. Note, however, that this is not merely just true by the virtue of using mmap - the memory content is only valid (initialised) if you used mmap with previously initialised content. Again, entirely up to you to decide what the semantics are since you are the author of the custom allocator. Does that make sense? Cheers, Simon> On Mar 30, 2021, at 18:27, Andreas Kersting <r-devel at akersting.de> wrote: > > Hi Simon, > > Yes, if this was acceptable on CRAN, I would agree that calling VALGRIND_MAKE_MEM_DEFINED() in my code would be sufficient. > > But since Tomas said, "So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary.", I am not sure if it is acceptable. > > Regards, > Andreas > > 2021-03-30 00:39 GMT+02:00 "Simon Urbanek" <simon.urbanek at R-project.org>: >> Andres, >> >> correct me if I'm wrong, but the issue here is not initialisation but rather valgrind flagging. You simply have to call VALGRIND_MAKE_MEM_DEFINED() in your code after allocVector3() to declare that you have initialised the memory - or am I missing something? >> >> Cheers, >> Simon >> >> >> >>> On 30/03/2021, at 9:18 AM, Andreas Kersting <r-devel at akersting.de> wrote: >>> >>> Hi Tomas, >>> >>> Thanks for sharing your view on this! I understand your point, but still I think that the current situation is somewhat unfortunate: >>> >>> I would argue that mmap() is a natural candidate to be used together with allocVector3(); it is even mentioned explicitly here: https://github.com/wch/r-source/blob/trunk/src/main/memory.c#L2575-L2576 >>> >>> However, when using a non-anonymous mapping, i.e. we want mmap() to initialize the memory e.g. from a file or a POSIX shared memory object, this means that we need to use MAP_FIXED in case we are obliged to initialize the memory AFTER allocVector3() returned it; at least I cannot think of a different way to achieve this. >>> >>> The use of MAP_FIXED >>> - is discouraged (e.g. https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mmap.2.html) >>> - requires two calls to mmap(): (1) to obtain the (anonymous) memory to be handed out by the custom allocater and (2) to actually map the file "over" the just allocated vector (using MAP_FIXED), which will overwrite the vector header; hence, we need to first back it up to later restore it >>> >>> I have implemented my function using MAP_FIXED here: https://github.com/gfkse/bettermc/commit/f34c4f4c45c9ab11abe9b9e9b8b48064f128d731#diff-7098a5dde34efab163bbef27fe32f95c29e76236649479985d09c70100e4c737R278-R323 >>> >>> This solution, to me, is much more complicated and hacky than my previous one, which assumed it is OK to hand out already initialized memory directly from allocVector3(). >>> >>> Regards, >>> Andreas >>> >>> >>> 2021-03-29 10:41 GMT+02:00 "Tomas Kalibera" <tomas.kalibera at gmail.com>: >>>> Hi Andreas, >>>> On 3/26/21 8:48 PM, Andreas Kersting wrote: >>>>> Hi Dirk, > > Sure, let me try to explain: > > CRAN ran the tests of my package using R which was configured > --with-valgrind-instrumentation > 0. Valgrind reported many errors > related to the use of supposedly uninitialized memory and the CRAN > team asked me to tackle these. > > These errors are false positives, because I pass a custom allocator > to allocVector3() which hands out memory which is already > initialized. However, this memory is explicitly marked for Valgrind > as uninitialized by allocVector3(), and I do not initialize it > subsequently, so Valgrind complains. > > Now I am asking if it is correct that allocVector3() marks memory as > uninitialized/undefined, even if it comes from a custom allocator. > This is because allocVector3() cannot know if the memory might > already by initialized. >>>> I think the semantics of allocVector/allocVector3 should be the same regardless of whether custom allocators are used. The semantics of allocVector is to provide uninitialized memory (non-pointer types, Writing R Extensions 5.9.2). Therefore, it is the caller who needs to take care of initialization. This is also the semantics of "malloc" and Rallocators.h says "custom_alloc_t mem_alloc; /* malloc equivalent */". >>>> >>>> So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary. >>>> >>>> So technically speaking, the valgrind reports are not false alarms. I think your call sites should initialize. >>>> >>>> Best >>>> Tomas >>>> >>>> >>>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Andreas Kersting
2021-Mar-30 08:33 UTC
[Rd] custom allocators, Valgrind and uninitialized memory
Hi Simon, hi Tomas, Let me try to wrap up this discussion: - "What does any of this to do with CRAN?" Not much, I agree. It is just that this whole issue arose because the CRAN team asked me to fix the use of uninitialized memory as reported by Valgrind. Sorry for mixing things up here. - "I don't quite get your earlier response about allocating *after* the call since that makes no sense to me" I was talking about *initializing* after the call as originally suggested by Tomas and - as I wrote - I also do not like my proposal involving MAP_FIXED. - bottom line allocVector3() is correctly marking memory as uninitialized because it cannot safely assume otherwise. It is ok for a custom allocator to return already initialized memory and inform Valgrind about this fact. I hope, this summarizes it well. Thanks for your time and support, Tomas and Simon. Very much appreciated! Regards, Andreas 2021-03-30 10:03 GMT+02:00 "Simon Urbanek" <simon.urbanek at R-project.org>:> Andreas, > > What does any of this to do with CRAN? This not a the CRAN list - we're discussing the proper approach of using valgrind and R can only assume that the memory is uninitialised (since it cannot safely assume anything else) so it is up to you to declare the memory as initialised if you can guarantee that being true. I don't quite get your earlier response about allocating *after* the call since that makes no sense to me - the whole point of a custom allocator is to allow you to allocate the memory, so whether it is initialised or not is under your control - but that means also it is your responsibility to flag the state accordingly. Note, however, that this is not merely just true by the virtue of using mmap - the memory content is only valid (initialised) if you used mmap with previously initialised content. Again, entirely up to you to decide what the semantics are since you are the author of the custom allocator. Does that make sense? > > Cheers, > Simon > > > >> On Mar 30, 2021, at 18:27, Andreas Kersting <r-devel at akersting.de> wrote: >> >> Hi Simon, >> >> Yes, if this was acceptable on CRAN, I would agree that calling VALGRIND_MAKE_MEM_DEFINED() in my code would be sufficient. >> >> But since Tomas said, "So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary.", I am not sure if it is acceptable. >> >> Regards, >> Andreas >> >> 2021-03-30 00:39 GMT+02:00 "Simon Urbanek" <simon.urbanek at R-project.org>: >>> Andres, >>> >>> correct me if I'm wrong, but the issue here is not initialisation but rather valgrind flagging. You simply have to call VALGRIND_MAKE_MEM_DEFINED() in your code after allocVector3() to declare that you have initialised the memory - or am I missing something? >>> >>> Cheers, >>> Simon >>> >>> >>> >>>> On 30/03/2021, at 9:18 AM, Andreas Kersting <r-devel at akersting.de> wrote: >>>> >>>> Hi Tomas, >>>> >>>> Thanks for sharing your view on this! I understand your point, but still I think that the current situation is somewhat unfortunate: >>>> >>>> I would argue that mmap() is a natural candidate to be used together with allocVector3(); it is even mentioned explicitly here: https://github.com/wch/r-source/blob/trunk/src/main/memory.c#L2575-L2576 >>>> >>>> However, when using a non-anonymous mapping, i.e. we want mmap() to initialize the memory e.g. from a file or a POSIX shared memory object, this means that we need to use MAP_FIXED in case we are obliged to initialize the memory AFTER allocVector3() returned it; at least I cannot think of a different way to achieve this. >>>> >>>> The use of MAP_FIXED >>>> - is discouraged (e.g. https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mmap.2.html) >>>> - requires two calls to mmap(): (1) to obtain the (anonymous) memory to be handed out by the custom allocater and (2) to actually map the file "over" the just allocated vector (using MAP_FIXED), which will overwrite the vector header; hence, we need to first back it up to later restore it >>>> >>>> I have implemented my function using MAP_FIXED here: https://github.com/gfkse/bettermc/commit/f34c4f4c45c9ab11abe9b9e9b8b48064f128d731#diff-7098a5dde34efab163bbef27fe32f95c29e76236649479985d09c70100e4c737R278-R323 >>>> >>>> This solution, to me, is much more complicated and hacky than my previous one, which assumed it is OK to hand out already initialized memory directly from allocVector3(). >>>> >>>> Regards, >>>> Andreas >>>> >>>> >>>> 2021-03-29 10:41 GMT+02:00 "Tomas Kalibera" <tomas.kalibera at gmail.com>: >>>>> Hi Andreas, >>>>> On 3/26/21 8:48 PM, Andreas Kersting wrote: >>>>>> Hi Dirk, > > Sure, let me try to explain: > > CRAN ran the tests of my package using R which was configured > --with-valgrind-instrumentation > 0. Valgrind reported many errors > related to the use of supposedly uninitialized memory and the CRAN > team asked me to tackle these. > > These errors are false positives, because I pass a custom allocator > to allocVector3() which hands out memory which is already > initialized. However, this memory is explicitly marked for Valgrind > as uninitialized by allocVector3(), and I do not initialize it > subsequently, so Valgrind complains. > > Now I am asking if it is correct that allocVector3() marks memory as > uninitialized/undefined, even if it comes from a custom allocator. > This is because allocVector3() cannot know if the memory might > already by initialized. >>>>> I think the semantics of allocVector/allocVector3 should be the same regardless of whether custom allocators are used. The semantics of allocVector is to provide uninitialized memory (non-pointer types, Writing R Extensions 5.9.2). Therefore, it is the caller who needs to take care of initialization. This is also the semantics of "malloc" and Rallocators.h says "custom_alloc_t mem_alloc; /* malloc equivalent */". >>>>> >>>>> So I think that your code using your custom allocator needs to initialize allocated memory to be correct. If your allocator initializes the memory, that is fine, but unnecessary. >>>>> >>>>> So technically speaking, the valgrind reports are not false alarms. I think your call sites should initialize. >>>>> >>>>> Best >>>>> Tomas >>>>> >>>>> >>>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >