thr3ads.net - llvm dev - [llvm-dev] Eliminating global memory roots (or not) to help leak checkers [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Chris Lattner via llvm-dev

2021-Apr-22 23:28 UTC

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

> On Apr 19, 2021, at 5:24 AM, James Y Knight <jyknight at google.com>
wrote:
> 
> There is no problem (no leaks) in the code that users wrote, so adding code
annotations (sanitizer suppression file, or attributes) is not a good solution
to this issue. The problem is that this optimization introduces a
"leak" (from the point of view of the leak checker), which wasn't
there before. And in practice, this seems to cause a large number of false
positives.
I think that “from the point of view of the leak checker” is the key thing
there.

Code that this triggers *is* leaking memory, it was just silenced because the
leak was spuriously reachable from a global.  Global variables aren’t a
preferred way to silence leak detectors, they have other ways to do so :)
> On Apr 19, 2021, at 4:52 PM, Sterling Augustine <saugustine at
google.com> wrote:
> 
> There may be other ways to disable leak checkers, but they put a burden on
users that was not there before. Further, users who try leak checking with and
without optimization will get different answers. The bug report will read:
"clang at -O2 makes my program leak". And, as James notes, whether or
not you need to suppress the leak depends on whether or not the optimizer does
away with the variable. Subtle changes to the code that have nothing to do with
memory allocation will appear to add or fix leaks. That is not something I would
care to explain or document.
I can see that concern, but this isn’t a battle that we can win: optimizations
in general can expose leaks.

IMO, If someone doesn’t want the global to be removed, they should mark it
volatile.  If they do want it removable, then they can use leak detector
features to silence the warning.
> On Apr 20, 2021, at 9:12 AM, Sterling Augustine <saugustine at
google.com> wrote:
> 
> In order to understand how much benefit this change gives to code size, I
built clang and related files with and without the patch, both with
CMAKE_BUILD_TYPE=Release.
> 
> clang itself gets about 0.4% smaller (from 145217124 to 144631078)
> lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)
> 
> Spot checking a few other random targets, in general, it seems that the
benefits depend heavily on the coding style, but roughly, bigger the binary, the
less benefit this brings.
> 
> I suspect that a more aggressive pass as described by Philip Reames could
capture a significant part of the benefit without sacrificing functionality. But
that would require a more detailed study to be sure.
A ~2% reduction in code size is a huge win.  I agree with your comment about it
being different with different coding styles.  I suspect that this is the sort
of thing that will pay particularly for high abstraction code bases.

I don’t see why we would punish general code to make “code that is leaking where
formerly not detected, and where users don’t want to mark the root as volatile”.
This seems really unprincipled to me, and a slippery slope we can’t go down.

-Chris

pawel k. via llvm-dev

2021-Apr-22 23:37 UTC

head link

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

Hello,
Seems like long and interesting thread. I would love to learn what problem
were facing here. Maybe im ignorant enough to try to tackle it.

Famous quote: I could make great violin player, i never tried it.

Haha __attribute(immortal) on global variable.

Stupid man from poland would say its volatile keyword abuse but maybe minor
and what do i know.

Can someone briefly describe key problematic code, what component fails and
how do we want to workaround it? I could study pros and cons and maybe come
up with set of proposals.

Best regards,
Pawel Kunio


pt., 23.04.2021, 01:29 użytkownik Chris Lattner via llvm-dev <
llvm-dev at lists.llvm.org> napisał:
>
>
> > On Apr 19, 2021, at 5:24 AM, James Y Knight <jyknight at
google.com> wrote:
> >
> > There is no problem (no leaks) in the code that users wrote, so adding
> code annotations (sanitizer suppression file, or attributes) is not a good
> solution to this issue. The problem is that this optimization introduces a
> "leak" (from the point of view of the leak checker), which
wasn't there
> before. And in practice, this seems to cause a large number of false
> positives.
>
> I think that “from the point of view of the leak checker” is the key thing
> there.
>
> Code that this triggers *is* leaking memory, it was just silenced because
> the leak was spuriously reachable from a global.  Global variables aren’t a
> preferred way to silence leak detectors, they have other ways to do so :)
>
> > On Apr 19, 2021, at 4:52 PM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > There may be other ways to disable leak checkers, but they put a
burden
> on users that was not there before. Further, users who try leak checking
> with and without optimization will get different answers. The bug report
> will read: "clang at -O2 makes my program leak". And, as James
notes,
> whether or not you need to suppress the leak depends on whether or not the
> optimizer does away with the variable. Subtle changes to the code that have
> nothing to do with memory allocation will appear to add or fix leaks. That
> is not something I would care to explain or document.
>
> I can see that concern, but this isn’t a battle that we can win:
> optimizations in general can expose leaks.
>
> IMO, If someone doesn’t want the global to be removed, they should mark it
> volatile.  If they do want it removable, then they can use leak detector
> features to silence the warning.
>
> > On Apr 20, 2021, at 9:12 AM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > In order to understand how much benefit this change gives to code
size,
> I built clang and related files with and without the patch, both with
> CMAKE_BUILD_TYPE=Release.
> >
> > clang itself gets about 0.4% smaller (from 145217124 to 144631078)
> > lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)
> >
> > Spot checking a few other random targets, in general, it seems that
the
> benefits depend heavily on the coding style, but roughly, bigger the
> binary, the less benefit this brings.
> >
> > I suspect that a more aggressive pass as described by Philip Reames
> could capture a significant part of the benefit without sacrificing
> functionality. But that would require a more detailed study to be sure.
>
> A ~2% reduction in code size is a huge win.  I agree with your comment
> about it being different with different coding styles.  I suspect that this
> is the sort of thing that will pay particularly for high abstraction code
> bases.
>
> I don’t see why we would punish general code to make “code that is leaking
> where formerly not detected, and where users don’t want to mark the root as
> volatile”.  This seems really unprincipled to me, and a slippery slope we
> can’t go down.
>
> -Chris
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210423/dd5d9d4c/attachment.html>

David Blaikie via llvm-dev

2021-Apr-22 23:40 UTC

head link

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

On Thu, Apr 22, 2021 at 4:28 PM Chris Lattner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
>
>
> > On Apr 19, 2021, at 5:24 AM, James Y Knight <jyknight at
google.com> wrote:
> >
> > There is no problem (no leaks) in the code that users wrote, so adding
code annotations (sanitizer suppression file, or attributes) is not a good
solution to this issue. The problem is that this optimization introduces a
"leak" (from the point of view of the leak checker), which wasn't
there before. And in practice, this seems to cause a large number of false
positives.
>
> I think that “from the point of view of the leak checker” is the key thing
there.
>
> Code that this triggers *is* leaking memory, it was just silenced because
the leak was spuriously reachable from a global.  Global variables aren’t a
preferred way to silence leak detectors, they have other ways to do so :)
"spuriously" reachable is quite questionable here. This way of writing
code is to make the allocation quite intentionally reachable from a
global.

For instance, one of the ways to disable global destruction: (
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
)

"If all else fails, you can create an object dynamically and never
delete it by using a function-local static pointer or reference (e.g.,
static const auto& impl = *new T(args...);)."

(all else fails in a large enough codebase rather often... )
> > On Apr 19, 2021, at 4:52 PM, Sterling Augustine <saugustine at
google.com> wrote:
> >
> > There may be other ways to disable leak checkers, but they put a
burden on users that was not there before. Further, users who try leak checking
with and without optimization will get different answers. The bug report will
read: "clang at -O2 makes my program leak". And, as James notes,
whether or not you need to suppress the leak depends on whether or not the
optimizer does away with the variable. Subtle changes to the code that have
nothing to do with memory allocation will appear to add or fix leaks. That is
not something I would care to explain or document.
>
> I can see that concern, but this isn’t a battle that we can win:
optimizations in general can expose leaks.
>
> IMO, If someone doesn’t want the global to be removed, they should mark it
volatile.  If they do want it removable, then they can use leak detector
features to silence the warning.
>
> > On Apr 20, 2021, at 9:12 AM, Sterling Augustine <saugustine at
google.com> wrote:
> >
> > In order to understand how much benefit this change gives to code
size, I built clang and related files with and without the patch, both with
CMAKE_BUILD_TYPE=Release.
> >
> > clang itself gets about 0.4% smaller (from 145217124 to 144631078)
> > lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)
> >
> > Spot checking a few other random targets, in general, it seems that
the benefits depend heavily on the coding style, but roughly, bigger the binary,
the less benefit this brings.
> >
> > I suspect that a more aggressive pass as described by Philip Reames
could capture a significant part of the benefit without sacrificing
functionality. But that would require a more detailed study to be sure.
>
> A ~2% reduction in code size is a huge win.  I agree with your comment
about it being different with different coding styles.  I suspect that this is
the sort of thing that will pay particularly for high abstraction code bases.
>
> I don’t see why we would punish general code to make “code that is leaking
where formerly not detected, and where users don’t want to mark the root as
volatile”.  This seems really unprincipled to me, and a slippery slope we can’t
go down.
I don't think it's as general or open ended as that - it's a fairly
specific carveout that's been used pretty broadly I think.

So - any interest in a flag to control this behavior, if that's what
it comes to? It's one that a fair bit of software relies on.

- Dave

Sterling Augustine via llvm-dev

2021-Apr-23 00:44 UTC

head link

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

Have any of the proponents of the patch tried lsan against a reasonable
large code base both with and with out it?

On Thu, Apr 22, 2021 at 4:28 PM Chris Lattner <clattner at nondot.org>
wrote:
>
>
> > On Apr 19, 2021, at 5:24 AM, James Y Knight <jyknight at
google.com> wrote:
> >
> > There is no problem (no leaks) in the code that users wrote, so adding
> code annotations (sanitizer suppression file, or attributes) is not a good
> solution to this issue. The problem is that this optimization introduces a
> "leak" (from the point of view of the leak checker), which
wasn't there
> before. And in practice, this seems to cause a large number of false
> positives.
>
> I think that “from the point of view of the leak checker” is the key thing
> there.
>
> Code that this triggers *is* leaking memory, it was just silenced because
> the leak was spuriously reachable from a global.  Global variables aren’t a
> preferred way to silence leak detectors, they have other ways to do so :)
>
> > On Apr 19, 2021, at 4:52 PM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > There may be other ways to disable leak checkers, but they put a
burden
> on users that was not there before. Further, users who try leak checking
> with and without optimization will get different answers. The bug report
> will read: "clang at -O2 makes my program leak". And, as James
notes,
> whether or not you need to suppress the leak depends on whether or not the
> optimizer does away with the variable. Subtle changes to the code that have
> nothing to do with memory allocation will appear to add or fix leaks. That
> is not something I would care to explain or document.
>
> I can see that concern, but this isn’t a battle that we can win:
> optimizations in general can expose leaks.
>
> IMO, If someone doesn’t want the global to be removed, they should mark it
> volatile.  If they do want it removable, then they can use leak detector
> features to silence the warning.
>
> > On Apr 20, 2021, at 9:12 AM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > In order to understand how much benefit this change gives to code
size,
> I built clang and related files with and without the patch, both with
> CMAKE_BUILD_TYPE=Release.
> >
> > clang itself gets about 0.4% smaller (from 145217124 to 144631078)
> > lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)
> >
> > Spot checking a few other random targets, in general, it seems that
the
> benefits depend heavily on the coding style, but roughly, bigger the
> binary, the less benefit this brings.
> >
> > I suspect that a more aggressive pass as described by Philip Reames
> could capture a significant part of the benefit without sacrificing
> functionality. But that would require a more detailed study to be sure.
>
> A ~2% reduction in code size is a huge win.  I agree with your comment
> about it being different with different coding styles.  I suspect that this
> is the sort of thing that will pay particularly for high abstraction code
> bases.
>
> I don’t see why we would punish general code to make “code that is leaking
> where formerly not detected, and where users don’t want to mark the root as
> volatile”.  This seems really unprincipled to me, and a slippery slope we
> can’t go down.
>
> -Chris
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210422/22afe5a5/attachment.html>

James Y Knight via llvm-dev

2021-Apr-23 14:18 UTC

head link

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

On Thu, Apr 22, 2021, 7:28 PM Chris Lattner <clattner at nondot.org>
wrote:
>
>
> > On Apr 19, 2021, at 5:24 AM, James Y Knight <jyknight at
google.com> wrote:
> >
> > There is no problem (no leaks) in the code that users wrote, so adding
> code annotations (sanitizer suppression file, or attributes) is not a good
> solution to this issue. The problem is that this optimization introduces a
> "leak" (from the point of view of the leak checker), which
wasn't there
> before. And in practice, this seems to cause a large number of false
> positives.
>
> I think that “from the point of view of the leak checker” is the key thing
> there.
>
> Code that this triggers *is* leaking memory

No, you've got it exactly backwards. "From the point of view of the
leak
checker", there is a leak, but in actually, there is not.

I'm afraid you're still arguing from mistaken assumptions. As I've
already
mentioned, reachable memory at program exit is not a leak. That's the
definition of "leak" which is always used by leak checkers. (This is
not
anything new, it's been how leak checkers work for decades, and how they
must work.)

Therefore, C++ code that allocates memory and assigns it to a global is not
a leak, and it's _still_ not a leak even if it so happens in some
instantiation of the program that all of the users of the global have been
removed by the optimizer.

The code is correct and it's not leaking memory, but with this change, the
leak checker is unable to determine that.

It was just silenced because the leak was spuriously reachable from
a> global.  Global variables aren’t a preferred way to silence leak detectors,
> they have other ways to do so :)

Memory reachable from a global is not a spurious reachability, it is actual
reachability. And, the purpose of assigning a value to a global variable in
the source code isn't to silence the leak checker, it is to make the object
accessible to other code. (People writing code normally aren't and
shouldn't be thinking about leak checking.)

> On Apr 19, 2021, at 4:52 PM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > There may be other ways to disable leak checkers, but they put a
burden
> on users that was not there before. Further, users who try leak checking
> with and without optimization will get different answers. The bug report
> will read: "clang at -O2 makes my program leak". And, as James
notes,
> whether or not you need to suppress the leak depends on whether or not the
> optimizer does away with the variable. Subtle changes to the code that have
> nothing to do with memory allocation will appear to add or fix leaks. That
> is not something I would care to explain or document.
>
> I can see that concern, but this isn’t a battle that we can win:
> optimizations in general can expose leaks.
>
The word "expose" is invalid here -- that implies that the code is
buggy
but that the leak checker was previously unable to detect the bug, and now
does. But that is not the case at hand. You maybe could say, instead "I can
see that concern, but this isn’t a battle that we can win: optimizations in
general can cause random false positives in the leak checker." (But, in
practice it was pretty much "won" for the last 9 years.)

IMO, If someone doesn’t want the global to be removed, they should mark
it> volatile.  If they do want it removable, then they can use leak detector
> features to silence the warning.
> On Apr 20, 2021, at 9:12 AM, Sterling Augustine <saugustine at
google.com>
> wrote:
> >
> > In order to understand how much benefit this change gives to code
size,
> I built clang and related files with and without the patch, both with
> CMAKE_BUILD_TYPE=Release.
> >
> > clang itself gets about 0.4% smaller (from 145217124 to 144631078)
> > lld gets about 1.85% smaller (from 101129181 to 99243810 bytes)
> >
> > Spot checking a few other random targets, in general, it seems that
the
> benefits depend heavily on the coding style, but roughly, bigger the
> binary, the less benefit this brings.
> >
> > I suspect that a more aggressive pass as described by Philip Reames
> could capture a significant part of the benefit without sacrificing
> functionality. But that would require a more detailed study to be sure.
>
> A ~2% reduction in code size is a huge win.  I agree with your comment
> about it being different with different coding styles.  I suspect that this
> is the sort of thing that will pay particularly for high abstraction code
> bases.
>
> I don’t see why we would punish general code to make “code that is leaking
> where formerly not detected, and where users don’t want to mark the root as
> volatile”.  This seems really unprincipled to me, and a slippery slope we
> can’t go down.
>
In the way you have restated the issue here, there is no benefit to the
current behavior, but that's only because of the mistaken assumptions. You
have redefined "leak", and are assuming that the problem is buggy
software,
whose users are upset that valid bugs are found which were not previously
found. But that's simply not the case we're dealing with. The code is
correct (non-leaking), and it's a regression in leak checker functionality
if we start forcing users to add manual annotations as a workaround.

I don't know what the right thing to do here is. But I'm quite sure we
cannot arrive at a good decision until everyone can at least get on the
same page about what the purpose of a leak checker is. I would hope that
there's a path that makes everyone satisfied, but if not, the disagreement
needs to be based on relative priority of use cases and engineering
trade-offs, not whether the problem EXISTS.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210423/e86c7224/attachment.html>

llvm dev - Apr 2021 - Eliminating global memory roots (or not) to help leak checkers

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers

[llvm-dev] Eliminating global memory roots (or not) to help leak checkers