thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator. [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Artem Dergachev via llvm-dev

2018-Aug-29 00:14 UTC

[llvm-dev] Identifying objects within BumpPtrAllocator.

In various debug dumps (eg., Clang's -ast-dump), various objects (eg., 
Stmts and Decls in that -ast-dump) are identified by pointers. It's very 
reliable in the sense that no two objects would ever have the same 
pointer at the same time, but it's unpleasant that pointers change 
across runs. Having deterministic identifiers instead of pointers would 
aid debugging: imagine a conditional break by object identifier that has 
not yet been constructed, or simply trying to align two debug dumps of 
different kind from different runs together. Additionally, pointers are 
hard to read and memorize; it's hard to notice the difference between 
0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a few screens 
apart.

Hence the idea: why don't we print the offset into the allocator's 
memory slab instead of a pointer? We use BumpPtrAllocator all over the 
place, which boils down to a set of slabs on which all objects are 
placed in the order in which they are allocated. It is easy for the 
allocator to identify if a pointer belongs to that allocator, and if so, 
deteremine which slab it belongs to and at what offset the object is in 
that slab. Therefore it is possible to identify the object by its (slab 
index, offset) pair. Eg., "TypedefDecl 0:528" (you already memorized
it)
instead of "TypedefDecl 0x7f80a28325e0". This could be applied to all 
sorts of objects that live in BumpPtrAllocators.

In order to compute such identifier, we only need access to the object 
and to the allocator. No additional memory is used to store such 
identifier. Such identifier would also be persistent across runs as long 
as the same objects are allocated in the same order, which is, i 
suspect, often the case.

One of the downsides of this identifier is that it's not going to be the 
same on different machines, because the same data structure may require 
different amounts of memory on different hosts. So it wouldn't 
necessarily help understanding a dump that the user sent you. But it 
still seems to be better than pointers.

Should we go ahead and try to implement it?

George Karpenkov via llvm-dev

2018-Aug-29 00:22 UTC

head link

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

Patch available at https://reviews.llvm.org/D51393
<https://reviews.llvm.org/D51393>

I would really love to see this in the static analyzer, but I think all other
dumping facilities could greatly benefit as well.
> On Aug 28, 2018, at 5:14 PM, Artem Dergachev via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
> 
> In various debug dumps (eg., Clang's -ast-dump), various objects (eg.,
Stmts and Decls in that -ast-dump) are identified by pointers. It's very
reliable in the sense that no two objects would ever have the same pointer at
the same time, but it's unpleasant that pointers change across runs. Having
deterministic identifiers instead of pointers would aid debugging: imagine a
conditional break by object identifier that has not yet been constructed, or
simply trying to align two debug dumps of different kind from different runs
together. Additionally, pointers are hard to read and memorize; it's hard to
notice the difference between 0x7f80a28325e0 and 0x7f80a28325a0, especially when
they're a few screens apart.
> 
> Hence the idea: why don't we print the offset into the allocator's
memory slab instead of a pointer? We use BumpPtrAllocator all over the place,
which boils down to a set of slabs on which all objects are placed in the order
in which they are allocated. It is easy for the allocator to identify if a
pointer belongs to that allocator, and if so, deteremine which slab it belongs
to and at what offset the object is in that slab. Therefore it is possible to
identify the object by its (slab index, offset) pair. Eg., "TypedefDecl
0:528" (you already memorized it) instead of "TypedefDecl
0x7f80a28325e0". This could be applied to all sorts of objects that live in
BumpPtrAllocators.
> 
> In order to compute such identifier, we only need access to the object and
to the allocator. No additional memory is used to store such identifier. Such
identifier would also be persistent across runs as long as the same objects are
allocated in the same order, which is, i suspect, often the case.
> 
> One of the downsides of this identifier is that it's not going to be
the same on different machines, because the same data structure may require
different amounts of memory on different hosts. So it wouldn't necessarily
help understanding a dump that the user sent you. But it still seems to be
better than pointers.
> 
> Should we go ahead and try to implement it?
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180828/73e8db62/attachment.html>

Matthias Braun via llvm-dev

2018-Aug-29 00:35 UTC

head link

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

This is a great idea!

I personally also wouldn't mind going further in debug builds and actually
create and store sequential IDs with the objects and take the small memory hit
for improved debuggability. The `PersistentId` field in SelectionDAG works that
way and has helped make the output more readable IMO.

- Matthias
> On Aug 28, 2018, at 5:22 PM, George Karpenkov via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Patch available at https://reviews.llvm.org/D51393
<https://reviews.llvm.org/D51393>
> 
> I would really love to see this in the static analyzer, but I think all
other dumping facilities could greatly benefit as well.
> 
>> On Aug 28, 2018, at 5:14 PM, Artem Dergachev via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> 
>> In various debug dumps (eg., Clang's -ast-dump), various objects
(eg., Stmts and Decls in that -ast-dump) are identified by pointers. It's
very reliable in the sense that no two objects would ever have the same pointer
at the same time, but it's unpleasant that pointers change across runs.
Having deterministic identifiers instead of pointers would aid debugging:
imagine a conditional break by object identifier that has not yet been
constructed, or simply trying to align two debug dumps of different kind from
different runs together. Additionally, pointers are hard to read and memorize;
it's hard to notice the difference between 0x7f80a28325e0 and
0x7f80a28325a0, especially when they're a few screens apart.
>> 
>> Hence the idea: why don't we print the offset into the
allocator's memory slab instead of a pointer? We use BumpPtrAllocator all
over the place, which boils down to a set of slabs on which all objects are
placed in the order in which they are allocated. It is easy for the allocator to
identify if a pointer belongs to that allocator, and if so, deteremine which
slab it belongs to and at what offset the object is in that slab. Therefore it
is possible to identify the object by its (slab index, offset) pair. Eg.,
"TypedefDecl 0:528" (you already memorized it) instead of
"TypedefDecl 0x7f80a28325e0". This could be applied to all sorts of
objects that live in BumpPtrAllocators.
>> 
>> In order to compute such identifier, we only need access to the object
and to the allocator. No additional memory is used to store such identifier.
Such identifier would also be persistent across runs as long as the same objects
are allocated in the same order, which is, i suspect, often the case.
>> 
>> One of the downsides of this identifier is that it's not going to
be the same on different machines, because the same data structure may require
different amounts of memory on different hosts. So it wouldn't necessarily
help understanding a dump that the user sent you. But it still seems to be
better than pointers.
>> 
>> Should we go ahead and try to implement it?
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180828/28feb07c/attachment.html>

Richard Smith via llvm-dev

2018-Aug-29 01:16 UTC

head link

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

On Tue, 28 Aug 2018, 17:14 Artem Dergachev via cfe-dev, <
cfe-dev at lists.llvm.org> wrote:
> In various debug dumps (eg., Clang's -ast-dump), various objects (eg.,
> Stmts and Decls in that -ast-dump) are identified by pointers. It's
very
> reliable in the sense that no two objects would ever have the same
> pointer at the same time, but it's unpleasant that pointers change
> across runs. Having deterministic identifiers instead of pointers would
> aid debugging: imagine a conditional break by object identifier that has
> not yet been constructed, or simply trying to align two debug dumps of
> different kind from different runs together. Additionally, pointers are
> hard to read and memorize; it's hard to notice the difference between
> 0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a few
screens
> apart.
>
> Hence the idea: why don't we print the offset into the allocator's
> memory slab instead of a pointer?

Make this "as well as" rather than "instead of" and it
sounds great to me.
When debugging, it's useful to be able to dump a large complex object, find
the piece you want, grab its address and start accessing it directly.

(For the pointer stability problem, at least on Linux you can turn off
ASLR. When running under gdb, that's typically done for you, and you can do
it manually with setarch. But it would be nice to have an easier way to
identify objects than a long, essentially meaningless address.)

We use BumpPtrAllocator all over the> place, which boils down to a set of slabs on which all objects are
> placed in the order in which they are allocated. It is easy for the
> allocator to identify if a pointer belongs to that allocator, and if so,
> deteremine which slab it belongs to and at what offset the object is in
> that slab. Therefore it is possible to identify the object by its (slab
> index, offset) pair. Eg., "TypedefDecl 0:528" (you already
memorized it)
> instead of "TypedefDecl 0x7f80a28325e0". This could be applied to
all
> sorts of objects that live in BumpPtrAllocators.
>
> In order to compute such identifier, we only need access to the object
> and to the allocator. No additional memory is used to store such
> identifier. Such identifier would also be persistent across runs as long
> as the same objects are allocated in the same order, which is, i
> suspect, often the case.
>
> One of the downsides of this identifier is that it's not going to be
the
> same on different machines, because the same data structure may require
> different amounts of memory on different hosts. So it wouldn't
> necessarily help understanding a dump that the user sent you. But it
> still seems to be better than pointers.
>
> Should we go ahead and try to implement it?
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180828/a2bf8eff/attachment.html>

David Blaikie via llvm-dev

2018-Aug-29 18:54 UTC

head link

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

Mostly what Richard said.

One thing I'd be a bit careful of - these numbers may still not be stable
in some small number of cases (eg: if objects are created based on
iteration order of a pointer-based hashing container - which may still be
valid if that ordering doesn't leak into the output of the program). So
this might provide a slightly false sense of security & make those minority
cases more painful - but perhaps they're rare enough that it's worth the
tradeoff.

(& as Richard said - debuggers will tend to disable ASLR anyway, making it
relatively easy to work with)

On Tue, Aug 28, 2018 at 6:16 PM Richard Smith via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Tue, 28 Aug 2018, 17:14 Artem Dergachev via cfe-dev, <
> cfe-dev at lists.llvm.org> wrote:
>
>> In various debug dumps (eg., Clang's -ast-dump), various objects
(eg.,
>> Stmts and Decls in that -ast-dump) are identified by pointers. It's
very
>> reliable in the sense that no two objects would ever have the same
>> pointer at the same time, but it's unpleasant that pointers change
>> across runs. Having deterministic identifiers instead of pointers would
>> aid debugging: imagine a conditional break by object identifier that
has
>> not yet been constructed, or simply trying to align two debug dumps of
>> different kind from different runs together. Additionally, pointers are
>> hard to read and memorize; it's hard to notice the difference
between
>> 0x7f80a28325e0 and 0x7f80a28325a0, especially when they're a few
screens
>> apart.
>>
>> Hence the idea: why don't we print the offset into the
allocator's
>> memory slab instead of a pointer?
>
>
> Make this "as well as" rather than "instead of" and it
sounds great to me.
> When debugging, it's useful to be able to dump a large complex object,
find
> the piece you want, grab its address and start accessing it directly.
>
> (For the pointer stability problem, at least on Linux you can turn off
> ASLR. When running under gdb, that's typically done for you, and you
can do
> it manually with setarch. But it would be nice to have an easier way to
> identify objects than a long, essentially meaningless address.)
>
> We use BumpPtrAllocator all over the
>> place, which boils down to a set of slabs on which all objects are
>> placed in the order in which they are allocated. It is easy for the
>> allocator to identify if a pointer belongs to that allocator, and if
so,
>> deteremine which slab it belongs to and at what offset the object is in
>> that slab. Therefore it is possible to identify the object by its (slab
>> index, offset) pair. Eg., "TypedefDecl 0:528" (you already
memorized it)
>> instead of "TypedefDecl 0x7f80a28325e0". This could be
applied to all
>> sorts of objects that live in BumpPtrAllocators.
>>
>> In order to compute such identifier, we only need access to the object
>> and to the allocator. No additional memory is used to store such
>> identifier. Such identifier would also be persistent across runs as
long
>> as the same objects are allocated in the same order, which is, i
>> suspect, often the case.
>>
>> One of the downsides of this identifier is that it's not going to
be the
>> same on different machines, because the same data structure may require
>> different amounts of memory on different hosts. So it wouldn't
>> necessarily help understanding a dump that the user sent you. But it
>> still seems to be better than pointers.
>>
>> Should we go ahead and try to implement it?
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180829/3632dcd1/attachment-0001.html>

llvm dev - Aug 2018 - [cfe-dev] Identifying objects within BumpPtrAllocator.

[llvm-dev] Identifying objects within BumpPtrAllocator.

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.

[llvm-dev] [cfe-dev] Identifying objects within BumpPtrAllocator.