thr3ads.net - llvm dev - [llvm-dev] The best way of generating a good representation for an array with header? [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Christoffer Lernö via llvm-dev

2019-Nov-12 22:35 UTC

[llvm-dev] The best way of generating a good representation for an array with header?

The advantages:

1. A pointer to the struct offset can be converted to a pointer without any
cost.
2. A nullpointer to a stretchy buffer can be treated as a zero length array.
Consequently no actual struct allocation is needed to represent a zero length
array.
3. A reference to the array is the same size as to a pointer.
4. It can be converted to and back from an pointer without losing any
information about the size & capacity.

The downsides are what we discuss. But it looks like I have to accept that I can
only represent it as a pointer with unknown length in DWARF then?

Best Regards,

Christoffer
> On 12 Nov 2019, at 21:49, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> On Tue, Nov 12, 2019 at 12:44 PM Christoffer Lernö via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
> Yes, we’re actually viewing the struct at an offset.
> 
> So basically it’s a struct like this:
> 
> typedef struct {
>   uint32_t size;
>   uint32_t capacity;
>   int array[0];
> } Foo;
> 
> The whole thing is malloc:ed with extra bytes at the end, and capacity is
set to that same number of extra bytes.
> 
> What’s then passed around is actually the int pointer at an offset:
&(foo->array)
> 
> Using the that pointer we can obviously in a simple way recover the pointer
to the struct, but can it be done so that LLVM and DWARF can identify the
pointer as a pointer to a struct member for a certain struct?
> 
> std::vector is as far as I know wrapping a pointer or two.
> 
> The advantage of a stretchy buffer is that its length is recoverable even
if stored as a pointer.
> 
> What's the advantage compared to a pointer to the struct, rather than a
pointer to the array? (a pointer to this first element of the array would still
have to be tagged differently from a pointer to an arbitrary int (either a
singular int or an int somewhere in the array) to indicate that you can
backtrack to find the length - so it's not like you get to generalize all
int pointers) - I wouldn't expect (but don't know that much) that the
extra constant offset on array indexing would be particularly
expensive/observable?
> 
> But yeah, I think you'd probably have some trouble getting DWARF
consumers to handle the idea that the parameter type to a function is more than
the type itself, or that pointers to that type actually point into the middle of
the object instead of the start.
> 
> Not insurmountable, but seems a bit expensive/complicated to try to make
that work - but don't know what your other constraints/data are.
>  
> It’s also incredibly thin, only taking up the same size as a pointer – as
opposed to std::vector which is likely 2 pointers long.
> 
> 
> Best Regards,
> 
> Christoffer
> 
> > Date: Tue, 12 Nov 2019 11:34:42 -0800
> > From: David Blaikie via llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> > 
> > the pointer points to the first element, and you walk backwards from
there
> > to find the header details about the bounds/etc?
> > 
> > In any case - I'd look at something like C++'s std::vector,
which is a
> > variable length array, and model your situation similarly. I doubt
there's
> > anything in particular you'll want to/be able to teach the
optimizations
> > about your situation (nothing especially special that they know about
> > std::vector-like things either, that I know of - they maybe can deduce
> > certain things about how the bounds relate, and they certainly can
optimize
> > a lot of std::vector usage) & debug info would probably look like
> > std::vector, in that it'd be a custom type, etc. Though if my
guess above
> > was right about using prefix data to describe the bounds - that might
be
> > hard to model in DWARF & you might be better off not being
"tricky" like
> > that & modelling this closer to something that you could have
written in C
> > or C++ more naturally.
> > 
> > On Tue, Nov 12, 2019 at 4:14 AM Christoffer Lernö via llvm-dev <
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> > 
> >> I’m considering building in variable arrays by implementing them
as a
> >> stretchy buffer, that is a single allocation with header +
elements with
> >> the pointer passed around pointing to the first element. (Example:
> >>
https://www.gamasutra.com/blogs/NiklasGray/20180109/312683/Minimalist_container_library_in_C_part_1.php
<https://www.gamasutra.com/blogs/NiklasGray/20180109/312683/Minimalist_container_library_in_C_part_1.php>
> >> )
> >> 
> >> Is there a good way to represent this in LLVM? I mean both in
terms of
> >> helping the optimizer passes understand how the layout works and
to make
> >> sure the debug info looks ok.
> >> 
> >> 
> >> Best regards,
> >> Christoffer
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191112/cbf2f15e/attachment.html>

David Blaikie via llvm-dev

2019-Nov-13 01:33 UTC

head link

[llvm-dev] The best way of generating a good representation for an array with header?

On Tue, Nov 12, 2019 at 2:36 PM Christoffer Lernö <christoffer at
aegik.com>
wrote:
> The advantages:
>
> 1. A pointer to the struct offset can be converted to a pointer without
> any cost.
> 2. A nullpointer to a stretchy buffer can be treated as a zero length
> array. Consequently no actual struct allocation is needed to represent a
> zero length array.
>
(2) could probably be done as well with the other representation (with the
prefix data representation you'd still have to special case the null test
before going backwards from the pointer to find the size - because you
wouldn't want to go backwards from null and try to read bytes there to find
the size).

> 3. A reference to the array is the same size as to a pointer.
>
(3) Would be true with either representation I'm picturing. (pointing to
the start of the struct you've described, rather tahn pointing to the
trailing array and walking backwards to find the rest)

> 4. It can be converted to and back from an pointer without losing any
> information about the size & capacity.
>
Converting back you have to know it's the start of an array, though, right?
(& you could still do that in the other representation - by subtraction,
but yes, wouldn't be free/zero-cost)

> The downsides are what we discuss. But it looks like I have to accept that
> I can only represent it as a pointer with unknown length in DWARF then?
>
I imagine it'd be difficult to describe the calling convention for passing
this array? Are you going to have instances of this on the stack and passed
by value? or only ever individually dynamically allocated & pointed to from
other places?

If they were individually dynamically allocated & only ever pointed to the
trailing array - the DWARF for that'd be pretty simple - basically just a
special pointer type. The pretty printer for it would know it was allowed
to walk backwards to find some extra things (as would any member functions,
if needed, be implemented in terms of walking the poniter back).

- dave

>
> Best Regards,
>
> Christoffer
>
> On 12 Nov 2019, at 21:49, David Blaikie <dblaikie at gmail.com>
wrote:
>
> On Tue, Nov 12, 2019 at 12:44 PM Christoffer Lernö via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Yes, we’re actually viewing the struct at an offset.
>>
>> So basically it’s a struct like this:
>>
>> typedef struct {
>>   uint32_t size;
>>   uint32_t capacity;
>>   int array[0];
>> } Foo;
>>
>> The whole thing is malloc:ed with extra bytes at the end, and capacity
is
>> set to that same number of extra bytes.
>>
>> What’s then passed around is actually the int pointer at an offset:
>> &(foo->array)
>>
>> Using the that pointer we can obviously in a simple way recover the
>> pointer to the struct, but can it be done so that LLVM and DWARF can
>> identify the pointer as a pointer to a struct member for a certain
struct?
>>
>> std::vector is as far as I know wrapping a pointer or two.
>>
>> The advantage of a stretchy buffer is that its length is recoverable
even
>> if stored as a pointer.
>
>
> What's the advantage compared to a pointer to the struct, rather than a
> pointer to the array? (a pointer to this first element of the array would
> still have to be tagged differently from a pointer to an arbitrary int
> (either a singular int or an int somewhere in the array) to indicate that
> you can backtrack to find the length - so it's not like you get to
> generalize all int pointers) - I wouldn't expect (but don't know
that much)
> that the extra constant offset on array indexing would be particularly
> expensive/observable?
>
> But yeah, I think you'd probably have some trouble getting DWARF
consumers
> to handle the idea that the parameter type to a function is more than the
> type itself, or that pointers to that type actually point into the middle
> of the object instead of the start.
>
> Not insurmountable, but seems a bit expensive/complicated to try to make
> that work - but don't know what your other constraints/data are.
>
>
>> It’s also incredibly thin, only taking up the same size as a pointer –
as
>> opposed to std::vector which is likely 2 pointers long.
>>
>>
>> Best Regards,
>>
>> Christoffer
>>
>> > Date: Tue, 12 Nov 2019 11:34:42 -0800
>> > From: David Blaikie via llvm-dev <llvm-dev at
lists.llvm.org>
>> >
>> > the pointer points to the first element, and you walk backwards
from
>> there
>> > to find the header details about the bounds/etc?
>> >
>> > In any case - I'd look at something like C++'s
std::vector, which is a
>> > variable length array, and model your situation similarly. I doubt
>> there's
>> > anything in particular you'll want to/be able to teach the
optimizations
>> > about your situation (nothing especially special that they know
about
>> > std::vector-like things either, that I know of - they maybe can
deduce
>> > certain things about how the bounds relate, and they certainly can
>> optimize
>> > a lot of std::vector usage) & debug info would probably look
like
>> > std::vector, in that it'd be a custom type, etc. Though if my
guess
>> above
>> > was right about using prefix data to describe the bounds - that
might be
>> > hard to model in DWARF & you might be better off not being
"tricky" like
>> > that & modelling this closer to something that you could have
written
>> in C
>> > or C++ more naturally.
>> >
>> > On Tue, Nov 12, 2019 at 4:14 AM Christoffer Lernö via llvm-dev
<
>> > llvm-dev at lists.llvm.org> wrote:
>> >
>> >> I’m considering building in variable arrays by implementing
them as a
>> >> stretchy buffer, that is a single allocation with header +
elements
>> with
>> >> the pointer passed around pointing to the first element.
(Example:
>> >>
>>
https://www.gamasutra.com/blogs/NiklasGray/20180109/312683/Minimalist_container_library_in_C_part_1.php
>> >> )
>> >>
>> >> Is there a good way to represent this in LLVM? I mean both in
terms of
>> >> helping the optimizer passes understand how the layout works
and to
>> make
>> >> sure the debug info looks ok.
>> >>
>> >>
>> >> Best regards,
>> >> Christoffer
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191112/837e47ce/attachment.html>

Christoffer Lernö via llvm-dev

2019-Nov-13 08:42 UTC

head link

[llvm-dev] The best way of generating a good representation for an array with header?

> On 13 Nov 2019, at 02:33, David Blaikie <dblaikie at gmail.com>
wrote:
> On Tue, Nov 12, 2019 at 2:36 PM Christoffer Lernö <christoffer at
aegik.com <mailto:christoffer at aegik.com>> wrote:
> The advantages:
> 
> 1. A pointer to the struct offset can be converted to a pointer without any
cost.
> 2. A nullpointer to a stretchy buffer can be treated as a zero length
array. Consequently no actual struct allocation is needed to represent a zero
length array.
> 
> (2) could probably be done as well with the other representation (with the
prefix data representation you'd still have to special case the null test
before going backwards from the pointer to find the size - because you
wouldn't want to go backwards from null and try to read bytes there to find
the size).
But that runs into (1).

In the stretchy buffer, converting to a pointer is essentially a NOP.

For the case you suggest, with the solution in (2) would be the following:

1. Is it NULL? If so return NULL.
2. Otherwise return current pointer + size of header as the pointer to the data.

There is then the question of what happens to indexing.

If we do the above conversion to pointer then we are doing an addition
comparison and add.

For deref we can optimize away the NULL check for indexing though (assuming UB
to index an array out of bounds), for x86 I think the add is included in
indexing, but for ARM it’s not.

It might be a small cost to pay.
>  
> 3. A reference to the array is the same size as to a pointer.
> 
> (3) Would be true with either representation I'm picturing. (pointing
to the start of the struct you've described, rather tahn pointing to the
trailing array and walking backwards to find the rest)
Compared to implementations like std::vector I meant.>  
> 4. It can be converted to and back from an pointer without losing any
information about the size & capacity.
> 
> Converting back you have to know it's the start of an array, though,
right? (& you could still do that in the other representation - by
subtraction, but yes, wouldn't be free/zero-cost)
Here I’m again mostly thinking about std::vector.
> The downsides are what we discuss. But it looks like I have to accept that
I can only represent it as a pointer with unknown length in DWARF then?
> 
> I imagine it'd be difficult to describe the calling convention for
passing this array? Are you going to have instances of this on the stack and
passed by value? or only ever individually dynamically allocated & pointed
to from other places?
The array is always passed as reference, but could potentially be put on the
stack but would still always be used as a reference.> 
> If they were individually dynamically allocated & only ever pointed to
the trailing array - the DWARF for that'd be pretty simple - basically just
a special pointer type. The pretty printer for it would know it was allowed to
walk backwards to find some extra things (as would any member functions, if
needed, be implemented in terms of walking the poniter back).
Alright, so then maybe this is solvable after all? Is there any docs on how to
do this the proper way in LLVM? And perhaps also where to patch this into a fork
of LLDB?

Best Regards,

Christoffer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191113/9866a9dc/attachment.html>

llvm dev - Nov 2019 - The best way of generating a good representation for an array with header?

[llvm-dev] The best way of generating a good representation for an array with header?

[llvm-dev] The best way of generating a good representation for an array with header?

[llvm-dev] The best way of generating a good representation for an array with header?