thr3ads.net - llvm dev - [llvm-dev] RFC: Should SmallVectors be smaller? [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith via llvm-dev

2018-Jun-22 04:16 UTC

[llvm-dev] RFC: Should SmallVectors be smaller?

>> On Jun 21, 2018, at 18:38, Chris Lattner <clattner at nondot.org>
wrote:
>> 
>> 
>> 
>> On Jun 21, 2018, at 9:52 AM, Duncan P. N. Exon Smith via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> 
>> I've been curious for a while whether SmallVectors have the right
speed/memory tradeoff.  It would be straightforward to shave off a couple of
pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could
afford to test for small-mode vs. large-mode.
> 
> Something like this could definitely work, but most smallvectors are on the
stack.  They are intentionally used when sizeof(smallvector) isn’t important, so
I don’t think this optimization will pay off.
For better or worse (mostly worse), there are a ton of SmallVector fields in
data structures, including some even nested inside other SmallVectors (e.g., see
the cleanup in r235229).  Often these data structures are heap-allocated.
> Out of curiosity, what brings this up?
I've noticed that Clang is using more stack recently (we're seeing more
crashes from template recursion; it seems the template recursion limit needs to
shrink), and somehow that train of thought led to this.

I share your skepticism that it will help stack usage much, but
SmallVector/SmallVectorImpl is so ubiquitous, it could help the heap a bit.  And
if it doesn’t hurt runtime performance in practice, there’s no reason to fork
the data structure.

If no one has measured before I might try it some time. 
> -Chris
> 
> 
>> 
>> The current scheme works out to something like this:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector {
>> T *BeginX, *EndX, *CapacityX;
>> T Small[SmallCapacity];
>> 
>> bool isSmall() const { return BeginX == Small; }
>> T *begin() { return BeginX; }
>> T *end() { return EndX; }
>> size_t size() const { return EndX - BeginX; }
>> size_t capacity() const { return CapacityX - BeginX; }
>> };
>> ```
>> 
>> In the past I used something more like:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector2 {
>> unsigned Size;
>> unsigned Capacity;
>> union {
>>   T Small[SmallCapacity];
>>   T *Large;
>> };
>> 
>> bool isSmall() const { return Capacity == SmallCapacity; } // Or a bit
shaved off of Capacity.
>> T *begin() { return isSmall() ? Small : Large; }
>> T *end() { return begin() + Size; }
>> size_t size() const { return Size; }
>> size_t capacity() const { return Capacity; }
>> };
>> ```
>> 
>> I'm curious whether this scheme would be really be slower in
practice (as a complete replacement for `SmallVector` in ADT).  I wonder, has
anyone profiled something like this before?  If so, in what context?  on what
workloads?
>> 
>> Duncan
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180621/9253450e/attachment.html>

Reid Kleckner via llvm-dev

2018-Jun-22 22:18 UTC

head link

[llvm-dev] RFC: Should SmallVectors be smaller?

On Thu, Jun 21, 2018 at 9:16 PM Duncan P. N. Exon Smith via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Jun 21, 2018, at 18:38, Chris Lattner <clattner at nondot.org>
wrote:
>
>
>
> On Jun 21, 2018, at 9:52 AM, Duncan P. N. Exon Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> I've been curious for a while whether SmallVectors have the right
> speed/memory tradeoff.  It would be straightforward to shave off a couple
> of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users
> could afford to test for small-mode vs. large-mode.
>
>
> Something like this could definitely work, but most smallvectors are on
> the stack.  They are intentionally used when sizeof(smallvector) isn’t
> important, so I don’t think this optimization will pay off.
>
>
> For better or worse (mostly worse), there are a ton of SmallVector fields
> in data structures, including some even nested inside other SmallVectors
> (e.g., see the cleanup in r235229).  Often these data structures are
> heap-allocated.
>
Yes, this is a huge problem. We seriously overuse SmallVector. I think in
CodeViewDebug.cpp we had a DenseMap of a struct which had a SmallVector of
structs that contained SmallVectors. It was silly.

Out of curiosity, what brings this up?>
>
> I've noticed that Clang is using more stack recently (we're seeing
more
> crashes from template recursion; it seems the template recursion limit
> needs to shrink), and somehow that train of thought led to this.
>
> I share your skepticism that it will help stack usage much, but
> SmallVector/SmallVectorImpl is so ubiquitous, it could help the heap a
> bit.  And if it doesn’t hurt runtime performance in practice, there’s no
> reason to fork the data structure.
>
> If no one has measured before I might try it some time.
>
I think it's important to keep begin(), end(), and indexing operations
branchless, so I'm not sure this pointer union is the best idea. I
haven't
profiled, but that's my intuition. If you wanted to limit all our vectors
to 4 billion elements to save a pointer, I'd probably be fine with that.

I think we might be better off just reducing the pre-allocation size of
most of our SmallVectors across LLVM and Clang. They're all wild guesses,
never profiled. Especially for vectors of relatively "large" elements,
the
pre-allocation optimization just doesn't make that much sense. I'd go as
far as to suggest providing a default SmallVector N value of something like
`sizeof(void*) * 3 / sizeof(T)`, i.e. by default, every SmallVector is at
most 6 pointers big.

---

Relatedly, there's a lot of work that can be done to tune DenseMap. When
the key and value pair is relatively large, we waste a lot of space on
empty table slots.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180622/c76cd840/attachment.html>

Duncan P. N. Exon Smith via llvm-dev

2018-Jun-23 16:11 UTC

head link

[llvm-dev] RFC: Should SmallVectors be smaller?

> On Jun 22, 2018, at 15:18, Reid Kleckner <rnk at google.com> wrote:
> 
> On Thu, Jun 21, 2018 at 9:16 PM Duncan P. N. Exon Smith via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> Out of curiosity, what brings this up?
> 
> I've noticed that Clang is using more stack recently (we're seeing
more crashes from template recursion; it seems the template recursion limit
needs to shrink), and somehow that train of thought led to this.
> 
> I share your skepticism that it will help stack usage much, but
SmallVector/SmallVectorImpl is so ubiquitous, it could help the heap a bit.  And
if it doesn’t hurt runtime performance in practice, there’s no reason to fork
the data structure.
> 
> If no one has measured before I might try it some time. 
> 
> I think it's important to keep begin(), end(), and indexing operations
branchless, so I'm not sure this pointer union is the best idea. I
haven't profiled, but that's my intuition. If you wanted to limit all
our vectors to 4 billion elements to save a pointer, I'd probably be fine
with that.
Good point, there are two separable changes here and only the union part is
likely to have compile-time slowdowns.  I threw together
https://reviews.llvm.org/D48518 <https://reviews.llvm.org/D48518>
(currently building with ASan to run check-llvm) and the surely uncontroversial
https://reviews.llvm.org/D48516 <https://reviews.llvm.org/D48516>.
> I think we might be better off just reducing the pre-allocation size of
most of our SmallVectors across LLVM and Clang. They're all wild guesses,
never profiled. Especially for vectors of relatively "large" elements,
the pre-allocation optimization just doesn't make that much sense. I'd
go as far as to suggest providing a default SmallVector N value of something
like `sizeof(void*) * 3 / sizeof(T)`, i.e. by default, every SmallVector is at
most 6 pointers big.
Interesting idea... and then audit current instances to drop the size argument.

Note that a SmallVector with N value of 0 takes the same storage as an N value
of 1, so very large sizeof(T) would still use more than 6 pointers.  The cause
is that SmallVectorTemplateCommon stores the first element so that it can detect
small mode by comparing BeginX against &FirstEl.  The fix would be to shave
a bit off of capacity (dropping max capacity to 2B)... likely reasonable.

If we're going to audit anyway, I wonder if forking names would make sense. 
E.g., the current thing would be less tempting to use in data structures if it
were called StackVector.  But that wouldn't be a fun change to roll out
across sub-projects.
> ---
> 
> Relatedly, there's a lot of work that can be done to tune DenseMap.
When the key and value pair is relatively large, we waste a lot of space on
empty table slots.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180623/6cbc7caf/attachment.html>

Chris Lattner via llvm-dev

2018-Jun-23 17:11 UTC

head link

[llvm-dev] RFC: Should SmallVectors be smaller?

On Jun 21, 2018, at 9:16 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:>> 
>> Something like this could definitely work, but most smallvectors are on
the stack.  They are intentionally used when sizeof(smallvector) isn’t
important, so I don’t think this optimization will pay off.
> 
> For better or worse (mostly worse), there are a ton of SmallVector fields
in data structures, including some even nested inside other SmallVectors (e.g.,
see the cleanup in r235229).  Often these data structures are heap-allocated.
Egads, that seems like a big issue, and (most of the time, but perhaps not
always) a misuse of SmallVector.  It seems that this should be fixed, instead of
changing smallvector.
>> Out of curiosity, what brings this up?
> 
> I've noticed that Clang is using more stack recently (we're seeing
more crashes from template recursion; it seems the template recursion limit
needs to shrink), and somehow that train of thought led to this.
> 
> I share your skepticism that it will help stack usage much, but
SmallVector/SmallVectorImpl is so ubiquitous, it could help the heap a bit.  And
if it doesn’t hurt runtime performance in practice, there’s no reason to fork
the data structure.
> 
> If no one has measured before I might try it some time. 
I’m not familiar with the modern structure of the code, but is there any chance
these algorithms can be reworked to be iterative instead of recursive? 
Shrinking individual frames may buy some time, but isn’t really a fix.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180623/36ed205a/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jun 2018 - RFC: Should SmallVectors be smaller?

[llvm-dev] RFC: Should SmallVectors be smaller?

[llvm-dev] RFC: Should SmallVectors be smaller?

[llvm-dev] RFC: Should SmallVectors be smaller?

[llvm-dev] RFC: Should SmallVectors be smaller?

Possibly Parallel Threads