thr3ads.net - llvm dev - [llvm-dev] (RFC) Encoding code duplication factor in discriminator [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2016-Oct-27 21:53 UTC

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

It looks like the example doesn't use the encoding described in the text?

Assume that the discriminator is uint32. The traditional discriminator is less
than 256, let's take 8 bit for it. For duplication factor (type 1
duplication), we assume the maximum unroll_factor * vectorize_factor is less
than 256, thus 8 bit for it. For unique number(type 2 duplication), we assume
code is at most duplicated 32 times, thus 5 bit for it. Overall, we still have
11 free bits left in the discriminator encoding.

Let's take the original source as an example, after loop unrolling and
peeling, the code may looks like:

for (i = 0; i < N & 3; i+= 4) {
  foo();  // discriminator: 0x40
  foo();  // discriminator: 0x40
  foo();  // discriminator: 0x40
  foo();  // discriminator: 0x40
}
if (i++ < N) {
  foo();   // discriminator: 0x100
  if (i++ < N) {
    foo(); // discriminator: 0x200
    if (i++ < N) {
      foo();  // discriminator: 0x300
    }
  }
}

If we allocate 8 bits to "traditional" discriminators, then 0x40 falls
into that range, so I'd think the calls to foo() inside the loop should be
using 0x400 to encode the unroll factor.  Note this requires 2 bytes for ULEB128
instead of 1.
And if we allocate another 8 bits to the unroll factor, then the trailing calls
should use 0x10000, 0x20000, 0x30000.  These will require 3 bytes for ULEB128
instead of 2.

I think it would be fine to allocate only 4 bits to "traditional"
discriminators (as you need them to be unique across the same source location,
but not across different source locations, and 16 independent basic blocks for
the same source location seems like plenty to me; but I haven't looked at a
lot of cases with discriminators).  This would allow you to use 0x40 to encode
the unroll factor in this example.  If you still want to allocate 8 bits for
unroll factor, then the trailing calls would use 0x1000, 0x2000, 0x3000 so as
long as you had no more than 3 trailing calls you can still encode the
discriminator in a 2-byte ULEB128.
--paulr
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161027/b6581a81/attachment.html>

Robinson, Paul via llvm-dev

2016-Nov-01 17:16 UTC

head link

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

The largest discriminator is 779 (coming from 471.omnetpp, which has significant
amount of EH code.)

779 distinct blocks coming from a single source location?  That's
astounding.

Or something like:
high bits   ---------->  low bits
EEEEEEEECCCCCFFDDD CFFFDDD CCFFFDD

So the lower 7 bits should be able to cover 85% percentile and the lower 14 bits
should be able to cover 99% percentile.

Having a scheme for compact representation for the vast majority of cases is
great, and will really help keep the size of the section under control.  Did you
have a plan for the degenerate cases where one of these elements (D/F/C) exceeds
the specified capacity?  You already have one, because 779 > 8 bits.
Thanks,
--paulr

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/b80a1265/attachment.html>

llvm dev - Nov 2016 - (RFC) Encoding code duplication factor in discriminator

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

[llvm-dev] (RFC) Encoding code duplication factor in discriminator