thr3ads.net - llvm dev - [llvm-dev] Accelerating TLI getLibFunc lookups [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2019-Apr-24 00:50 UTC

[llvm-dev] Accelerating TLI getLibFunc lookups

TLDR: Figuring out whether a declaration is a TLI LibFunc is slow.  We
hammer that path in CGP.  I'm proposing storing the ID of a TLI LibFunc
in the same IntID field in Function we use for IntrinsicID to make that
fast.

Looking into a compile time issue during codegen (LLC) for a large IR
file, I came across something interesting.  Due to the presence of a
very large number of intrinsics in the particular example, we were
spending almost 30% of time in CodeGenPrep::optimizeCallInst, and within
that, almost all of it in the FortifiedLibCallSimplifier.  Now, since
the IR file in question has no fortified libcalls, that seemed a bit odd.

Looking into, it turns out that figuring out that an arbitrary direct
call is *not* a call to a LibCall requires a full name normalization and
table lookup that a successful one does.  We could simply make the
lookup itself faster - it looks like we could probably tablegen a near
optimal character switch lookup table - but that still leaves us with
the normalization.  We could cache the lookup, but then we have an
analysis invalidation problem for all users of TLI.  Not unsolvable, but
not fun if we have a better option. 

Instead, I noticed that we have no overlap between intrinsics, and
target library functions.  Assuming we're happy with that, and don't see
that changing in the future, that gives us an opportunity.  We could
cache the libfunc ID into the Function itself, just like we do for
intrinsics today.

What would that look like in practice you ask?

  * We'd move the definition of LibFunc into
    include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest
    of TLI)
  * We'd change IntID field in GlobalValue to be union of IntrinsicID
    and LibFunc.
  * We'd change Function::getIntrisicID to check the HasLLVMReservedName
    flag (already existing), and return Intrinsic::not_inrinsic value if
    not set.
  * We'd add a corresponding getLibFuncID, and isLibFunc function to
    Function.
  * We'd modify recalculateIntrinsicID to compute the libfunc enum as well.

The tradeoff is that function construction and renaming would become
slightly slower, but determining whether a function was a library
function would become fast.  We could also populate the value lazily,
but that seems like complexity with little benefit. 

Thoughts?  Objections?  Better ideas?

If folks are on board with this, I'm happy to prepare a patch.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190423/f1f61e06/attachment.html>

Björn Pettersson A via llvm-dev

2019-Apr-24 14:55 UTC

head link

[llvm-dev] Accelerating TLI getLibFunc lookups

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Philip Reames
> via llvm-dev
> Sent: den 24 april 2019 02:50
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Accelerating TLI getLibFunc lookups
> 
> TLDR: Figuring out whether a declaration is a TLI LibFunc is slow.  We
> hammer that path in CGP.  I'm proposing storing the ID of a TLI LibFunc
in
> the same IntID field in Function we use for IntrinsicID to make that fast.
> Looking into a compile time issue during codegen (LLC) for a large IR file,
> I came across something interesting.  Due to the presence of a very large
> number of intrinsics in the particular example, we were spending almost 30%
> of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it
> in the FortifiedLibCallSimplifier.  Now, since the IR file in question has
> no fortified libcalls, that seemed a bit odd.
> Looking into, it turns out that figuring out that an arbitrary direct call
> is *not* a call to a LibCall requires a full name normalization and table
> lookup that a successful one does.  We could simply make the lookup itself
> faster - it looks like we could probably tablegen a near optimal character
> switch lookup table - but that still leaves us with the normalization.  We
> could cache the lookup, but then we have an analysis invalidation problem
> for all users of TLI.  Not unsolvable, but not fun if we have a better
> option.
> Instead, I noticed that we have no overlap between intrinsics, and target
> library functions.  Assuming we're happy with that, and don't see
that
> changing in the future, that gives us an opportunity.  We could cache the
> libfunc ID into the Function itself, just like we do for intrinsics today.
> What would that look like in practice you ask?
> • We'd move the definition of LibFunc into
> include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of
> TLI)
> • We'd change IntID field in GlobalValue to be union of IntrinsicID and
> LibFunc.
> • We'd change Function::getIntrisicID to check the HasLLVMReservedName
flag
> (already existing), and return Intrinsic::not_inrinsic value if not set.
> • We'd add a corresponding getLibFuncID, and isLibFunc function to
> Function.
> • We'd modify recalculateIntrinsicID to compute the libfunc enum as
well.
> The tradeoff is that function construction and renaming would become
> slightly slower, but determining whether a function was a library function
> would become fast.  We could also populate the value lazily, but that seems
> like complexity with little benefit.
> Thoughts?  Objections?  Better ideas?
> If folks are on board with this, I'm happy to prepare a patch.
> Philip
So if we know that an there are no intrinsic calls being simplified
by the  FortifiedLibCallSimplifier, then I guess that we could
early out and return false inside the if

  IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
  if (II) {
     ...
  }

that is before the FortifiedLibCallSimplifier simplifications.

That would at least reduce the amount of time spend in the
FortifiedLibCallSimplifier trying to lookup intrinsics.

And if there are some intrinsics that should be dealt with by
the FortifiedLibCallSimplifier we could make sure we fallthrough
by adding some cases to the switch inside that if statement.

Such a solution is ofcourse not as general as the one you are
suggesting, but it might be a simple solution if the problem
is that we try to lookup intrinsic calls inside the
FortifiedLibCallSimplifier.

/Björn

Philip Reames via llvm-dev

2019-Apr-24 15:42 UTC

head link

[llvm-dev] Accelerating TLI getLibFunc lookups

On 4/24/19 7:55 AM, Björn Pettersson A wrote:>> -----Original Message-----
>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Philip Reames
>> via llvm-dev
>> Sent: den 24 april 2019 02:50
>> To: llvm-dev <llvm-dev at lists.llvm.org>
>> Subject: [llvm-dev] Accelerating TLI getLibFunc lookups
>>
>> TLDR: Figuring out whether a declaration is a TLI LibFunc is slow.  We
>> hammer that path in CGP.  I'm proposing storing the ID of a TLI
LibFunc in
>> the same IntID field in Function we use for IntrinsicID to make that
fast.
>> Looking into a compile time issue during codegen (LLC) for a large IR
file,
>> I came across something interesting.  Due to the presence of a very
large
>> number of intrinsics in the particular example, we were spending almost
30%
>> of time in CodeGenPrep::optimizeCallInst, and within that, almost all
of it
>> in the FortifiedLibCallSimplifier.  Now, since the IR file in question
has
>> no fortified libcalls, that seemed a bit odd.
>> Looking into, it turns out that figuring out that an arbitrary direct
call
>> is *not* a call to a LibCall requires a full name normalization and
table
>> lookup that a successful one does.  We could simply make the lookup
itself
>> faster - it looks like we could probably tablegen a near optimal
character
>> switch lookup table - but that still leaves us with the normalization. 
We
>> could cache the lookup, but then we have an analysis invalidation
problem
>> for all users of TLI.  Not unsolvable, but not fun if we have a better
>> option.
>> Instead, I noticed that we have no overlap between intrinsics, and
target
>> library functions.  Assuming we're happy with that, and don't
see that
>> changing in the future, that gives us an opportunity.  We could cache
the
>> libfunc ID into the Function itself, just like we do for intrinsics
today.
>> What would that look like in practice you ask?
>> • We'd move the definition of LibFunc into
>> include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of
>> TLI)
>> • We'd change IntID field in GlobalValue to be union of IntrinsicID
and
>> LibFunc.
>> • We'd change Function::getIntrisicID to check the
HasLLVMReservedName flag
>> (already existing), and return Intrinsic::not_inrinsic value if not
set.
>> • We'd add a corresponding getLibFuncID, and isLibFunc function to
>> Function.
>> • We'd modify recalculateIntrinsicID to compute the libfunc enum as
well.
>> The tradeoff is that function construction and renaming would become
>> slightly slower, but determining whether a function was a library
function
>> would become fast.  We could also populate the value lazily, but that
seems
>> like complexity with little benefit.
>> Thoughts?  Objections?  Better ideas?
>> If folks are on board with this, I'm happy to prepare a patch.
>> Philip
> So if we know that an there are no intrinsic calls being simplified
> by the  FortifiedLibCallSimplifier, then I guess that we could
> early out and return false inside the if
>
>   IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
>   if (II) {
>      ...
>   }
>
> that is before the FortifiedLibCallSimplifier simplifications.
>
> That would at least reduce the amount of time spend in the
> FortifiedLibCallSimplifier trying to lookup intrinsics.
>
> And if there are some intrinsics that should be dealt with by
> the FortifiedLibCallSimplifier we could make sure we fallthrough
> by adding some cases to the switch inside that if statement.
>
> Such a solution is ofcourse not as general as the one you are
> suggesting, but it might be a simple solution if the problem
> is that we try to lookup intrinsic calls inside the
> FortifiedLibCallSimplifier.Your framing does work for this particular use case. This is actually
how I figured out what was going on.  But while CGP is the codepath
which showed up hot in this example, I'm sure there are others. 
Filtering within getLibFunc is also a possibility, which is slightly
more general.  But both variants leave open the same problem for a
module which is heavy on non-intrinsic non-libfunc calls.  The advantage
of my proposed scheme is that all calls are treated
equally. >
> /Björn

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Apr 2019 - Accelerating TLI getLibFunc lookups

[llvm-dev] Accelerating TLI getLibFunc lookups

[llvm-dev] Accelerating TLI getLibFunc lookups

[llvm-dev] Accelerating TLI getLibFunc lookups

Maybe Matching Threads