Philip Reames via llvm-dev
2019-Apr-24 00:50 UTC
[llvm-dev] Accelerating TLI getLibFunc lookups
TLDR: Figuring out whether a declaration is a TLI LibFunc is slow. We hammer that path in CGP. I'm proposing storing the ID of a TLI LibFunc in the same IntID field in Function we use for IntrinsicID to make that fast. Looking into a compile time issue during codegen (LLC) for a large IR file, I came across something interesting. Due to the presence of a very large number of intrinsics in the particular example, we were spending almost 30% of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it in the FortifiedLibCallSimplifier. Now, since the IR file in question has no fortified libcalls, that seemed a bit odd. Looking into, it turns out that figuring out that an arbitrary direct call is *not* a call to a LibCall requires a full name normalization and table lookup that a successful one does. We could simply make the lookup itself faster - it looks like we could probably tablegen a near optimal character switch lookup table - but that still leaves us with the normalization. We could cache the lookup, but then we have an analysis invalidation problem for all users of TLI. Not unsolvable, but not fun if we have a better option. Instead, I noticed that we have no overlap between intrinsics, and target library functions. Assuming we're happy with that, and don't see that changing in the future, that gives us an opportunity. We could cache the libfunc ID into the Function itself, just like we do for intrinsics today. What would that look like in practice you ask? * We'd move the definition of LibFunc into include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of TLI) * We'd change IntID field in GlobalValue to be union of IntrinsicID and LibFunc. * We'd change Function::getIntrisicID to check the HasLLVMReservedName flag (already existing), and return Intrinsic::not_inrinsic value if not set. * We'd add a corresponding getLibFuncID, and isLibFunc function to Function. * We'd modify recalculateIntrinsicID to compute the libfunc enum as well. The tradeoff is that function construction and renaming would become slightly slower, but determining whether a function was a library function would become fast. We could also populate the value lazily, but that seems like complexity with little benefit. Thoughts? Objections? Better ideas? If folks are on board with this, I'm happy to prepare a patch. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190423/f1f61e06/attachment.html>
Björn Pettersson A via llvm-dev
2019-Apr-24 14:55 UTC
[llvm-dev] Accelerating TLI getLibFunc lookups
> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames > via llvm-dev > Sent: den 24 april 2019 02:50 > To: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] Accelerating TLI getLibFunc lookups > > TLDR: Figuring out whether a declaration is a TLI LibFunc is slow. We > hammer that path in CGP. I'm proposing storing the ID of a TLI LibFunc in > the same IntID field in Function we use for IntrinsicID to make that fast. > Looking into a compile time issue during codegen (LLC) for a large IR file, > I came across something interesting. Due to the presence of a very large > number of intrinsics in the particular example, we were spending almost 30% > of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it > in the FortifiedLibCallSimplifier. Now, since the IR file in question has > no fortified libcalls, that seemed a bit odd. > Looking into, it turns out that figuring out that an arbitrary direct call > is *not* a call to a LibCall requires a full name normalization and table > lookup that a successful one does. We could simply make the lookup itself > faster - it looks like we could probably tablegen a near optimal character > switch lookup table - but that still leaves us with the normalization. We > could cache the lookup, but then we have an analysis invalidation problem > for all users of TLI. Not unsolvable, but not fun if we have a better > option. > Instead, I noticed that we have no overlap between intrinsics, and target > library functions. Assuming we're happy with that, and don't see that > changing in the future, that gives us an opportunity. We could cache the > libfunc ID into the Function itself, just like we do for intrinsics today. > What would that look like in practice you ask? > • We'd move the definition of LibFunc into > include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of > TLI) > • We'd change IntID field in GlobalValue to be union of IntrinsicID and > LibFunc. > • We'd change Function::getIntrisicID to check the HasLLVMReservedName flag > (already existing), and return Intrinsic::not_inrinsic value if not set. > • We'd add a corresponding getLibFuncID, and isLibFunc function to > Function. > • We'd modify recalculateIntrinsicID to compute the libfunc enum as well. > The tradeoff is that function construction and renaming would become > slightly slower, but determining whether a function was a library function > would become fast. We could also populate the value lazily, but that seems > like complexity with little benefit. > Thoughts? Objections? Better ideas? > If folks are on board with this, I'm happy to prepare a patch. > PhilipSo if we know that an there are no intrinsic calls being simplified by the FortifiedLibCallSimplifier, then I guess that we could early out and return false inside the if IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI); if (II) { ... } that is before the FortifiedLibCallSimplifier simplifications. That would at least reduce the amount of time spend in the FortifiedLibCallSimplifier trying to lookup intrinsics. And if there are some intrinsics that should be dealt with by the FortifiedLibCallSimplifier we could make sure we fallthrough by adding some cases to the switch inside that if statement. Such a solution is ofcourse not as general as the one you are suggesting, but it might be a simple solution if the problem is that we try to lookup intrinsic calls inside the FortifiedLibCallSimplifier. /Björn
Philip Reames via llvm-dev
2019-Apr-24 15:42 UTC
[llvm-dev] Accelerating TLI getLibFunc lookups
On 4/24/19 7:55 AM, Björn Pettersson A wrote:>> -----Original Message----- >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames >> via llvm-dev >> Sent: den 24 april 2019 02:50 >> To: llvm-dev <llvm-dev at lists.llvm.org> >> Subject: [llvm-dev] Accelerating TLI getLibFunc lookups >> >> TLDR: Figuring out whether a declaration is a TLI LibFunc is slow. We >> hammer that path in CGP. I'm proposing storing the ID of a TLI LibFunc in >> the same IntID field in Function we use for IntrinsicID to make that fast. >> Looking into a compile time issue during codegen (LLC) for a large IR file, >> I came across something interesting. Due to the presence of a very large >> number of intrinsics in the particular example, we were spending almost 30% >> of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it >> in the FortifiedLibCallSimplifier. Now, since the IR file in question has >> no fortified libcalls, that seemed a bit odd. >> Looking into, it turns out that figuring out that an arbitrary direct call >> is *not* a call to a LibCall requires a full name normalization and table >> lookup that a successful one does. We could simply make the lookup itself >> faster - it looks like we could probably tablegen a near optimal character >> switch lookup table - but that still leaves us with the normalization. We >> could cache the lookup, but then we have an analysis invalidation problem >> for all users of TLI. Not unsolvable, but not fun if we have a better >> option. >> Instead, I noticed that we have no overlap between intrinsics, and target >> library functions. Assuming we're happy with that, and don't see that >> changing in the future, that gives us an opportunity. We could cache the >> libfunc ID into the Function itself, just like we do for intrinsics today. >> What would that look like in practice you ask? >> • We'd move the definition of LibFunc into >> include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of >> TLI) >> • We'd change IntID field in GlobalValue to be union of IntrinsicID and >> LibFunc. >> • We'd change Function::getIntrisicID to check the HasLLVMReservedName flag >> (already existing), and return Intrinsic::not_inrinsic value if not set. >> • We'd add a corresponding getLibFuncID, and isLibFunc function to >> Function. >> • We'd modify recalculateIntrinsicID to compute the libfunc enum as well. >> The tradeoff is that function construction and renaming would become >> slightly slower, but determining whether a function was a library function >> would become fast. We could also populate the value lazily, but that seems >> like complexity with little benefit. >> Thoughts? Objections? Better ideas? >> If folks are on board with this, I'm happy to prepare a patch. >> Philip > So if we know that an there are no intrinsic calls being simplified > by the FortifiedLibCallSimplifier, then I guess that we could > early out and return false inside the if > > IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI); > if (II) { > ... > } > > that is before the FortifiedLibCallSimplifier simplifications. > > That would at least reduce the amount of time spend in the > FortifiedLibCallSimplifier trying to lookup intrinsics. > > And if there are some intrinsics that should be dealt with by > the FortifiedLibCallSimplifier we could make sure we fallthrough > by adding some cases to the switch inside that if statement. > > Such a solution is ofcourse not as general as the one you are > suggesting, but it might be a simple solution if the problem > is that we try to lookup intrinsic calls inside the > FortifiedLibCallSimplifier.Your framing does work for this particular use case. This is actually how I figured out what was going on. But while CGP is the codepath which showed up hot in this example, I'm sure there are others. Filtering within getLibFunc is also a possibility, which is slightly more general. But both variants leave open the same problem for a module which is heavy on non-intrinsic non-libfunc calls. The advantage of my proposed scheme is that all calls are treated equally.> > /Björn
Possibly Parallel Threads
- [LLVMdev] SimplifyLibCalls doesn't check TLI for LibFunc availability
- [LLVMdev] SimplifyLibCalls doesn't check TLI for LibFunc availability
- [LLVMdev] strlen in fast-isel
- DSE: Remove useless stores between malloc & memset
- DSE: Remove useless stores between malloc & memset