Hongtao Yu via llvm-dev
2021-Jun-09 20:35 UTC
[llvm-dev] [EXTERNAL] Re: exponential code explosion during inlining
There were some offline discussions about implementing the global cap based on an interprocedural loop-graph (ILG) based working set analysis. David, I’m wondering if you could share the progress of the ILG work. Thanks. <mailto:davidxl at google.com> From: Joseph Tremoulet <jotrem at microsoft.com> Date: Wednesday, June 9, 2021 at 1:00 PM To: Hongtao Yu <hoy at fb.com>, Xinliang David Li <xinliangli at gmail.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Wenlei He <wenlei at fb.com> Subject: RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion during inlining Yes, a global cap would certainly be a big enough hammer to prevent the code explosion I’m seeing. I didn’t follow the review closely enough to get a sense, is that likely to land? From: Hongtao Yu <hoy at fb.com> Sent: Wednesday, June 9, 2021 1:14 PM To: Joseph Tremoulet <jotrem at microsoft.com>; Xinliang David Li <xinliangli at gmail.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com> Subject: Re: [EXTERNAL] Re: [llvm-dev] exponential code explosion during inlining The existing implementation has a per-callsite a size cap. For a callsite, if the inlining could cause a potential size growth to exceed the given cap (3000 for hot callsites, 45 for cold callsites, 375 for warm callsites, IIRC), the inlining will not be done. In https://reviews.llvm.org/D98481<https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04|01|jotrem at microsoft.com|1aa271500a8c46e23e8c08d92b69f579|72f988bf86f141af91ab2d7cd011db47|1|0|637588556366647435|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|1000&sdata=+xCY5aNbb+ZEYbwTNE1Ms5OZLIa6pnN/y3hJFeFfr3g=&reserved=0>, we were thinking about enforcing a global size cap per each inliner. Give the current callsite considered to be inlined, if the size growth makes the inliner size over a given cap, the current inlining and all subsequent inlining will be skipped. Could this approach solve your problem? Thanks, Hongtao From: Joseph Tremoulet <jotrem at microsoft.com<mailto:jotrem at microsoft.com>> Date: Wednesday, June 9, 2021 at 9:42 AM To: Xinliang David Li <xinliangli at gmail.com<mailto:xinliangli at gmail.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>, Hongtao Yu <hoy at fb.com<mailto:hoy at fb.com>>, Wenlei He <wenlei at fb.com<mailto:wenlei at fb.com>> Subject: RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion during inlining Related, yes. I see the conversation there leading toward having a cap on inlinee size (because, at leach level, there is a single huge function, which will be inlined twice into the single function in the next level). I was mistakenly under the impression that our existing threshold effectively imposed a size cap (and therefore effectively imposed a fan-out cap), so one avenue I was considering was limiting the depth of transitive inlines through calls copied from inlinees in already-converged SCCs. But adding the depth cap could make cases such as I’m hitting look a lot like the case that https://reviews.llvm.org/D98481<https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04|01|jotrem at microsoft.com|1aa271500a8c46e23e8c08d92b69f579|72f988bf86f141af91ab2d7cd011db47|1|0|637588556366657378|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0> describes. So I think that neither the size cap nor the depth cap alone would be sufficient… From: Xinliang David Li <xinliangli at gmail.com<mailto:xinliangli at gmail.com>> Sent: Tuesday, June 8, 2021 8:03 PM To: Joseph Tremoulet <jotrem at microsoft.com<mailto:jotrem at microsoft.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; Hongtao Yu <hoy at fb.com<mailto:hoy at fb.com>>; Wenlei He <wenlei at fb.com<mailto:wenlei at fb.com>> Subject: [EXTERNAL] Re: [llvm-dev] exponential code explosion during inlining May be related to https://reviews.llvm.org/D98481<https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04|01|jotrem at microsoft.com|1aa271500a8c46e23e8c08d92b69f579|72f988bf86f141af91ab2d7cd011db47|1|0|637588556366657378|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0> David On Tue, Jun 8, 2021 at 8:43 AM Joseph Tremoulet via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, I filed a bug (PR50485 [1]) a couple weeks ago for some pathological behavior we’ve hit in the inliner, but there hasn’t been any reply on the bug so I figured I’d broaden the audience and ask about the issue here. The problem is partially a phase-ordering issue – we run cleanup optimizations after inlining an SCC that reduce the size of functions in the SCC, so a given call foo -> bar that looks unprofitable due to bar’s size while processing the SCC containing foo and bar may suddenly look profitable due to bar’s reduced size when later considering a copy of that same callsite that gets created by inlining foo into some function in a later SCC. This interacts badly with the approach that the inliner relies on local heuristics to eventually converge (rather than limiting itself with some global budget). I’ll copy the comment explaining that approach here: // We use a single common worklist for calls across the entire SCC. We // process these in-order and append new calls introduced during inlining to // the end. // // Note that this particular order of processing is actually critical to // avoid very bad behaviors. Consider *highly connected* call graphs where // each function contains a small amount of code and a couple of calls to // other functions. Because the LLVM inliner is fundamentally a bottom-up // inliner, it can handle gracefully the fact that these all appear to be // reasonable inlining candidates as it will flatten things until they become // too big to inline, and then move on and flatten another batch. // // However, when processing call edges *within* an SCC we cannot rely on this // bottom-up behavior. As a consequence, with heavily connected *SCCs* of // functions we can end up incrementally inlining N calls into each of // N functions because each incremental inlining decision looks good and we // don't have a topological ordering to prevent explosions. // // To compensate for this, we don't process transitive edges made immediate // by inlining until we've done one pass of inlining across the entire SCC. // Large, highly connected SCCs still lead to some amount of code bloat in // this model, but it is uniformly spread across all the functions in the SCC // and eventually they all become too large to inline, rather than // incrementally maknig a single function grow in a super linear fashion. The problem in a nutshell is that “eventually they all become too large to inline” is true while inlining is happening in their SCC, but then the cleanup makes them small again and so the inliner goes nuts chasing all the now-profitable paths through the highly connected SCC when considering them as transitive inlines into a subsequent SCC. I’d love some thoughts on how we might best go about addressing this. I could imagine trying to address it as a phase ordering issue by running cleanup at the start of inlining an SCC – in the cases where we’ve hit this the cleanup hasn’t actually depended on inlining to expose the opportunities, it just happened to first get cleaned up immediately post inlining. I could also imagine trying to address it by limiting transitive inlines at callsites created by inlining functions from already-converged SCCs, which we could either do wholesale (if we’re expecting them to be too big to inline at this point, that shouldn’t be a big loss, right?) or just by capping their depth, say, to cut off exponential explosion. Thanks, -Joseph [1] - 50485 – Exponential code explosion during inlining (llvm.org)<https://nam06.safelinks.protection.outlook.com/?url=https://bugs.llvm.org/show_bug.cgi?id=50485&data=04|01|jotrem at microsoft.com|1aa271500a8c46e23e8c08d92b69f579|72f988bf86f141af91ab2d7cd011db47|1|0|637588556366667321|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|1000&sdata=CA1s3Y1qj0sM3T7DgdCIOiM2yV1bg8fE1XnEbr/W2AM=&reserved=0> _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://nam06.safelinks.protection.outlook.com/?url=https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev&data=04|01|jotrem at microsoft.com|1aa271500a8c46e23e8c08d92b69f579|72f988bf86f141af91ab2d7cd011db47|1|0|637588556366677276|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|1000&sdata=l++ALFP37cEG27HSOY/p4oZDFe5jlYHxGYSgeaUaqLc=&reserved=0> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210609/1859474d/attachment.html>
Xinliang David Li via llvm-dev
2021-Jun-09 20:41 UTC
[llvm-dev] [EXTERNAL] Re: exponential code explosion during inlining
On Wed, Jun 9, 2021 at 1:35 PM Hongtao Yu <hoy at fb.com> wrote:> There were some offline discussions about implementing the global cap > based on an interprocedural loop-graph (ILG) based working set analysis. > David, I’m wondering if you could share the progress of the ILG work. > Thanks. >ILG based cap will be used for performance tuning and it won't be global but per code-reuse region. To handle pathological code growth, a different/simpler capping mechanism is probably better.> <davidxl at google.com> > > > > *From: *Joseph Tremoulet <jotrem at microsoft.com> > *Date: *Wednesday, June 9, 2021 at 1:00 PM > *To: *Hongtao Yu <hoy at fb.com>, Xinliang David Li <xinliangli at gmail.com> > *Cc: *llvm-dev <llvm-dev at lists.llvm.org>, Wenlei He <wenlei at fb.com> > *Subject: *RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion > during inlining > > Yes, a global cap would certainly be a big enough hammer to prevent the > code explosion I’m seeing. I didn’t follow the review closely enough to > get a sense, is that likely to land? > > > > *From:* Hongtao Yu <hoy at fb.com> > *Sent:* Wednesday, June 9, 2021 1:14 PM > *To:* Joseph Tremoulet <jotrem at microsoft.com>; Xinliang David Li < > xinliangli at gmail.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com> > *Subject:* Re: [EXTERNAL] Re: [llvm-dev] exponential code explosion > during inlining > > > > The existing implementation has a per-callsite a size cap. For a callsite, > if the inlining could cause a potential size growth to exceed the given cap > (3000 for hot callsites, 45 for cold callsites, 375 for warm callsites, > IIRC), the inlining will not be done. In https://reviews.llvm.org/D98481 > <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem at microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366647435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=+xCY5aNbb+ZEYbwTNE1Ms5OZLIa6pnN/y3hJFeFfr3g=&reserved=0>, > we were thinking about enforcing a global size cap per each inliner. Give > the current callsite considered to be inlined, if the size growth makes the > inliner size over a given cap, the current inlining and all subsequent > inlining will be skipped. Could this approach solve your problem? > > > > Thanks, > > Hongtao > > > > *From: *Joseph Tremoulet <jotrem at microsoft.com> > *Date: *Wednesday, June 9, 2021 at 9:42 AM > *To: *Xinliang David Li <xinliangli at gmail.com> > *Cc: *llvm-dev <llvm-dev at lists.llvm.org>, Hongtao Yu <hoy at fb.com>, Wenlei > He <wenlei at fb.com> > *Subject: *RE: [EXTERNAL] Re: [llvm-dev] exponential code explosion > during inlining > > Related, yes. I see the conversation there leading toward having a cap on > inlinee size (because, at leach level, there is a single huge function, > which will be inlined twice into the single function in the next level). I > was mistakenly under the impression that our existing threshold effectively > imposed a size cap (and therefore effectively imposed a fan-out cap), so > one avenue I was considering was limiting the depth of transitive inlines > through calls copied from inlinees in already-converged SCCs. But adding > the depth cap could make cases such as I’m hitting look a lot like the case > that https://reviews.llvm.org/D98481 > <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem at microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366657378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0> > describes. So I think that neither the size cap nor the depth cap alone > would be sufficient… > > > > *From:* Xinliang David Li <xinliangli at gmail.com> > *Sent:* Tuesday, June 8, 2021 8:03 PM > *To:* Joseph Tremoulet <jotrem at microsoft.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Hongtao Yu <hoy at fb.com>; Wenlei > He <wenlei at fb.com> > *Subject:* [EXTERNAL] Re: [llvm-dev] exponential code explosion during > inlining > > > > May be related to https://reviews.llvm.org/D98481 > <https://nam06.safelinks.protection.outlook.com/?url=https://reviews.llvm.org/D98481&data=04%7C01%7Cjotrem at microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366657378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=hWIPCD3k6UdpeyQWIHh/vtgcWIoHucfWNpaSIk9eHYk=&reserved=0> > > > > David > > > > On Tue, Jun 8, 2021 at 8:43 AM Joseph Tremoulet via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi, > > > > I filed a bug (PR50485 [1]) a couple weeks ago for some pathological > behavior we’ve hit in the inliner, but there hasn’t been any reply on the > bug so I figured I’d broaden the audience and ask about the issue here. > > > > The problem is partially a phase-ordering issue – we run cleanup > optimizations after inlining an SCC that reduce the size of functions in > the SCC, so a given call foo -> bar that looks unprofitable due to bar’s > size while processing the SCC containing foo and bar may suddenly look > profitable due to bar’s reduced size when later considering a copy of that > same callsite that gets created by inlining foo into some function in a > later SCC. > > > > This interacts badly with the approach that the inliner relies on local > heuristics to eventually converge (rather than limiting itself with some > global budget). I’ll copy the comment explaining that approach here: > > > > // We use a single common worklist for calls across the entire SCC. We > > // process these in-order and append new calls introduced during > inlining to > > // the end. > > // > > // Note that this particular order of processing is actually critical to > > // avoid very bad behaviors. Consider *highly connected* call graphs > where > > // each function contains a small amount of code and a couple of calls to > > // other functions. Because the LLVM inliner is fundamentally a bottom-up > > // inliner, it can handle gracefully the fact that these all appear to be > > // reasonable inlining candidates as it will flatten things until they > become > > // too big to inline, and then move on and flatten another batch. > > // > > // However, when processing call edges *within* an SCC we cannot rely on > this > > // bottom-up behavior. As a consequence, with heavily connected *SCCs* of > > // functions we can end up incrementally inlining N calls into each of > > // N functions because each incremental inlining decision looks good and > we > > // don't have a topological ordering to prevent explosions. > > // > > // To compensate for this, we don't process transitive edges made > immediate > > // by inlining until we've done one pass of inlining across the entire > SCC. > > // Large, highly connected SCCs still lead to some amount of code bloat > in > > // this model, but it is uniformly spread across all the functions in > the SCC > > // and eventually they all become too large to inline, rather than > > // incrementally maknig a single function grow in a super linear fashion. > > > > The problem in a nutshell is that “eventually they all become too large to > inline” is true while inlining is happening in their SCC, but then the > cleanup makes them small again and so the inliner goes nuts chasing all the > now-profitable paths through the highly connected SCC when considering them > as transitive inlines into a subsequent SCC. > > > > I’d love some thoughts on how we might best go about addressing this. I > could imagine trying to address it as a phase ordering issue by running > cleanup at the start of inlining an SCC – in the cases where we’ve hit this > the cleanup hasn’t actually depended on inlining to expose the > opportunities, it just happened to first get cleaned up immediately post > inlining. I could also imagine trying to address it by limiting transitive > inlines at callsites created by inlining functions from already-converged > SCCs, which we could either do wholesale (if we’re expecting them to be too > big to inline at this point, that shouldn’t be a big loss, right?) or just > by capping their depth, say, to cut off exponential explosion. > > > > > > Thanks, > > -Joseph > > > > [1] - 50485 – Exponential code explosion during inlining (llvm.org) > <https://nam06.safelinks.protection.outlook.com/?url=https://bugs.llvm.org/show_bug.cgi?id=50485&data=04%7C01%7Cjotrem at microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366667321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=CA1s3Y1qj0sM3T7DgdCIOiM2yV1bg8fE1XnEbr/W2AM=&reserved=0> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://nam06.safelinks.protection.outlook.com/?url=https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev&data=04%7C01%7Cjotrem at microsoft.com%7C1aa271500a8c46e23e8c08d92b69f579%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637588556366677276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C1000&sdata=l++ALFP37cEG27HSOY/p4oZDFe5jlYHxGYSgeaUaqLc=&reserved=0> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210609/dfb3601e/attachment-0001.html>