Hal Finkel via llvm-dev
2016-Feb-27 04:15 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Philip Reames" > <listmail at philipreames.com>, "Duncan P. N. Exon Smith" > <dexonsmith at apple.com>, "Xinliang David Li" <xinliangli at gmail.com>, > "Sanjoy Das" <sanjoy at playingwithpointers.com> > Sent: Friday, February 26, 2016 10:04:44 PM > Subject: Re: [llvm-dev] Possible soundness issue with > available_externally (split from "RFC: Add guard intrinsics")> On Fri, Feb 26, 2016 at 7:59 PM Hal Finkel < hfinkel at anl.gov > wrote:> > > From: "Chandler Carruth" < chandlerc at google.com > > > > > > > To: "Hal Finkel" < hfinkel at anl.gov > > > > > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >, "Philip Reames" < > > > listmail at philipreames.com >, "Duncan P. N. Exon Smith" < > > > dexonsmith at apple.com >, "Xinliang David Li" < > > > xinliangli at gmail.com > > > >, "Sanjoy Das" < sanjoy at playingwithpointers.com > > > >> > > Sent: Friday, February 26, 2016 9:41:23 PM > > > > > > Subject: Re: [llvm-dev] Possible soundness issue with > > > available_externally (split from "RFC: Add guard intrinsics") > > >> > > On Fri, Feb 26, 2016 at 7:38 PM Hal Finkel < hfinkel at anl.gov > > > > wrote: > > >> > > > > From: "Chandler Carruth" < chandlerc at google.com > > > > > > > > > > > > > > > > To: "Hal Finkel" < hfinkel at anl.gov > > > > > > > > > > >> > > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >, "Philip Reames" < > > > > > listmail at philipreames.com >, "Duncan P. N. Exon Smith" < > > > > > dexonsmith at apple.com >, "Xinliang David Li" < > > > > > xinliangli at gmail.com > > > > > >, "Sanjoy Das" < sanjoy at playingwithpointers.com > > > > > > > > > > > > > > > > Sent: Friday, February 26, 2016 9:33:55 PM > > > > > > > > > > > > > > > Subject: Re: [llvm-dev] Possible soundness issue with > > > > > available_externally (split from "RFC: Add guard intrinsics") > > > > > > > > > >> > > > > On Fri, Feb 26, 2016 at 7:26 PM Hal Finkel < hfinkel at anl.gov > > > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > From: "Chandler Carruth" < chandlerc at google.com > > > > > > > > > > > > > > > > > > > > > > > To: "Hal Finkel" < hfinkel at anl.gov >, "Sanjoy Das" < > > > > > > > sanjoy at playingwithpointers.com > > > > > > > > > > > > > > > > > > > > > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >, "Philip > > > > > > > Reames" > > > > > > > < > > > > > > > listmail at philipreames.com >, "Duncan P. N. Exon Smith" > > > > > > > > > > > > > > > > > > > > > > < dexonsmith at apple.com >, "Xinliang David Li" < > > > > > > > xinliangli at gmail.com > > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, February 26, 2016 9:01:48 PM > > > > > > > > > > > > > > > > > > > > > > Subject: Re: [llvm-dev] Possible soundness issue with > > > > > > > available_externally (split from "RFC: Add guard > > > > > > > intrinsics") > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this will have a much higher cost than my > > > > > > > proposal > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > constrain how we deduce function attributes (which still > > > > > > > fixes > > > > > > > > > > > > > > > > > > > > > > Sanjoy's latest example). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Specifically, I think this will force us to constrain far > > > > > > > too > > > > > > > many > > > > > > > > > > > > > > > > > > > > > > transformations for the sake of code size in functions > > > > > > > that > > > > > > > we > > > > > > > won't > > > > > > > > > > > > > > > > > > > > > > inline. Even if we were never going to deduce function > > > > > > > attributes > > > > > > > > > > > > > > > > > > > > > > for anything in the function (because its big and reads > > > > > > > and > > > > > > > writes > > > > > > > > > > > > > > > > > > > > > > everything), we'll still have to constrain our > > > > > > > transformations > > > > > > > just > > > > > > > > > > > > > > > > > > > > > > because we *might* later deduce a function attribute that > > > > > > > triggers > > > > > > > > > > > > > > > > > > > > > > these kinds of bugs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Essentially, you're proposing to limit intraprocedural > > > > > > > optimization > > > > > > > > > > > > > > > > > > > > > > to when we can successfully to interprocedural > > > > > > > optimization > > > > > > > > > > > > > > > > > > > > > > ("privatization"), where I'm suggesting we limit > > > > > > > interprocedural > > > > > > > > > > > > > > > > > > > > > > optimization to leave intraprocedural optimization > > > > > > > unconstrained. > > > > > > > > > > > > > > > > > > > > > > Given the ratio of our optimizations (almost all are > > > > > > > intra, > > > > > > > very > > > > > > > few > > > > > > > > > > > > > > > > > > > > > > are inter), I'm much more comfortable with the latter. > > > > > > > > > > > > > > >> > > > > > This is a good point; we can certainly (easily) delay the > > > > > > privatization decision until we modify any IPA-level > > > > > > function > > > > > > information (at which point we can either reject the > > > > > > attribute > > > > > > change (when optimizing for code size), or keep it locally > > > > > > (when > > > > > > optimizing for speed). Ideally, you'd want to delay this > > > > > > even > > > > > > further (until you knew the attribute information was > > > > > > used), > > > > > > but > > > > > > I'm > > > > > > not sure that's practical. > > > > > > > > > > > > > > >> > > > > > Actually, what if instead of actually privatizing, we moved > > > > > > the > > > > > > function into a different comdat section named after some > > > > > > hash > > > > > > of > > > > > > the function body? That way, if all versions are actually > > > > > > optimized > > > > > > identically, we'll still only end up with one copy in the > > > > > > final > > > > > > executable. If this is technically possible, it seems like > > > > > > the > > > > > > best > > > > > > kind of solution. > > > > > > > > > > > > > > >> > > > > This is how I want to do a revamped function merging anyways > > > > > and > > > > > it > > > > > would fall out naturally of that. > > > > > > > > > > > > > > Excellent, so let's fix this at the same time we implement this > > > > function merging (so that we don't have performance regressions > > > > in > > > > an intermediate state). This will also allow us to have uniform > > > > logic across different optimization levels, which is obviously > > > > preferable. > > > > > >> > > I am *extremely* uncomfortable waiting to fix this until merging > > > stuff is in place and we add privatization heuristics to our IPO > > > passes. Those might be years away. > > > > > I'm *extremely* uncomfortable fixing this at all unless it can be > > done without causing performance regressions. The underlying basic > > use case (linking together code compiled with different > > optimization > > levels), is certainly something I'd like to work properly, but is > > definitely a far lower priority than optimized code quality and > > size. >> Ok, I prioritize this differently: it is absolutely critical. Every > binary I have includes files that fit this description. Every single > binary.> I think that we will discover subtle miscompiles every time we try to > make function attributes more powerful until this is fixed. I think > this is absolutely critical to fix.> I'm fine if you want an option to enable unsafe behavior here, but I > think we have to have an option that provides correct results and I > frankly think it should be the default.I'm fine with this (i.e. there must be an option to turn it off).> > Furthermore, you agree that there is a technical solution that > > satisfies these requirements (placing functions into their own > > hashed comdat sections), > > No, I think that is a significant overstatement. I think that this > would require a pretty sizable amount of work to get the size > regression to be tolerable. I don't know how much work, nor when it > could be done.Why would there be size regressions? Are there section alignment padding concerns?> > and I don't see why this is not relatively-straightforward to > > implement, and so if we want to fix this bug, we should implement > > it. In the common case (where everything is optimized at the same > > level), I don't see why it has any additional overhead. We should > > be > > able to privatize aggressively under this scheme. > > I don't have any confidence in our ability to perfectly merge > privatized routines. Maybe you do, but I genuinely do not how to do > it with sufficient reliability today.> I have ideas about how to get a *good* approximation that I think > will recover much of the penalty. But I think it would still need a > size threshold....Can you explain your concerns about reliability? My idea is that we have a kind of delayed-name comdat section attribute that says that the name of the comdat section into which the function will be placed will be based on the hash of the contents (which should probably be the pre-isel IR-level contexts plus perhaps the other state that might affect backend optimization). Maybe just using the MI-level function contents will work too. It seems like this should be "perfect". I don't see the circumstances why function definitions optimized under identical optimization settings should differ in this sense. Doing further, since our optimizations should be deterministic, we might even be able to just hash the optimization level information and other relevant attributes (which could affect TTI), and then be done with it. Thanks again, Hal> > -Hal >> > > I think we should pretty immediately: > > >> > > 1) Make our IPO passes conservatively correc > > > > > > 2) Leave comments about how to add privatization, but explain the > > > code size cost incurred by it > > >> > > I have no idea if anyone is even working on privatization (I'm > > > not) > > > or function merging (no ETA at all, its really far down my > > > queue). > > > I > > > think we should decouple all these pieces. Once things are > > > correct, > > > folks can add a very size-conservative privatization > > > transformation > > > to IPO routines if we don't have merging, or a fairly aggressive > > > one > > > if we do have merging. And if we add merging later we can re-tune > > > the privatization. > > >> > > -Chandler > > >> > > > Thanks again, > > > > > > > > > > Hal > > > > > >> > > > -- > > > > > >> > > > Hal Finkel > > > > > > > > > > Assistant Computational Scientist > > > > > > > > > > Leadership Computing Facility > > > > > > > > > > Argonne National Laboratory > > > > > >> > -- >> > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/5ebbe960/attachment.html>
Chandler Carruth via llvm-dev
2016-Feb-27 04:30 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
On Fri, Feb 26, 2016 at 8:15 PM Hal Finkel <hfinkel at anl.gov> wrote:> Can you explain your concerns about reliability? My idea is that we have a > kind of delayed-name comdat section attribute that says that the name of > the comdat section into which the function will be placed will be based on > the hash of the contents (which should probably be the pre-isel IR-level > contexts plus perhaps the other state that might affect backend > optimization). Maybe just using the MI-level function contents will work > too. It seems like this should be "perfect". I don't see the circumstances > why function definitions optimized under identical optimization settings > should differ in this sense. >Ok, let's consider Sanjoy's latest post. I think we could end up with this routine in N different modules with slightly different amounts of information in each. Each of these in turn could cause us to deduce a slightly different set of attributes, and thus compute a different hash. Now, certainly, there are a small number of variants here. Two, maybe three. But this could in the worst case cause us to have two copies of a *huge* number of functions. My concern here is not that this is a *theoretically* hard problem to solve. I think there are good techniques to handle this. My concern is that it will be a substantial amount of work and require a reasonable amount of testing and experimentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160227/b8ea8c56/attachment.html>
Xinliang David Li via llvm-dev
2016-Feb-27 04:40 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
On Fri, Feb 26, 2016 at 8:30 PM, Chandler Carruth <chandlerc at google.com> wrote:> On Fri, Feb 26, 2016 at 8:15 PM Hal Finkel <hfinkel at anl.gov> wrote: > >> Can you explain your concerns about reliability? My idea is that we have >> a kind of delayed-name comdat section attribute that says that the name of >> the comdat section into which the function will be placed will be based on >> the hash of the contents (which should probably be the pre-isel IR-level >> contexts plus perhaps the other state that might affect backend >> optimization). Maybe just using the MI-level function contents will work >> too. It seems like this should be "perfect". I don't see the circumstances >> why function definitions optimized under identical optimization settings >> should differ in this sense. >> > > Ok, let's consider Sanjoy's latest post. > > I think we could end up with this routine in N different modules with > slightly different amounts of information in each. Each of these in turn > could cause us to deduce a slightly different set of attributes, and thus > compute a different hash. > > Now, certainly, there are a small number of variants here. Two, maybe > three. But this could in the worst case cause us to have two copies of a > *huge* number of functions. > > My concern here is not that this is a *theoretically* hard problem to > solve. I think there are good techniques to handle this. My concern is that > it will be a substantial amount of work and require a reasonable amount of > testing and experimentation. >Such data won't be too hard to find out. It can be conservatively estimated by looking at the final binary code for each COMDAT to see how many 'clones's are needed due to different code gen -- the result is the upper bound as different code gen does not mean different deduced function attributes. With PGO, the decision process will also become much simpler. David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/0631ad32/attachment.html>
Hal Finkel via llvm-dev
2016-Feb-27 04:44 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Philip Reames" <listmail at philipreames.com>, "Duncan P. N. Exon Smith" > <dexonsmith at apple.com>, "Xinliang David Li" <xinliangli at gmail.com>, "Sanjoy Das" <sanjoy at playingwithpointers.com> > Sent: Friday, February 26, 2016 10:30:13 PM > Subject: Re: [llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics") > > > > > On Fri, Feb 26, 2016 at 8:15 PM Hal Finkel < hfinkel at anl.gov > wrote: > > > > Can you explain your concerns about reliability? My idea is that we > have a kind of delayed-name comdat section attribute that says that > the name of the comdat section into which the function will be > placed will be based on the hash of the contents (which should > probably be the pre-isel IR-level contexts plus perhaps the other > state that might affect backend optimization). Maybe just using the > MI-level function contents will work too. It seems like this should > be "perfect". I don't see the circumstances why function definitions > optimized under identical optimization settings should differ in > this sense. > > > > Ok, let's consider Sanjoy's latest post. >Indeed, a great example. His point is well taken, this is not strictly related to optimization level.> > I think we could end up with this routine in N different modules with > slightly different amounts of information in each. Each of these in > turn could cause us to deduce a slightly different set of > attributes, and thus compute a different hash. > > Now, certainly, there are a small number of variants here. Two, maybe > three. But this could in the worst case cause us to have two copies > of a *huge* number of functions. > > My concern here is not that this is a *theoretically* hard problem to > solve. I think there are good techniques to handle this.It seems like, by definition, the information necessary to control the number of variants is not locally available. Controlling this by some global setting (opt for size vs. opt for speed) seems easy. Otherwise, I don't understand what you're thinking. Thanks again, Hal> My concern > is that it will be a substantial amount of work and require a > reasonable amount of testing and experimentation.-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Maybe Matching Threads
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")