On Wed, Aug 31, 2016 at 12:23:34PM -0700, Justin Lebar via llvm-dev wrote:> > Some optimizations that are related to a single thread could be done without needing to know the actual memory scope. > > Right, it's clear to me that there exist optimizations that you cannot > do if we model these ops as target-specific intrinsics. > > But what I think Mehdi and I were trying to get at is: How much of a > problem is this in practice? Are there real-world programs that > suffer because we miss these optimizations? If so, how much? > > The reason I'm asking this question is, there's a real cost to adding > complexity in LLVM. Everyone in the project is going to pay that > cost, forever (or at least, until we remove the feature :). So I want > to try to evaluate whether paying that cost is actually worth while, > as compared to the simple alternative (i.e., intrinsics). Given the > tepid response to this proposal, I'm sort of thinking that now may not > be the time to start paying this cost. (We can always revisit this in > the future.) But I remain open to being convinced. >I think the cost of adding this information to the IR is really low. There is already a sychronization scope field present for LLVM atomic instructions, and it is already being encoded as 32-bits, so it is possible to represent the additional scopes using the existing bitcode format. Optimization passes are already aware of this synchronization scope field, so they know how to preserve it when transforming the IR. The primary goal here is to pass synchronization scope information from the fronted to the backend. We already have a mechanism for doing this, so why not use it? That seems like the lowest cost option to me. -Tom> As a point of comparison, we have a rule of thumb that we'll add an > optimization that increases compilation time by x% if we have a > benchmark that is sped up by at least x%. Similarly here, I'd want to > weigh the added complexity against the improvements to user code. > > -Justin > > On Tue, Aug 23, 2016 at 2:28 PM, Tye, Tony via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > >> Since the scope is “opaque” and target specific, can you elaborate what > >> kind of generic optimization can be performed? > > > > > > > > Some optimizations that are related to a single thread could be done without > > needing to know the actual memory scope. For example, an atomic acquire can > > restrict reordering memory operations after it, but allow reordering of > > memory operations (except another atomic acquire) before it, regardless of > > the memory scope. > > > > > > > > Thanks, > > > > -Tony > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On 09/01/2016 08:52 AM, Tom Stellard via llvm-dev wrote:> On Wed, Aug 31, 2016 at 12:23:34PM -0700, Justin Lebar via llvm-dev wrote: >>> Some optimizations that are related to a single thread could be done without needing to know the actual memory scope. >> Right, it's clear to me that there exist optimizations that you cannot >> do if we model these ops as target-specific intrinsics. >> >> But what I think Mehdi and I were trying to get at is: How much of a >> problem is this in practice? Are there real-world programs that >> suffer because we miss these optimizations? If so, how much? >> >> The reason I'm asking this question is, there's a real cost to adding >> complexity in LLVM. Everyone in the project is going to pay that >> cost, forever (or at least, until we remove the feature :). So I want >> to try to evaluate whether paying that cost is actually worth while, >> as compared to the simple alternative (i.e., intrinsics). Given the >> tepid response to this proposal, I'm sort of thinking that now may not >> be the time to start paying this cost. (We can always revisit this in >> the future.) But I remain open to being convinced. >> > I think the cost of adding this information to the IR is really low. > There is already a sychronization scope field present for LLVM atomic > instructions, and it is already being encoded as 32-bits, so it is > possible to represent the additional scopes using the existing bitcode > format. Optimization passes are already aware of this synchronization > scope field, so they know how to preserve it when transforming the IR.I disagree with this assessment. Atomics are an area where additional complexity has a *substantial* conceptual cost. I also question whether the single_thread scope is actually respected throughout the optimizer in practice. I view the request of changing the IR as a fairly big ask. In particular, I'm really nervous about what the exact optimization semantics of such scopes would be. Depending on how that was defined, this could be anything from fairly straight forward to outright messy. In particular, if there are optimizations which are legal for only some subset of scopes (or subset of pairs of scopes?), I'd really like to see a clear definition given for how those are defined. (p.s. Is there a current patch with an updated LangRef for the proposal being discussed? I've lost track of it.) Let me give an example proposal just to illustrate my point. This isn't really a counter proposal per se, just me thinking out loud. Say we added 32 distinct concurrent domains. One of them is used for "single_thread". One is used for "everything else". The remaining 30 are defined in a target specific manner w/the exception that they can't overlap with each other or with the two predefined ones. The effect of a given atomic operation with respect to each concurrency domain could be defined in terms of a 32 bit mask. If a bit was set, the operation is ordered (according to the separately stated ordering) with that domain. If not, it is explicitly unordered w.r.t. that domain. A memory operation would be tagged with the memory domains which which it might interact. The key bit here is that I can describe transformations in terms of these abstract domains without knowing anything about how the frontend might be using such a domain or how the backend might lower it. In particular, if I have the sequence: %v = load i64, %p atomic scope {domain3 only} fence seq_cst scope={domain1 only} %v2 = load i64, %p atomic scope {domain3 only} I can tell that the two loads aren't order with respect to the fence and that I can do load forwarding here. In general, an IR extension needs to be well defined, general enough to be used by multiple distinct users, and fairly battle tested design wise. We're not completely afraid of having to remove bad ideas from the IR, but we really try to avoid adding things until they're fairly proven.> > The primary goal here is to pass synchronization scope information from > the fronted to the backend. We already have a mechanism for doing this, > so why not use it? That seems like the lowest cost option to me. > > -Tom > >> As a point of comparison, we have a rule of thumb that we'll add an >> optimization that increases compilation time by x% if we have a >> benchmark that is sped up by at least x%. Similarly here, I'd want to >> weigh the added complexity against the improvements to user code. >> >> -Justin >> >> On Tue, Aug 23, 2016 at 2:28 PM, Tye, Tony via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>>> Since the scope is “opaque” and target specific, can you elaborate what >>>> kind of generic optimization can be performed? >>> >>> >>> Some optimizations that are related to a single thread could be done without >>> needing to know the actual memory scope. For example, an atomic acquire can >>> restrict reordering memory operations after it, but allow reordering of >>> memory operations (except another atomic acquire) before it, regardless of >>> the memory scope. >>> >>> >>> >>> Thanks, >>> >>> -Tony >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> On Sep 2, 2016, at 5:52 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > On 09/01/2016 08:52 AM, Tom Stellard via llvm-dev wrote: >> On Wed, Aug 31, 2016 at 12:23:34PM -0700, Justin Lebar via llvm-dev wrote: >>>> Some optimizations that are related to a single thread could be done without needing to know the actual memory scope. >>> Right, it's clear to me that there exist optimizations that you cannot >>> do if we model these ops as target-specific intrinsics. >>> >>> But what I think Mehdi and I were trying to get at is: How much of a >>> problem is this in practice? Are there real-world programs that >>> suffer because we miss these optimizations? If so, how much? >>> >>> The reason I'm asking this question is, there's a real cost to adding >>> complexity in LLVM. Everyone in the project is going to pay that >>> cost, forever (or at least, until we remove the feature :). So I want >>> to try to evaluate whether paying that cost is actually worth while, >>> as compared to the simple alternative (i.e., intrinsics). Given the >>> tepid response to this proposal, I'm sort of thinking that now may not >>> be the time to start paying this cost. (We can always revisit this in >>> the future.) But I remain open to being convinced. >>> >> I think the cost of adding this information to the IR is really low. >> There is already a sychronization scope field present for LLVM atomic >> instructions, and it is already being encoded as 32-bits, so it is >> possible to represent the additional scopes using the existing bitcode >> format. Optimization passes are already aware of this synchronization >> scope field, so they know how to preserve it when transforming the IR. > I disagree with this assessment. Atomics are an area where additional complexity has a *substantial* conceptual cost. I also question whether the single_thread scope is actually respected throughout the optimizer in practice. > > I view the request of changing the IR as a fairly big ask. In particular, I'm really nervous about what the exact optimization semantics of such scopes would be. Depending on how that was defined, this could be anything from fairly straight forward to outright messy. In particular, if there are optimizations which are legal for only some subset of scopes (or subset of pairs of scopes?), I'd really like to see a clear definition given for how those are defined. > (p.s. Is there a current patch with an updated LangRef for the proposal being discussed? I've lost track of it.)Here is the patch: https://reviews.llvm.org/D21723> > Let me give an example proposal just to illustrate my point. This isn't really a counter proposal per se, just me thinking out loud. > > Say we added 32 distinct concurrent domains. One of them is used for "single_thread". One is used for "everything else". The remaining 30 are defined in a target specific manner w/the exception that they can't overlap with each other or with the two predefined ones. The effect of a given atomic operation with respect to each concurrency domain could be defined in terms of a 32 bit mask. If a bit was set, the operation is ordered (according to the separately stated ordering) with that domain. If not, it is explicitly unordered w.r.t. that domain. A memory operation would be tagged with the memory domains which which it might interact. > > The key bit here is that I can describe transformations in terms of these abstract domains without knowing anything about how the frontend might be using such a domain or how the backend might lower it. In particular, if I have the sequence: > %v = load i64, %p atomic scope {domain3 only} > fence seq_cst scope={domain1 only} > %v2 = load i64, %p atomic scope {domain3 only} > > I can tell that the two loads aren't order with respect to the fence and that I can do load forwarding here.I see the current proposal as a strip-down version what you describe: the optimizer can reason about operations inside a single scope, but can’t assume anything cross-scope (they may or may not interact with each other). What you describes seems like having always non-overlapping domains (from the optimizer point of view), and require the frontend to express the overlapping by attaching a “list" of domains that an atomic operation interacts with. I hope I make sense :) Best, — Mehdi> > > In general, an IR extension needs to be well defined, general enough to be used by multiple distinct users, and fairly battle tested design wise. We're not completely afraid of having to remove bad ideas from the IR, but we really try to avoid adding things until they're fairly proven. > > >> >> The primary goal here is to pass synchronization scope information from >> the fronted to the backend. We already have a mechanism for doing this, >> so why not use it? That seems like the lowest cost option to me. >> >> -Tom >> >>> As a point of comparison, we have a rule of thumb that we'll add an >>> optimization that increases compilation time by x% if we have a >>> benchmark that is sped up by at least x%. Similarly here, I'd want to >>> weigh the added complexity against the improvements to user code. >>> >>> -Justin >>> >>> On Tue, Aug 23, 2016 at 2:28 PM, Tye, Tony via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>>> Since the scope is “opaque” and target specific, can you elaborate what >>>>> kind of generic optimization can be performed? >>>> >>>> >>>> Some optimizations that are related to a single thread could be done without >>>> needing to know the actual memory scope. For example, an atomic acquire can >>>> restrict reordering memory operations after it, but allow reordering of >>>> memory operations (except another atomic acquire) before it, regardless of >>>> the memory scope. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> -Tony >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160902/1a4d01eb/attachment-0001.html>