> On Jan 21, 2015, at 3:02 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > > On Wed, Jan 21, 2015 at 2:43 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote: > The first thing that springs to mind is that I don’t trust the backend to get this right. I don’t think it will understand when an i32 load/store would have been preferable to a float one or vice versa. I have no evidence of this, but given how strongly typed tablegen is, I don’t think it can make a good choice here. > > I don't think tablegen is relevant to making a good choice here. This only comes up when we have a load which is only ever stored. See below, I'll come back to this after clarifying... > > > So I think we probably need to teach the backend how to undo whatever canonical form we choose if it has a reason to. And the best long term solution is for tablegen to have sized load/stores, not typed ones. > > One (potentially expensive) way to choose the canonical form here is to look at the users of the load and see what type works best. If we load an i32, but bit cast and do an fp operation on it, then a float load was best. If we just load it then store, then in theory either type works. > > We actually already do this. =] I tought instcombine to do this recently. The way this works is as follows: > > If we find a load with a single bitcast of its value to some other type, we try to load that type instead. We rely on the fact that if there is in fact a single type that the load is used as, some other part of LLVM will fold the redundant bitcasts. I can easily handle redundant bitcasts if you like. > > If we find a store of a bitcasted value, we try to store the original value instead. > > This works really well *except* for the case when the only (transitive) users of the loaded value are themselves stores. In that case, there is no "truth" we can rely on from operational semantics. We need to just pick a consistent answer IMO. Integers seem like the right consistent answer, and this isn't the only place LLVM does this -- we also lower a small memcpy as an integer load and store.Yeah, thinking about this more, integers do seem like the right answer. If a backend wanted to do something special then its up to them to handle it. For example, x86 might load balance issue ports by turning an i32 load/store in to SSE insert/extract, although i cannot imagine any time this would actually be a good idea.> > As for the backend, I agree I don't trust them to do anything clever here, but I think you may be missing how clever they would need to be. =D The only way this matters is that, for example, you have a store-to-load forwarding unit in your CPU that only works within a register class, and thus a stored integer will fail to get forwarded in the CPU to a load of a floating point value at the same address. If LLVM is ever able to forward this in the IR, it should re-write all the types to match whatever operation we have. > > I don't think any backend today (or in the foreseeable future) would have such smarts. But if CPUs are actually impacted by this kind of thing (and I have no evidence that they are), we might be getting lucky some of the time. > > Personally, I don't think this is compelling enough to delay making a change because i have test cases where we are getting *unlucky* today, and picking a canonical form will either directly fix them or make it much easier to fix them.Sounds good to me. Integers it is then. Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/0caf5044/attachment.html>
On Wed, Jan 21, 2015 at 3:06 PM, Pete Cooper <peter_cooper at apple.com> wrote:> Sounds good to me. Integers it is then.FYI, thanks, I'm just going to commit this then. It seems we're all in essential agreement. We can revert it and take a more cautious approach if something terrible happens. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/155383e0/attachment.html>
> On Jan 21, 2015, at 5:32 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > > On Wed, Jan 21, 2015 at 3:06 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote: > Sounds good to me. Integers it is then. > > FYI, thanks, I'm just going to commit this then. It seems we're all in essential agreement. We can revert it and take a more cautious approach if something terrible happens. =]No problem :) I don’t expect any issues, but then i don’t know the quirks of all of our targets so it’ll be fun to see what happens. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/d377e09c/attachment.html>
So this change did indeed have an effect! :) I’m seeing regressions in a number of benchmarks mainly due to a host of extra bitcasts that get introduced. Here’s the problem I’m seeing in a nutshell: 1) There is a Phi with input type double 2) Polly demotes the phi into a load/store of type double 3) InstCombine canonicalizes the load/store to use i64 instead of double 4) SROA removes the load/store & inserts a phi back in, using i64 as the type. Inserts bitcast to get to double. 5) The bitcast sticks around and eventually get translated into FMOVs (for AArch64 at least). The function findCommonType() in SROA.cpp is used to obtain the type that should be used for the new alloca that SROA wants to create. It’s decision process is essentially – if all loads/stores of alloca are the same, use that type; else use the corresponding integer type. This causes bitcasts to be inserted in a number of places, most all of which stick around. I’ve copied a reduced version of an instance of the problem below. I’m looking for comments on what others think is the right solution here. Make SROA more intelligent about picking the type? The code is below with all unnecessary code removed for easy consumption. Daniel Before Polly – Prepare code for polly we have code that looks like: while.cond473: ; preds = %while.cond473.outer78, %while.body475 %p_j_x452.0 = phi double [ %105, %while.body475 ], [ %p_j_x452.0.ph82, %while.cond473.outer78 ] while.body475: ; preds = %while.cond473 %sub480 = fsub fast double %64, %p_j_x452.0 %105 = load double* %x485, align 8, !tbaa !25 After Polly – Prepare code for polly we have: while.cond473: ; preds = %while.cond473.outer78, %while.body475 %p_j_x452.0.reload = load double* %p_j_x452.0.reg2mem while.body475: ; preds = %while.cond473 %sub480 = fsub fast double %64, %p_j_x452.0.reload %110 = load double* %x485, align 8, !tbaa !25 store double %110, double* %p_j_x452.0.reg2mem After Combine redundant instructions : while.cond473: ; preds = %while.cond473.outer78, %while.body475 %p_j_x452.0.reload = load double* %p_j_x452.0.reg2mem, align 8 while.body475: ; preds = %while.cond473 %sub480 = fsub fast double %74, %p_j_x452.0.reload %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482, i32 0, i32 0 %194 = bitcast double* %x485 to i64* %195 = load i64* %194, align 8, !tbaa !25 %200 = bitcast double* %p_j_x452.0.reg2mem to i64* store i64 %195, i64* %200, align 8 After SROA : while.cond473: ; preds = %while.cond473.outer78, %while.body475 %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362 = phi i64 [ %p_j_x452.0.ph73.reg2mem.sroa.0.0.load368, %while.cond473.outer78 ], [ %178, %while.body475 ] %173 = bitcast i64 %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362 to double while.body475: ; preds = %while.cond473 %sub480 = fsub fast double %78, %173 %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482, i32 0, i32 0 %177 = bitcast double* %x485 to i64* %178 = load i64* %177, align 8, !tbaa !25 From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth Sent: Wednesday, January 21, 2015 8:32 PM To: Pete Cooper Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] RFC: Missing canonicalization in LLVM On Wed, Jan 21, 2015 at 3:06 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com> > wrote: Sounds good to me. Integers it is then. FYI, thanks, I'm just going to commit this then. It seems we're all in essential agreement. We can revert it and take a more cautious approach if something terrible happens. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150421/efc76068/attachment.html>