Steve Montgomery
2012-Aug-14 15:43 UTC
[LLVMdev] Load serialisation during selection DAG building
I looked into those patches but I don't think they will help in my situation because my problems occur during instruction selection rather than scheduling. A simple and concrete example is a pattern like: [(set GR:$dst (add GR:$src (nvload addr:$mem)))] where nvload matches a load provided that isVolatile() is false. If the selection DAG looks like: | | LD1 LD2 ^ ^ | | \ / add ^ | \ / ST and the chain like: LD1 LD2 ^ ^ | | \ / TokenFactor ^ | ST then the add instruction is selected. However, if operand 1 of the add is a volatile load, then the chain looks like: LD1 ^ | LD2Volatile | ST In this case the add instruction cannot be selected. The operator is commutative, so instruction selection matches the non-volatile load against the operand 1 but then fails because to select this instruction would mean that the volatile load into a register (to match operand 0) would occur before the non-volatile load that's folded into the instruction. I can get this instruction to be selected by changing the way loads are built into the selection DAG, i.e. making the chain: LD1 LD2Volatile ^ ^ | | \ / TokenFactor ^ | ST Given what the LRM says about volatile accesses, this seems valid but I take Hal's point about there being code that might not be expecting such a chain. It's a matter of changing a few lines in SelectionDAGBuilder.cpp to achieve what I want so I think I'll go ahead and see what happens. If Hal or anyone else has any further thoughts on the potential problems with doing so then I'd be glad to know. Thanks Steve Montgomery On 13 Aug 2012, at 20:09, Hal Finkel wrote:> Steve, > > I had created a patch last year that does something similar to what you > describe for regular loads and stores using aliasing information. I > think that the last message in the thread was: > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120402/140299.html > > This approach has worked for me, but it is not the preferred solution > going forward. The preferred solution is to keep the "critical > chain" as is (partly because there are places in some of the backend > lowering code that assume it exists in its current form), and just > update the scheduler to be more intelligent about how it forms the > dependency graph. This has since been developed for the new MI > scheduling framework by Sergei Larin, and was committed in r156842 > (last message in the discussion thread was > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142659.html) > > I would recommend trying something like Sergei's solution first, and > fall back to trying to play with the critical chain only if that can't > or won't work. > > -Hal > > On Mon, 13 Aug 2012 11:29:18 +0100 > Steve Montgomery <stephen.montgomery3 at btinternet.com> wrote: > >> I've got a question about how SelectionDAGBuilder treats loads. >> >> The LLVM Language Reference Manual explicitly states that the order >> of volatile operations may be changed relative to non-volatile >> operations. However, when the SelectionDAGBuilder in LLVM 3.1 >> encounters a volatile load, it flushes all pending loads and then >> chains the volatile load onto them meaning that the volatile load >> must be scheduled after those loads. While this behaviour isn't >> wrong, it seems to reduce the scope for efficient instruction >> selection. >> >> Is there any reason not to modify the behaviour of >> SelectionDAGBuilder::visitLoad() so that volatile loads don't cause >> SelectionDAGBuilder::getRoot() to be called. Instead, they can be >> chained together with the head of the chain being stored in >> PendingLoads. Then when something else calls >> SelectionDAGBuilder::getRoot(), the chain of volatile loads is >> TokenFactored together with the non-volatile loads. I've tried this >> out and it seems to do what I want but as I'm fairly inexperienced >> with LLVM, I'm not sure whether there's something else preventing >> this strategy from working. >> >> The reason I noticed this is because I have been developing a >> back-end for a target in which some instructions are implemented as >> pseudos which will be expanded into a pair of instructions. Each of >> the two operands of the pseudo can be a load but as the expansion >> accesses the right operand twice, I don't want to match a volatile >> load for this operand. For example, in: >> >> %0 = load i16* @y, align 2, !tbaa !0 >> %1 = load volatile i16* @x, align 2, !tbaa !0 >> %add = add i16 %0, %1 >> >> the volatile load is sequenced after the non-volatile load which >> means that the non-volatile load can't match the left operand of the >> add because this would create a scheduling cycle. This means both >> loads are selected as load instructions, resulting in use of an extra >> register and an extra instruction. If I change the behaviour of >> SelectionDAGBuilder::visitLoad() as described then I get just two >> instructions. _______________________________________________ LLVM >> Developers mailing list LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Postdoctoral Appointee > Leadership Computing Facility > Argonne National Laboratory-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120814/54546950/attachment.html>
Steve Montgomery
2012-Aug-14 21:05 UTC
[LLVMdev] Load serialisation during selection DAG building
Further to my earlier question, I'm perhaps a bit confused about memory serialisation. The following example, compiled using clang for the MSP430: target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:16:32-n8:16" target triple = "msp430-??-??" @y = common global i16 0, align 2 @x = common global i16 0, align 2 define void @f() nounwind { entry: %0 = load i16* @y, align 2, !tbaa !0 %1 = load volatile i16* @x, align 2, !tbaa !0 %add = add i16 %1, %0 store i16 %add, i16* @y, align 2, !tbaa !0 ret void } has a chain store->load volatile->load. I thought this meant that the load volatile had to occur _after_ the load but the MSP430 backend selects the ADD16mm instruction for which I suspect the order of operand access isn't specified. So, does the chain mean "no earlier than" rather than "later than"? On 14 Aug 2012, at 16:43, Steve Montgomery wrote:> I looked into those patches but I don't think they will help in my situation because my problems occur during instruction selection rather than scheduling. > > A simple and concrete example is a pattern like: > > [(set GR:$dst (add GR:$src (nvload addr:$mem)))] > > where nvload matches a load provided that isVolatile() is false. > > If the selection DAG looks like: > > | | > LD1 LD2 > ^ ^ > | | > \ / > add > ^ > | > \ / > ST > > and the chain like: > > LD1 LD2 > ^ ^ > | | > \ / > TokenFactor > ^ > | > ST > > then the add instruction is selected. However, if operand 1 of the add is a volatile load, then the chain looks like: > > LD1 > ^ > | > LD2Volatile > | > ST > > In this case the add instruction cannot be selected. The operator is commutative, so instruction selection matches the non-volatile load against the operand 1 but then fails because to select this instruction would mean that the volatile load into a register (to match operand 0) would occur before the non-volatile load that's folded into the instruction. > > I can get this instruction to be selected by changing the way loads are built into the selection DAG, i.e. making the chain: > > LD1 LD2Volatile > ^ ^ > | | > \ / > TokenFactor > ^ > | > ST > > Given what the LRM says about volatile accesses, this seems valid but I take Hal's point about there being code that might not be expecting such a chain. > > It's a matter of changing a few lines in SelectionDAGBuilder.cpp to achieve what I want so I think I'll go ahead and see what happens. > > If Hal or anyone else has any further thoughts on the potential problems with doing so then I'd be glad to know. > > Thanks > > Steve Montgomery > > On 13 Aug 2012, at 20:09, Hal Finkel wrote: > >> Steve, >> >> I had created a patch last year that does something similar to what you >> describe for regular loads and stores using aliasing information. I >> think that the last message in the thread was: >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120402/140299.html >> >> This approach has worked for me, but it is not the preferred solution >> going forward. The preferred solution is to keep the "critical >> chain" as is (partly because there are places in some of the backend >> lowering code that assume it exists in its current form), and just >> update the scheduler to be more intelligent about how it forms the >> dependency graph. This has since been developed for the new MI >> scheduling framework by Sergei Larin, and was committed in r156842 >> (last message in the discussion thread was >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142659.html) >> >> I would recommend trying something like Sergei's solution first, and >> fall back to trying to play with the critical chain only if that can't >> or won't work. >> >> -Hal >> >> On Mon, 13 Aug 2012 11:29:18 +0100 >> Steve Montgomery <stephen.montgomery3 at btinternet.com> wrote: >> >>> I've got a question about how SelectionDAGBuilder treats loads. >>> >>> The LLVM Language Reference Manual explicitly states that the order >>> of volatile operations may be changed relative to non-volatile >>> operations. However, when the SelectionDAGBuilder in LLVM 3.1 >>> encounters a volatile load, it flushes all pending loads and then >>> chains the volatile load onto them meaning that the volatile load >>> must be scheduled after those loads. While this behaviour isn't >>> wrong, it seems to reduce the scope for efficient instruction >>> selection. >>> >>> Is there any reason not to modify the behaviour of >>> SelectionDAGBuilder::visitLoad() so that volatile loads don't cause >>> SelectionDAGBuilder::getRoot() to be called. Instead, they can be >>> chained together with the head of the chain being stored in >>> PendingLoads. Then when something else calls >>> SelectionDAGBuilder::getRoot(), the chain of volatile loads is >>> TokenFactored together with the non-volatile loads. I've tried this >>> out and it seems to do what I want but as I'm fairly inexperienced >>> with LLVM, I'm not sure whether there's something else preventing >>> this strategy from working. >>> >>> The reason I noticed this is because I have been developing a >>> back-end for a target in which some instructions are implemented as >>> pseudos which will be expanded into a pair of instructions. Each of >>> the two operands of the pseudo can be a load but as the expansion >>> accesses the right operand twice, I don't want to match a volatile >>> load for this operand. For example, in: >>> >>> %0 = load i16* @y, align 2, !tbaa !0 >>> %1 = load volatile i16* @x, align 2, !tbaa !0 >>> %add = add i16 %0, %1 >>> >>> the volatile load is sequenced after the non-volatile load which >>> means that the non-volatile load can't match the left operand of the >>> add because this would create a scheduling cycle. This means both >>> loads are selected as load instructions, resulting in use of an extra >>> register and an extra instruction. If I change the behaviour of >>> SelectionDAGBuilder::visitLoad() as described then I get just two >>> instructions. _______________________________________________ LLVM >>> Developers mailing list LLVMdev at cs.uiuc.edu >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> -- >> Hal Finkel >> Postdoctoral Appointee >> Leadership Computing Facility >> Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120814/d4a78ffc/attachment.html>
Dan Gohman
2012-Aug-14 21:22 UTC
[LLVMdev] Load serialisation during selection DAG building
On Aug 14, 2012, at 2:05 PM, Steve Montgomery <stephen.montgomery3 at btinternet.com> wrote:> Further to my earlier question, I'm perhaps a bit confused about memory serialisation. The following example, compiled using clang for the MSP430: > > target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:16:32-n8:16" > target triple = "msp430-??-??" > > @y = common global i16 0, align 2 > @x = common global i16 0, align 2 > > define void @f() nounwind { > entry: > %0 = load i16* @y, align 2, !tbaa !0 > %1 = load volatile i16* @x, align 2, !tbaa !0 > %add = add i16 %1, %0 > store i16 %add, i16* @y, align 2, !tbaa !0 > ret void > } > > has a chain store->load volatile->load. I thought this meant that the load volatile had to occur _after_ the load but the MSP430 backend selects the ADD16mm instruction for which I suspect the order of operand access isn't specified. So, does the chain mean "no earlier than" rather than "later than"?No, a chain is supposed to mean "later than". It sounds like MSP430 is bending the rules here. Dan
Possibly Parallel Threads
- [LLVMdev] Load serialisation during selection DAG building
- [LLVMdev] Load serialisation during selection DAG building
- [LLVMdev] Load serialisation during selection DAG building
- [LLVMdev] Load serialisation during selection DAG building
- [LLVMdev] Load serialisation during selection DAG building