Jordy Potman
2012-Jun-12 14:20 UTC
[LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs
Hi all, I have a question regarding the latency of the true dependency of a store followed by an aliased load in ScheduleDAGInstrs. The latency seems to depend on the store and load being volatile or not as can be seen in the post-RA-sched debug output of the attached ARM example: $ llc -O3 -debug-only=post-RA-sched store_load_latency_test.ll ... SU(2): STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:Volatile ST4[%p1](tbaa=!"int") # preds left : 1 # succs left : 2 # rdefs left : 0 Latency : 1 Depth : 2 Height : 0 Predecessors: val SU(1): Latency=1 Reg=%R2 Successors: antiSU(3): Latency=0 ch SU(3): Latency=0 SU(3): %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:Volatile LD4[%p2](tbaa=!"int") # preds left : 2 # succs left : 1 # rdefs left : 0 Latency : 1 Depth : 2 Height : 0 Predecessors: antiSU(2): Latency=0 ch SU(2): Latency=0 Successors: val SU(4294967295): Latency=2 ... SU(2): STRi12 %R2<kill>, %R0<kill>, 0, pred:14, pred:%noreg; mem:ST4[%p1](tbaa=!"int") # preds left : 1 # succs left : 2 # rdefs left : 0 Latency : 1 Depth : 2 Height : 0 Predecessors: val SU(1): Latency=1 Reg=%R2 Successors: antiSU(3): Latency=0 ch SU(3): Latency=1 SU(3): %R0<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:LD4[%p2](tbaa=!"int") # preds left : 2 # succs left : 1 # rdefs left : 0 Latency : 1 Depth : 3 Height : 0 Predecessors: antiSU(2): Latency=0 ch SU(2): Latency=1 Successors: val SU(4294967295): Latency=2 ... So in the volatile case the latency of the chain dependency is 0, while in the non volatile case it is 1. I am using ScheduleDAGInstrs in a scheduler for a VLIW target and in the volatile case the load gets incorrectly scheduled in the same cycle as the store. Is ScheduleDAGInstrs incorrect in the volatile case or shouldn't I rely on the latency being non zero for getting a correct schedule? Best regards, Jordy Potman -------------- next part -------------- ; ModuleID = 'store_load_latency_test.c' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32" target triple = "armv4t--" define i32 @f1(i32* nocapture %p1, i32* nocapture %p2) nounwind { entry: store volatile i32 65540, i32* %p1, align 4, !tbaa !0 %0 = load volatile i32* %p2, align 4, !tbaa !0 ret i32 %0 } define i32 @f2(i32* nocapture %p1, i32* nocapture %p2) nounwind { entry: store i32 65540, i32* %p1, align 4, !tbaa !0 %0 = load i32* %p2, align 4, !tbaa !0 ret i32 %0 } !0 = metadata !{metadata !"int", metadata !1} !1 = metadata !{metadata !"omnipotent char", metadata !2} !2 = metadata !{metadata !"Simple C/C++ TBAA"}
Andrew Trick
2012-Jun-13 02:45 UTC
[LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs
On Jun 12, 2012, at 7:20 AM, Jordy Potman <jordy.potman at recoresystems.com> wrote:> > So in the volatile case the latency of the chain dependency is 0, while > in the non volatile case it is 1. > > I am using ScheduleDAGInstrs in a scheduler for a VLIW target and in the > volatile case the load gets incorrectly scheduled in the same cycle as > the store. Is ScheduleDAGInstrs incorrect in the volatile case or > shouldn't I rely on the latency being non zero for getting a correct > schedule?I don't like the inconsistency. There's no reason for it other than sloppy implementation. I tried to clean this up in r158380. Memory dependence latency should now be conservative in this respect. That said, this was really meant to be a heuristic, and a fairly unimportant one at that, so I never paid much attention to the inconsistency. Whether memory dependencies can be bundled in the same group is cpu specific. Register anti-dependencies are also zero cycle. Do you bundle those in the same group? What about register output dependencies? They can result from dead register defs. You may need to add your own checks to determine valid instruction groups. For example, you may not want to allow any DAG dependencies at all within the group. If the basic ScheduleHazardRecognizer is insufficient, you could try the approach that the Hexagon target uses. One of the engineers who works on that backend should be able to explain it better. -Andy
Jordy Potman
2012-Jun-14 15:20 UTC
[LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs
On Tue, 2012-06-12 at 19:45 -0700, Andrew Trick wrote:> I don't like the inconsistency. There's no reason for it other than sloppy implementation. I tried to clean this up in r158380. Memory dependence latency should now be conservative in this respect. > > That said, this was really meant to be a heuristic, and a fairly unimportant one at that, so I never paid much attention to the inconsistency. Whether memory dependencies can be bundled in the same group is cpu specific. Register anti-dependencies are also zero cycle. Do you bundle those in the same group? What about register output dependencies? They can result from dead register defs. You may need to add your own checks to determine valid instruction groups. For example, you may not want to allow any DAG dependencies at all within the group. If the basic ScheduleHazardRecognizer is insufficient, you could try the approach that the Hexagon target uses. One of the engineers who works on that backend should be able to explain it better. > > -AndyThanks for cleaning up the inconsistency. Register anti-dependencies can be bundled in the same group on our target and we treat dead register defs specially in our scheduler so I guess that is why we did not run into issues before. I now understand that checking explicitly for allowed/not-allowed DAG dependencies within a group is a more robust approach, so we'll start doing that. Thanks, Jordy
Reasonably Related Threads
- [LLVMdev] Latency of true depency of store followed by aliased load in ScheduleDAGInstrs
- [LLVMdev] Assert in live update from MI scheduler.
- [LLVMdev] Assert in live update from MI scheduler.
- [LLVMdev] Scheduling question (memory dependency)
- [LLVMdev] Scheduling question (memory dependency)