Hi, I got an optimization problem (O1, O2) regarding memory barrier “dmb ishld” I find in the test/CodeGen/AArch64/intrinsics-memory-barrier.ll , it’s stated that memory access around DMB should not be reordered, but when compiling the Linux kernel, I found load/store in static inline void hlist_add_before_rcu(struct hlist_node *n, struct hlist_node *next) { n->pprev = next->pprev; n->next = next; rcu_assign_pointer(hlist_pprev_rcu(n), n); next->pprev = &n->next; } can reordered, and causes kernel crash. f94006a8 ldr x8, [x21,#8] f9000275 str x21, [x19] d5033abf dmb ishst f9400669 ldr x9, [x19,#8] f9000668 str x8, [x19,#8] <==== reordered str f9000133 str x19, [x9] f90006b3 str x19, [x21,#8] It should be: f94006a8 ldr x8, [x21,#8] f9000668 str x8, [x19,#8] f9000275 str x21, [x19] d5033abf dmb ishst f9400669 ldr x9, [x19,#8] f9000133 str x19, [x9] f90006b3 str x19, [x21,#8] I guess it’s because "dmb ishst" is not checked/tested? Any quick way to fix this? Thanks, Chengyu
What is dmb ishst? hehe .. But tackling these problems one-by-one do not seem the quickest way to test our idea; try -O0?> static inline void hlist_add_before_rcu(struct hlist_node *n, > struct hlist_node *next) > { > n->pprev = next->pprev; > n->next = next;barrier();> rcu_assign_pointer(hlist_pprev_rcu(n), n);barrier();> next->pprev = &n->next; > } > > can reordered, and causes kernel crash. > > f94006a8 ldr x8, [x21,#8] > f9000275 str x21, [x19] > d5033abf dmb ishst > f9400669 ldr x9, [x19,#8] > f9000668 str x8, [x19,#8] <==== reordered str > f9000133 str x19, [x9] > f90006b3 str x19, [x21,#8] > > It should be: > > f94006a8 ldr x8, [x21,#8] > f9000668 str x8, [x19,#8] > f9000275 str x21, [x19] > d5033abf dmb ishst > f9400669 ldr x9, [x19,#8] > f9000133 str x19, [x9] > f90006b3 str x19, [x21,#8] > > I guess it’s because "dmb ishst" is not checked/tested? Any quick way to fix this? > > Thanks, > Chengyu
Sorry, should be "dmb ishst"> On Dec 9, 2014, at 2:40 PM, Chengyu Song <csong84 at gatech.edu> wrote: > > Hi, > > I got an optimization problem (O1, O2) regarding memory barrier “dmb ishld” > > I find in the test/CodeGen/AArch64/intrinsics-memory-barrier.ll , it’s stated that memory access around DMB should not be reordered, but when compiling the Linux kernel, I found load/store in > > static inline void hlist_add_before_rcu(struct hlist_node *n, > struct hlist_node *next) > { > n->pprev = next->pprev; > n->next = next; > rcu_assign_pointer(hlist_pprev_rcu(n), n); > next->pprev = &n->next; > } > > can reordered, and causes kernel crash. > > f94006a8 ldr x8, [x21,#8] > f9000275 str x21, [x19] > d5033abf dmb ishst > f9400669 ldr x9, [x19,#8] > f9000668 str x8, [x19,#8] <==== reordered str > f9000133 str x19, [x9] > f90006b3 str x19, [x21,#8] > > It should be: > > f94006a8 ldr x8, [x21,#8] > f9000668 str x8, [x19,#8] > f9000275 str x21, [x19] > d5033abf dmb ishst > f9400669 ldr x9, [x19,#8] > f9000133 str x19, [x9] > f90006b3 str x19, [x21,#8] > > I guess it’s because "dmb ishst" is not checked/tested? Any quick way to fix this? > > Thanks, > Chengyu
Hi Chengyu, On 9 December 2014 at 11:40, Chengyu Song <csong84 at gatech.edu> wrote:> I guess it’s because "dmb ishst" is not checked/tested? Any quick way to fix this?That barrier should be treated the same as any other dmb by LLVM. What I'm most confused about is why it's there in the first place. From what I can tell, the rcu_assign_pointer function should map to a native store-release operation (stlr). Also, isn't the reordering wrong just from a data-dependency point of view? It looks like it moves the hlist_pprev_rcu(n) access before the n->pprev assignment. Could be because of an aliasing violation, or it could be LLVM. Do you have preprocessed source or LLVM IR handy? (You can get a .i file from "clang -save-temps" for example). Cheers. Tim.
Hi Tim,> That barrier should be treated the same as any other dmb by LLVM. What > I'm most confused about is why it's there in the first place. From > what I can tell, the rcu_assign_pointer function should map to a > native store-release operation (stlr).I guess it's because we disabled the integrated-asm, so it won't complain when compiling the kernel.> Also, isn't the reordering wrong just from a data-dependency point of > view? It looks like it moves the hlist_pprev_rcu(n) access before the > n->pprev assignment. Could be because of an aliasing violation, or it > could be LLVM.The problem is explained below, the reordering causes accessing to uninitialized data. f94006a8 ldr x8, [x21,#8] f9000275 str x21, [x19] d5033abf dmb ishst f9400669 ldr x9, [x19,#8] <==== uninitialized data f9000668 str x8, [x19,#8] <==== initialization f9000133 str x19, [x9] <==== accessing uninitialized address f90006b3 str x19, [x21,#8]> Do you have preprocessed source or LLVM IR handy? (You can get a .i > file from "clang -save-temps" for example).The IR looks OK to me, the order is correct. I think the optimization is on machine code. The code is part of the insert_leaf_info function in net/ipv4/fib_trie.c %37 = getelementptr inbounds %struct.leaf_info* %new, i64 0, i32 0, !dbg !10245 %38 = getelementptr inbounds %struct.leaf_info* %li.0.lcssa, i64 0, i32 0, !dbg !10245 tail call void @llvm.dbg.value(metadata !{%struct.hlist_node* %37}, i64 0, metadata !10246, metadata !6594), !dbg !10247 tail call void @llvm.dbg.value(metadata !{%struct.hlist_node* %38}, i64 0, metadata !10248, metadata !6594), !dbg !10249 %39 = getelementptr inbounds %struct.leaf_info* %li.0.lcssa, i64 0, i32 0, i32 1, !dbg !10250 %40 = load %struct.hlist_node*** %39, align 8, !dbg !10250 %41 = getelementptr inbounds %struct.leaf_info* %new, i64 0, i32 0, i32 1, !dbg !10250 store %struct.hlist_node** %40, %struct.hlist_node*** %41, align 8, !dbg !10250 %42 = getelementptr inbounds %struct.leaf_info* %new, i64 0, i32 0, i32 0, !dbg !10251 store %struct.hlist_node* %38, %struct.hlist_node** %42, align 8, !dbg !10251 tail call void asm sideeffect "dmb ishst", "~{memory}"() #5, !dbg !10252, !srcloc !10255 %43 = load %struct.hlist_node*** %41, align 8, !dbg !10252 store %struct.hlist_node* %37, %struct.hlist_node** %43, align 8, !dbg !10252 store %struct.hlist_node** %42, %struct.hlist_node*** %39, align 8, !dbg !10256 br label %hlist_add_head_rcu.exit Thanks, Chengyu