Chandler Carruth
2013-Jan-20 21:56 UTC
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this seems like an easy and simple deficiency in the IR intrinsic for memcpy. See below. On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote:> define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), > align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), > align 2 >Just so we're on the same page, the "regression" (which I claim is actually a bug-fix) are the loads here being volatile in addition to the stores. The stores require it, but the loads get it. The problem is in how clang codegen aggregate copies when the source or> the destination are volatile : it uses the memcpy intrinsic, setting the > volatile parameter. However, the LLVM IR reference manual specifies that > "The detailed access behavior is not very cleanly specified and it is > unwise to depend on it". The new SROA implementation is conservatively > correct, but behaves differently. > > I believe we should either : > - rework methods AggExprEmitter::EmitCopy and > CodeGenFunction::EmitAggregateCopy (from > clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when > either the source or the destination is volatile, >I think this is bad...> - specify cleanly the memcpy intrinsic with respect to volatile behaviour >I agree that this is what we should do. I took a conservative approach when implementing SROA w.r.t. volatile memcpy, and I think we should codify those semantics: with a single is-volatile bit both the loads and stores which the memcpy decomposes into will have volatile semantics. I think we should extend the memcpy intrinsic to support two volatile flags and have that overload have the flags refer to the loads and stores respectively. We should then auto-upgrade the existing memcpy intrinsics into the more precise form by replicating the single volatile flag. This will conservatively preserve semantics. As part of adding the new intrinsic, everywhere in LLVM that examines the two flags should be audited and should use the individual flag for load vs. store volatility. This will change SROA so that it can emit the loads as non-volatile, and only the stores will be volatile. Finally we switch Clang over to emit the new intrinsic structure with more precise information, and your test case should go back to normal. As part of this, we should also clarify that a volatile memcpy guarantees that each byte loaded is loaded exactly once, and each byte stored is stored exactly once, but that these operations are free to be batched into 4-byte, 8-byte, or any other width of loads and stores, and there is no guarantee about the interleaving of these loads and stores even when both are volatile. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130120/ff274d19/attachment.html>
Arnaud de Grandmaison
2013-Jan-21 10:15 UTC
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
On 01/20/2013 10:56 PM, Chandler Carruth wrote:> I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this > seems like an easy and simple deficiency in the IR intrinsic for > memcpy. See below. > > On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison > <arnaud.allarddegrandmaison at parrot.com > <mailto:arnaud.allarddegrandmaison at parrot.com>> wrote: > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to > i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to > i16*), align 2 > > > Just so we're on the same page, the "regression" (which I claim is > actually a bug-fix) are the loads here being volatile in addition to > the stores. The stores require it, but the loads get it.We are on the same page there. And although this shows as a performance regression with my targets, I agree this is a bug fix, especially given the actual semantics of the memcpy intrinsic.> > The problem is in how clang codegen aggregate copies when the > source or > the destination are volatile : it uses the memcpy intrinsic, > setting the > volatile parameter. However, the LLVM IR reference manual > specifies that > "The detailed access behavior is not very cleanly specified and it is > unwise to depend on it". The new SROA implementation is conservatively > correct, but behaves differently. > > I believe we should either : > - rework methods AggExprEmitter::EmitCopy and > CodeGenFunction::EmitAggregateCopy (from > clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when > either the source or the destination is volatile, > > > I think this is bad... > > > - specify cleanly the memcpy intrinsic with respect to volatile > behaviour > > > I agree that this is what we should do. > > I took a conservative approach when implementing SROA w.r.t. volatile > memcpy, and I think we should codify those semantics: with a single > is-volatile bit both the loads and stores which the memcpy decomposes > into will have volatile semantics. > > I think we should extend the memcpy intrinsic to support two volatile > flags and have that overload have the flags refer to the loads and > stores respectively. We should then auto-upgrade the existing memcpy > intrinsics into the more precise form by replicating the single > volatile flag. This will conservatively preserve semantics. As part of > adding the new intrinsic, everywhere in LLVM that examines the two > flags should be audited and should use the individual flag for load > vs. store volatility. This will change SROA so that it can emit the > loads as non-volatile, and only the stores will be volatile. Finally > we switch Clang over to emit the new intrinsic structure with more > precise information, and your test case should go back to normal. > > As part of this, we should also clarify that a volatile memcpy > guarantees that each byte loaded is loaded exactly once, and each byte > stored is stored exactly once, but that these operations are free to > be batched into 4-byte, 8-byte, or any other width of loads and > stores, and there is no guarantee about the interleaving of these > loads and stores even when both are volatile. > > -ChandlerI am volunteering to implement this. I have to finish my code porting to llvm-3.2 first ; expect some patches / discussions by the end of the week. Cheers, -- Arnaud de Grandmaison -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/8f94815b/attachment.html>
Chandler Carruth
2013-Jan-21 10:30 UTC
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
On Mon, Jan 21, 2013 at 2:15 AM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote:> I am volunteering to implement this. I have to finish my code porting to > llvm-3.2 first ; expect some patches / discussions by the end of the week. >Awesome! Thanks for the analysis and the help getting the nice performance back. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/8bd0f887/attachment.html>
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
- [LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
- [LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
- [LLVMdev] Weird volatile propagation ?
- [LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics