Arnaud A. de Grandmaison
2013-Jan-28 16:54 UTC
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All, In the language reference manual, the access behavior of the memcpy, memmove and memset intrinsics is not well defined with respect to the volatile flag. The LRM even states that "it is unwise to depend on it". This forces optimization passes to be conservatively correct and prevent optimizations. A very simple example of this is : $ cat test.c #include <stdint.h> struct R { uint16_t a; uint16_t b; }; volatile struct R * const addr = (volatile struct R *) 416; void test(uint16_t a) { struct R r = { a, 1 }; *addr = r; } $ clang -O2 -o - -emit-llvm -S -c test.c ; ModuleID = 'test.c' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.R = type { i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } In the generated ir, the loads are marked as volatile. In this case, this is due to the new SROA being conservatively correct. We should however expect the test function to look like this much simpler one (which was actually the case with llvm <= 3.1) : define void @test(i16 zeroext %a) nounwind uwtable { store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2 store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2 ret void } I propose to specify the volatile access behavior for those intrinsics : instead of one flag, they should have 2 independant volatile flags, one for destination accesses, the second for the source accesses. If there is general agreement, I plan to proceed with the following steps : 1. Specify the access behavior (this patch). 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into the more precise form by replicating the single volatile flag and rework the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(), (is|set)DestVolatile() and implement (set|is)Volatile in terms of the former 2 methods. This will conservatively preserve semantics. No functional change so far. Commit 1 & 2. 3. Audit all uses of isVolatile() / setVolatile() and move them to the more precise form. From this point, more aggressive / precise optimizations can happen. Commit 3. 4. Teach clang to use the new form. 5. Optionally remove the old interface form,as there should be no in-tree users left. This would however be an API change breaking external code. Cheers, -- Arnaud de Grandmaison -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch Type: text/x-patch Size: 7380 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130128/0d3ef5ec/attachment.bin>
Andrew Trick
2013-Jan-29 18:30 UTC
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
I can't think of a better way to do this, so I think it's ok. I also submitted a complementary patch on llvm-commits clarifying volatile semantics. -Andy On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <arnaud.allarddegrandmaison at parrot.com> wrote:> Hi All, > > In the language reference manual, the access behavior of the memcpy, > memmove and memset intrinsics is not well defined with respect to the > volatile flag. The LRM even states that "it is unwise to depend on it". > This forces optimization passes to be conservatively correct and prevent > optimizations. > > A very simple example of this is : > > $ cat test.c > > #include <stdint.h> > > struct R { > uint16_t a; > uint16_t b; > }; > > volatile struct R * const addr = (volatile struct R *) 416; > > void test(uint16_t a) > { > struct R r = { a, 1 }; > *addr = r; > } > > $ clang -O2 -o - -emit-llvm -S -c test.c > ; ModuleID = 'test.c' > target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > target triple = "x86_64-unknown-linux-gnu" > > %struct.R = type { i16, i16 } > > @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } > > In the generated ir, the loads are marked as volatile. In this case, > this is due to the new SROA being conservatively correct. We should > however expect the test function to look like this much simpler one > (which was actually the case with llvm <= 3.1) : > > define void @test(i16 zeroext %a) nounwind uwtable { > store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2 > store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } > > > I propose to specify the volatile access behavior for those intrinsics : > instead of one flag, they should have 2 independant volatile flags, one > for destination accesses, the second for the source accesses. > > If there is general agreement, I plan to proceed with the following steps : > 1. Specify the access behavior (this patch). > 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into > the more precise form by replicating the single volatile flag and rework > the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(), > (is|set)DestVolatile() and implement (set|is)Volatile in terms of the > former 2 methods. This will conservatively preserve semantics. No > functional change so far. Commit 1 & 2. > 3. Audit all uses of isVolatile() / setVolatile() and move them to the > more precise form. From this point, more aggressive / precise > optimizations can happen. Commit 3. > 4. Teach clang to use the new form. > 5. Optionally remove the old interface form,as there should be no > in-tree users left. This would however be an API change breaking > external code. > > Cheers, > > -- > Arnaud de Grandmaison > > <0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chandler Carruth
2013-Jan-29 19:15 UTC
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
LGTM as well. I'm particularly happy to specify that there is no guarantee about the width of loads and/or stores resulting, as I think that's important for frontends to not depend on (or to request a change if they need to depend on it). Thanks Andy far the additional clarification, that helps pin down things even more. On Tue, Jan 29, 2013 at 10:30 AM, Andrew Trick <atrick at apple.com> wrote:> I can't think of a better way to do this, so I think it's ok. > > I also submitted a complementary patch on llvm-commits clarifying > volatile semantics. > > -Andy > > On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison < > arnaud.allarddegrandmaison at parrot.com> wrote: > > > Hi All, > > > > In the language reference manual, the access behavior of the memcpy, > > memmove and memset intrinsics is not well defined with respect to the > > volatile flag. The LRM even states that "it is unwise to depend on it". > > This forces optimization passes to be conservatively correct and prevent > > optimizations. > > > > A very simple example of this is : > > > > $ cat test.c > > > > #include <stdint.h> > > > > struct R { > > uint16_t a; > > uint16_t b; > > }; > > > > volatile struct R * const addr = (volatile struct R *) 416; > > > > void test(uint16_t a) > > { > > struct R r = { a, 1 }; > > *addr = r; > > } > > > > $ clang -O2 -o - -emit-llvm -S -c test.c > > ; ModuleID = 'test.c' > > target datalayout > "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > > target triple = "x86_64-unknown-linux-gnu" > > > > %struct.R = type { i16, i16 } > > > > @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 > > > > define void @test(i16 zeroext %a) nounwind uwtable { > > %r.sroa.0 = alloca i16, align 2 > > %r.sroa.1 = alloca i16, align 2 > > store i16 %a, i16* %r.sroa.0, align 2 > > store i16 1, i16* %r.sroa.1, align 2 > > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), > align 32 > > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), > align 2 > > ret void > > } > > > > In the generated ir, the loads are marked as volatile. In this case, > > this is due to the new SROA being conservatively correct. We should > > however expect the test function to look like this much simpler one > > (which was actually the case with llvm <= 3.1) : > > > > define void @test(i16 zeroext %a) nounwind uwtable { > > store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2 > > store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2 > > ret void > > } > > > > > > I propose to specify the volatile access behavior for those intrinsics : > > instead of one flag, they should have 2 independant volatile flags, one > > for destination accesses, the second for the source accesses. > > > > If there is general agreement, I plan to proceed with the following > steps : > > 1. Specify the access behavior (this patch). > > 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into > > the more precise form by replicating the single volatile flag and rework > > the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(), > > (is|set)DestVolatile() and implement (set|is)Volatile in terms of the > > former 2 methods. This will conservatively preserve semantics. No > > functional change so far. Commit 1 & 2. > > 3. Audit all uses of isVolatile() / setVolatile() and move them to the > > more precise form. From this point, more aggressive / precise > > optimizations can happen. Commit 3. > > 4. Teach clang to use the new form. > > 5. Optionally remove the old interface form,as there should be no > > in-tree users left. This would however be an API change breaking > > external code. > > > > Cheers, > > > > -- > > Arnaud de Grandmaison > > > > > <0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch>_______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130129/618d6535/attachment.html>
Arnaud A. de Grandmaison
2013-Jan-31 17:13 UTC
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Thanks Andy and Chandler, After specifying the volatile access behaviour, the second step was to autoupgrade the memmove/memcpy intrinsics, and implement (is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no functional change. 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the one you already reviewed, unaltered. 0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch implements this second step. It requires clang tests to be patched with clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch as the codegen'd memcpy / memmove will use the new format. Finally, 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch is purely mechanical : it upgrades all llvm tests (but auto_upgrade_intrinsics :)) to use the new format for memcpy/memmove. There are no functional changes, "make check-llvm check-clang" pass after each patch is applied. When those patches are accepted, I will commit them and start propagating the new interface in the codebase. Cheers, -- Arnaud A. de Grandmaison On 01/28/2013 05:54 PM, Arnaud A. de Grandmaison wrote:> Hi All, > > In the language reference manual, the access behavior of the memcpy, > memmove and memset intrinsics is not well defined with respect to the > volatile flag. The LRM even states that "it is unwise to depend on it". > This forces optimization passes to be conservatively correct and prevent > optimizations. > > A very simple example of this is : > > $ cat test.c > > #include <stdint.h> > > struct R { > uint16_t a; > uint16_t b; > }; > > volatile struct R * const addr = (volatile struct R *) 416; > > void test(uint16_t a) > { > struct R r = { a, 1 }; > *addr = r; > } > > $ clang -O2 -o - -emit-llvm -S -c test.c > ; ModuleID = 'test.c' > target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > target triple = "x86_64-unknown-linux-gnu" > > %struct.R = type { i16, i16 } > > @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } > > In the generated ir, the loads are marked as volatile. In this case, > this is due to the new SROA being conservatively correct. We should > however expect the test function to look like this much simpler one > (which was actually the case with llvm <= 3.1) : > > define void @test(i16 zeroext %a) nounwind uwtable { > store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2 > store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } > > > I propose to specify the volatile access behavior for those intrinsics : > instead of one flag, they should have 2 independant volatile flags, one > for destination accesses, the second for the source accesses. > > If there is general agreement, I plan to proceed with the following steps : > 1. Specify the access behavior (this patch). > 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into > the more precise form by replicating the single volatile flag and rework > the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(), > (is|set)DestVolatile() and implement (set|is)Volatile in terms of the > former 2 methods. This will conservatively preserve semantics. No > functional change so far. Commit 1 & 2. > 3. Audit all uses of isVolatile() / setVolatile() and move them to the > more precise form. From this point, more aggressive / precise > optimizations can happen. Commit 3. > 4. Teach clang to use the new form. > 5. Optionally remove the old interface form,as there should be no > in-tree users left. This would however be an API change breaking > external code. > > Cheers, >-- Arnaud A. de Grandmaison -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch Type: text/x-patch Size: 7384 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130131/7ff817e8/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch Type: text/x-patch Size: 22660 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130131/7ff817e8/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch Type: text/x-patch Size: 181861 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130131/7ff817e8/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch Type: text/x-patch Size: 22178 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130131/7ff817e8/attachment-0003.bin>
Arnaud A. de Grandmaison
2013-Feb-03 22:48 UTC
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Same patches as before, but 0002-memcpy has been updated to put the (is|set)SrcVolatile methods to where they logically belong : MemTransferInst. This makes (is|set)Volatile methods look a bit ugly to keep compatibility with existing behaviour, but they will hopefully disappear when all users have moved to the new interface --- in the next series of patches. I plan to give a try to phabricator for the next patches, but could not for the actual ones as they touch llvm and clang. Cheers, -- Arnaud A. de Grandmaison On 01/31/2013 06:13 PM, Arnaud A. de Grandmaison wrote:> Thanks Andy and Chandler, > > After specifying the volatile access behaviour, the second step was to > autoupgrade the memmove/memcpy intrinsics, and implement > (is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no > functional change. > > 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the > one you already reviewed, unaltered. > > 0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch > implements this second step. It requires clang tests to be patched with > clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch as > the codegen'd memcpy / memmove will use the new format. > > Finally, 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch > is purely mechanical : it upgrades all llvm tests (but > auto_upgrade_intrinsics :)) to use the new format for memcpy/memmove. > > There are no functional changes, "make check-llvm check-clang" pass > after each patch is applied. > > When those patches are accepted, I will commit them and start > propagating the new interface in the codebase. > > Cheers, > -- > Arnaud A. de Grandmaison > > On 01/28/2013 05:54 PM, Arnaud A. de Grandmaison wrote: >> Hi All, >> >> In the language reference manual, the access behavior of the memcpy, >> memmove and memset intrinsics is not well defined with respect to the >> volatile flag. The LRM even states that "it is unwise to depend on it". >> This forces optimization passes to be conservatively correct and prevent >> optimizations. >> >> A very simple example of this is : >> >> $ cat test.c >> >> #include <stdint.h> >> >> struct R { >> uint16_t a; >> uint16_t b; >> }; >> >> volatile struct R * const addr = (volatile struct R *) 416; >> >> void test(uint16_t a) >> { >> struct R r = { a, 1 }; >> *addr = r; >> } >> >> $ clang -O2 -o - -emit-llvm -S -c test.c >> ; ModuleID = 'test.c' >> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" >> target triple = "x86_64-unknown-linux-gnu" >> >> %struct.R = type { i16, i16 } >> >> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 >> >> define void @test(i16 zeroext %a) nounwind uwtable { >> %r.sroa.0 = alloca i16, align 2 >> %r.sroa.1 = alloca i16, align 2 >> store i16 %a, i16* %r.sroa.0, align 2 >> store i16 1, i16* %r.sroa.1, align 2 >> %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 >> store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 >> %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 >> store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 >> ret void >> } >> >> In the generated ir, the loads are marked as volatile. In this case, >> this is due to the new SROA being conservatively correct. We should >> however expect the test function to look like this much simpler one >> (which was actually the case with llvm <= 3.1) : >> >> define void @test(i16 zeroext %a) nounwind uwtable { >> store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2 >> store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2 >> ret void >> } >> >> >> I propose to specify the volatile access behavior for those intrinsics : >> instead of one flag, they should have 2 independant volatile flags, one >> for destination accesses, the second for the source accesses. >> >> If there is general agreement, I plan to proceed with the following steps : >> 1. Specify the access behavior (this patch). >> 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into >> the more precise form by replicating the single volatile flag and rework >> the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(), >> (is|set)DestVolatile() and implement (set|is)Volatile in terms of the >> former 2 methods. This will conservatively preserve semantics. No >> functional change so far. Commit 1 & 2. >> 3. Audit all uses of isVolatile() / setVolatile() and move them to the >> more precise form. From this point, more aggressive / precise >> optimizations can happen. Commit 3. >> 4. Teach clang to use the new form. >> 5. Optionally remove the old interface form,as there should be no >> in-tree users left. This would however be an API change breaking >> external code. >> >> Cheers, >> >-- Arnaud A. de Grandmaison -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch Type: text/x-patch Size: 7384 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/e04b4d41/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch Type: text/x-patch Size: 22970 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/e04b4d41/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch.bz2 Type: application/x-bzip Size: 21263 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/e04b4d41/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch Type: text/x-patch Size: 22178 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/e04b4d41/attachment-0003.bin>
Apparently Analagous Threads
- [LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
- [LLVMdev] Weird volatile propagation ?
- [LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
- [LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
- [LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics