Krzysztof Parzyszek via llvm-dev
2015-Nov-10 19:22 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote:> On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote: >> I have a customer testcase where SROA splits a volatile memcpy and we end up >> generating bad code[1]. While this looks like a bug, simply preventing SROA >> from splitting volatile memory intrinsics causes basictest.ll for SROA to >> fail. Not only that, but it also seems like handling of volatile memory >> transfers was done with some intent. > > There is no such thing as a volatile memcpy or memset in standard ISO C, > so what exactly are you doing and why do you expect it to work that way?The motivating example has an aggregate copy where the aggregate is volatile, followed by a store to one of its members. (This does not have anything to do with devices.) SROA expanded this into a series of volatile loads and stores, which cannot be coalesced back into fewer instructions. This is clearly worse than doing the copy and then the member overwrite. --- test.c --- typedef struct { volatile unsigned int value; } atomic_word_t; typedef union { struct { unsigned char state; unsigned char priority; }; atomic_word_t atomic; unsigned int full; } mystruct_t; mystruct_t a; unsigned int foo(void) { mystruct_t x; mystruct_t y; x.full = a.atomic.value; y = x; y.priority = 7; return y.full; } -------------- -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Joerg Sonnenberger via llvm-dev
2015-Nov-10 19:29 UTC
[llvm-dev] SROA and volatile memcpy/memset
On Tue, Nov 10, 2015 at 01:22:57PM -0600, Krzysztof Parzyszek via llvm-dev wrote:> On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote: > >On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via llvm-dev wrote: > >>I have a customer testcase where SROA splits a volatile memcpy and we end up > >>generating bad code[1]. While this looks like a bug, simply preventing SROA > >>from splitting volatile memory intrinsics causes basictest.ll for SROA to > >>fail. Not only that, but it also seems like handling of volatile memory > >>transfers was done with some intent. > > > >There is no such thing as a volatile memcpy or memset in standard ISO C, > >so what exactly are you doing and why do you expect it to work that way? > > The motivating example has an aggregate copy where the aggregate is > volatile, followed by a store to one of its members. (This does not have > anything to do with devices.) SROA expanded this into a series of volatile > loads and stores, which cannot be coalesced back into fewer instructions. > This is clearly worse than doing the copy and then the member overwrite.But a copy and overwrite would violate the volatile rules? Joerg
Krzysztof Parzyszek via llvm-dev
2015-Nov-10 19:35 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/10/2015 1:29 PM, Joerg Sonnenberger via llvm-dev wrote:> > But a copy and overwrite would violate the volatile rules?The copy and overwrite were both in the original code. SROA tries to optimize the overwrite away, but in doing so, it breaks up the memcpy into several instructions. Now, because they are all volatile, we can't do much to optimize them. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
----- Original Message -----> From: "Krzysztof Parzyszek via llvm-dev" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Tuesday, November 10, 2015 1:22:57 PM > Subject: Re: [llvm-dev] SROA and volatile memcpy/memset > > On 11/10/2015 1:07 PM, Joerg Sonnenberger via llvm-dev wrote: > > On Tue, Nov 10, 2015 at 10:41:06AM -0600, Krzysztof Parzyszek via > > llvm-dev wrote: > >> I have a customer testcase where SROA splits a volatile memcpy and > >> we end up > >> generating bad code[1]. While this looks like a bug, simply > >> preventing SROA > >> from splitting volatile memory intrinsics causes basictest.ll for > >> SROA to > >> fail. Not only that, but it also seems like handling of volatile > >> memory > >> transfers was done with some intent. > > > > There is no such thing as a volatile memcpy or memset in standard > > ISO C, > > so what exactly are you doing and why do you expect it to work that > > way? > > The motivating example has an aggregate copy where the aggregate is > volatile, followed by a store to one of its members. (This does not > have > anything to do with devices.) SROA expanded this into a series of > volatile loads and stores, which cannot be coalesced back into fewer > instructions. This is clearly worse than doing the copy and then the > member overwrite. > > --- test.c --- > typedef struct { > volatile unsigned int value; > } atomic_word_t; > > typedef union { > struct { > unsigned char state; > unsigned char priority; > }; > atomic_word_t atomic; > unsigned int full; > } mystruct_t; > > > mystruct_t a; > > unsigned int foo(void) { > mystruct_t x; > mystruct_t y; > > x.full = a.atomic.value; > y = x; > y.priority = 7; > > return y.full; > } > --------------SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll): define i32 @test6() { ; CHECK-LABEL: @test6( ; CHECK: alloca i32 ; CHECK-NEXT: store volatile i32 ; CHECK-NEXT: load i32, i32* ; CHECK-NEXT: ret i32 entry: %a = alloca [4 x i8] %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0 call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true) %iptr = bitcast i8* %ptr to i32* %val = load i32, i32* %iptr ret i32 %val } -Hal> > > -Krzysztof > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:00 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 8:53 AM, Hal Finkel wrote:> > SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll): > > define i32 @test6() { > ; CHECK-LABEL: @test6( > ; CHECK: alloca i32 > ; CHECK-NEXT: store volatile i32 > ; CHECK-NEXT: load i32, i32* > ; CHECK-NEXT: ret i32 > > entry: > %a = alloca [4 x i8] > %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0 > call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true) > %iptr = bitcast i8* %ptr to i32* > %val = load i32, i32* %iptr > ret i32 %val > } >Yes, that would work. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation