Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:00 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 8:53 AM, Hal Finkel wrote:> > SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll): > > define i32 @test6() { > ; CHECK-LABEL: @test6( > ; CHECK: alloca i32 > ; CHECK-NEXT: store volatile i32 > ; CHECK-NEXT: load i32, i32* > ; CHECK-NEXT: ret i32 > > entry: > %a = alloca [4 x i8] > %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0 > call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true) > %iptr = bitcast i8* %ptr to i32* > %val = load i32, i32* %iptr > ret i32 %val > } >Yes, that would work. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Chandler Carruth via llvm-dev
2015-Nov-11 15:28 UTC
[llvm-dev] SROA and volatile memcpy/memset
So, here is the model that LLVM is using: a volatile memcpy is lowered to a loop of loads and stores of indeterminate width. As such, splitting a memcpy is always valid. If we want a very specific load and store width for volatile accesses, I think that the frontend should generate concrete loads and stores of a type with that width. Ultimately, memcpy is a pretty bad model for *specific* width accesses, it is best at handling indeterminate sized accesses, which is exactly what doesn't make sense for device backed volatile accesses. On Wed, Nov 11, 2015, 10:00 Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> On 11/11/2015 8:53 AM, Hal Finkel wrote: > > > > SROA seems to be doing a number of things here. What about if we > prevented SROA from generating multiple slices splitting volatile accesses? > There might be a significant difference between that and something like > this test (test/Transforms/SROA/basictest.ll): > > > > define i32 @test6() { > > ; CHECK-LABEL: @test6( > > ; CHECK: alloca i32 > > ; CHECK-NEXT: store volatile i32 > > ; CHECK-NEXT: load i32, i32* > > ; CHECK-NEXT: ret i32 > > > > entry: > > %a = alloca [4 x i8] > > %ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0 > > call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 > true) > > %iptr = bitcast i8* %ptr to i32* > > %val = load i32, i32* %iptr > > ret i32 %val > > } > > > > > Yes, that would work. > > -Krzysztof > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/9633a6f0/attachment.html>
Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:34 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 9:28 AM, Chandler Carruth wrote:> So, here is the model that LLVM is using: a volatile memcpy is lowered > to a loop of loads and stores of indeterminate width. As such, splitting > a memcpy is always valid. > > If we want a very specific load and store width for volatile accesses, I > think that the frontend should generate concrete loads and stores of a > type with that width. Ultimately, memcpy is a pretty bad model for > *specific* width accesses, it is best at handling indeterminate sized > accesses, which is exactly what doesn't make sense for device backed > volatile accesses. >Yeah, the remark about devices I made in my post was a result of a "last-minute" thought to add some rationale. It doesn't actually apply to SROA, since there are no devices that are mapped to the stack, which is what SROA is interested in. The concern with the testcase I attached is really about performance. Would it be reasonable to control the splitting in SROA via TTI? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation