Keno Fischer via llvm-dev
2017-May-16 17:37 UTC
[llvm-dev] Which pass should be propagating memory copies
Consider the following IR example: define void @simple([4 x double] *%ptr, i64 %idx) { %stack = alloca [4 x double] %ptri8 = bitcast [4 x double] *%ptr to i8* %stacki8 = bitcast [4 x double] *%stack to i8* call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 32, i32 0, i1 0) %dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, i32 0, i64 %idx store double 0.0, double *%dataptr call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 32, i32 0, i1 0) ret void } I would like to see this optimized to just a single store (into %ptr). Right now, even at -O3 that doesn't happen. My frontend guarantees that idx is always inbounds for the allocation, but I do think the transformation should be valid regardless because accessing beyond the bounds of the alloca should be undefined behavior. Now, my question is which pass should be responsible for doing this? SROA? DSE? GVN? A new pass just to do this kind of thing? Maybe there already is some pass that does this, just not in the default pipeline? Any hints would be much appreciated. Thanks, Keno -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/a9ca05be/attachment.html>
Daniel Neilson via llvm-dev
2017-May-16 17:50 UTC
[llvm-dev] Which pass should be propagating memory copies
The InstCombine transform does exactly what you want. Take a look at lib/Transforms/Scalar/InstCombine/InstCombineCalls.cpp — InstCombiner::SimplifyMemTransfer With your align parameter on the memcpy being zero you are likely hitting the first conditional in that function: if (CopyAlign < MinAlign) { MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, false)); return MI; } Arguably, instcombine probably shouldn’t bail on trying to simplify the memcpy just because it could update the alignment on the call... -Daniel> On May 16, 2017, at 12:37 PM, Keno Fischer via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Consider the following IR example: > > define void @simple([4 x double] *%ptr, i64 %idx) { > %stack = alloca [4 x double] > %ptri8 = bitcast [4 x double] *%ptr to i8* > %stacki8 = bitcast [4 x double] *%stack to i8* > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 32, i32 0, i1 0) > %dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, i32 0, i64 %idx > store double 0.0, double *%dataptr > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 32, i32 0, i1 0) > ret void > } > > > I would like to see this optimized to just a single store (into %ptr). Right now, even at -O3 that doesn't happen. My frontend guarantees that idx is always inbounds for the allocation, but I do think the transformation should be valid regardless because accessing beyond the bounds of the alloca should be undefined behavior. Now, my question is which pass should be responsible for doing this? SROA? DSE? GVN? A new pass just to do this kind of thing? Maybe there already is some pass that does this, just not in the default pipeline? Any hints would be much appreciated. > > Thanks, > Keno > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Keno Fischer via llvm-dev
2017-May-16 17:56 UTC
[llvm-dev] Which pass should be propagating memory copies
Hi Daniel, as far as I can tell that handles turning small memcpys into store instructions. What I'm looking for is something that can simplify (copy to stack) -> (modify stack) -> (copy back to heap) into a single heap modification. Keno On Tue, May 16, 2017 at 1:50 PM, Daniel Neilson <dneilson at azul.com> wrote:> The InstCombine transform does exactly what you want. Take a look at > lib/Transforms/Scalar/InstCombine/InstCombineCalls.cpp — InstCombiner:: > SimplifyMemTransfer > > With your align parameter on the memcpy being zero you are likely hitting > the first conditional in that function: > if (CopyAlign < MinAlign) { > MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, > false)); > return MI; > } > > Arguably, instcombine probably shouldn’t bail on trying to simplify the > memcpy just because it could update the alignment on the call... > > -Daniel > > > On May 16, 2017, at 12:37 PM, Keno Fischer via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Consider the following IR example: > > > > define void @simple([4 x double] *%ptr, i64 %idx) { > > %stack = alloca [4 x double] > > %ptri8 = bitcast [4 x double] *%ptr to i8* > > %stacki8 = bitcast [4 x double] *%stack to i8* > > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 > 32, i32 0, i1 0) > > %dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, > i32 0, i64 %idx > > store double 0.0, double *%dataptr > > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 > 32, i32 0, i1 0) > > ret void > > } > > > > > > I would like to see this optimized to just a single store (into %ptr). > Right now, even at -O3 that doesn't happen. My frontend guarantees that idx > is always inbounds for the allocation, but I do think the transformation > should be valid regardless because accessing beyond the bounds of the > alloca should be undefined behavior. Now, my question is which pass should > be responsible for doing this? SROA? DSE? GVN? A new pass just to do this > kind of thing? Maybe there already is some pass that does this, just not in > the default pipeline? Any hints would be much appreciated. > > > > Thanks, > > Keno > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/97f14639/attachment-0001.html>
Davide Italiano via llvm-dev
2017-May-16 18:16 UTC
[llvm-dev] Which pass should be propagating memory copies
On Tue, May 16, 2017 at 10:37 AM, Keno Fischer via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Consider the following IR example: > > define void @simple([4 x double] *%ptr, i64 %idx) { > %stack = alloca [4 x double] > %ptri8 = bitcast [4 x double] *%ptr to i8* > %stacki8 = bitcast [4 x double] *%stack to i8* > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 32, > i32 0, i1 0) > %dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, i32 > 0, i64 %idx > store double 0.0, double *%dataptr > call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 32, > i32 0, i1 0) > ret void > } > > > I would like to see this optimized to just a single store (into %ptr). Right > now, even at -O3 that doesn't happen. My frontend guarantees that idx is > always inbounds for the allocation, but I do think the transformation should > be valid regardless because accessing beyond the bounds of the alloca should > be undefined behavior. Now, my question is which pass should be responsible > for doing this? SROA? DSE? GVN? A new pass just to do this kind of thing? > Maybe there already is some pass that does this, just not in the default > pipeline? Any hints would be much appreciated. > > Thanks, > Keno >This seems like a GVN job to me. -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
Keno Fischer via llvm-dev
2017-May-16 19:28 UTC
[llvm-dev] Which pass should be propagating memory copies
On Tue, May 16, 2017 at 2:16 PM, Davide Italiano <davide at freebsd.org> wrote:> This seems like a GVN job to me. >Cool. If I wanted to try to implement this in NewGVN, any hints on how to start? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/ec66bcca/attachment.html>