Jakub (Kuba) Kuderski via llvm-dev
2018-Jan-31 17:36 UTC
[llvm-dev] llvm.memcpy for struct copy
Hi Ma, how can I transform the llvm.memcpy into data move loop IR and eliminate> the bitcast instruction ? >I'm not sure why you are concerned about memcpy and bitcasts, but if you call MCpyInst->getSource() and MCpyInst->getDest() it will look through casts and give you the 'true' source/destination. If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc . Best, Kuba On Tue, Jan 30, 2018 at 3:22 AM, ma jun via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Craig > Thank you very much ! > > 2018-01-30 16:11 GMT+08:00 Craig Topper <craig.topper at gmail.com>: > >> The pointers must always be i8* the alignment is independent and is >> controlled by the attributes on the arguments in the call to memcpy. >> >> ~Craig >> >> On Mon, Jan 29, 2018 at 11:45 PM, ma jun <jun.parser at gmail.com> wrote: >> >>> Hi >>> >>> >>> 2018-01-30 15:36 GMT+08:00 ma jun <jun.parser at gmail.com>: >>> >>>> Hi >>>> Thanks ! >>>> so for this example >>>> void foo(X &src, X &dst) { >>>> dst = src; >>>> } >>>> and the IR: >>>> >>>> define void @foo(X&, X&)(%struct.X* dereferenceable(8), %struct.X* >>>> dereferenceable(8)) #0 { >>>> %3 = alloca %struct.X*, align 8 >>>> %4 = alloca %struct.X*, align 8 >>>> store %struct.X* %0, %struct.X** %3, align 8 >>>> store %struct.X* %1, %struct.X** %4, align 8 >>>> %5 = load %struct.X*, %struct.X** %3, align 8 >>>> %6 = load %struct.X*, %struct.X** %4, align 8 >>>> %7 = bitcast %struct.X* %6 to i8* >>>> %8 = bitcast %struct.X* %5 to i8* >>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8* align 4 %8, >>>> i64 8, i1 false) >>>> >>> >>> also since the dst and src are 4 byte align , can we use the IR below: >>> >>> %7 = bitcast %struct.X* %6 to i32* >>> >>> %8 = bitcast %struct.X* %5 to i32* >>> >>> call void @llvm.memcpy.p0i32.p0i32.i64(i32* align 4 %7, i32* align 4 %8 >>> , i64 8, i1 false) >>> >>> >>>> ret void >>>> } >>>> >>>> how can I transform the llvm.memcpy into data move loop IR and >>>> eliminate the bitcast instruction ? >>>> >>>> Regards >>>> Jun >>>> >>>> >>>> 2018-01-30 15:24 GMT+08:00 Craig Topper <craig.topper at gmail.com>: >>>> >>>>> The i8 type in the pointers doesn't matter a whole lot. There's a long >>>>> term plan to remove the type from all pointers in llvm IR. >>>>> >>>>> Yes, clang will use memcpy for struct copies. You can see example IR >>>>> here https://godbolt.org/g/8gQ18m. You'll see that the struct >>>>> pointers are bitcasted to i8* before the call. >>>>> >>>>> ~Craig >>>>> >>>>> On Mon, Jan 29, 2018 at 11:12 PM, ma jun via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> >>>>>> Hi all >>>>>> I'm new here, and I have some question about llvm.memcpy >>>>>> intrinsic. >>>>>> why does llvm.memcpy intrinsic only support i8* for first two >>>>>> arguments? and does clang will also transform struct copy into llvm.memcpy >>>>>> ? what format does IR looks like? >>>>>> Thanks ! >>>>>> >>>>>> Regards >>>>>> Jun >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180131/9bcc9fec/attachment.html>
Hi Jakub thanks, I saw the pass with code: auto *BufferTy = dyn_cast<StructType>(SrcPtrTy->getPointerElementType()); if (!BufferTy) return false; any type like i32/float can also use this pass to eliminate memcpy? Regards Jun 2018-02-01 1:36 GMT+08:00 Jakub (Kuba) Kuderski <kubakuderski at gmail.com>:> Hi Ma, > > how can I transform the llvm.memcpy into data move loop IR and eliminate >> the bitcast instruction ? >> > > I'm not sure why you are concerned about memcpy and bitcasts, but if you > call MCpyInst->getSource() and MCpyInst->getDest() it will look through > casts and give you the 'true' source/destination. > > If you want to get rid of memcpy altogether, you can take a look at this > pass: https://github.com/seahorn/seahorn/blob/master/ > lib/Transforms/Scalar/PromoteMemcpy.cc . > > Best, > Kuba > > On Tue, Jan 30, 2018 at 3:22 AM, ma jun via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Craig >> Thank you very much ! >> >> 2018-01-30 16:11 GMT+08:00 Craig Topper <craig.topper at gmail.com>: >> >>> The pointers must always be i8* the alignment is independent and is >>> controlled by the attributes on the arguments in the call to memcpy. >>> >>> ~Craig >>> >>> On Mon, Jan 29, 2018 at 11:45 PM, ma jun <jun.parser at gmail.com> wrote: >>> >>>> Hi >>>> >>>> >>>> 2018-01-30 15:36 GMT+08:00 ma jun <jun.parser at gmail.com>: >>>> >>>>> Hi >>>>> Thanks ! >>>>> so for this example >>>>> void foo(X &src, X &dst) { >>>>> dst = src; >>>>> } >>>>> and the IR: >>>>> >>>>> define void @foo(X&, X&)(%struct.X* dereferenceable(8), %struct.X* >>>>> dereferenceable(8)) #0 { >>>>> %3 = alloca %struct.X*, align 8 >>>>> %4 = alloca %struct.X*, align 8 >>>>> store %struct.X* %0, %struct.X** %3, align 8 >>>>> store %struct.X* %1, %struct.X** %4, align 8 >>>>> %5 = load %struct.X*, %struct.X** %3, align 8 >>>>> %6 = load %struct.X*, %struct.X** %4, align 8 >>>>> %7 = bitcast %struct.X* %6 to i8* >>>>> %8 = bitcast %struct.X* %5 to i8* >>>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8* align 4 %8, >>>>> i64 8, i1 false) >>>>> >>>> >>>> also since the dst and src are 4 byte align , can we use the IR below: >>>> >>>> %7 = bitcast %struct.X* %6 to i32* >>>> >>>> %8 = bitcast %struct.X* %5 to i32* >>>> >>>> call void @llvm.memcpy.p0i32.p0i32.i64(i32* align 4 %7, i32* align 4 %8 >>>> , i64 8, i1 false) >>>> >>>> >>>>> ret void >>>>> } >>>>> >>>>> how can I transform the llvm.memcpy into data move loop IR and >>>>> eliminate the bitcast instruction ? >>>>> >>>>> Regards >>>>> Jun >>>>> >>>>> >>>>> 2018-01-30 15:24 GMT+08:00 Craig Topper <craig.topper at gmail.com>: >>>>> >>>>>> The i8 type in the pointers doesn't matter a whole lot. There's a >>>>>> long term plan to remove the type from all pointers in llvm IR. >>>>>> >>>>>> Yes, clang will use memcpy for struct copies. You can see example IR >>>>>> here https://godbolt.org/g/8gQ18m. You'll see that the struct >>>>>> pointers are bitcasted to i8* before the call. >>>>>> >>>>>> ~Craig >>>>>> >>>>>> On Mon, Jan 29, 2018 at 11:12 PM, ma jun via llvm-dev < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>>> >>>>>>> Hi all >>>>>>> I'm new here, and I have some question about llvm.memcpy >>>>>>> intrinsic. >>>>>>> why does llvm.memcpy intrinsic only support i8* for first two >>>>>>> arguments? and does clang will also transform struct copy into llvm.memcpy >>>>>>> ? what format does IR looks like? >>>>>>> Thanks ! >>>>>>> >>>>>>> Regards >>>>>>> Jun >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > -- > Jakub Kuderski >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180201/2447fb11/attachment.html>
On 31 Jan 2018, at 17:36, Jakub (Kuba) Kuderski via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc .There are at least four different places in LLVM where memcpy intrinsics are expanded to either sequences of instructions or calls: - InstCombine does it for very small memcpys (with a broken heuristic). - PromoteMemCpy does it mostly to expose other optimisation opportunities. - SelectionDAG does it (though in a pretty terrible way, because it can’t create new basic blocks and so can’t emit small loops) - Some back ends do it in cooperation with SelectionDAG to provide their own implementation. Whether you want a memcpy intrinsic or a sequence of loads and stores depends a little bit on what optimisation you’re doing next - some work better treating individual fields separately, some prefer to have a blob of memory that they can treat as a single entity. It’s also worth noting that LLVM’s handling of padding in structure fields is particularly bad. LLVM IR has two kinds of struct: packed an non-packed. The documentation doesn’t make it clear whether non-packed structs have padding at the end (and clang assumes that it doesn’t, some of the time). Non-padded structs do have padding in between fields for alignment. When lowering from C (or a language needing to support a C ABI), you sometimes end up with padding fields inserted by the front end. Optimisers have no way of distinguishing these fields from non-padding fields and so we only get rid of them if SROA extracts them and finds that they have no side-effect-free consumers. In contrast, the padding between fields in non-packed structs disappears as soon as SROA runs. This can lead to violations of C semantics, where padding fields should not change (because C defines bitwise comparisons on structs using memcmp). This can lead to subtly different behaviour in C code depending on the target ABI (we’ve seen cases where trailing padding is copied in one ABI but not in another, depending solely on pointer size). David
Hi David tks a lot, that makes much more clear! Regards Jun 2018-02-01 18:03 GMT+08:00 David Chisnall <David.Chisnall at cl.cam.ac.uk>:> On 31 Jan 2018, at 17:36, Jakub (Kuba) Kuderski via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > If you want to get rid of memcpy altogether, you can take a look at this > pass: https://github.com/seahorn/seahorn/blob/master/lib/ > Transforms/Scalar/PromoteMemcpy.cc . > > There are at least four different places in LLVM where memcpy intrinsics > are expanded to either sequences of instructions or calls: > > - InstCombine does it for very small memcpys (with a broken heuristic). > > - PromoteMemCpy does it mostly to expose other optimisation opportunities. > > - SelectionDAG does it (though in a pretty terrible way, because it can’t > create new basic blocks and so can’t emit small loops) > > - Some back ends do it in cooperation with SelectionDAG to provide their > own implementation. > > Whether you want a memcpy intrinsic or a sequence of loads and stores > depends a little bit on what optimisation you’re doing next - some work > better treating individual fields separately, some prefer to have a blob of > memory that they can treat as a single entity. > > It’s also worth noting that LLVM’s handling of padding in structure fields > is particularly bad. LLVM IR has two kinds of struct: packed an > non-packed. The documentation doesn’t make it clear whether non-packed > structs have padding at the end (and clang assumes that it doesn’t, some of > the time). Non-padded structs do have padding in between fields for > alignment. When lowering from C (or a language needing to support a C > ABI), you sometimes end up with padding fields inserted by the front end. > Optimisers have no way of distinguishing these fields from non-padding > fields and so we only get rid of them if SROA extracts them and finds that > they have no side-effect-free consumers. In contrast, the padding between > fields in non-packed structs disappears as soon as SROA runs. This can > lead to violations of C semantics, where padding fields should not change > (because C defines bitwise comparisons on structs using memcmp). This can > lead to subtly different behaviour in C code depending on the target ABI > (we’ve seen cases where trailing padding is copied in one ABI but not in > another, depending solely on pointer size). > > David > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180201/4e444cab/attachment.html>
On 2/1/2018 2:03 AM, David Chisnall via llvm-dev wrote:> In contrast, the padding between fields in non-packed structs > disappears as soon as SROA runs. This can lead to violations of C > semantics, where padding fields should not change (because C defines > bitwise comparisons on structs using memcmp). This can lead to subtly > different behaviour in C code depending on the target ABI (we’ve seen > cases where trailing padding is copied in one ABI but not in another, > depending solely on pointer size).The IR type of an alloca isn't supposed to affect the semantics; it's just a sizeof(type) block of bytes. We haven't always gotten this right in the past, but it should work correctly on trunk, as far as I know. If you have an IR testcase where this still doesn't work correctly, please file a bug. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project