Jakub (Kuba) Kuderski via llvm-dev
2018-Jan-31 17:36 UTC
[llvm-dev] llvm.memcpy for struct copy
Hi Ma, how can I transform the llvm.memcpy into data move loop IR and eliminate> the bitcast instruction ? >I'm not sure why you are concerned about memcpy and bitcasts, but if you call MCpyInst->getSource() and MCpyInst->getDest() it will look through casts and give you the 'true' source/destination. If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc . Best, Kuba On Tue, Jan 30, 2018 at 3:22 AM, ma jun via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Craig > Thank you very much ! > > 2018-01-30 16:11 GMT+08:00 Craig Topper <craig.topper at gmail.com>: > >> The pointers must always be i8* the alignment is independent and is >> controlled by the attributes on the arguments in the call to memcpy. >> >> ~Craig >> >> On Mon, Jan 29, 2018 at 11:45 PM, ma jun <jun.parser at gmail.com> wrote: >> >>> Hi >>> >>> >>> 2018-01-30 15:36 GMT+08:00 ma jun <jun.parser at gmail.com>: >>> >>>> Hi >>>> Thanks ! >>>> so for this example >>>> void foo(X &src, X &dst) { >>>> dst = src; >>>> } >>>> and the IR: >>>> >>>> define void @foo(X&, X&)(%struct.X* dereferenceable(8), %struct.X* >>>> dereferenceable(8)) #0 { >>>> %3 = alloca %struct.X*, align 8 >>>> %4 = alloca %struct.X*, align 8 >>>> store %struct.X* %0, %struct.X** %3, align 8 >>>> store %struct.X* %1, %struct.X** %4, align 8 >>>> %5 = load %struct.X*, %struct.X** %3, align 8 >>>> %6 = load %struct.X*, %struct.X** %4, align 8 >>>> %7 = bitcast %struct.X* %6 to i8* >>>> %8 = bitcast %struct.X* %5 to i8* >>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8* align 4 %8, >>>> i64 8, i1 false) >>>> >>> >>> also since the dst and src are 4 byte align , can we use the IR below: >>> >>> %7 = bitcast %struct.X* %6 to i32* >>> >>> %8 = bitcast %struct.X* %5 to i32* >>> >>> call void @llvm.memcpy.p0i32.p0i32.i64(i32* align 4 %7, i32* align 4 %8 >>> , i64 8, i1 false) >>> >>> >>>> ret void >>>> } >>>> >>>> how can I transform the llvm.memcpy into data move loop IR and >>>> eliminate the bitcast instruction ? >>>> >>>> Regards >>>> Jun >>>> >>>> >>>> 2018-01-30 15:24 GMT+08:00 Craig Topper <craig.topper at gmail.com>: >>>> >>>>> The i8 type in the pointers doesn't matter a whole lot. There's a long >>>>> term plan to remove the type from all pointers in llvm IR. >>>>> >>>>> Yes, clang will use memcpy for struct copies. You can see example IR >>>>> here https://godbolt.org/g/8gQ18m. You'll see that the struct >>>>> pointers are bitcasted to i8* before the call. >>>>> >>>>> ~Craig >>>>> >>>>> On Mon, Jan 29, 2018 at 11:12 PM, ma jun via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> >>>>>> Hi all >>>>>> I'm new here, and I have some question about llvm.memcpy >>>>>> intrinsic. >>>>>> why does llvm.memcpy intrinsic only support i8* for first two >>>>>> arguments? and does clang will also transform struct copy into llvm.memcpy >>>>>> ? what format does IR looks like? >>>>>> Thanks ! >>>>>> >>>>>> Regards >>>>>> Jun >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180131/9bcc9fec/attachment.html>
Hi Jakub
thanks, I saw the pass with code:
auto *BufferTy =
dyn_cast<StructType>(SrcPtrTy->getPointerElementType());
if (!BufferTy)
return false;
any type like i32/float can also use this pass to eliminate memcpy?
Regards
Jun
2018-02-01 1:36 GMT+08:00 Jakub (Kuba) Kuderski <kubakuderski at
gmail.com>:
> Hi Ma,
>
> how can I transform the llvm.memcpy into data move loop IR and eliminate
>> the bitcast instruction ?
>>
>
> I'm not sure why you are concerned about memcpy and bitcasts, but if
you
> call MCpyInst->getSource() and MCpyInst->getDest() it will look
through
> casts and give you the 'true' source/destination.
>
> If you want to get rid of memcpy altogether, you can take a look at this
> pass: https://github.com/seahorn/seahorn/blob/master/
> lib/Transforms/Scalar/PromoteMemcpy.cc .
>
> Best,
> Kuba
>
> On Tue, Jan 30, 2018 at 3:22 AM, ma jun via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Craig
>> Thank you very much !
>>
>> 2018-01-30 16:11 GMT+08:00 Craig Topper <craig.topper at
gmail.com>:
>>
>>> The pointers must always be i8* the alignment is independent and is
>>> controlled by the attributes on the arguments in the call to
memcpy.
>>>
>>> ~Craig
>>>
>>> On Mon, Jan 29, 2018 at 11:45 PM, ma jun <jun.parser at
gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>>
>>>> 2018-01-30 15:36 GMT+08:00 ma jun <jun.parser at
gmail.com>:
>>>>
>>>>> Hi
>>>>> Thanks !
>>>>> so for this example
>>>>> void foo(X &src, X &dst) {
>>>>> dst = src;
>>>>> }
>>>>> and the IR:
>>>>>
>>>>> define void @foo(X&, X&)(%struct.X*
dereferenceable(8), %struct.X*
>>>>> dereferenceable(8)) #0 {
>>>>> %3 = alloca %struct.X*, align 8
>>>>> %4 = alloca %struct.X*, align 8
>>>>> store %struct.X* %0, %struct.X** %3, align 8
>>>>> store %struct.X* %1, %struct.X** %4, align 8
>>>>> %5 = load %struct.X*, %struct.X** %3, align 8
>>>>> %6 = load %struct.X*, %struct.X** %4, align 8
>>>>> %7 = bitcast %struct.X* %6 to i8*
>>>>> %8 = bitcast %struct.X* %5 to i8*
>>>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8*
align 4 %8,
>>>>> i64 8, i1 false)
>>>>>
>>>>
>>>> also since the dst and src are 4 byte align , can we use the
IR below:
>>>>
>>>> %7 = bitcast %struct.X* %6 to i32*
>>>>
>>>> %8 = bitcast %struct.X* %5 to i32*
>>>>
>>>> call void @llvm.memcpy.p0i32.p0i32.i64(i32* align 4 %7, i32*
align 4 %8
>>>> , i64 8, i1 false)
>>>>
>>>>
>>>>> ret void
>>>>> }
>>>>>
>>>>> how can I transform the llvm.memcpy into data move loop IR
and
>>>>> eliminate the bitcast instruction ?
>>>>>
>>>>> Regards
>>>>> Jun
>>>>>
>>>>>
>>>>> 2018-01-30 15:24 GMT+08:00 Craig Topper <craig.topper at
gmail.com>:
>>>>>
>>>>>> The i8 type in the pointers doesn't matter a whole
lot. There's a
>>>>>> long term plan to remove the type from all pointers in
llvm IR.
>>>>>>
>>>>>> Yes, clang will use memcpy for struct copies. You can
see example IR
>>>>>> here https://godbolt.org/g/8gQ18m. You'll see that
the struct
>>>>>> pointers are bitcasted to i8* before the call.
>>>>>>
>>>>>> ~Craig
>>>>>>
>>>>>> On Mon, Jan 29, 2018 at 11:12 PM, ma jun via llvm-dev
<
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi all
>>>>>>> I'm new here, and I have some question
about llvm.memcpy
>>>>>>> intrinsic.
>>>>>>> why does llvm.memcpy intrinsic only support
i8* for first two
>>>>>>> arguments? and does clang will also transform
struct copy into llvm.memcpy
>>>>>>> ? what format does IR looks like?
>>>>>>> Thanks !
>>>>>>>
>>>>>>> Regards
>>>>>>> Jun
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> --
> Jakub Kuderski
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180201/2447fb11/attachment.html>
On 31 Jan 2018, at 17:36, Jakub (Kuba) Kuderski via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc .There are at least four different places in LLVM where memcpy intrinsics are expanded to either sequences of instructions or calls: - InstCombine does it for very small memcpys (with a broken heuristic). - PromoteMemCpy does it mostly to expose other optimisation opportunities. - SelectionDAG does it (though in a pretty terrible way, because it can’t create new basic blocks and so can’t emit small loops) - Some back ends do it in cooperation with SelectionDAG to provide their own implementation. Whether you want a memcpy intrinsic or a sequence of loads and stores depends a little bit on what optimisation you’re doing next - some work better treating individual fields separately, some prefer to have a blob of memory that they can treat as a single entity. It’s also worth noting that LLVM’s handling of padding in structure fields is particularly bad. LLVM IR has two kinds of struct: packed an non-packed. The documentation doesn’t make it clear whether non-packed structs have padding at the end (and clang assumes that it doesn’t, some of the time). Non-padded structs do have padding in between fields for alignment. When lowering from C (or a language needing to support a C ABI), you sometimes end up with padding fields inserted by the front end. Optimisers have no way of distinguishing these fields from non-padding fields and so we only get rid of them if SROA extracts them and finds that they have no side-effect-free consumers. In contrast, the padding between fields in non-packed structs disappears as soon as SROA runs. This can lead to violations of C semantics, where padding fields should not change (because C defines bitwise comparisons on structs using memcmp). This can lead to subtly different behaviour in C code depending on the target ABI (we’ve seen cases where trailing padding is copied in one ABI but not in another, depending solely on pointer size). David
Hi David
tks a lot, that makes much more clear!
Regards
Jun
2018-02-01 18:03 GMT+08:00 David Chisnall <David.Chisnall at
cl.cam.ac.uk>:
> On 31 Jan 2018, at 17:36, Jakub (Kuba) Kuderski via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > If you want to get rid of memcpy altogether, you can take a look at
this
> pass: https://github.com/seahorn/seahorn/blob/master/lib/
> Transforms/Scalar/PromoteMemcpy.cc .
>
> There are at least four different places in LLVM where memcpy intrinsics
> are expanded to either sequences of instructions or calls:
>
> - InstCombine does it for very small memcpys (with a broken heuristic).
>
> - PromoteMemCpy does it mostly to expose other optimisation opportunities.
>
> - SelectionDAG does it (though in a pretty terrible way, because it can’t
> create new basic blocks and so can’t emit small loops)
>
> - Some back ends do it in cooperation with SelectionDAG to provide their
> own implementation.
>
> Whether you want a memcpy intrinsic or a sequence of loads and stores
> depends a little bit on what optimisation you’re doing next - some work
> better treating individual fields separately, some prefer to have a blob of
> memory that they can treat as a single entity.
>
> It’s also worth noting that LLVM’s handling of padding in structure fields
> is particularly bad. LLVM IR has two kinds of struct: packed an
> non-packed. The documentation doesn’t make it clear whether non-packed
> structs have padding at the end (and clang assumes that it doesn’t, some of
> the time). Non-padded structs do have padding in between fields for
> alignment. When lowering from C (or a language needing to support a C
> ABI), you sometimes end up with padding fields inserted by the front end.
> Optimisers have no way of distinguishing these fields from non-padding
> fields and so we only get rid of them if SROA extracts them and finds that
> they have no side-effect-free consumers. In contrast, the padding between
> fields in non-packed structs disappears as soon as SROA runs. This can
> lead to violations of C semantics, where padding fields should not change
> (because C defines bitwise comparisons on structs using memcmp). This can
> lead to subtly different behaviour in C code depending on the target ABI
> (we’ve seen cases where trailing padding is copied in one ABI but not in
> another, depending solely on pointer size).
>
> David
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180201/4e444cab/attachment.html>
On 2/1/2018 2:03 AM, David Chisnall via llvm-dev wrote:> In contrast, the padding between fields in non-packed structs > disappears as soon as SROA runs. This can lead to violations of C > semantics, where padding fields should not change (because C defines > bitwise comparisons on structs using memcmp). This can lead to subtly > different behaviour in C code depending on the target ABI (we’ve seen > cases where trailing padding is copied in one ABI but not in another, > depending solely on pointer size).The IR type of an alloca isn't supposed to affect the semantics; it's just a sizeof(type) block of bytes. We haven't always gotten this right in the past, but it should work correctly on trunk, as far as I know. If you have an IR testcase where this still doesn't work correctly, please file a bug. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project