thr3ads.net - llvm dev - [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics. [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Lang Hames via llvm-dev

2015-Aug-19 01:04 UTC

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hi All,

I'd like to float two changes to the llvm.memcpy / llvm.memmove intrinsics.


(1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy intrinsic.

When set to '1' (the auto-upgrade default), this argument would indicate
that the source and destination arguments may perfectly alias (otherwise
they must not alias at all - memcpy prohibits partial overlap). While the C
standard says that memcpy's arguments can't alias at all, perfect
aliasing
works in practice, and clang currently relies on this behavior: it emits
llvm.memcpys for aggregate copies, despite the possibility of
self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies would have
mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy (including
lowerings from memcpy calls) would have mapPerfectlyAlias set to '0'.

This change is motivated by poor optimization for small memcpys on targets
with strict alignment requirements. When a user writes a small, unaligned
memcpy we may transform it into an unaligned load/store pair in instcombine
(See InstCombine::SimplifyMemTransfer), which is then broken up into an
unwieldy series of smaller loads and stores during legalization. I have a
fix for this issue which tags the pointers for unaligned load/store pairs
with noalias metadata allowing CodeGen to produce better code during
legalization, but it's not safe to apply while clang is emitting memcpys
with pointers that may perfectly alias. If the 'mayPerfectlyAlias' flag
were introduced, I could inspect that and add the noalias tag only if
mayPerfectlyAlias is '0'.

Note: We could also achieve the desired effect by adding a new intrinsic
(llvm.structcpy?) with semantics that match the current llvm.memcpy ones
(i.e. perfect-aliasing or non-aliasing, but no partial), and then reclaim
llvm.memcpy for non-aliasing pointers only. I floated this idea with David
Majnemer on IRC and he suggested that adding a flag to llvm.memcpy might be
less disruptive and easier to maintain - thanks for the suggestion David!



(2) Allow different source and destination alignments on both llvm.memcpy /
llvm.memmove.

Since I'm talking about changes to llvm.memcpy anyway, a few people asked
me to float this one. Having separate alignments for the source and
destination pointers may allow us to generate better code when one of the
pointers has a higher alignment.

The auto-upgrade for this would be to set both source and destination
alignment to the original 'align' value.



Any thoughts?

Cheers,
Lang.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150818/b1a9d2c7/attachment.html>

Pete Cooper via llvm-dev

2015-Aug-19 16:35 UTC

head link

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hey Lang> On Aug 18, 2015, at 6:04 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi All,
> 
> I'd like to float two changes to the llvm.memcpy / llvm.memmove
intrinsics.
> 
> 
> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
intrinsic.
> 
> When set to '1' (the auto-upgrade default), this argument would
indicate that the source and destination arguments may perfectly alias
(otherwise they must not alias at all - memcpy prohibits partial overlap). While
the C standard says that memcpy's arguments can't alias at all, perfect
aliasing works in practice, and clang currently relies on this behavior: it
emits llvm.memcpys for aggregate copies, despite the possibility of
self-assignment.
> 
> Going forward, llvm.memcpy calls emitted for aggregate copies would have
mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy (including
lowerings from memcpy calls) would have mapPerfectlyAlias set to '0'.
> 
> This change is motivated by poor optimization for small memcpys on targets
with strict alignment requirements. When a user writes a small, unaligned memcpy
we may transform it into an unaligned load/store pair in instcombine (See
InstCombine::SimplifyMemTransfer), which is then broken up into an unwieldy
series of smaller loads and stores during legalization. I have a fix for this
issue which tags the pointers for unaligned load/store pairs with noalias
metadata allowing CodeGen to produce better code during legalization, but
it's not safe to apply while clang is emitting memcpys with pointers that
may perfectly alias. If the 'mayPerfectlyAlias' flag were introduced, I
could inspect that and add the noalias tag only if mayPerfectlyAlias is
'0'.
> 
> Note: We could also achieve the desired effect by adding a new intrinsic
(llvm.structcpy?) with semantics that match the current llvm.memcpy ones (i.e.
perfect-aliasing or non-aliasing, but no partial), and then reclaim llvm.memcpy
for non-aliasing pointers only. I floated this idea with David Majnemer on IRC
and he suggested that adding a flag to llvm.memcpy might be less disruptive and
easier to maintain - thanks for the suggestion David!
> 
> 
> 
> (2) Allow different source and destination alignments on both llvm.memcpy /
llvm.memmove.
> 
> Since I'm talking about changes to llvm.memcpy anyway, a few people
asked me to float this one. Having separate alignments for the source and
destination pointers may allow us to generate better code when one of the
pointers has a higher alignment.
> 
> The auto-upgrade for this would be to set both source and destination
alignment to the original 'align' value.FWIW, I have a patch for this lying around.  I can dig it up.  I use alignment
attributes to do it as there’s no need for alignment to be its own argument any
more.

Cheers,
Pete> 
> 
> 
> Any thoughts?
> 
> Cheers,
> Lang.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=03tkj3107244TlY4t3_hEgkDY-UG6gKwwK0wOUS3qjM&m=Js9_JWwnnCSoMnHhNlCr8sySTkjrVAbkaLqUP-49_x8&s=fAOxwvp7OA1L-OJfpwmZClRuD_eqxcJWA9p2bZ2-zz0&e=

Philip Reames via llvm-dev

2015-Aug-19 17:14 UTC

head link

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

On 08/19/2015 09:35 AM, Pete Cooper via llvm-dev wrote:> Hey Lang
>> On Aug 18, 2015, at 6:04 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> Hi All,
>>
>> I'd like to float two changes to the llvm.memcpy / llvm.memmove
intrinsics.
>>
>>
>> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
intrinsic.
>>
>> When set to '1' (the auto-upgrade default), this argument would
indicate that the source and destination arguments may perfectly alias
(otherwise they must not alias at all - memcpy prohibits partial overlap). While
the C standard says that memcpy's arguments can't alias at all, perfect
aliasing works in practice, and clang currently relies on this behavior: it
emits llvm.memcpys for aggregate copies, despite the possibility of
self-assignment.
>>
>> Going forward, llvm.memcpy calls emitted for aggregate copies would
have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy (including
lowerings from memcpy calls) would have mapPerfectlyAlias set to '0'.
>>
>> This change is motivated by poor optimization for small memcpys on
targets with strict alignment requirements. When a user writes a small,
unaligned memcpy we may transform it into an unaligned load/store pair in
instcombine (See InstCombine::SimplifyMemTransfer), which is then broken up into
an unwieldy series of smaller loads and stores during legalization. I have a fix
for this issue which tags the pointers for unaligned load/store pairs with
noalias metadata allowing CodeGen to produce better code during legalization,
but it's not safe to apply while clang is emitting memcpys with pointers
that may perfectly alias. If the 'mayPerfectlyAlias' flag were
introduced, I could inspect that and add the noalias tag only if
mayPerfectlyAlias is '0'.
>>
>> Note: We could also achieve the desired effect by adding a new
intrinsic (llvm.structcpy?) with semantics that match the current llvm.memcpy
ones (i.e. perfect-aliasing or non-aliasing, but no partial), and then reclaim
llvm.memcpy for non-aliasing pointers only. I floated this idea with David
Majnemer on IRC and he suggested that adding a flag to llvm.memcpy might be less
disruptive and easier to maintain - thanks for the suggestion David!Given there's a semantically conservative interpretation and a more 
optimistic one, this really sounds like a case for metadata not another 
argument to the function.  Our memcpy could keep it's current semantics, 
and we could add a piece of metadata which says none of the arguments to 
the call alias.

Actually, can't we already get this interpretation by marking both 
argument points as noalias?  Doesn't that require that they don't 
overlap at all?  I think we just need the ability to specify noalias at 
the callsite for each argument.  I don't know if that's been tried, but 
it should work in theory.  There are some issues with control dependence 
of call site attributes though that we'd need to watch out
for/fix.>>
>>
>>
>> (2) Allow different source and destination alignments on both
llvm.memcpy / llvm.memmove.
>>
>> Since I'm talking about changes to llvm.memcpy anyway, a few people
asked me to float this one. Having separate alignments for the source and
destination pointers may allow us to generate better code when one of the
pointers has a higher alignment.
>>
>> The auto-upgrade for this would be to set both source and destination
alignment to the original 'align' value.
> FWIW, I have a patch for this lying around.  I can dig it up.  I use
alignment attributes to do it as there’s no need for alignment to be its own
argument any more.
This would be a nice cleanup in general.  +1>
> Cheers,
> Pete
>>
>>
>> Any thoughts?
>>
>> Cheers,
>> Lang.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=03tkj3107244TlY4t3_hEgkDY-UG6gKwwK0wOUS3qjM&m=Js9_JWwnnCSoMnHhNlCr8sySTkjrVAbkaLqUP-49_x8&s=fAOxwvp7OA1L-OJfpwmZClRuD_eqxcJWA9p2bZ2-zz0&e>
_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hal Finkel via llvm-dev

2015-Aug-19 18:47 UTC

head link

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

----- Original Message -----> From: "Lang Hames" <lhames at gmail.com>
> To: "LLVM Developers Mailing List" <llvm-dev at
lists.llvm.org>
> Cc: "Chandler Carruth" <chandlerc at gmail.com>, "Hal
Finkel" <hfinkel at anl.gov>, "David Majnemer"
> <david.majnemer at gmail.com>, "John McCall" <rjmccall
at apple.com>, "Jim Grosbach" <grosbach at apple.com>
> Sent: Tuesday, August 18, 2015 8:04:48 PM
> Subject: [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.
> 
> 
> Hi All,
> 
> 
> I'd like to float two changes to the llvm.memcpy / llvm.memmove
> intrinsics.
> 
> 
> 
> 
> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
> intrinsic.
> 
> 
> When set to '1' (the auto-upgrade default), this argument would
> indicate that the source and destination arguments may perfectly
> alias (otherwise they must not alias at all - memcpy prohibits
> partial overlap). While the C standard says that memcpy's arguments
> can't alias at all, perfect aliasing works in practice, and clang
> currently relies on this behavior: it emits llvm.memcpys for
> aggregate copies, despite the possibility of self-assignment.
> 
> 
> Going forward, llvm.memcpy calls emitted for aggregate copies would
> have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy
> (including lowerings from memcpy calls) would have mapPerfectlyAlias
> set to '0'.
> 
> 
> This change is motivated by poor optimization for small memcpys on
> targets with strict alignment requirements. When a user writes a
> small, unaligned memcpy we may transform it into an unaligned
> load/store pair in instcombine (See
> InstCombine::SimplifyMemTransfer), which is then broken up into an
> unwieldy series of smaller loads and stores during legalization. I
> have a fix for this issue which tags the pointers for unaligned
> load/store pairs with noalias metadata allowing CodeGen to produce
> better code during legalization, but it's not safe to apply while
> clang is emitting memcpys with pointers that may perfectly alias. If
> the 'mayPerfectlyAlias' flag were introduced, I could inspect that
> and add the noalias tag only if mayPerfectlyAlias is '0'.
> 
> 
> 
> Note: We could also achieve the desired effect by adding a new
> intrinsic (llvm.structcpy?) with semantics that match the current
> llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
> partial), and then reclaim llvm.memcpy for non-aliasing pointers
> only. I floated this idea with David Majnemer on IRC and he
> suggested that adding a flag to llvm.memcpy might be less disruptive
> and easier to maintain - thanks for the suggestion David!
> 
> 
> (2) Allow different source and destination alignments on both
> llvm.memcpy / llvm.memmove.
> 
> 
> Since I'm talking about changes to llvm.memcpy anyway, a few people
> asked me to float this one. Having separate alignments for the
> source and destination pointers may allow us to generate better code
> when one of the pointers has a higher alignment.
> 
> The auto-upgrade for this would be to set both source and destination
> alignment to the original 'align' value.
As one of the people who asked for this, let me add: We currently have code
which upgrades the alignment on memcpy intrinsics (because of alignment
attributes, assumptions, etc.), and this is useful for making memcpy expand into
vector instructions when the source/destination are suitably aligned. It would
be useful for this to happen on some targets even if only the source or
destination could be upgraded (aligned stores but underaligned loads might still
be a win, for example). Currently we can't do this because we can only
represent a single alignment. Because we aggressively form memcpy as part of
idiom recognition, and emit them in frontends, this comes up more than it would
from source-level memcpy calls alone.

Thus, I agree with John (and Lang), so long as we're fooling with the memcpy
intrinsic's signature, we should do this too.
> 
> 
> Any thoughts?
I'm strongly in favor of both pieces.

 -Hal
> 
> 
> Cheers,
> Lang.
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Aug 2015 - [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.