thr3ads.net - llvm dev - [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control. [Apr 2021]

If this information is useful, please help other people find it:
Share via:

James Y Knight via llvm-dev

2021-Apr-16 21:35 UTC

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

On Thu, Apr 15, 2021, 6:03 PM via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> Reid, I’m not clear why anyone would want to “power down” the
> alignment-aware optimizations?  How does that benefit anyone?  For example…
>
>
>
> Let’s postulate a target that has only non-trapping load/store
> instructions; maybe they go faster on aligned addresses, but don’t trap on
> unaligned addresses.  It has been a few decades but I think VAX worked this
> way.
>
> Would you insist we should power-down the alignment-aware optimizations
> for this target?  Just because the hardware couldn’t require aligned data?
> I hope not.
>
> The conclusion must be, then, that there is no relationship between the
> existence of trapping/non-trapping instruction behavior for a given target,
> and how the frontend and middle-end should behave.
>
>
>
> Therefore, we can’t insist on the front-end slapping “align 1” on
> everything just because the target doesn’t trap a non-aligned load.
>
>Certainly, it's entirely valid for a target to not trap on an unaligned
load. We have many such targets. A target trapping on misaligned loads
isn't a required feature. (If users want to reliably diagnose misalignment
bugs, -fsanitize=alignment is the way to do so.)

Therefore, the choice of trapping/non-trapping instruction behavior in
the> X86 target specifically, has no necessary relationship to how alignment is
> thought of in the front-end/middle-end.
>
>If the proposal here had been: "We should switch X86 from using movaps
(alignment-checking) to movups (non-alignment-checking), because movups has
a smaller encoding size (or is faster to execute on new microarchitectures,
or ...), there'd be no problem.

But, that is *not* what's being proposed here. This proposal is to switch
to movups as a workaround for software that has undefined behavior due to
misaligned objects. That is misguided, because the proposed change does not
fix such code! That the movaps instruction traps in such programs is like a
proverbial "canary in a coal mine". It's a result of your program
containing alignment-related UB. Removing the canary prevents you from
having a dead canary, but it doesn't prevent the mine from exploding.

I have the feeling folks aren't understanding what exactly I'm talking
about w.r.t. alignment-related breakage. There's at least three things LLVM
can do with alignment information today.
1. Most obviously, it allows generation of hardware load instructions that
require a certain alignment (MOVAPS on X86, LDM on ARM, etc.).
2. It enables known-bits analysis on pointers: "ptr & 0x3" is
optimized to
0 if ptr is known to have alignment >= 4. Example: `int foo(int& x) {
return ((uintptr_t)&x) & 0x3; }`
3. It can assist with alias analysis: if both addr1 and addr2 have align 8,
then a 4-byte load from (addr1 + 0) cannot possibly alias a 4-byte load
from (addr2 + 4). This is true even without TBAA, and even if know nothing
else about the relationship between addr1 and addr2. (I don't have an
example of this -- it looks like llvm may not be doing as good a job here
as it could, but I definitely recall reading code which purported to
implement this.)

The initial proposal only addresses the first issue, leaving users who
depend on this are in an extremely precarious position -- liable to be
broken by any future optimization improvement.


*From:* Craig Topper <craig.topper at gmail.com>> *Sent:* Thursday, April 15, 2021 4:51 PM
> *To:* Reid Kleckner <rnk at google.com>
> *Cc:* Robinson, Paul <paul.robinson at sony.com>; Luo, Yuanke <
> yuanke.luo at intel.com>; Liu, Chen3 <chen3.liu at intel.com>;
Maslov, Sergey V
> <sergey.v.maslov at intel.com>; llvm-dev <llvm-dev at
lists.llvm.org>
> *Subject:* Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx
> machine with option control.
>
>
>
> What if we didn't use aligned instructions by default like what PS4
did.
> And then had a command line option that would "enable alignment
exceptions"
> if someone wants them. Maybe that option should also disable memory folding
> since memory folding never checks alignment with AVX? Do other targets that
> have vectors have alignment exceptions like this? We're not obligated
to
> emit code that detects alignment errors. And we already don't if the
load
> gets folded. It seems the problem with the current proposal is that once
> you have the exception, setting a flag to make it go away is the wrong
> response.
>
>
>
> ~Craig
>
>
>
>
>
> On Thu, Apr 15, 2021 at 1:10 PM Reid Kleckner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Right, I get that this doesn't match what you are doing for PS4, and it
> doesn't match what Chen3 Liu proposed. To James's point, the
> -fmax-type-align flag is more principled because it powers down all the
> other LLVM optimizations that assume aligned pointers have zeros in the low
> bits.
>
>
>
> As for how to handle explicit alignment attributes that don't come from
> type information, maybe we could revisit that behavior, or conditionalize
> it with a flag. I just mean to say that there is prior art for this
> direction. We should continue in the direction of a complete solution from
> the frontend, rather than adding a workaround in the backend.
>
>
>
> On Thu, Apr 15, 2021 at 11:54 AM <paul.robinson at sony.com> wrote:
>
> | This sounds like the -fmax-type-align flag:
>
>
>
> Well, no, at least not for the PS4 case.  In our case, the type had an
> alignment attribute but the caller didn’t make sure the allocated memory
> was aligned properly.  The -fmax-type-align flag explicitly doesn’t do
> anything in that case, if I’m reading it correctly.  (Yes, it’s a bug.
> Yes, sanitizers or other testing could have found it.  No, there is no
> opportunity to do any of the things that would have fixed it correctly.)
>
>
>
> Really what we did was effectively this:  Pretend X86 doesn’t have a
> VMOVAPS opcode.  That’s all.  Nothing about memory/operand alignment
> attributes was modified, IR is unchanged.  Pretend that one machine opcode
> is missing.  Can’t possibly affect anything about IR optimizations, *
> *maybe** something post-ISel would be different but even that is hard to
> imagine.  (As best I can remember, the only test updates we had to make
> were to change things like “vmovaps” to “vmov{{u|a}}ps” and done.)  It’s
> like we did s/movaps/movups/g on the assembly output.
>
>
>
> I still can’t say I think it should be appropriate to do upstream—no real
> info yet on Intel’s problem case--but I hope this explains why the bigger
> hammer (i.e., get Clang involved) doesn’t seem necessary or appropriate.
>
> --paulr
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Reid
> Kleckner via llvm-dev
> *Sent:* Thursday, April 15, 2021 12:59 PM
> *To:* James Y Knight <jyknight at google.com>
> *Cc:* llvm-dev at lists.llvm.org; Liu, Chen3 <chen3.liu at
intel.com>; Luo,
> Yuanke <yuanke.luo at intel.com>; Maslov, Sergey V
<sergey.v.maslov at intel.com
> >
> *Subject:* Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx
> machine with option control.
>
>
>
> On Wed, Apr 14, 2021 at 11:58 AM James Y Knight via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> What I suspect you *actually* want here is an option to tell Clang not to
> infer load/store alignments based on object types or alignment attributes
> -- instead treating everything as being potentially aligned to 1 unless the
> allocation is seen (e.g. global/local variables). Clang would still need to
> use the usual alignment computation for variable definitions and structure
> layout, but not memory operations. If clang emits "load ... align
1"
> instructions in LLVM IR, the right thing would then happen in the X86
> backend automatically.
>
>
>
> This sounds like the -fmax-type-align flag:
>
> https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
>
<https://urldefense.com/v3/__https:/clang.llvm.org/docs/UsersManual.html*controlling-code-generation__;Iw!!JmoZiZGBv3RvKRSx!uoBVF33nyuM5lbseJ-XKanIeYhdhHW9yOoxyF7zJ56FjUs8jsfdUcuw4AQ96FRBrmA$>
>
> Explicit alignment attributes are still honored, so some aligned vector
> instructions may be generated. However, the documentation describes
> essentially this exact use case.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
<https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!tTmi5Ot5ypUoBSp2e6p8a3o7U86YV49CFHt2_pW2GwCyapgR-cMMoUAeUQxP8A7xBQ$>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210416/3d06e1c8/attachment.html>

Wang, Pengfei via llvm-dev

2021-Apr-17 02:23 UTC

head link

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

*   If the proposal here had been: "We should switch X86 from using movaps
(alignment-checking) to movups (non-alignment-checking), because movups has a
smaller encoding size (or is faster to execute on new microarchitectures, or
...), there'd be no problem. … This proposal is to switch to movups as a
workaround for software that has undefined behavior due to misaligned objects.

I think we can consider movaps is a limitation for legacy microarchitectures
which gets better performance for aligned memory load/store. It does to be
feature for new microarchitectures, i.e. movups is faster to execute on new
microarchitectures when aligned.

  *   The initial proposal only addresses the first issue, leaving users who
depend on this are in an extremely precarious position -- liable to be broken by
any future optimization improvement.

If users depend on exceptions on alignment tricks, they should explicitly use
proposed option like “-exception-on-unalginedmem”, which is not only keep to use
movaps but also block existing memory folding. Does it make more sense?

Thanks
Pengfei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James Y
Knight via llvm-dev
Sent: Saturday, April 17, 2021 5:35 AM
To: Robinson, Paul <paul.robinson at sony.com>
Cc: Liu, Chen3 <chen3.liu at intel.com>; Luo, Yuanke <yuanke.luo at
intel.com>; llvm-dev <llvm-dev at lists.llvm.org>; Maslov, Sergey V
<sergey.v.maslov at intel.com>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.

On Thu, Apr 15, 2021, 6:03 PM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Reid, I’m not clear why anyone would want to “power down” the alignment-aware
optimizations?  How does that benefit anyone?  For example…

Let’s postulate a target that has only non-trapping load/store instructions;
maybe they go faster on aligned addresses, but don’t trap on unaligned
addresses.  It has been a few decades but I think VAX worked this way.
Would you insist we should power-down the alignment-aware optimizations for this
target?  Just because the hardware couldn’t require aligned data?  I hope not.

The conclusion must be, then, that there is no relationship between the
existence of trapping/non-trapping instruction behavior for a given target, and
how the frontend and middle-end should behave.

Therefore, we can’t insist on the front-end slapping “align 1” on everything
just because the target doesn’t trap a non-aligned load.

Certainly, it's entirely valid for a target to not trap on an unaligned
load. We have many such targets. A target trapping on misaligned loads isn't
a required feature. (If users want to reliably diagnose misalignment bugs,
-fsanitize=alignment is the way to do so.)

Therefore, the choice of trapping/non-trapping instruction behavior in the X86
target specifically, has no necessary relationship to how alignment is thought
of in the front-end/middle-end.

If the proposal here had been: "We should switch X86 from using movaps
(alignment-checking) to movups (non-alignment-checking), because movups has a
smaller encoding size (or is faster to execute on new microarchitectures, or
...), there'd be no problem.

But, that is not what's being proposed here. This proposal is to switch to
movups as a workaround for software that has undefined behavior due to
misaligned objects. That is misguided, because the proposed change does not fix
such code! That the movaps instruction traps in such programs is like a
proverbial "canary in a coal mine". It's a result of your program
containing alignment-related UB. Removing the canary prevents you from having a
dead canary, but it doesn't prevent the mine from exploding.

I have the feeling folks aren't understanding what exactly I'm talking
about w.r.t. alignment-related breakage. There's at least three things LLVM
can do with alignment information today.
1. Most obviously, it allows generation of hardware load instructions that
require a certain alignment (MOVAPS on X86, LDM on ARM, etc.).
2. It enables known-bits analysis on pointers: "ptr & 0x3" is
optimized to 0 if ptr is known to have alignment >= 4. Example: `int
foo(int& x) { return ((uintptr_t)&x) & 0x3; }`
3. It can assist with alias analysis: if both addr1 and addr2 have align 8, then
a 4-byte load from (addr1 + 0) cannot possibly alias a 4-byte load from (addr2 +
4). This is true even without TBAA, and even if know nothing else about the
relationship between addr1 and addr2. (I don't have an example of this -- it
looks like llvm may not be doing as good a job here as it could, but I
definitely recall reading code which purported to implement this.)

The initial proposal only addresses the first issue, leaving users who depend on
this are in an extremely precarious position -- liable to be broken by any
future optimization improvement.


From: Craig Topper <craig.topper at gmail.com<mailto:craig.topper at
gmail.com>>
Sent: Thursday, April 15, 2021 4:51 PM
To: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>>
Cc: Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at
sony.com>>; Luo, Yuanke <yuanke.luo at intel.com<mailto:yuanke.luo
at intel.com>>; Liu, Chen3 <chen3.liu at intel.com<mailto:chen3.liu
at intel.com>>; Maslov, Sergey V <sergey.v.maslov at
intel.com<mailto:sergey.v.maslov at intel.com>>; llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.

What if we didn't use aligned instructions by default like what PS4 did. And
then had a command line option that would "enable alignment
exceptions" if someone wants them. Maybe that option should also disable
memory folding since memory folding never checks alignment with AVX? Do other
targets that have vectors have alignment exceptions like this? We're not
obligated to emit code that detects alignment errors. And we already don't
if the load gets folded. It seems the problem with the current proposal is that
once you have the exception, setting a flag to make it go away is the wrong
response.

~Craig


On Thu, Apr 15, 2021 at 1:10 PM Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Right, I get that this doesn't match what you are doing for PS4, and it
doesn't match what Chen3 Liu proposed. To James's point, the
-fmax-type-align flag is more principled because it powers down all the other
LLVM optimizations that assume aligned pointers have zeros in the low bits.

As for how to handle explicit alignment attributes that don't come from type
information, maybe we could revisit that behavior, or conditionalize it with a
flag. I just mean to say that there is prior art for this direction. We should
continue in the direction of a complete solution from the frontend, rather than
adding a workaround in the backend.

On Thu, Apr 15, 2021 at 11:54 AM <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
| This sounds like the -fmax-type-align flag:

Well, no, at least not for the PS4 case.  In our case, the type had an alignment
attribute but the caller didn’t make sure the allocated memory was aligned
properly.  The -fmax-type-align flag explicitly doesn’t do anything in that
case, if I’m reading it correctly.  (Yes, it’s a bug.  Yes, sanitizers or other
testing could have found it.  No, there is no opportunity to do any of the
things that would have fixed it correctly.)

Really what we did was effectively this:  Pretend X86 doesn’t have a VMOVAPS
opcode.  That’s all.  Nothing about memory/operand alignment attributes was
modified, IR is unchanged.  Pretend that one machine opcode is missing.  Can’t
possibly affect anything about IR optimizations, *maybe* something post-ISel
would be different but even that is hard to imagine.  (As best I can remember,
the only test updates we had to make were to change things like “vmovaps” to
“vmov{{u|a}}ps” and done.)  It’s like we did s/movaps/movups/g on the assembly
output.

I still can’t say I think it should be appropriate to do upstream—no real info
yet on Intel’s problem case--but I hope this explains why the bigger hammer
(i.e., get Clang involved) doesn’t seem necessary or appropriate.
--paulr

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Reid Kleckner via llvm-dev
Sent: Thursday, April 15, 2021 12:59 PM
To: James Y Knight <jyknight at google.com<mailto:jyknight at
google.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Liu,
Chen3 <chen3.liu at intel.com<mailto:chen3.liu at intel.com>>; Luo,
Yuanke <yuanke.luo at intel.com<mailto:yuanke.luo at intel.com>>;
Maslov, Sergey V <sergey.v.maslov at intel.com<mailto:sergey.v.maslov at
intel.com>>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.

On Wed, Apr 14, 2021 at 11:58 AM James Y Knight via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
What I suspect you actually want here is an option to tell Clang not to infer
load/store alignments based on object types or alignment attributes -- instead
treating everything as being potentially aligned to 1 unless the allocation is
seen (e.g. global/local variables). Clang would still need to use the usual
alignment computation for variable definitions and structure layout, but not
memory operations. If clang emits "load ... align 1" instructions in
LLVM IR, the right thing would then happen in the X86 backend automatically.

This sounds like the -fmax-type-align flag:
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation<https://urldefense.com/v3/__https:/clang.llvm.org/docs/UsersManual.html*controlling-code-generation__;Iw!!JmoZiZGBv3RvKRSx!uoBVF33nyuM5lbseJ-XKanIeYhdhHW9yOoxyF7zJ56FjUs8jsfdUcuw4AQ96FRBrmA$>
Explicit alignment attributes are still honored, so some aligned vector
instructions may be generated. However, the documentation describes essentially
this exact use case.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!tTmi5Ot5ypUoBSp2e6p8a3o7U86YV49CFHt2_pW2GwCyapgR-cMMoUAeUQxP8A7xBQ$>
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210417/55ffaba2/attachment.html>

via llvm-dev

2021-Apr-19 14:47 UTC

head link

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

|> If the proposal here had been: "We should switch X86 from using
> movaps (alignment-checking) to movups (non-alignment-checking),
> because movups has a smaller encoding size (or is faster to
> execute on new microarchitectures, or ...), there'd be no problem. 
>
> But, that is not what's being proposed here. This proposal is to
> switch to movups as a workaround for software that has undefined
> behavior due to misaligned objects. That is misguided, because
> the proposed change does not fix such code! That the movaps
> instruction traps in such programs is like a proverbial "canary
> in a coal mine". It's a result of your program containing
> alignment-related UB. Removing the canary prevents you from
> having a dead canary, but it doesn't prevent the mine from
> exploding.
Hi James,

It's apparent from your reply that you misunderstand one thing:
The mine has *already* exploded.

I still don't know exactly what Intel is facing, but at Sony we
have games already shipped that CANNOT BE FIXED because they are
embedded in DVD.  It is literally physically impossible to fix the
buggy software, and we have a moral contract with users that their
games will continue to run on all future releases of the console.

I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed.  I fully endorse that 
goal.  However, that is not the situation that Sony has, and likely
not what Intel has.  Your proposal will NOT solve our problem.

HTH,
--paulr

llvm dev - Apr 2021 - [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.