thr3ads.net - llvm dev - [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control. [Apr 2021]

If this information is useful, please help other people find it:
Share via:

via llvm-dev

2021-Apr-19 20:42 UTC

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

We might still not be fully understanding one another, because this:
so that you can compile code with under-aligned objects, and have it work as the
author expected it to
sounds like you’re expecting us to recompile the client code that creates the
under-aligned objects.  That is literally not possible.  If you do understand
that part, great, it’s just not obvious to me from how you’re phrasing things.

I (still) don’t know what Intel is facing.  For Sony’s problem, we would be much
more likely to try to do something specific to the APIs that are being abused,
rather than something draconian like eliminating alignment requirements for
everyone.  But of course we have a solution that works for us, so there’s that
much more inertia to overcome.
--paulr

From: James Y Knight <jyknight at google.com>
Sent: Monday, April 19, 2021 2:30 PM
To: Robinson, Paul <paul.robinson at sony.com>
Cc: Luo, Yuanke <yuanke.luo at intel.com>; Roman Lebedev <lebedev.ri at
gmail.com>; Liu, Chen3 <chen3.liu at intel.com>; llvm-dev <llvm-dev
at lists.llvm.org>; Maslov, Sergey V <sergey.v.maslov at intel.com>;
daniel.towner at intel.com
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.


I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed.  I fully endorse that
goal.  However, that is not the situation that Sony has, and likely
not what Intel has.  Your proposal will NOT solve our problem.

No, that's not it at all! I'm afraid you've totally misunderstood my
concern.

My goal is that if we add a compiler feature to address this problem -- so that
you can compile code with under-aligned objects, and have it work as the author
expected it to --  that the feature reliably addresses the problem, and makes
such code no longer exhibit Undefined Behavior. The proposed backend change does
not accomplish that, but we can implement a feature which will.

As Reid said, -fmax-type-align=N appears to be almost that feature, and
something like this little patch (along with documentation update) may be all
that's needed (but this is totally untested).

diff --git clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/CodeGen/CodeGenModule.cpp
index b23d995683bf..3aef166a690e 100644
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -6280,8 +6280,7 @@ CharUnits CodeGenModule::getNaturalTypeAlignment(QualType
T,
   // Cap to the global maximum type alignment unless the alignment
   // was somehow explicit on the type.
   if (unsigned MaxAlign = getLangOpts().MaxTypeAlign) {
-    if (Alignment.getQuantity() > MaxAlign &&
-        !getContext().isAlignmentRequired(T))
+    if (Alignment.getQuantity() > MaxAlign)
       Alignment = CharUnits::fromQuantity(MaxAlign);
   }
   return Alignment;

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210419/8f702fe1/attachment.html>

Luo, Yuanke via llvm-dev

2021-Apr-20 02:30 UTC

head link

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

I collected the feedback/requirement from Intel customer as below.

Our software runs in an embedded environment and is processing buffers which are
unaligned. Sometimes this misalignment is simply because the buffer allocation
is beyond the immediate control of our software but  it can also be because we
are processing blocks of data which are not multiples of the vector size (e.g.,
6, 12 or 24). We can’t just fix our buffers to make them aligned. Our code is
complicated and we support multiple instruction sets operating using the same
algorithms by using templated code. For example:

template<typename DVEC_TYPE>
void doSomething(DVEC_TYPE* data)
{
  // Trivial example – reality would be something much more substantial,
possibly with loops or other function calls.
  *data += 1.0f;
}

Note that we use dvec to help us abstract the ISA, but other similar header-only
vector overloading libraries also exist.

We would then instantiate our function above multiple times for each ISA or data
type we care about:

template void doSomething<float>(float* data); // Scalar type useful for
debugging algorithm and doing basic testing
template void doSomething<F32vec8>(F32vec8* data); // Different AVX widths
template void doSomething<F32vec16>(F32vec16* data);
template void doSomething<I32vec16>(I32vec16* data); // Different element
type

The functions are sufficiently large that we don’t want to have to write a
different version for each ISA. We know that the incoming data may be
mis-aligned and that accessing it directly is UB, so we could modify our code to
explicitly handle misalignment. Something like:

template<typename DVEC_TYPE>
void doSomething(DVEC_TYPE* data)
{
  DVEC_TYPE t;
  loadu(t, data);
  t += 1.0f;
  storeu(data, t);
}

The code has become more verbose, less readable (maintainable, debuggable, etc),
and it no longer works with plain scalar types which don’t have loadu/storeu
defined unless we start defining overloaded helper functions. Also, if `data’
pointed at an array, we’d have to throw some pointer arithmetic into the mix,
rather than just using plain `data[IDX]’ syntax. We can certainly write code
which could cope with the misalignment explicitly but it just ends up becoming
messy. Or, we could leverage the hardware to manage this misalignment for us
letting the compiler emit the movups instruction, instead of movaps.

Until now we have only been using the Intel Compiler, so we have written our
code to use ICC’s unaligned operations and hardware support to make our code
cleaner. We are looking at porting our code to LLVM, but LLVM is making this
harder than it needs to be.

Thanks
Yuanke

From: paul.robinson at sony.com <paul.robinson at sony.com>
Sent: Tuesday, April 20, 2021 4:42 AM
To: jyknight at google.com
Cc: Luo, Yuanke <yuanke.luo at intel.com>; lebedev.ri at gmail.com; Liu,
Chen3 <chen3.liu at intel.com>; llvm-dev at lists.llvm.org; Maslov, Sergey
V <sergey.v.maslov at intel.com>; Towner, Daniel <daniel.towner at
intel.com>
Subject: RE: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.

We might still not be fully understanding one another, because this:
so that you can compile code with under-aligned objects, and have it work as the
author expected it to
sounds like you’re expecting us to recompile the client code that creates the
under-aligned objects.  That is literally not possible.  If you do understand
that part, great, it’s just not obvious to me from how you’re phrasing things.

I (still) don’t know what Intel is facing.  For Sony’s problem, we would be much
more likely to try to do something specific to the APIs that are being abused,
rather than something draconian like eliminating alignment requirements for
everyone.  But of course we have a solution that works for us, so there’s that
much more inertia to overcome.
--paulr

From: James Y Knight <jyknight at google.com<mailto:jyknight at
google.com>>
Sent: Monday, April 19, 2021 2:30 PM
To: Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at
sony.com>>
Cc: Luo, Yuanke <yuanke.luo at intel.com<mailto:yuanke.luo at
intel.com>>; Roman Lebedev <lebedev.ri at
gmail.com<mailto:lebedev.ri at gmail.com>>; Liu, Chen3 <chen3.liu at
intel.com<mailto:chen3.liu at intel.com>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; Maslov, Sergey V
<sergey.v.maslov at intel.com<mailto:sergey.v.maslov at intel.com>>;
daniel.towner at intel.com<mailto:daniel.towner at intel.com>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine
with option control.


I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed.  I fully endorse that
goal.  However, that is not the situation that Sony has, and likely
not what Intel has.  Your proposal will NOT solve our problem.

No, that's not it at all! I'm afraid you've totally misunderstood my
concern.

My goal is that if we add a compiler feature to address this problem -- so that
you can compile code with under-aligned objects, and have it work as the author
expected it to --  that the feature reliably addresses the problem, and makes
such code no longer exhibit Undefined Behavior. The proposed backend change does
not accomplish that, but we can implement a feature which will.

As Reid said, -fmax-type-align=N appears to be almost that feature, and
something like this little patch (along with documentation update) may be all
that's needed (but this is totally untested).

diff --git clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/CodeGen/CodeGenModule.cpp
index b23d995683bf..3aef166a690e 100644
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -6280,8 +6280,7 @@ CharUnits CodeGenModule::getNaturalTypeAlignment(QualType
T,
   // Cap to the global maximum type alignment unless the alignment
   // was somehow explicit on the type.
   if (unsigned MaxAlign = getLangOpts().MaxTypeAlign) {
-    if (Alignment.getQuantity() > MaxAlign &&
-        !getContext().isAlignmentRequired(T))
+    if (Alignment.getQuantity() > MaxAlign)
       Alignment = CharUnits::fromQuantity(MaxAlign);
   }
   return Alignment;

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210420/871f63bb/attachment.html>

llvm dev - Apr 2021 - [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.