thr3ads.net - llvm dev - [LLVMdev] [PATCH] Add a Scalarize pass [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Nadav Rotem

2013-Nov-13 16:56 UTC

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, 

Thanks for working on this. We should probably move this discussion to llvm-dev
because it is not strictly related to the patch review anymore. The code below
is not representative of general c/c++ code. Usually only domain specific
language (such as OpenCL) contain vector instructions.  The LLVM pass manager
configuration (pass manager builder) is designed for C/C++ compilers, not for
DSLs.  People who use LLVM for other compilation flows (such as GPU compilers,
other languages) create their own optimization pipe. I am in favor of adding the
scalarizer pass so that people who build LLVM-based JITs and compilers could use
it.  However, I am against adding this pass by default to the pass manager
builder.  I understand that there are cases where scalarizing early in the
pipeline is better, but I don’t think that its worth the added complexity. Every
target has a different set of quirks and we try very hard to avoid adding
target-specific passes at IR-level. SelectionDAG is not going away soon, and the
SD replacement will also have a scalarizing pass - the overall architecture is
not going to change. There are always optimization phase ordering problems in
the compiler and at the end of the day we need to come up with an optimization
pipe and works for most programs that we care about. I still think that
scalarizing in SD is a reasonable solution for c/c++.

Thanks,
Nadav

On Nov 13, 2013, at 2:03 AM, Richard Sandiford <rsandifo at
linux.vnet.ibm.com> wrote:
> Nadav Rotem <nrotem at apple.com> writes:
>> I think that it is a good idea to have a scalarizer pass for people who
>> want to build llvm-based compilers, but I don’t think that this pass
>> should be a part of the default pass manager.  Targets that want to
>> scalarize the code should do it as part of instruction-selection (just
>> declare the types as illegal).  Why do you want to control
scalatization
>> from the target ?  IMHO scalarization is only useful in undoing domain
>> specific input IR.
> 
> The problem is that instruction selection is so late that the scalar
> operations don't get optimised very much.  The only pass that runs
after
> type legalisation and still understands the function at an operational
> level is DAGCombiner, which is only block-local.
> 
> Take for example something like:
> 
>  typedef unsigned int V4SI __attribute__ ((vector_size (16)));
>  void foo (V4SI *vec, unsigned int n, unsigned int x)
>  {
>    V4SI factor = { x, 2, 4, 8 };
>    for (unsigned i = 0; i < n; ++i)
>      vec[i] *= factor;
>  }
> 
> Without the Scalarizer pass, this multiplication remains a vector
> multiplication between variables until after type legalisation.
> It is then split into four scalar multiplications between variables,
> which we select as multiplications rather than shifts.  With the
> Scalarizer pass, we get one multiplication and three shifts.
> 
> You could argue that in this case, the target-specific CodeGen code
> should be prepared to rewrite multiplications as shifts as a result
> of later (CodeGen) constant propagation, but that isn't as easy for
> more complicated chains of operations.
> 
> This wasn't a motivation, but: I believe there's a long-term plan
> to move away from SelectionDAG-based instruction selection.  I was
> hoping that doing scalarisation at the IR level would help with that.
>

Richard Sandiford

2013-Nov-13 19:35 UTC

head link

[LLVMdev] [PATCH] Add a Scalarize pass

Nadav Rotem <nrotem at apple.com> writes:> Hi Richard, 
>
> Thanks for working on this. We should probably move this discussion to
> llvm-dev because it is not strictly related to the patch review
> anymore.
OK, I removed phabricator and llvm-commits.
> The code below is not representative of general c/c++
> code. Usually only domain specific language (such as OpenCL) contain
> vector instructions.  The LLVM pass manager configuration (pass manager
> builder) is designed for C/C++ compilers, not for DSLs.  People who use
> LLVM for other compilation flows (such as GPU compilers, other
> languages) create their own optimization pipe. I am in favor of adding
> the scalarizer pass so that people who build LLVM-based JITs and
> compilers could use it.  However, I am against adding this pass by
> default to the pass manager builder.  I understand that there are cases
> where scalarizing early in the pipeline is better, but I don’t think
> that its worth the added complexity. Every target has a different set of
> quirks and we try very hard to avoid adding target-specific passes at
> IR-level. SelectionDAG is not going away soon, and the SD replacement
> will also have a scalarizing pass - the overall architecture is not
> going to change. There are always optimization phase ordering problems
> in the compiler and at the end of the day we need to come up with an
> optimization pipe and works for most programs that we care about. I
> still think that scalarizing in SD is a reasonable solution for c/c++.
I don't understand the basis for the last statement though.  Do you mean
that you think most cases produce better code if scalarised at the SD stage
rather than at the IR level?  Could you give an example?

If the idea is to have a clean separation of concerns between the front end
and LLVM, then it seems like there are two obvious approaches:

(a) make it the front end's responsibility to only generate vector widths
    that the target can handle.  There should then be no need for vector
    type legalisation (as opposed to operation legalisation).

(b) make LLVM handle vectors of all widths, which is the current situation.

If we stick with (b) then I think LLVM should try to handle those vectors
as efficiently as possible.  The argument instead seems to be for:

(c) have code of last resort to handle vectors of all widths, but do not
    try to optimise the resulting scalar operations as much as code that
    was scalar to begin with.  If the front end is generating vector
    widths for which the target has no native support, and if the front end
    cares about the performance of that vector code, it should explicitly
    run the Scalarizer pass itself.

    AIUI, it would also be the front end's resposnsibility to identify
    which targets have support for which vector widths and which would
    instead use scalarisation.

That seems to be a less clean interface.  E.g. as things stand today,
llvmpipe is able to do everything it needs to do with generic IR.
Porting it to a new target is a very trivial change of a few lines[*].
This seems like a good endorsement of the current interface.  But the
interface would be less clean if llvmpipe (and any other front end
that cares) has to duplicate target knowledge that LLVM already has.

 [*] There are optimisations to use intrinsics for certain targets,
     but they aren't needed for correctness.  Some of them might not
     be needed at all with recent versions of LLVM.

The C example I gave was deliberately small and artificial to show the point.
But you can go quite a long way with the generic vector extensions to C and
C++, just like llvmpipe can use generic IR to do everything it needs to do.

I think your point is that we should never run the Scalarizer pass
for clang, so it shouldn't be added by the pass manager.  But regardless
of whether the example code is typical, it seems reasonable to expect
"foo * 4" to be implemented as a shift.  To have it implemented as a
multiplication even at -O3 seems like a deficiency.

Even if you think it's unusual for C and C++ to have pre-vectorised code,
I think it's even more unusual for all vector input code to be cold.
So if we have vector code as input, I think we should try to optimise
it as best we can, whether it comes from C, C++, or a domain-specific
front end.

As I said in the phabricator comments, the vectorisation passes convert
scalar code to vector code based on target-specific knowledge.  I don't
see why it's a bad thing to also convert vector code to scalar code
based on target-specific knowledge.

Thanks,
Richard

Nadav Rotem

2013-Nov-14 17:16 UTC

head link

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, 

Thanks for working on this. Comments below. 
> I don't understand the basis for the last statement though.  Do you
mean
> that you think most cases produce better code if scalarised at the SD stage
> rather than at the IR level?  Could you give an example?
You presented an example that shows that scalarizing vectors allow further
optimizations.  But I don’t think that this example represents the kind of
problems that we run into in general C++ code.  We currently consider vector
legalization a codegen problem.  LLVM is designed this way to handle certain
kind of programs. Other users of LLVM (such as OpenCL JITs) do scalarize early
in the optimization pipeline because the problem-domain presents lots of vectors
that needs to be legalized.  I am very supportive of adding the new
scalarization pass, but I don’t want you to add it to the PassManagerBuilder
because the PMB is designed for static C compilers, that don’t have this
problem.  Are you interested in improving code generation for c++ programs or
for programs from another domain ?

Thanks,
Nadav

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131114/2a74b959/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Nov 2013 - [LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

Possibly Parallel Threads