thr3ads.net - llvm dev - [LLVMdev] [PATCH] Add a Scalarize pass [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Richard Sandiford

2013-Nov-15 11:26 UTC

[LLVMdev] [PATCH] Add a Scalarize pass

Nadav Rotem <nrotem at apple.com> writes:> On Nov 14, 2013, at 2:32 PM, Richard Sandiford
> <rsandifo at linux.vnet.ibm.com> wrote:
>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes:
>>> Are you worried that adding it to PMB will increase compile time?
>>> The pass exits very early for any target that doesn't opt-in to
doing
>>> scalarisation at the IR level, without even looking at the
function.
>> 
>> As an alternative, adding Scalarizer and InstCombine passes to
>> SystemZPassConfig::addIRPasses() would probably give me most of the
>> benefit without affecting the PMB.  Scalarizer itself would then not
>> test TargetTransformInfo at all, at least in the initial version,
>> and the scalarisation would still logically be done by codegen.
>> Would that be OK?
>
> I actually prefer that the Scalarizer would not touch TTI at all because
> I view scalarization a canonicalization phase for DSLs, much like SROA
> breaks structs.
That's what Pekka is thinking of using it for, but it wasn't the reason
I wrote it.  The original motivation was llvmpipe, which is a rasteriser
rather than a DSL compiler.  The motivation wasn't to canonicalise,
it was to do the same thing that codegen currently does, but in a better
place from an optimisation perspective.

You said in an earlier message:

  Other users of LLVM (such as OpenCL JITs) do scalarize early in the
  optimization pipeline because the problem-domain presents lots of
  vectors that needs to be legalized.

But:

(a) Scalarising and revectorising only makes sense if the vectorisation
    is done with the target in mind.  If going from scalar code to vector
    code can depend on the target, why shouldn't the same be true in the
    other direction, for targets without vector support?

(b) The situation you describe isn't the one that applies to llvmpipe.
    In llvmpipe the vectors are nice, known widths that are under the
    driver's own control.  We certainly don't want to scalarise and
    revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
    The original code is already well vectorised for those targets.
    (And also for ARM NEON I expect.)

    In the llvmpipe case, codegen's type legaliser already makes a good
    decision about what to scalarise and what not to scalarise, without
    any help from llvmpipe.  The problem I'm trying to solve is that
    codegen is too late to get the benefit of other IR optimisations.

    So in my case I do not want to _change_ the decision about which
    vectors get scalarised and how.  I just want to do it earlier.
    It would be a shame if that meant that llvmpipe had to duplicate
    exactly the decisions that codegen makes wrt scalarisation,
    since codegen can easily make those decisions available through
    TargetTransformInfo.

That's why I thought using TTI in the Scalarizer was a good thing
in principle, at least as an option.

SystemZ is a simple case because there is no vector support.  But take MIPS
(which is often a good example when it comes to complicated possibilities :-)).
It has at least four separate vector extensions:

  - <2 x float> support from the MIPS V floating-point extensions,
    carried over to MIPS 32/64.

  - <8 x i8> and <4 x i16> support from the optional MDMX extension,
    now deprecated but used on older chips like the SB-1 and (in a
    modified form) the VR5400.

  - Processor-specific vector extensions for the Loongson range.

  - The new MSA ASE.

That's a lot of possiblities.  Maybe the LLVM port will never support
Loongson and MDMX (almost certain for the latter), but the point is that
even if it did support them, the current codegen interface would make the
right decisions about which of the llvmpipe vectors should be scalarised
and how.

If Scalarizer is an all-or-nothing pass then it cannot make as good a
decision for llvmpipe IR, where we don't expect to revectorise the result.
Obviously the current pass is all-or-nothing anyway, but I tried to
structure it so that it would be easy to make per-type decisions in
the future, based on the TargetTransformInfo.

I realise I'm not going to convince you, and I'm going to make the
change anyway.  I still think it's the wrong direction though.

Thanks,
Richard

Nadav Rotem

2013-Nov-15 17:13 UTC

head link

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, 

The discussion on llvmpipe is irrelevant.  llvmpipe has its own pass manager and
optimization pipe, it is not a C compiler.

Nadav 


On Nov 15, 2013, at 3:26 AM, Richard Sandiford <rsandifo at
linux.vnet.ibm.com> wrote:
> Nadav Rotem <nrotem at apple.com> writes:
>> On Nov 14, 2013, at 2:32 PM, Richard Sandiford
>> <rsandifo at linux.vnet.ibm.com> wrote:
>>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes:
>>>> Are you worried that adding it to PMB will increase compile
time?
>>>> The pass exits very early for any target that doesn't
opt-in to doing
>>>> scalarisation at the IR level, without even looking at the
function.
>>> 
>>> As an alternative, adding Scalarizer and InstCombine passes to
>>> SystemZPassConfig::addIRPasses() would probably give me most of the
>>> benefit without affecting the PMB.  Scalarizer itself would then
not
>>> test TargetTransformInfo at all, at least in the initial version,
>>> and the scalarisation would still logically be done by codegen.
>>> Would that be OK?
>> 
>> I actually prefer that the Scalarizer would not touch TTI at all
because
>> I view scalarization a canonicalization phase for DSLs, much like SROA
>> breaks structs.
> 
> That's what Pekka is thinking of using it for, but it wasn't the
reason
> I wrote it.  The original motivation was llvmpipe, which is a rasteriser
> rather than a DSL compiler.  The motivation wasn't to canonicalise,
> it was to do the same thing that codegen currently does, but in a better
> place from an optimisation perspective.
> 
> You said in an earlier message:
> 
>  Other users of LLVM (such as OpenCL JITs) do scalarize early in the
>  optimization pipeline because the problem-domain presents lots of
>  vectors that needs to be legalized.
> 
> But:
> 
> (a) Scalarising and revectorising only makes sense if the vectorisation
>    is done with the target in mind.  If going from scalar code to vector
>    code can depend on the target, why shouldn't the same be true in the
>    other direction, for targets without vector support?
> 
> (b) The situation you describe isn't the one that applies to llvmpipe.
>    In llvmpipe the vectors are nice, known widths that are under the
>    driver's own control.  We certainly don't want to scalarise and
>    revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
>    The original code is already well vectorised for those targets.
>    (And also for ARM NEON I expect.)
> 
>    In the llvmpipe case, codegen's type legaliser already makes a good
>    decision about what to scalarise and what not to scalarise, without
>    any help from llvmpipe.  The problem I'm trying to solve is that
>    codegen is too late to get the benefit of other IR optimisations.
> 
>    So in my case I do not want to _change_ the decision about which
>    vectors get scalarised and how.  I just want to do it earlier.
>    It would be a shame if that meant that llvmpipe had to duplicate
>    exactly the decisions that codegen makes wrt scalarisation,
>    since codegen can easily make those decisions available through
>    TargetTransformInfo.
> 
> That's why I thought using TTI in the Scalarizer was a good thing
> in principle, at least as an option.
> 
> SystemZ is a simple case because there is no vector support.  But take MIPS
> (which is often a good example when it comes to complicated possibilities
:-)).
> It has at least four separate vector extensions:
> 
>  - <2 x float> support from the MIPS V floating-point extensions,
>    carried over to MIPS 32/64.
> 
>  - <8 x i8> and <4 x i16> support from the optional MDMX
extension,
>    now deprecated but used on older chips like the SB-1 and (in a
>    modified form) the VR5400.
> 
>  - Processor-specific vector extensions for the Loongson range.
> 
>  - The new MSA ASE.
> 
> That's a lot of possiblities.  Maybe the LLVM port will never support
> Loongson and MDMX (almost certain for the latter), but the point is that
> even if it did support them, the current codegen interface would make the
> right decisions about which of the llvmpipe vectors should be scalarised
> and how.
> 
> If Scalarizer is an all-or-nothing pass then it cannot make as good a
> decision for llvmpipe IR, where we don't expect to revectorise the
result.
> Obviously the current pass is all-or-nothing anyway, but I tried to
> structure it so that it would be easy to make per-type decisions in
> the future, based on the TargetTransformInfo.
> 
> I realise I'm not going to convince you, and I'm going to make the
> change anyway.  I still think it's the wrong direction though.
> 
> Thanks,
> Richard
>

Richard Sandiford

2013-Nov-15 17:18 UTC

head link

[LLVMdev] [PATCH] Add a Scalarize pass

Nadav Rotem <nrotem at apple.com> writes:> The discussion on llvmpipe is irrelevant.  llvmpipe has its own pass
> manager and optimization pipe, it is not a C compiler.
Note that this reply was about whether TargetTransformInfo should be
used in Scalarizer, not whether Scalarizer should be in PMB.  I was
trying to explain why I thought that not testing TargetTransformInfo in
Scalarizer would make the pass less useful for llvmpipe's optimisation pipe.

Thanks,
Richard
> On Nov 15, 2013, at 3:26 AM, Richard Sandiford
> <rsandifo at linux.vnet.ibm.com> wrote:
>
>> Nadav Rotem <nrotem at apple.com> writes:
>>> On Nov 14, 2013, at 2:32 PM, Richard Sandiford
>>> <rsandifo at linux.vnet.ibm.com> wrote:
>>>> Richard Sandiford <rsandifo at linux.vnet.ibm.com>
writes:
>>>>> Are you worried that adding it to PMB will increase compile
time?
>>>>> The pass exits very early for any target that doesn't
opt-in to doing
>>>>> scalarisation at the IR level, without even looking at the
function.
>>>> 
>>>> As an alternative, adding Scalarizer and InstCombine passes to
>>>> SystemZPassConfig::addIRPasses() would probably give me most of
the
>>>> benefit without affecting the PMB.  Scalarizer itself would
then not
>>>> test TargetTransformInfo at all, at least in the initial
version,
>>>> and the scalarisation would still logically be done by codegen.
>>>> Would that be OK?
>>> 
>>> I actually prefer that the Scalarizer would not touch TTI at all
because
>>> I view scalarization a canonicalization phase for DSLs, much like
SROA
>>> breaks structs.
>> 
>> That's what Pekka is thinking of using it for, but it wasn't
the reason
>> I wrote it.  The original motivation was llvmpipe, which is a
rasteriser
>> rather than a DSL compiler.  The motivation wasn't to canonicalise,
>> it was to do the same thing that codegen currently does, but in a
better
>> place from an optimisation perspective.
>> 
>> You said in an earlier message:
>> 
>>  Other users of LLVM (such as OpenCL JITs) do scalarize early in the
>>  optimization pipeline because the problem-domain presents lots of
>>  vectors that needs to be legalized.
>> 
>> But:
>> 
>> (a) Scalarising and revectorising only makes sense if the vectorisation
>>    is done with the target in mind.  If going from scalar code to
vector
>>    code can depend on the target, why shouldn't the same be true in
the
>>    other direction, for targets without vector support?
>> 
>> (b) The situation you describe isn't the one that applies to
llvmpipe.
>>    In llvmpipe the vectors are nice, known widths that are under the
>>    driver's own control.  We certainly don't want to scalarise
and
>>    revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
>>    The original code is already well vectorised for those targets.
>>    (And also for ARM NEON I expect.)
>> 
>>    In the llvmpipe case, codegen's type legaliser already makes a
good
>>    decision about what to scalarise and what not to scalarise, without
>>    any help from llvmpipe.  The problem I'm trying to solve is that
>>    codegen is too late to get the benefit of other IR optimisations.
>> 
>>    So in my case I do not want to _change_ the decision about which
>>    vectors get scalarised and how.  I just want to do it earlier.
>>    It would be a shame if that meant that llvmpipe had to duplicate
>>    exactly the decisions that codegen makes wrt scalarisation,
>>    since codegen can easily make those decisions available through
>>    TargetTransformInfo.
>> 
>> That's why I thought using TTI in the Scalarizer was a good thing
>> in principle, at least as an option.
>> 
>> SystemZ is a simple case because there is no vector support.  But take
MIPS
>> (which is often a good example when it comes to complicated
possibilities :-)).
>> It has at least four separate vector extensions:
>> 
>>  - <2 x float> support from the MIPS V floating-point extensions,
>>    carried over to MIPS 32/64.
>> 
>>  - <8 x i8> and <4 x i16> support from the optional MDMX
extension,
>>    now deprecated but used on older chips like the SB-1 and (in a
>>    modified form) the VR5400.
>> 
>>  - Processor-specific vector extensions for the Loongson range.
>> 
>>  - The new MSA ASE.
>> 
>> That's a lot of possiblities.  Maybe the LLVM port will never
support
>> Loongson and MDMX (almost certain for the latter), but the point is
that
>> even if it did support them, the current codegen interface would make
the
>> right decisions about which of the llvmpipe vectors should be
scalarised
>> and how.
>> 
>> If Scalarizer is an all-or-nothing pass then it cannot make as good a
>> decision for llvmpipe IR, where we don't expect to revectorise the
result.
>> Obviously the current pass is all-or-nothing anyway, but I tried to
>> structure it so that it would be easy to make per-type decisions in
>> the future, based on the TargetTransformInfo.
>> 
>> I realise I'm not going to convince you, and I'm going to make
the
>> change anyway.  I still think it's the wrong direction though.
>> 
>> Thanks,
>> Richard
>>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2013 - [LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

[LLVMdev] [PATCH] Add a Scalarize pass

Possibly Parallel Threads