Nadav Rotem <nrotem at apple.com> writes:> On Nov 14, 2013, at 2:32 PM, Richard Sandiford > <rsandifo at linux.vnet.ibm.com> wrote: >> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: >>> Are you worried that adding it to PMB will increase compile time? >>> The pass exits very early for any target that doesn't opt-in to doing >>> scalarisation at the IR level, without even looking at the function. >> >> As an alternative, adding Scalarizer and InstCombine passes to >> SystemZPassConfig::addIRPasses() would probably give me most of the >> benefit without affecting the PMB. Scalarizer itself would then not >> test TargetTransformInfo at all, at least in the initial version, >> and the scalarisation would still logically be done by codegen. >> Would that be OK? > > I actually prefer that the Scalarizer would not touch TTI at all because > I view scalarization a canonicalization phase for DSLs, much like SROA > breaks structs.That's what Pekka is thinking of using it for, but it wasn't the reason I wrote it. The original motivation was llvmpipe, which is a rasteriser rather than a DSL compiler. The motivation wasn't to canonicalise, it was to do the same thing that codegen currently does, but in a better place from an optimisation perspective. You said in an earlier message: Other users of LLVM (such as OpenCL JITs) do scalarize early in the optimization pipeline because the problem-domain presents lots of vectors that needs to be legalized. But: (a) Scalarising and revectorising only makes sense if the vectorisation is done with the target in mind. If going from scalar code to vector code can depend on the target, why shouldn't the same be true in the other direction, for targets without vector support? (b) The situation you describe isn't the one that applies to llvmpipe. In llvmpipe the vectors are nice, known widths that are under the driver's own control. We certainly don't want to scalarise and revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX. The original code is already well vectorised for those targets. (And also for ARM NEON I expect.) In the llvmpipe case, codegen's type legaliser already makes a good decision about what to scalarise and what not to scalarise, without any help from llvmpipe. The problem I'm trying to solve is that codegen is too late to get the benefit of other IR optimisations. So in my case I do not want to _change_ the decision about which vectors get scalarised and how. I just want to do it earlier. It would be a shame if that meant that llvmpipe had to duplicate exactly the decisions that codegen makes wrt scalarisation, since codegen can easily make those decisions available through TargetTransformInfo. That's why I thought using TTI in the Scalarizer was a good thing in principle, at least as an option. SystemZ is a simple case because there is no vector support. But take MIPS (which is often a good example when it comes to complicated possibilities :-)). It has at least four separate vector extensions: - <2 x float> support from the MIPS V floating-point extensions, carried over to MIPS 32/64. - <8 x i8> and <4 x i16> support from the optional MDMX extension, now deprecated but used on older chips like the SB-1 and (in a modified form) the VR5400. - Processor-specific vector extensions for the Loongson range. - The new MSA ASE. That's a lot of possiblities. Maybe the LLVM port will never support Loongson and MDMX (almost certain for the latter), but the point is that even if it did support them, the current codegen interface would make the right decisions about which of the llvmpipe vectors should be scalarised and how. If Scalarizer is an all-or-nothing pass then it cannot make as good a decision for llvmpipe IR, where we don't expect to revectorise the result. Obviously the current pass is all-or-nothing anyway, but I tried to structure it so that it would be easy to make per-type decisions in the future, based on the TargetTransformInfo. I realise I'm not going to convince you, and I'm going to make the change anyway. I still think it's the wrong direction though. Thanks, Richard
Hi Richard, The discussion on llvmpipe is irrelevant. llvmpipe has its own pass manager and optimization pipe, it is not a C compiler. Nadav On Nov 15, 2013, at 3:26 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:> Nadav Rotem <nrotem at apple.com> writes: >> On Nov 14, 2013, at 2:32 PM, Richard Sandiford >> <rsandifo at linux.vnet.ibm.com> wrote: >>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: >>>> Are you worried that adding it to PMB will increase compile time? >>>> The pass exits very early for any target that doesn't opt-in to doing >>>> scalarisation at the IR level, without even looking at the function. >>> >>> As an alternative, adding Scalarizer and InstCombine passes to >>> SystemZPassConfig::addIRPasses() would probably give me most of the >>> benefit without affecting the PMB. Scalarizer itself would then not >>> test TargetTransformInfo at all, at least in the initial version, >>> and the scalarisation would still logically be done by codegen. >>> Would that be OK? >> >> I actually prefer that the Scalarizer would not touch TTI at all because >> I view scalarization a canonicalization phase for DSLs, much like SROA >> breaks structs. > > That's what Pekka is thinking of using it for, but it wasn't the reason > I wrote it. The original motivation was llvmpipe, which is a rasteriser > rather than a DSL compiler. The motivation wasn't to canonicalise, > it was to do the same thing that codegen currently does, but in a better > place from an optimisation perspective. > > You said in an earlier message: > > Other users of LLVM (such as OpenCL JITs) do scalarize early in the > optimization pipeline because the problem-domain presents lots of > vectors that needs to be legalized. > > But: > > (a) Scalarising and revectorising only makes sense if the vectorisation > is done with the target in mind. If going from scalar code to vector > code can depend on the target, why shouldn't the same be true in the > other direction, for targets without vector support? > > (b) The situation you describe isn't the one that applies to llvmpipe. > In llvmpipe the vectors are nice, known widths that are under the > driver's own control. We certainly don't want to scalarise and > revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX. > The original code is already well vectorised for those targets. > (And also for ARM NEON I expect.) > > In the llvmpipe case, codegen's type legaliser already makes a good > decision about what to scalarise and what not to scalarise, without > any help from llvmpipe. The problem I'm trying to solve is that > codegen is too late to get the benefit of other IR optimisations. > > So in my case I do not want to _change_ the decision about which > vectors get scalarised and how. I just want to do it earlier. > It would be a shame if that meant that llvmpipe had to duplicate > exactly the decisions that codegen makes wrt scalarisation, > since codegen can easily make those decisions available through > TargetTransformInfo. > > That's why I thought using TTI in the Scalarizer was a good thing > in principle, at least as an option. > > SystemZ is a simple case because there is no vector support. But take MIPS > (which is often a good example when it comes to complicated possibilities :-)). > It has at least four separate vector extensions: > > - <2 x float> support from the MIPS V floating-point extensions, > carried over to MIPS 32/64. > > - <8 x i8> and <4 x i16> support from the optional MDMX extension, > now deprecated but used on older chips like the SB-1 and (in a > modified form) the VR5400. > > - Processor-specific vector extensions for the Loongson range. > > - The new MSA ASE. > > That's a lot of possiblities. Maybe the LLVM port will never support > Loongson and MDMX (almost certain for the latter), but the point is that > even if it did support them, the current codegen interface would make the > right decisions about which of the llvmpipe vectors should be scalarised > and how. > > If Scalarizer is an all-or-nothing pass then it cannot make as good a > decision for llvmpipe IR, where we don't expect to revectorise the result. > Obviously the current pass is all-or-nothing anyway, but I tried to > structure it so that it would be easy to make per-type decisions in > the future, based on the TargetTransformInfo. > > I realise I'm not going to convince you, and I'm going to make the > change anyway. I still think it's the wrong direction though. > > Thanks, > Richard >
Nadav Rotem <nrotem at apple.com> writes:> The discussion on llvmpipe is irrelevant. llvmpipe has its own pass > manager and optimization pipe, it is not a C compiler.Note that this reply was about whether TargetTransformInfo should be used in Scalarizer, not whether Scalarizer should be in PMB. I was trying to explain why I thought that not testing TargetTransformInfo in Scalarizer would make the pass less useful for llvmpipe's optimisation pipe. Thanks, Richard> On Nov 15, 2013, at 3:26 AM, Richard Sandiford > <rsandifo at linux.vnet.ibm.com> wrote: > >> Nadav Rotem <nrotem at apple.com> writes: >>> On Nov 14, 2013, at 2:32 PM, Richard Sandiford >>> <rsandifo at linux.vnet.ibm.com> wrote: >>>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: >>>>> Are you worried that adding it to PMB will increase compile time? >>>>> The pass exits very early for any target that doesn't opt-in to doing >>>>> scalarisation at the IR level, without even looking at the function. >>>> >>>> As an alternative, adding Scalarizer and InstCombine passes to >>>> SystemZPassConfig::addIRPasses() would probably give me most of the >>>> benefit without affecting the PMB. Scalarizer itself would then not >>>> test TargetTransformInfo at all, at least in the initial version, >>>> and the scalarisation would still logically be done by codegen. >>>> Would that be OK? >>> >>> I actually prefer that the Scalarizer would not touch TTI at all because >>> I view scalarization a canonicalization phase for DSLs, much like SROA >>> breaks structs. >> >> That's what Pekka is thinking of using it for, but it wasn't the reason >> I wrote it. The original motivation was llvmpipe, which is a rasteriser >> rather than a DSL compiler. The motivation wasn't to canonicalise, >> it was to do the same thing that codegen currently does, but in a better >> place from an optimisation perspective. >> >> You said in an earlier message: >> >> Other users of LLVM (such as OpenCL JITs) do scalarize early in the >> optimization pipeline because the problem-domain presents lots of >> vectors that needs to be legalized. >> >> But: >> >> (a) Scalarising and revectorising only makes sense if the vectorisation >> is done with the target in mind. If going from scalar code to vector >> code can depend on the target, why shouldn't the same be true in the >> other direction, for targets without vector support? >> >> (b) The situation you describe isn't the one that applies to llvmpipe. >> In llvmpipe the vectors are nice, known widths that are under the >> driver's own control. We certainly don't want to scalarise and >> revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX. >> The original code is already well vectorised for those targets. >> (And also for ARM NEON I expect.) >> >> In the llvmpipe case, codegen's type legaliser already makes a good >> decision about what to scalarise and what not to scalarise, without >> any help from llvmpipe. The problem I'm trying to solve is that >> codegen is too late to get the benefit of other IR optimisations. >> >> So in my case I do not want to _change_ the decision about which >> vectors get scalarised and how. I just want to do it earlier. >> It would be a shame if that meant that llvmpipe had to duplicate >> exactly the decisions that codegen makes wrt scalarisation, >> since codegen can easily make those decisions available through >> TargetTransformInfo. >> >> That's why I thought using TTI in the Scalarizer was a good thing >> in principle, at least as an option. >> >> SystemZ is a simple case because there is no vector support. But take MIPS >> (which is often a good example when it comes to complicated possibilities :-)). >> It has at least four separate vector extensions: >> >> - <2 x float> support from the MIPS V floating-point extensions, >> carried over to MIPS 32/64. >> >> - <8 x i8> and <4 x i16> support from the optional MDMX extension, >> now deprecated but used on older chips like the SB-1 and (in a >> modified form) the VR5400. >> >> - Processor-specific vector extensions for the Loongson range. >> >> - The new MSA ASE. >> >> That's a lot of possiblities. Maybe the LLVM port will never support >> Loongson and MDMX (almost certain for the latter), but the point is that >> even if it did support them, the current codegen interface would make the >> right decisions about which of the llvmpipe vectors should be scalarised >> and how. >> >> If Scalarizer is an all-or-nothing pass then it cannot make as good a >> decision for llvmpipe IR, where we don't expect to revectorise the result. >> Obviously the current pass is all-or-nothing anyway, but I tried to >> structure it so that it would be easy to make per-type decisions in >> the future, based on the TargetTransformInfo. >> >> I realise I'm not going to convince you, and I'm going to make the >> change anyway. I still think it's the wrong direction though. >> >> Thanks, >> Richard >>