Hi Richard, Thanks for working on this. We should probably move this discussion to llvm-dev because it is not strictly related to the patch review anymore. The code below is not representative of general c/c++ code. Usually only domain specific language (such as OpenCL) contain vector instructions. The LLVM pass manager configuration (pass manager builder) is designed for C/C++ compilers, not for DSLs. People who use LLVM for other compilation flows (such as GPU compilers, other languages) create their own optimization pipe. I am in favor of adding the scalarizer pass so that people who build LLVM-based JITs and compilers could use it. However, I am against adding this pass by default to the pass manager builder. I understand that there are cases where scalarizing early in the pipeline is better, but I don’t think that its worth the added complexity. Every target has a different set of quirks and we try very hard to avoid adding target-specific passes at IR-level. SelectionDAG is not going away soon, and the SD replacement will also have a scalarizing pass - the overall architecture is not going to change. There are always optimization phase ordering problems in the compiler and at the end of the day we need to come up with an optimization pipe and works for most programs that we care about. I still think that scalarizing in SD is a reasonable solution for c/c++. Thanks, Nadav On Nov 13, 2013, at 2:03 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:> Nadav Rotem <nrotem at apple.com> writes: >> I think that it is a good idea to have a scalarizer pass for people who >> want to build llvm-based compilers, but I don’t think that this pass >> should be a part of the default pass manager. Targets that want to >> scalarize the code should do it as part of instruction-selection (just >> declare the types as illegal). Why do you want to control scalatization >> from the target ? IMHO scalarization is only useful in undoing domain >> specific input IR. > > The problem is that instruction selection is so late that the scalar > operations don't get optimised very much. The only pass that runs after > type legalisation and still understands the function at an operational > level is DAGCombiner, which is only block-local. > > Take for example something like: > > typedef unsigned int V4SI __attribute__ ((vector_size (16))); > void foo (V4SI *vec, unsigned int n, unsigned int x) > { > V4SI factor = { x, 2, 4, 8 }; > for (unsigned i = 0; i < n; ++i) > vec[i] *= factor; > } > > Without the Scalarizer pass, this multiplication remains a vector > multiplication between variables until after type legalisation. > It is then split into four scalar multiplications between variables, > which we select as multiplications rather than shifts. With the > Scalarizer pass, we get one multiplication and three shifts. > > You could argue that in this case, the target-specific CodeGen code > should be prepared to rewrite multiplications as shifts as a result > of later (CodeGen) constant propagation, but that isn't as easy for > more complicated chains of operations. > > This wasn't a motivation, but: I believe there's a long-term plan > to move away from SelectionDAG-based instruction selection. I was > hoping that doing scalarisation at the IR level would help with that. >
Nadav Rotem <nrotem at apple.com> writes:> Hi Richard, > > Thanks for working on this. We should probably move this discussion to > llvm-dev because it is not strictly related to the patch review > anymore.OK, I removed phabricator and llvm-commits.> The code below is not representative of general c/c++ > code. Usually only domain specific language (such as OpenCL) contain > vector instructions. The LLVM pass manager configuration (pass manager > builder) is designed for C/C++ compilers, not for DSLs. People who use > LLVM for other compilation flows (such as GPU compilers, other > languages) create their own optimization pipe. I am in favor of adding > the scalarizer pass so that people who build LLVM-based JITs and > compilers could use it. However, I am against adding this pass by > default to the pass manager builder. I understand that there are cases > where scalarizing early in the pipeline is better, but I don’t think > that its worth the added complexity. Every target has a different set of > quirks and we try very hard to avoid adding target-specific passes at > IR-level. SelectionDAG is not going away soon, and the SD replacement > will also have a scalarizing pass - the overall architecture is not > going to change. There are always optimization phase ordering problems > in the compiler and at the end of the day we need to come up with an > optimization pipe and works for most programs that we care about. I > still think that scalarizing in SD is a reasonable solution for c/c++.I don't understand the basis for the last statement though. Do you mean that you think most cases produce better code if scalarised at the SD stage rather than at the IR level? Could you give an example? If the idea is to have a clean separation of concerns between the front end and LLVM, then it seems like there are two obvious approaches: (a) make it the front end's responsibility to only generate vector widths that the target can handle. There should then be no need for vector type legalisation (as opposed to operation legalisation). (b) make LLVM handle vectors of all widths, which is the current situation. If we stick with (b) then I think LLVM should try to handle those vectors as efficiently as possible. The argument instead seems to be for: (c) have code of last resort to handle vectors of all widths, but do not try to optimise the resulting scalar operations as much as code that was scalar to begin with. If the front end is generating vector widths for which the target has no native support, and if the front end cares about the performance of that vector code, it should explicitly run the Scalarizer pass itself. AIUI, it would also be the front end's resposnsibility to identify which targets have support for which vector widths and which would instead use scalarisation. That seems to be a less clean interface. E.g. as things stand today, llvmpipe is able to do everything it needs to do with generic IR. Porting it to a new target is a very trivial change of a few lines[*]. This seems like a good endorsement of the current interface. But the interface would be less clean if llvmpipe (and any other front end that cares) has to duplicate target knowledge that LLVM already has. [*] There are optimisations to use intrinsics for certain targets, but they aren't needed for correctness. Some of them might not be needed at all with recent versions of LLVM. The C example I gave was deliberately small and artificial to show the point. But you can go quite a long way with the generic vector extensions to C and C++, just like llvmpipe can use generic IR to do everything it needs to do. I think your point is that we should never run the Scalarizer pass for clang, so it shouldn't be added by the pass manager. But regardless of whether the example code is typical, it seems reasonable to expect "foo * 4" to be implemented as a shift. To have it implemented as a multiplication even at -O3 seems like a deficiency. Even if you think it's unusual for C and C++ to have pre-vectorised code, I think it's even more unusual for all vector input code to be cold. So if we have vector code as input, I think we should try to optimise it as best we can, whether it comes from C, C++, or a domain-specific front end. As I said in the phabricator comments, the vectorisation passes convert scalar code to vector code based on target-specific knowledge. I don't see why it's a bad thing to also convert vector code to scalar code based on target-specific knowledge. Thanks, Richard
Hi Richard, Thanks for working on this. Comments below.> I don't understand the basis for the last statement though. Do you mean > that you think most cases produce better code if scalarised at the SD stage > rather than at the IR level? Could you give an example?You presented an example that shows that scalarizing vectors allow further optimizations. But I don’t think that this example represents the kind of problems that we run into in general C++ code. We currently consider vector legalization a codegen problem. LLVM is designed this way to handle certain kind of programs. Other users of LLVM (such as OpenCL JITs) do scalarize early in the optimization pipeline because the problem-domain presents lots of vectors that needs to be legalized. I am very supportive of adding the new scalarization pass, but I don’t want you to add it to the PassManagerBuilder because the PMB is designed for static C compilers, that don’t have this problem. Are you interested in improving code generation for c++ programs or for programs from another domain ? Thanks, Nadav -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131114/2a74b959/attachment.html>