On Wed, Apr 24, 2013 at 5:26 PM, Chris Lattner <clattner at apple.com> wrote:> On Apr 24, 2013, at 5:01 PM, Dan Gohman <dan433584 at gmail.com> wrote: > > In the spirit of the (long-term) intent to migrate away from the > SelectionDAG framework, it is desirable to implement legalization passes as > discrete passes. Attached is a patch which implements the beginning of a > new type legalization pass, to help motivate discussion. > > This is a great discussion to have. > > > Is LLVM IR the right level for this? > > IMO, no, definitely not. > > > The main alternative approach that's been discussed is to do FastISel to > a target-independent opcode set on MachineInstrs, and then do legalization > and ultimately the last phase off instruction selection proper after that. > The most obvious advantage of using LLVM IR for legalization is that it's > (currently) more developer-friendly. The most obvious advantage of using > MachineInstrs is that they would make it easier to do low-level > manipulations. Also, doing legalization on MachineInstrs would mean > avoiding having LLVM-IR-level optimization passes which lower the IR, which > has historically been a design goal of LLVM. > > I think that you (in the rest of your email) identify a number of specific > problems with using LLVM IR for legalization. These are a lot of specific > issues caused by the fact that LLVM IR is intentionally not trying to model > machine issues. I'm sure you *could* try to make this work by introducing > a bunch of new intrinsics into LLVM IR which would model the union of the > selection dag ISD nodes along with the target specific X86ISD nodes. > However, at this point, you have only modeled the operations and haven't > modeled the proper type system. >I don't wish to argue about this, and am fine following your suggestion. However, I would like to understand your reasons better. I don't think the type system is really the issue. The only thing SelectionDAG's type system has which LLVM IR's lacks which is useful here is "untyped", and that's a special-purpose thing that we can probably handle in other ways. You and others are right that there could be a fair number of new intrinsics, especially considering all the X86ISD ones and all the rest. Is this a significant concern for you? Targets already have large numbers of target-specific intrinsics; would adding a relatively moderate number of new intrinsics really be a problem? There's also the problem of keeping callers and callees consistent, and it's indeed quite a dickens, but it need not be a show-stopper. LLVM IR is just not the right level for this. You seem to think it is> better than MachineInstrs because of developer friendliness, but it isn't > clear to me that LLVM IR with the additions you're talking about would > actually be friendly anymore :-) >As I see it, people working in codegen are going to have to deal with lots of codegeny instructions regardless of whether we call them instructions or intrinsics. Is it really better one way or the other?> Personally, I think that the right representation for legalization is > MachineInstrs supplemented with a type system that allows MVTs as well as > register classes. If you are seriously interested in pushing forward on > this, we should probably discuss it in person, or over beer at the next > social or something. >Ok. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130425/e5937fa3/attachment.html>
.....> > > I don't wish to argue about this, and am fine following your suggestion. > However, I would like to understand your reasons better. >What would be the plan as far as incrementally achieving this alternate implementation? Why has " avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM"?> I don't think the type system is really the issue. The only thing > SelectionDAG's type system has which LLVM IR's lacks which is useful > here is "untyped", and that's a special-purpose thing that we can > probably handle in other ways. > > You and others are right that there could be a fair number of new > intrinsics, especially considering all the X86ISD ones and all the rest. > Is this a significant concern for you? Targets already have large > numbers of target-specific intrinsics; would adding a relatively > moderate number of new intrinsics really be a problem? > > There's also the problem of keeping callers and callees consistent, and > it's indeed quite a dickens, but it need not be a show-stopper. > > LLVM IR is just not the right level for this. You seem to think it > is better than MachineInstrs because of developer friendliness, but > it isn't clear to me that LLVM IR with the additions you're talking > about would actually be friendly anymore :-) > > > As I see it, people working in codegen are going to have to deal with > lots of codegeny instructions regardless of whether we call them > instructions or intrinsics. Is it really better one way or the other? > > Personally, I think that the right representation for legalization > is MachineInstrs supplemented with a type system that allows MVTs as > well as register classes. If you are seriously interested in > pushing forward on this, we should probably discuss it in person, or > over beer at the next social or something. > > > Ok. > Dan > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
James Courtier-Dutton
2013-Apr-26 07:50 UTC
[LLVMdev] Proposal for new Legalization framework
I don't know if it helps at all, but another option might be some sort of target CPU modelling. I know tablegen does a lot of this, but a decompiler called boomerang (http://boomerang.sourceforge.net/) does have some interesting infrastructure in the area of CPU modelling. It uses a CPU modelling language that is used to automatically generate source code that assembles/disassembles instructions. It might not be suitable, but I think it would be good to have a look at boomerang to see if its CPU modelling can be used in LLVM.
On Apr 25, 2013, at 2:00 PM, Reed Kotler <rkotler at mips.com> wrote:> ..... >> >> >> I don't wish to argue about this, and am fine following your suggestion. >> However, I would like to understand your reasons better. >> > > What would be the plan as far as incrementally achieving this alternate implementation? > > Why has " avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM"?We obviously need an incremental migration plan. That said, personally, I would prefer to figure out what the right destination is, before we start trying to discuss how to get there. I don't think that any other approach makes much sense. -Chris
Dan, and anyone else interested. I am not sure if this has been discussed before, but I do have a case when the following logic fails to work: lib/Analysis/ConstantFolding.cpp static Constant *ConstantFoldBinaryFP(double (*NativeFP)(double, double), double V, double W, Type *Ty) { sys::llvm_fenv_clearexcept(); V = NativeFP(V, W); if (sys::llvm_fenv_testexcept()) { sys::llvm_fenv_clearexcept(); return 0; } .. This fragment seems to assumes that host and target behave in exact the same way in regard to FP exception handling. In some way I understand it, but. On some cross compilation platforms this might not be always true. In case of Hexagon for example our FP math handling is apparently more precise then "stock" one on x86 host. Specific (but not the best) example would be computing sqrtf(1.000001). Result is 1 + FE_INEXACT set. My current linux x86 host fails the inexact part. resulting in wrong code emitted. Once again, my question is not about this specific example, but rather about the assumption of identical behavior of completely different systems. What if my target's "objective" is to exceed IEEE precision? .and I happen to have a set of tests to verify that I do J Thank you for any comment. Sergei --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130426/acd2cd4c/attachment.html>
On Apr 25, 2013, at 8:58 AM, Dan Gohman <dan433584 at gmail.com> wrote:> I think that you (in the rest of your email) identify a number of specific problems with using LLVM IR for legalization. These are a lot of specific issues caused by the fact that LLVM IR is intentionally not trying to model machine issues. I'm sure you *could* try to make this work by introducing a bunch of new intrinsics into LLVM IR which would model the union of the selection dag ISD nodes along with the target specific X86ISD nodes. However, at this point, you have only modeled the operations and haven't modeled the proper type system. > > I don't wish to argue about this, and am fine following your suggestion. However, I would like to understand your reasons better.Sure, I'm happy to explain. I apologize if I came across overly-strong about this. This is something that has come up many times before.> I don't think the type system is really the issue. The only thing SelectionDAG's type system has which LLVM IR's lacks which is useful here is "untyped", and that's a special-purpose thing that we can probably handle in other ways.That's definitely a fair criticism. In my (often crazy) mind, I'd like to solve a few problems in SelectionDAG that are not just an aspect of the DAG representation. One specific problem area with SelectionDAG (ignoring the DAG) is that various steps (legalization, isel, etc) want to introduce target specific operations that *require* a specific register class. The only way to model that in SelectionDAG is by picking an MVT that happens to align with it, and hoping that the right thing happens downstream. It would be much better if SelectionDAG (and its replacement) could represent register classes directly in its type system. However, this is a really really bad idea for LLVM IR for hopefully obvious reasons.> You and others are right that there could be a fair number of new intrinsics, especially considering all the X86ISD ones and all the rest. Is this a significant concern for you?No, I'm not specifically concerned with number of intrinsics.> Targets already have large numbers of target-specific intrinsics; would adding a relatively moderate number of new intrinsics really be a problem?No.> There's also the problem of keeping callers and callees consistent, and it's indeed quite a dickens, but it need not be a show-stopper.I consider this to be one (really important!) example of an invariant that would have to be violated to make this plan happen. I think that (in order to make this really work) we'd have to add a non-SSA LLVM IR, potentially multiple return results, subregs, etc. I think it is a really bad idea to make LLVM IR more complicated and worse to work with for the benefit of codegen.> > LLVM IR is just not the right level for this. You seem to think it is better than MachineInstrs because of developer friendliness, but it isn't clear to me that LLVM IR with the additions you're talking about would actually be friendly anymore :-) > > As I see it, people working in codegen are going to have to deal with lots of codegeny instructions regardless of whether we call them instructions or intrinsics. Is it really better one way or the other?Number of intrinsics is not a strong concern for me. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130426/655aa518/attachment.html>
Hi Sergei, The degree to which LLVM actually makes any guarantees about IEEE arithmetic precision is ambiguous. LangRef, for one, doesn't even mention it (it mentions formats, but nothing else). The de-facto way of interpreting holes in LangRef is to consider how the IR is used by clang and follow the path up into the C and/or C++ standards and then work from there. C describes a binding to IEC 60559, but it is optional, and clang doesn't opt in. C++ doesn't even have the option. So from an official perspective, it's not clear that you have any basis to complain ;-). I mention all this not to dismiss your concern, but to put it in context. Right or wrong, much of the C/C++ software world is not that keenly concerned in these matters. This includes LLVM in some respects. The folding of floating-point library routines which you point out in LLVM is one example of this. One idea for addressing this would be to teach LLVM's TargetLibraryInfo to carry information about how precise the target's library functions are. Then, you could either implement soft-float functions within LLVM itself for the affected library functions, or you could disable folding for those functions which are not precise enough on the host (in non-fast-math mode) Another idea for addressing this would be to convince the LLVM community that LLVM shouldn't constant-fold floating-point library functions at all (in non-fast-math mode). I think you could make a reasonable argument for this. There are ways to do this without loosing much optimization -- such expressions are still constant after all, so they can be hoisted out of any loop at all. They could even be hoisted out to main if you want. It's also worth noting that this problem predates the implementation of fast-math mode in LLVM's optimizer. Now that fast-math mode is available, it may be easier to convince people to make the non-fast-math mode more conservative. I don't know that everyone will accept this, but it's worth considering. Dan On Fri, Apr 26, 2013 at 12:44 PM, Sergei Larin <slarin at codeaurora.org>wrote:> Dan, and anyone else interested… **** > > ** ** > > I am not sure if this has been discussed before, but I do have a case > when the following logic fails to work:**** > > ** ** > > lib/Analysis/ConstantFolding.cpp**** > > ** ** > > static Constant *ConstantFoldBinaryFP(double (*NativeFP)(double, double),* > *** > > double V, double W, Type *Ty) {**** > > sys::llvm_fenv_clearexcept();**** > > V = NativeFP(V, W);**** > > if (sys::llvm_fenv_testexcept()) {**** > > sys::llvm_fenv_clearexcept();**** > > return 0;**** > > }**** > > ** ** > > ….**** > > ** ** > > This fragment seems to assumes that host and target behave in exact the > same way in regard to FP exception handling. In some way I understand it, > but… On some cross compilation platforms this might not be always true. In > case of Hexagon for example our FP math handling is apparently more precise > then “stock” one on x86 host. Specific (but not the best) example would be > computing sqrtf(1.000001). Result is 1 + FE_INEXACT set. My current linux > x86 host fails the inexact part… resulting in wrong code emitted.**** > > ** ** > > Once again, my question is not about this specific example, but rather > about the assumption of identical behavior of completely different systems. > What if my target’s “objective” is to exceed IEEE precision? …and I happen > to have a set of tests to verify that I do J**** > > ** ** > > Thank you for any comment.**** > > ** ** > > Sergei**** > > ** ** > > ** ** > > ---**** > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation**** > > ** ** >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130426/8192f702/attachment.html>