Hal Finkel
2012-Jun-29 03:54 UTC
[LLVMdev] Removing the separation between opt and codegen?
Hello, One important next step in turning LLVM into a first-class autovectorizing compiler will be to incorporate target information into the vectorization logic. To really make good decisions regarding what is profitable to vectorize, and how that vectorization should be done, it will be important for the vectorization pass(es) to understand the underlying target capabilities. The same will hold true for various kinds of loop iteration-space transformations. As I recall, Chris suggested to me some months ago the following work-around: allow optimization passes to access target lowering info only when it is available. Specifically this means that only for frontends (like clang) that link in both the optimization passes and codegen, we would provide some mechanism for providing a TLI instance to the optimization passes. While I think this could certainly be made to work, it seems suboptimal. It would mean that 'opt' could no longer perform the same level of optimization as 'clang' with equivalent inputs. That being the case, I think that over time 'opt' would simply fall out of use. My general question is this: What do we gain by keeping a strict separation between the (mostly-target-independent) optimization layer and the codegen layer? To partially answer my own question, I can think of one advantage: It keeps us from being lazy. Specifically, it forces us to keep a single canonical expression form that is handed to the backends. The eases the maintenance burden by forcing a certain amount of generality into the whole system and by limiting target-specific variants of the canonical expression forms. This makes it harder to break things in odd ways with seemingly-innocuous changes. I fear, however, that this leads to a system which is generally good, but not great on any particular target. Furthermore, it is sometimes very difficult or impossible for the backends to undo bad decisions made by the target-independent optimization layer. I think it is time to reconsider this separation and make optimization a truly target-dependent process where needed. Obviously, we should not make target-dependent decisions where they're not necessary, and we should introduce appropriate abstraction layers to characterize target differences. Nevertheless, the most efficient and maintainable way to provide target information to the optimization passes will be to provide that information directly from the backend code (and associated tablegen files). I would like to hear other opinions on this. Thanks again, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Das, Dibyendu
2012-Jun-29 05:14 UTC
[LLVMdev] Removing the separation between opt and codegen?
Hal- I generally agree with what you are saying here. Based on my recent experience with working on a partial-simdizer (not llvm) I found that even to decide which instructions to group for good simdization requires some knowledge of the underlying target. Lets take an instruction like haddps which adds all the components of a vector register in a certain way. Whether such an instruction is supported by the target does impact your simdization choice. Furthermore, the cost of haddps may also decide how/where to simdize. Hence the simdization-choice phase which should (theoretically) be fairly target-independent needs to have some knowledge of the target. Now whether this can be abstracted way in some form can be discussed. -Dibyendu -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Friday, June 29, 2012 9:25 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Removing the separation between opt and codegen? Hello, One important next step in turning LLVM into a first-class autovectorizing compiler will be to incorporate target information into the vectorization logic. To really make good decisions regarding what is profitable to vectorize, and how that vectorization should be done, it will be important for the vectorization pass(es) to understand the underlying target capabilities. The same will hold true for various kinds of loop iteration-space transformations. As I recall, Chris suggested to me some months ago the following work-around: allow optimization passes to access target lowering info only when it is available. Specifically this means that only for frontends (like clang) that link in both the optimization passes and codegen, we would provide some mechanism for providing a TLI instance to the optimization passes. While I think this could certainly be made to work, it seems suboptimal. It would mean that 'opt' could no longer perform the same level of optimization as 'clang' with equivalent inputs. That being the case, I think that over time 'opt' would simply fall out of use. My general question is this: What do we gain by keeping a strict separation between the (mostly-target-independent) optimization layer and the codegen layer? To partially answer my own question, I can think of one advantage: It keeps us from being lazy. Specifically, it forces us to keep a single canonical expression form that is handed to the backends. The eases the maintenance burden by forcing a certain amount of generality into the whole system and by limiting target-specific variants of the canonical expression forms. This makes it harder to break things in odd ways with seemingly-innocuous changes. I fear, however, that this leads to a system which is generally good, but not great on any particular target. Furthermore, it is sometimes very difficult or impossible for the backends to undo bad decisions made by the target-independent optimization layer. I think it is time to reconsider this separation and make optimization a truly target-dependent process where needed. Obviously, we should not make target-dependent decisions where they're not necessary, and we should introduce appropriate abstraction layers to characterize target differences. Nevertheless, the most efficient and maintainable way to provide target information to the optimization passes will be to provide that information directly from the backend code (and associated tablegen files). I would like to hear other opinions on this. Thanks again, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Rotem, Nadav
2012-Jun-29 07:05 UTC
[LLVMdev] Removing the separation between opt and codegen?
>>seems suboptimal. It would mean that 'opt' could no longer perform the same level of optimization as >>'clang' with equivalent inputs. That being the case, I think that over time 'opt' would simply fall out of >>use. My general question is this: What do we gain by keeping a strict separation between the >>(mostly-target-independent) optimization layer and the codegen layer?I was under the impression that opt would also be able to benefit from the added capabilities. After all, opt is just a driver, and can be taught to understand the 'march' 'mcpu' flags. I agree that it is important to allow opt to access the TLI for two reasons: 1. We use opt to test our code. 2. Vectorizers may want to service domain specific languages which may not necessarily use clang.>>I fear, however, that this leads to a system which is generally good, but not great on any particular >> target. Furthermore, it is sometimes very difficult or impossible for the backends to undo bad >> decisions made by the target-independent optimization layer.Yes, but this is a general compiler problem. Early optimizations have no knowledge of how they affects later stages. For example, we don't consider register pressure when we inline a function. The problem is even more severe with vectorizing compilers. One problem that I mentioned in the past was that on 64bit systems, 32bit scalars are promoted into 64bit numbers. Later on, the vectorizer attempts to vectorize this value, but the problem is, that it is much more difficult to vectorize vectors of i64s. For example, array indices which were i32 values are now vectors of i64s, which can't be used for scatter/gather operations (which use i32 indices).>> Nevertheless, the most efficient and maintainable way to provide target information to the >> optimization passes will be to provide that information directly from the backend code (and >> associated tablegen files). >>I would like to hear other opinions on this.--------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Apparently Analagous Threads
- Question about llvm vectors
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] LLVM Loop Vectorizer
- [LLVMdev] LLVM Loop Vectorizer
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits