Stefanos Baziotis via llvm-dev
2020-Jul-07 19:11 UTC
[llvm-dev] How to get information about data dependencies?
> Just to be clear, you don't think it has potential because it's been > disgned into an inner-loop corner and would take extensive rewriting to > handle OLV?First of all, if one wants to do dependence analysis for OLV (which is what I'm currently working on), they may get away with extending LAA, or writing their own OLV checker. That said, no it doesn't seem that the current LAA can help. But that we can't support OLV currently is not the core problem for me. We may do it eventually, but if the implementation is still hacky, then it would still be bad I think. The core problem is that there's no clear theoretical foundation to support it. The code seems to me like a set of "Oh, we needed to handle this case so we added an `if` there". Which to me is not the correct way to go and it's pretty much the opposite of DA. (Again, I don't mean to criticize LAA implementors - they probably had their reasons and I guess they know way more than me)> it certainly seems like extending DA is the way to goMaybe but maybe not. Maybe a good theory for run-time checks ends up being different than DA's ideas (this is partly what I have to do this summer). In any case, extending> but I'd like to hear from the current vectorizer > maintainers because I don't have enough knowledge to make an informed > judgment.Me too!> There's the VPlan infrastructure which I have not heard much about for > several months. What is going on with that? Yes, that's a vector > codegen issue but it may be useful to have a more complete picture of > how all this works or will work.Sorry, I don't know much about VPlan. I'm involved in the RV ( https://github.com/cdl-saarland/rv)> Note that when the development of LAA started it also did static checksonly, even though DA already existed in the code base Interesting, thanks.> Thanks for sharing your analysis.No problem :)> I am not sure if that is an entirely fair characterization of LAA. LAA isbeing used by the vectorizer (and other passes) in production for a few years now. None of the in-tree users of DA seem to be enabled by default and therefore LAA probably has an order of magnitude more testing, bug fixes & tuning. No argument there. The fact that it is used, doesn't necessarily mean it's clean nor that it has some strong theory supporting it.> DA’s implementation might be cleaner, but as mentioned earlier, DAhandles only a small subset of things LAA handles and hence I am not sure comparing the code-complexity is too helpful. DA does not handle a small subset of LAA's checks, unless I miss something. It handles way more when it comes to static checking. I think that comparing code complexity is important. DA is about double the size of LAA yet it's way more understandable. And the reason for that I don't think it is that it does something more trivial. Rather, it's based on a clear paper and has clearly implemented it.> IMO a lot of LAA complexity comes from things DA does not handle, inparticular runtime check generation. I agree.> LAA also analyses & processes a whole loop whereas DA only checksdependences between 2 memory accesses, as well as decides whether it is profitable to generate runtime checks. It processes innermost loops only and the fact that it can handle a whole loop rather than independent accesses I'm not sure it is a good path. For LAA's usage it's necessary but it creates a form of coupling (and complexity).> There is definitely potential for improving the structure & organizationof LAA, as well as improving the documentation. Happy to collaborate on that. Are we really sure of that? Personally, I was thinking of submitting a patch but I'm not sure it is worth the effort. However, I'm glad to hear that you're happy to collaborate. :) We can talk about that more if you want.> I am not convinced it makes sense to add runtime check generating to DAdirectly, because I don’t think the static dependence checks really need to be strongly coupled with runtime-check generati I agree to the latter, maybe to the former. In any case, I'd like to see the current DA staying as it is. And move the discussion to "what is the future of run-time checks". Either that is extending LAA, DA or something else completely.> To clarify, LAA does static checks and only generate runtime checks if itcannot prove that the dependence is safe for vectorization statically. Granted, the static checks mostly boil down to distance computations on SCEV expressions, but for the current use cases it seems to work well enough. Yes, sorry for not stressing that LAA does static checks too as I said though, in my understanding they're very weak, though still enough for its usage. And I agree that LAA's capabilities is probably enough for innermost loop vectorization. The important thing I believe is the future.> It might be feasible to use DA for the static checks in LAA. That mighthelp for a few multi-dimensional cases, but in practice generating proper runtime-checks for multi-dimensional cases is probably more important, due to aliasing issues. Well... I tried that and it doesn't seem to be very useful unfortunately. The C/C++ way that arrays are defined is probably why DA is not that useful. Namely that a row can alias with another row in 2D arrays. The theory behind DA is quite powerful if we knew that they don't alias. Right now, it just gives up. I don't think that LAA can handle multi-dimensional cases either though, nor do I have a good idea about how to do it myself (in or out of LAA / DA). Best, Stefanos Στις Τετ, 8 Ιουλ 2020 στις 12:48 π.μ., ο/η Florian Hahn < florian_hahn at apple.com> έγραψε:> Hi, > > > On Jul 7, 2020, at 18:37, Stefanos Baziotis via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > > Ah, that's important information I didn't have. Thank you! > > > > No problem, glad to help! > > > > To the rest of your thoughts, I certainly agree. One interesting > question is why LAA > > didn't use DA at all. Other than that, note that LAA is quite > specialized, namely for > > loop vectorization. Actually, it's even more specific. For innermost > loop vectorization. > > That affects the design. It might had been easier to create this > specialized tool than > > extending a general one (if that was a good path to follow is another > topic). > > > > > But yet they are intimately related in that the kind of information you > > > want to know statically and dynamically is the same. I wonder what it > > > would take to extend DA to generate runtime checks if it can't prove > > > independence. > > > > Indeed, but again, IMHO unifying them is neither easy nor does it make > sense. > > They do fundamentally the same thing but their directions are very > different. > > > > So, I see two options: > > > > a) As you said > > > > > I wonder what it would take to extend DA to generate runtime checks > if it can't prove independence. > > > > Personally, I see potential but neither do I know what it would take. > Since this is something that I'm > > currently thinking of, I would be more than interested to discuss it > extensively. > > > > In any case, I would strongly prefer that we don't follow the LAA path, > since I don't think it has potential > > anyway. I think that we should try to find a way to extend it that is > also based on strong theoretical foundation > > and maintains the high quality of code. > > I am not sure if that is an entirely fair characterization of LAA. LAA is > being used by the vectorizer (and other passes) in production for a few > years now. None of the in-tree users of DA seem to be enabled by default > and therefore LAA probably has an order of magnitude more testing, bug > fixes & tuning. > > DA’s implementation might be cleaner, but as mentioned earlier, DA handles > only a small subset of things LAA handles and hence I am not sure comparing > the code-complexity is too helpful. > IMO a lot of LAA complexity comes from things DA does not handle, in > particular runtime check generation. LAA also analyses & processes a whole > loop whereas DA only checks dependences between 2 memory accesses, as well > as decides whether it is profitable to generate runtime checks. > > There is definitely potential for improving the structure & organization > of LAA, as well as improving the documentation. Happy to collaborate on > that. > > I am not convinced it makes sense to add runtime check generating to DA > directly, because I don’t think the static dependence checks really need to > be strongly coupled with runtime-check generation. > > > b) Extend LAA to do static checks > > > > The question here is though: Why do that? As I said, it doesn't seem to > have potential and I believe that people > > working on vectorizers (either LLVM's current one or external like e.g. > RV and VPlan) don't do either. > > > > To clarify, LAA does static checks and only generate runtime checks if it > cannot prove that the dependence is safe for vectorization statically. > Granted, the static checks mostly boil down to distance computations on > SCEV expressions, but for the current use cases it seems to work well > enough. > > It might be feasible to use DA for the static checks in LAA. That might > help for a few multi-dimensional cases, but in practice generating proper > runtime-checks for multi-dimensional cases is probably more important, due > to aliasing issues. > > Cheers, > Florian-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200707/09acf263/attachment-0001.html>
Florian Hahn via llvm-dev
2020-Jul-07 21:48 UTC
[llvm-dev] How to get information about data dependencies?
Hi,> On Jul 7, 2020, at 18:37, Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > Ah, that's important information I didn't have. Thank you! > > No problem, glad to help! > > To the rest of your thoughts, I certainly agree. One interesting question is why LAA > didn't use DA at all. Other than that, note that LAA is quite specialized, namely for > loop vectorization. Actually, it's even more specific. For innermost loop vectorization. > That affects the design. It might had been easier to create this specialized tool than > extending a general one (if that was a good path to follow is another topic). > > > But yet they are intimately related in that the kind of information you > > want to know statically and dynamically is the same. I wonder what it > > would take to extend DA to generate runtime checks if it can't prove > > independence. > > Indeed, but again, IMHO unifying them is neither easy nor does it make sense. > They do fundamentally the same thing but their directions are very different. > > So, I see two options: > > a) As you said > > > I wonder what it would take to extend DA to generate runtime checks if it can't prove independence. > > Personally, I see potential but neither do I know what it would take. Since this is something that I'm > currently thinking of, I would be more than interested to discuss it extensively. > > In any case, I would strongly prefer that we don't follow the LAA path, since I don't think it has potential > anyway. I think that we should try to find a way to extend it that is also based on strong theoretical foundation > and maintains the high quality of code.I am not sure if that is an entirely fair characterization of LAA. LAA is being used by the vectorizer (and other passes) in production for a few years now. None of the in-tree users of DA seem to be enabled by default and therefore LAA probably has an order of magnitude more testing, bug fixes & tuning. DA’s implementation might be cleaner, but as mentioned earlier, DA handles only a small subset of things LAA handles and hence I am not sure comparing the code-complexity is too helpful. IMO a lot of LAA complexity comes from things DA does not handle, in particular runtime check generation. LAA also analyses & processes a whole loop whereas DA only checks dependences between 2 memory accesses, as well as decides whether it is profitable to generate runtime checks. There is definitely potential for improving the structure & organization of LAA, as well as improving the documentation. Happy to collaborate on that. I am not convinced it makes sense to add runtime check generating to DA directly, because I don’t think the static dependence checks really need to be strongly coupled with runtime-check generation.> b) Extend LAA to do static checks > > The question here is though: Why do that? As I said, it doesn't seem to have potential and I believe that people > working on vectorizers (either LLVM's current one or external like e.g. RV and VPlan) don't do either. >To clarify, LAA does static checks and only generate runtime checks if it cannot prove that the dependence is safe for vectorization statically. Granted, the static checks mostly boil down to distance computations on SCEV expressions, but for the current use cases it seems to work well enough. It might be feasible to use DA for the static checks in LAA. That might help for a few multi-dimensional cases, but in practice generating proper runtime-checks for multi-dimensional cases is probably more important, due to aliasing issues. Cheers, Florian
David Greene via llvm-dev
2020-Jul-15 19:17 UTC
[llvm-dev] How to get information about data dependencies?
Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> writes:> Well... I tried that and it doesn't seem to be very useful > unfortunately. The C/C++ way that arrays are defined is probably why > DA is not that useful. Namely that a row can alias with another row in > 2D arrays. The theory behind DA is quite powerful if we knew that they > don't alias. Right now, it just gives up.Note that the situation is very different in flang, where good dependence analysis becomes much more critical. -David