Roman Lebedev via llvm-dev
2018-May-23 18:53 UTC
[llvm-dev] RFC: should CVP always narrow the width of lshr?
Hi. In https://reviews.llvm.org/D46760, Bixia Zheng notes that given the pattern like [1] %za = zext i32 %a to i64 %udiv = udiv i64 %za, <C> %urem = urem i64 %za, <C> %uadd = add i64 %udiv, %urem ret i64 %uadd the CVP will reduce the width of udiv/urem, without considerations for the new instructions, or whether the new width is legal for the target. That was implemented in https://reviews.llvm.org/D44102 As Sanjay Patel notes in https://reviews.llvm.org/D47113#1106601,> We overlooked these questions in the specific case of div/rem > (D44102) because we assumed that narrower div/rem are always > better for analysis and codegen <cut>Now, there is a caveat. If <C> is constant, and is a power-of-two, instcombine will transform udiv to lshr, and urem to and. And CVP pass does not narrow the width of those opcodes. And instcombine won't narrow them either, because the zext has multiple uses / that would create more instructions, which is illegal for instcombine. So [1] may or may not be handled properly... There are multiple solutions, and more than one solution can be implemented, i suppose. 1. https://reviews.llvm.org/D46760 Somehow relax instcombine, to check for this very specific pattern. I don't think we want that. 2. Extend aggressive-instcombine to narrow a sequence of arbitrary binops (and maybe even more than binops). This is probably the most powerful, but i'm not quite sure how it would look.. 3. Teach CVP to always narrow the width of lshr. https://reviews.llvm.org/D47113 As Sanjay Patel notes in https://reviews.llvm.org/D47113#1106601, > <cut> but I doubt that we can extend that reasoning to all binops > on all targets. This will fight with transforms in instcombine > (see canEvaluateSExtd / canEvaluateZExtd). But, this is already sufficient to handle the [1]. Thus, as suggested, i'm sending this RFC. What do people think about "CVP should always narrow lshr" approach? Or is that likely to be too bad, and some more concrete approach, like the aggressive-instcombine extension is highly preferable? Roman.