Paul C. Anagnostopoulos via llvm-dev
2020-Nov-13 12:22 UTC
[llvm-dev] Musings on the TableGen -emit-dag-isel backend
Your suggestion for two passes is indeed my plan if simply using 3-byte sizes is not acceptable. I don't want to duplicate all the logic in a second length-calculating function, so I would just have special logic for the three matching operators with children and use the existing function for the rest, passing a null output stream. Or I could conditionalize all the output on another function parameter so it isn't done at all. I'm not convinced that anyone would notice a 4% increase in the size of the matching table, but I don't think that's my call. I have plenty to do while waiting for more comments. ;-)
Nicolai Hähnle via llvm-dev
2020-Nov-13 12:53 UTC
[llvm-dev] Musings on the TableGen -emit-dag-isel backend
On 13.11.20 13:22, Paul C. Anagnostopoulos via llvm-dev wrote:> Your suggestion for two passes is indeed my plan if simply using 3-byte sizes is not acceptable. I don't want to duplicate all the logic in a second length-calculating function, so I would just have special logic for the three matching operators with children and use the existing function for the rest, passing a null output stream. Or I could conditionalize all the output on another function parameter so it isn't done at all. > > I'm not convinced that anyone would notice a 4% increase in the size of the matching table, but I don't think that's my call.It's not necessarily about the total size, but cache pressure is an issue. That said, if we are seriously thinking about the performance of the byte code, perhaps some of these opcodes should be reconsidered at a higher level anyway. For example: The overall bytecode always begins with an OPC_SwitchOpcode implemented as a linear list of cases, often hundreds of them (depending on the target). A binary search over a jump table would be *much* better for those. Cheers, Nicolai> I have plenty to do while waiting for more comments. ;-) > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.
Paul C. Anagnostopoulos via llvm-dev
2020-Nov-13 13:43 UTC
[llvm-dev] Musings on the TableGen -emit-dag-isel backend
I've only just begun looking at the compiler's use of the matching table. You are correct about the outer SwitchOpcode: The AMDGPU target has 212 cases. The X86 has 452. However, there is a cache for the case offsets in the SelectionDAGIsel class. At 11/13/2020 07:53 AM, Nicolai Hähnle wrote:>On 13.11.20 13:22, Paul C. Anagnostopoulos via llvm-dev wrote: >>Your suggestion for two passes is indeed my plan if simply using 3-byte sizes is not acceptable. I don't want to duplicate all the logic in a second length-calculating function, so I would just have special logic for the three matching operators with children and use the existing function for the rest, passing a null output stream. Or I could conditionalize all the output on another function parameter so it isn't done at all. >>I'm not convinced that anyone would notice a 4% increase in the size of the matching table, but I don't think that's my call. > >It's not necessarily about the total size, but cache pressure is an issue. > >That said, if we are seriously thinking about the performance of the byte code, perhaps some of these opcodes should be reconsidered at a higher level anyway. > >For example: The overall bytecode always begins with an OPC_SwitchOpcode implemented as a linear list of cases, often hundreds of them (depending on the target). A binary search over a jump table would be *much* better for those. > >Cheers, >Nicolai
Paul C. Anagnostopoulos via llvm-dev
2020-Nov-13 14:55 UTC
[llvm-dev] Musings on the TableGen -emit-dag-isel backend
Would it make sense for TableGen to generate the outer OPC_SwitchOpcode offset table? At 11/13/2020 07:53 AM, Nicolai Hähnle wrote:>That said, if we are seriously thinking about the performance of the byte code, perhaps some of these opcodes should be reconsidered at a higher level anyway. > >For example: The overall bytecode always begins with an OPC_SwitchOpcode implemented as a linear list of cases, often hundreds of them (depending on the target). A binary search over a jump table would be *much* better for those. > >Cheers, >Nicolai