Jon Chesterfield via llvm-dev
2021-Mar-24 13:58 UTC
[llvm-dev] Function specialisation pass
> > Date: Tue, 23 Mar 2021 19:44:49 +0000 > From: Sjoerd Meijer via llvm-dev <llvm-dev at lists.llvm.org> > To: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] Function specialisation pass > > > I am interested in adding a function specialisation(*) pass to LLVM ... > > Both previous attempts were parked at approximately the same point: the > transformation was implemented but the cost-model to control compile-times > and code-size was lacking ... >This sounds right. The transform is fairly mechanical - clone function, replace uses of the argument being specialised on with the value, replace call sites which passed said value. Great to see there's already work in that direction. I'd be delighted to contribute to the implementation effort. It may even qualify as a legitimate thing to do during work hours - it would let the amdgpu openmp runtime back off on the enthusiasm for inlining everything. Some initial thoughts below. Thanks! Eliding call overhead and specialising on known arguments are currently bundled together as 'inlining', which also has a challenging cost model. If we can reliably specialise, call sites that are presently inlined no longer need to be. I'm sure the end point reached, with specialisation and inlining working in harmony, would be better than what we have now in compile time, code size, code quality. There's a lot of work to get the heuristics and interaction with inlining right though. I'd suggest an intermediate step of specialising based on user annotations, kicking that problem down the road: void example(int x, __attribute__((bikeshed)) int y) {... big function ...} where this means a call site with a compile time known value for y, say 42, gets specialised without reference to heuristics to a call to: void example.carefully-named.42(int x) {constexpr int y = 42; ...} We still have to get the machinery right - caching previous specialisations, mapping multiple specialisations down to the same end call, care with naming, maybe teaching thinlto about it and so forth. However we postpone the problem of heuristically determining which arguments and call sites are profitable. The big motivating examples for me are things that take function pointers (qsort style) and functions containing a large switch on one of the arguments that is often known at the callee. An in-tree example of the latter using 'the trick' from partial evaluation to get the same end result is in llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h: The entry point is: template <typename F> const unsigned char *handle_msgpack(byte_range bytes, F f); It's going to switch on the first byte in the byte_range but the function is too large to inline. So it is specialized, not with a convenient attribute, but by introducing something like: template <typename F, msgpack::type ty> const unsigned char *func(byte_range bytes, F f) { switch(ty) {} } template <typename F> const unsigned char *handle_msgpack(byte_range bytes, F f) { auto ty = bytes[0]; switch(ty) { case 0: return func<F,0>(...); case 1: return func<F,1>(...); } } This is, alas, a slightly more complicated example than `void example(int x, __attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly annotating a parameter as 'when this is known at compile time, specialise on it' would still let me simplify that code. Function pointer interfaces that are only called with a couple of different pointers may be a more obvious win. Like openmp target regions. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/4b1b91d6/attachment.html>
Hello Jon, Thanks for your reply. Very interesting, and I really like your suggestion! I am still exploring and doing some more initial investigations, now also by experimenting with D93838<https://reviews.llvm.org/D93838>, but I start to believe that there is a credible way forward to get this in and enabled by default at some point. Thus, I think a first intermediate step would be to get this in but have it off by default as that is very convenient for testing/experimenting. But like I said, I really like your suggestion, and so if we can drive function specialisation with an attribute then that sounds ideal to me and also justifies having the infrastructure to support this, then let it be driven by an attribute, and we can work on cost-modelling and enabling it by default. Also happy to hear you would like to contribute. I will discuss with the authors of D93838<https://reviews.llvm.org/D93838> if they can pick it up, but otherwise I will do that soon. If you for example would like to contribute with your attribute proposal or other things you mentioned that would absolutely fantastic and of course I would be happy to help out with reviews. Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Jon Chesterfield via llvm-dev <llvm-dev at lists.llvm.org> Sent: 24 March 2021 13:58 To: llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org <llvm-dev-request at lists.llvm.org> Subject: Re: [llvm-dev] Function specialisation pass Date: Tue, 23 Mar 2021 19:44:49 +0000 From: Sjoerd Meijer via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> To: "llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: [llvm-dev] Function specialisation pass I am interested in adding a function specialisation(*) pass to LLVM ... Both previous attempts were parked at approximately the same point: the transformation was implemented but the cost-model to control compile-times and code-size was lacking ... This sounds right. The transform is fairly mechanical - clone function, replace uses of the argument being specialised on with the value, replace call sites which passed said value. Great to see there's already work in that direction. I'd be delighted to contribute to the implementation effort. It may even qualify as a legitimate thing to do during work hours - it would let the amdgpu openmp runtime back off on the enthusiasm for inlining everything. Some initial thoughts below. Thanks! Eliding call overhead and specialising on known arguments are currently bundled together as 'inlining', which also has a challenging cost model. If we can reliably specialise, call sites that are presently inlined no longer need to be. I'm sure the end point reached, with specialisation and inlining working in harmony, would be better than what we have now in compile time, code size, code quality. There's a lot of work to get the heuristics and interaction with inlining right though. I'd suggest an intermediate step of specialising based on user annotations, kicking that problem down the road: void example(int x, __attribute__((bikeshed)) int y) {... big function ...} where this means a call site with a compile time known value for y, say 42, gets specialised without reference to heuristics to a call to: void example.carefully-named.42(int x) {constexpr int y = 42; ...} We still have to get the machinery right - caching previous specialisations, mapping multiple specialisations down to the same end call, care with naming, maybe teaching thinlto about it and so forth. However we postpone the problem of heuristically determining which arguments and call sites are profitable. The big motivating examples for me are things that take function pointers (qsort style) and functions containing a large switch on one of the arguments that is often known at the callee. An in-tree example of the latter using 'the trick' from partial evaluation to get the same end result is in llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h: The entry point is: template <typename F> const unsigned char *handle_msgpack(byte_range bytes, F f); It's going to switch on the first byte in the byte_range but the function is too large to inline. So it is specialized, not with a convenient attribute, but by introducing something like: template <typename F, msgpack::type ty> const unsigned char *func(byte_range bytes, F f) { switch(ty) {} } template <typename F> const unsigned char *handle_msgpack(byte_range bytes, F f) { auto ty = bytes[0]; switch(ty) { case 0: return func<F,0>(...); case 1: return func<F,1>(...); } } This is, alas, a slightly more complicated example than `void example(int x, __attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly annotating a parameter as 'when this is known at compile time, specialise on it' would still let me simplify that code. Function pointer interfaces that are only called with a couple of different pointers may be a more obvious win. Like openmp target regions. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/87a58ed0/attachment.html>