Janek Van Oirschot via llvm-dev
2021-Apr-13 12:28 UTC
[llvm-dev] Loop pragma for hardware loops
Hey all, I'm looking to extend the current clang loop pragmas to also support hardware loops and allow a user to insert (or completely disable) hardware loop intrinsics on a per-loop basis. One of the questions I have regarding this is how to go about incorporating the different hardware loop intrinsics in the pragma. A few options we came up with: 1. The pragma incorporates which intrinsic to use for a loop: #pragma loop hwloop(set_loop_i32) or #pragma loop hwloop(/*LivesInReg=*/ true, /*AddTestGuard=*/ true, /*NumBits=*/ 32) 2. The pragma adds some target specific info (string?) to use in the hwloop TTI hook/new hwloop TTI hook: #pragma loop hwloop(target="bdnz") // PPC example #pragma loop hwloop(target="bdz") // PPC example or #pragma loop hwloop(max-count=42, ...) Option 1 requires the user to know about llvm's hardware loops internals so I'm leaning more towards option 2 as users are more likely to be aware of target specific information (such as PPC's bdnz/bdz). These are just some options we came up with, we would love to hear about other (better) options, if any. Kind regards, Janek van Oirschot -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210413/7d3cc0db/attachment-0001.html>
Sjoerd Meijer via llvm-dev
2021-Apr-13 13:15 UTC
[llvm-dev] Loop pragma for hardware loops
Hello Janek, It looks like you would like to steer which hardwareloop form will be generated with a pragma by providing very detailed target information, but I think a more typical use case of pragmas is to override the cost-model or a transformation threshold/argument. In this case, I would have guessed that the idea of the new pragma is it takes precedence over TTI's isHardwareLoopProfitable hook, and thus would probably have expected something as simple as "hwloop(enable|disable)" initially. If you would like to bring a hardwareloop into a more efficient form, then I think that's mainly the responsibility of the hardwareloop pass or a backend pass (see e.g. the ARM backend passes). I think option 1 is a non-starter as it exposes all sorts of internals that we don't want for different reasons, so option 2 looks a lot better but is still very specific. Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Janek Van Oirschot via llvm-dev <llvm-dev at lists.llvm.org> Sent: 13 April 2021 13:28 To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org <cfe-dev at lists.llvm.org> Subject: [llvm-dev] Loop pragma for hardware loops Hey all, I'm looking to extend the current clang loop pragmas to also support hardware loops and allow a user to insert (or completely disable) hardware loop intrinsics on a per-loop basis. One of the questions I have regarding this is how to go about incorporating the different hardware loop intrinsics in the pragma. A few options we came up with: 1. The pragma incorporates which intrinsic to use for a loop: #pragma loop hwloop(set_loop_i32) or #pragma loop hwloop(/*LivesInReg=*/ true, /*AddTestGuard=*/ true, /*NumBits=*/ 32) 2. The pragma adds some target specific info (string?) to use in the hwloop TTI hook/new hwloop TTI hook: #pragma loop hwloop(target="bdnz") // PPC example #pragma loop hwloop(target="bdz") // PPC example or #pragma loop hwloop(max-count=42, ...) Option 1 requires the user to know about llvm's hardware loops internals so I'm leaning more towards option 2 as users are more likely to be aware of target specific information (such as PPC's bdnz/bdz). These are just some options we came up with, we would love to hear about other (better) options, if any. Kind regards, Janek van Oirschot -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210413/a9eee549/attachment.html>