Richard Diamond via llvm-dev
2015-Nov-02 23:57 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
Hey all, I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc. # Background/Motivation In Rust we have a crate called `test` which provides a function, `black_box`, which is designed to be a no-op function that prevents constprop, die, etc from interfering with tests/benchmarks but otherwise doesn't negatively affect resulting machine code quality. `test` currently implements this function by using inline asm, which marks a pointer to the argument as used by the assembly. At the IR level, this creates an alloca, stores it's argument to it, calls the no-op inline asm with the alloca pointer, and then returns a load of the alloca. Obviously, `mem2reg` would normally optimize this sort of pattern away, however the deliberate use of the no-op asm prevents other desirable optimizations (such as the aforementioned `mem2reg` pass) a little too well. Existing and upcoming virtual ISA targets also don't have this luxury (PNaCl/JS and WebAssembly, respectively). For these kind of targets, Rust's `test` currently forbids inlining of `black_box`, which crudely achieves the same effect. This is undesirable for any target because of the associated call overhead. The IR for `test::black_box::<i32>` is currently (it gets inlined, as desired, so I've omitted the function signature): ````llvm %dummy.i = alloca i32, align 4 %2 = bitcast i32* %dummy.i to i8* call void @llvm.lifetime.start(i64 4, i8* %2) #1 ; Here, the value operand was the original argument to `test::black_box::<i32>` store i32 2, i32* %dummy.i, align 4 call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i32* %dummy.i) #1, !srcloc !0 %3 = load i32, i32* %dummy.i, align 4 call void @llvm.lifetime.end(i64 4, i8* %2) #1 ```` This could be better. # Solution Add a new intrinsic, called `llvm.blackbox`, which accepts a value of any type and returns a value of the same type. As with many other intrinsics, this intrinsic shall remain unknown to all optimizations, before and during codegen. Specifically, this intrinsic should prevent all optimizations which operate by assuming properties of the value passed to the intrinsic. Once the last optimization pass (of any kind) is finished, all calls can be RAUW its argument. Table-gen def: ```tablegen def int_blackbox : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>]>; ``` Thus, using the previous example, `%3` would become: ```llvm %3 = call i32 @llvm.blackbox.i32(i32 2) ``` # Thoughts and suggestions welcome. Thanks, Richard Diamond -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151102/ed50c469/attachment.html>
Sanjoy Das via llvm-dev
2015-Nov-03 01:19 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
Why does this need to be an intrinsic (as opposed to generic "unknown function" to llvm)? Secondly, have you looked into a volatile store / load to an alloca? That should work with PNaCl and WebAssembly. E.g. define i32 @blackbox(i32 %arg) { entry: %p = alloca i32 store volatile i32 10, i32* %p ;; or store %arg %v = load volatile i32, i32* %p ret i32 %v } -- Sanjoy Richard Diamond via llvm-dev wrote:> Hey all, > > I'd like to propose a new intrinsic for use in preventing optimizations > from deleting IR due to constant propagation, dead code elimination, etc. > > > # Background/Motivation > > In Rust we have a crate called `test` which provides a function, > `black_box`, which is designed to be a no-op function that prevents > constprop, die, etc from interfering with tests/benchmarks but otherwise > doesn't negatively affect resulting machine code quality. `test` > currently implements this function by using inline asm, which marks a > pointer to the argument as used by the assembly. > > At the IR level, this creates an alloca, stores it's argument to it, > calls the no-op inline asm with the alloca pointer, and then returns a > load of the alloca. Obviously, `mem2reg` would normally optimize this > sort of pattern away, however the deliberate use of the no-op asm > prevents other desirable optimizations (such as the aforementioned > `mem2reg` pass) a little too well. > > Existing and upcoming virtual ISA targets also don't have this luxury > (PNaCl/JS and WebAssembly, respectively). For these kind of targets, > Rust's `test` currently forbids inlining of `black_box`, which crudely > achieves the same effect. This is undesirable for any target because of > the associated call overhead. > > The IR for `test::black_box::<i32>` is currently (it gets inlined, as > desired, so I've omitted the function signature): > > ````llvm > %dummy.i = alloca i32, align 4 > %2 = bitcast i32* %dummy.i to i8* > call void @llvm.lifetime.start(i64 4, i8* %2) #1 > ; Here, the value operand was the original argument to > `test::black_box::<i32>` > store i32 2, i32* %dummy.i, align 4 > call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i32* %dummy.i) #1, > !srcloc !0 > %3 = load i32, i32* %dummy.i, align 4 > call void @llvm.lifetime.end(i64 4, i8* %2) #1 > ```` > > This could be better. > > # Solution > > Add a new intrinsic, called `llvm.blackbox`, which accepts a value of > any type and returns a value of the same type. As with many other > intrinsics, this intrinsic shall remain unknown to all optimizations, > before and during codegen. Specifically, this intrinsic should prevent > all optimizations which operate by assuming properties of the value > passed to the intrinsic. Once the last optimization pass (of any kind) > is finished, all calls can be RAUW its argument. > > Table-gen def: > > ```tablegen > def int_blackbox : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>]>; > ``` > > Thus, using the previous example, `%3` would become: > ```llvm > %3 = call i32 @llvm.blackbox.i32(i32 2) > > ``` > > # > > Thoughts and suggestions welcome. > > Thanks, > Richard Diamond > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Richard Diamond via llvm-dev
2015-Nov-03 01:23 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
On Mon, Nov 2, 2015 at 7:19 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Why does this need to be an intrinsic (as opposed to generic "unknown > function" to llvm)? > > Secondly, have you looked into a volatile store / load to an alloca? That > should work with PNaCl and WebAssembly. > > E.g. > > define i32 @blackbox(i32 %arg) { > entry: > %p = alloca i32 > store volatile i32 10, i32* %p ;; or store %arg > %v = load volatile i32, i32* %p > ret i32 %v > }That volatility would have a negative performance impact. Richard Diamond -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151102/c969d51d/attachment-0001.html>
Krzysztof Parzyszek via llvm-dev
2015-Nov-03 13:29 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
On 11/2/2015 5:57 PM, Richard Diamond via llvm-dev wrote:> > Add a new intrinsic, called `llvm.blackbox`, which accepts a value of > any type and returns a value of the same type. As with many other > intrinsics, this intrinsic shall remain unknown to all optimizations, > before and during codegen. Specifically, this intrinsic should prevent > all optimizations which operate by assuming properties of the value > passed to the intrinsic. Once the last optimization pass (of any kind) > is finished, all calls can be RAUW its argument.This would not prevent dead code elimination from removing it. The intrinsic would need to have some sort of a side-effect in order to be preserved in all cases. Are you concerned about cases where the user of the intrinsic is dead? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Owen Anderson via llvm-dev
2015-Nov-03 19:18 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
To add on to what Danny and Krzysztof have said, this proposal doesn’t make a lot of sense to me. You want this intrinsic to inhibit (some) optimizations, but you simultaneously want it not to have a performance impact. Those are contradictory goals. Worse, the proposal doesn’t specify what optimizations should/should not be allowed for this intrinsic, since apparently you want at least some applied. Is CSE allowed? DCE? PRE? —Owen> On Nov 2, 2015, at 3:57 PM, Richard Diamond via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hey all, > > I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc. > > > # Background/Motivation > > In Rust we have a crate called `test` which provides a function, `black_box`, which is designed to be a no-op function that prevents constprop, die, etc from interfering with tests/benchmarks but otherwise doesn't negatively affect resulting machine code quality. `test` currently implements this function by using inline asm, which marks a pointer to the argument as used by the assembly. > > At the IR level, this creates an alloca, stores it's argument to it, calls the no-op inline asm with the alloca pointer, and then returns a load of the alloca. Obviously, `mem2reg` would normally optimize this sort of pattern away, however the deliberate use of the no-op asm prevents other desirable optimizations (such as the aforementioned `mem2reg` pass) a little too well. > > Existing and upcoming virtual ISA targets also don't have this luxury (PNaCl/JS and WebAssembly, respectively). For these kind of targets, Rust's `test` currently forbids inlining of `black_box`, which crudely achieves the same effect. This is undesirable for any target because of the associated call overhead. > > The IR for `test::black_box::<i32>` is currently (it gets inlined, as desired, so I've omitted the function signature): > > ````llvm > %dummy.i = alloca i32, align 4 > %2 = bitcast i32* %dummy.i to i8* > call void @llvm.lifetime.start(i64 4, i8* %2) #1 > ; Here, the value operand was the original argument to `test::black_box::<i32>` > store i32 2, i32* %dummy.i, align 4 > call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i32* %dummy.i) #1, !srcloc !0 > %3 = load i32, i32* %dummy.i, align 4 > call void @llvm.lifetime.end(i64 4, i8* %2) #1 > ```` > > This could be better. > > # Solution > > Add a new intrinsic, called `llvm.blackbox`, which accepts a value of any type and returns a value of the same type. As with many other intrinsics, this intrinsic shall remain unknown to all optimizations, before and during codegen. Specifically, this intrinsic should prevent all optimizations which operate by assuming properties of the value passed to the intrinsic. Once the last optimization pass (of any kind) is finished, all calls can be RAUW its argument. > > Table-gen def: > > ```tablegen > def int_blackbox : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>]>; > ``` > > Thus, using the previous example, `%3` would become: > ```llvm > %3 = call i32 @llvm.blackbox.i32(i32 2) > > ``` > > # > > Thoughts and suggestions welcome. > > Thanks, > Richard Diamond > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151103/5dc22e35/attachment.html>
Richard Diamond via llvm-dev
2015-Nov-03 20:48 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
On Tue, Nov 3, 2015 at 1:18 PM, Owen Anderson <resistor at mac.com> wrote:> To add on to what Danny and Krzysztof have said, this proposal doesn’t > make a lot of sense to me. You want this intrinsic to inhibit (some) > optimizations, but you simultaneously want it not to have a performance > impact. Those are contradictory goals. Worse, the proposal doesn’t > specify what optimizations should/should not be allowed for this intrinsic, > since apparently you want at least some applied. Is CSE allowed? DCE? PRE? > >I apologize for the confusion. I don't think the goals are contradictory. We're talking about code the developer *specifically* doesn't want optimized away, but otherwise doesn't care about what optimization transforms are employed. So yes, I want it to inhibit some optimizations, but without otherwise having a performance impact outside of the obviously prevented optimizations. PRE would be fine, as long as the expression in question doesn't make a call to this intrinsic. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151103/39e441cb/attachment.html>
Diego Novillo via llvm-dev
2015-Nov-03 20:50 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
I don't see how this is any different from volatile markers on loads/stores or memory barriers or several other optimizer blocking devices. They generally end up crippling the optimizers without much added benefit. Would it be possible to stop the code motion you want to block by explicitly exposing data dependencies? Or simply disabling some optimizations with pragmas? Diego. On Mon, Nov 2, 2015 at 6:57 PM, Richard Diamond via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hey all, > > I'd like to propose a new intrinsic for use in preventing optimizations > from deleting IR due to constant propagation, dead code elimination, etc. > > > # Background/Motivation > > In Rust we have a crate called `test` which provides a function, > `black_box`, which is designed to be a no-op function that prevents > constprop, die, etc from interfering with tests/benchmarks but otherwise > doesn't negatively affect resulting machine code quality. `test` currently > implements this function by using inline asm, which marks a pointer to the > argument as used by the assembly. > > At the IR level, this creates an alloca, stores it's argument to it, calls > the no-op inline asm with the alloca pointer, and then returns a load of > the alloca. Obviously, `mem2reg` would normally optimize this sort of > pattern away, however the deliberate use of the no-op asm prevents other > desirable optimizations (such as the aforementioned `mem2reg` pass) a > little too well. > > Existing and upcoming virtual ISA targets also don't have this luxury > (PNaCl/JS and WebAssembly, respectively). For these kind of targets, Rust's > `test` currently forbids inlining of `black_box`, which crudely achieves > the same effect. This is undesirable for any target because of the > associated call overhead. > > The IR for `test::black_box::<i32>` is currently (it gets inlined, as > desired, so I've omitted the function signature): > > ````llvm > %dummy.i = alloca i32, align 4 > %2 = bitcast i32* %dummy.i to i8* > call void @llvm.lifetime.start(i64 4, i8* %2) #1 > ; Here, the value operand was the original argument to > `test::black_box::<i32>` > store i32 2, i32* %dummy.i, align 4 > call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i32* %dummy.i) #1, > !srcloc !0 > %3 = load i32, i32* %dummy.i, align 4 > call void @llvm.lifetime.end(i64 4, i8* %2) #1 > ```` > > This could be better. > > # Solution > > Add a new intrinsic, called `llvm.blackbox`, which accepts a value of any > type and returns a value of the same type. As with many other intrinsics, > this intrinsic shall remain unknown to all optimizations, before and during > codegen. Specifically, this intrinsic should prevent all optimizations > which operate by assuming properties of the value passed to the intrinsic. > Once the last optimization pass (of any kind) is finished, all calls can be > RAUW its argument. > > Table-gen def: > > ```tablegen > def int_blackbox : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>]>; > ``` > > Thus, using the previous example, `%3` would become: > ```llvm > %3 = call i32 @llvm.blackbox.i32(i32 2) > > ``` > > # > > Thoughts and suggestions welcome. > > Thanks, > Richard Diamond > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151103/16fbd9f3/attachment.html>
Richard Diamond via llvm-dev
2015-Nov-06 16:31 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
On Tue, Nov 3, 2015 at 2:50 PM, Diego Novillo <dnovillo at google.com> wrote:> I don't see how this is any different from volatile markers on > loads/stores or memory barriers or several other optimizer blocking > devices. They generally end up crippling the optimizers without much added > benefit. >Volatile must touch memory (right?). Memory is slow.> Would it be possible to stop the code motion you want to block by > explicitly exposing data dependencies? Or simply disabling some > optimizations with pragmas? > >Code motion would be fine in theory, though as has been proposed, this intrinsic would prevent it (because there isn't an attribute that doesn't allow dead code removal but still permits reordering, as far as I'm aware). Rust doesn't have pragmas, and besides, that would also affect the whole module (or the whole crate, to use Rust's vernacular), whereas this intrinsic would be used in a much more targeted manner (ie at the SSA value level) by the developer and leave the rest of the module unmolested. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151106/f0d8f5e9/attachment.html>
Jeroen Dobbelaere via llvm-dev
2015-Nov-10 10:04 UTC
[llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
Hi Richard, why don't you use an inline assembly that returns your argument in a register ? For example: ---- int foo(int a, int b) { int c=a+b+10; __asm__ volatile ("":"=r"(c):"0"(c):"memory"); return c+20; } --- results in: (Note that the +10 and +20 were not combined) --- foo: # @foo .cfi_startproc # BB#0: leal 10(%rdi,%rsi), %eax #APP #NO_APP addl $20, %eax retq .Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc -- At llvm-ir level, it looks like: --- define i32 @foo(i32 %a, i32 %b) #0 { %1 = add i32 %a, 10 %2 = add i32 %1, %b %3 = tail call i32 asm sideeffect "", "=r,0,~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %2) #1, !srcloc !1 %4 = add nsw i32 %3, 20 ret i32 %4 } --- Greetings, Jeroen Dobbelaere From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Richard Diamond via llvm-dev Sent: Tuesday, November 03, 2015 12:58 AM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations Hey all, I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc. # Background/Motivation In Rust we have a crate called `test` which provides a function, `black_box`, which is designed to be a no-op function that prevents constprop, die, etc from interfering with tests/benchmarks but otherwise doesn't negatively affect resulting machine code quality. `test` currently implements this function by using inline asm, which marks a pointer to the argument as used by the assembly. At the IR level, this creates an alloca, stores it's argument to it, calls the no-op inline asm with the alloca pointer, and then returns a load of the alloca. Obviously, `mem2reg` would normally optimize this sort of pattern away, however the deliberate use of the no-op asm prevents other desirable optimizations (such as the aforementioned `mem2reg` pass) a little too well. Existing and upcoming virtual ISA targets also don't have this luxury (PNaCl/JS and WebAssembly, respectively). For these kind of targets, Rust's `test` currently forbids inlining of `black_box`, which crudely achieves the same effect. This is undesirable for any target because of the associated call overhead. The IR for `test::black_box::<i32>` is currently (it gets inlined, as desired, so I've omitted the function signature): ````llvm %dummy.i = alloca i32, align 4 %2 = bitcast i32* %dummy.i to i8* call void @llvm.lifetime.start(i64 4, i8* %2) #1 ; Here, the value operand was the original argument to `test::black_box::<i32>` store i32 2, i32* %dummy.i, align 4 call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i32* %dummy.i) #1, !srcloc !0 %3 = load i32, i32* %dummy.i, align 4 call void @llvm.lifetime.end(i64 4, i8* %2) #1 ```` This could be better. # Solution Add a new intrinsic, called `llvm.blackbox`, which accepts a value of any type and returns a value of the same type. As with many other intrinsics, this intrinsic shall remain unknown to all optimizations, before and during codegen. Specifically, this intrinsic should prevent all optimizations which operate by assuming properties of the value passed to the intrinsic. Once the last optimization pass (of any kind) is finished, all calls can be RAUW its argument. Table-gen def: ```tablegen def int_blackbox : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>]>; ``` Thus, using the previous example, `%3` would become: ```llvm %3 = call i32 @llvm.blackbox.i32(i32 2) ``` # Thoughts and suggestions welcome. Thanks, Richard Diamond -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151110/7cf35b2f/attachment.html>