James Y Knight via llvm-dev
2019-Jun-27 20:28 UTC
[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky. How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened? It'd sure be a whole lot easier to not have the values valid on the secondary exit blocks. Can you present examples where preserving the values on the branches is be a requirement? (I feel like I've seen some before, but it'd be good to be reminded). E.g., imagine code like this: << entry: br i1 %cmp, label %true, label %false true: %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error] false: %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error] error: %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]>>Normally, if a common register cannot be found to use across relevant block transitions, it can simply fall back on storing values on the stack. But, that's not possible with callbr, since the location is fixed by the asm, and no code can be inserted after the values are written, before the branch (as both value writes and the branch are inside the asm blob). So what can be done, in that case? One thing you might be able to do is to duplicate the error block so you have a different target for every callbr, but I'd consider that an invalid transform (because the address of the block is potentially being used as a value in the asm too). Another thing you could perhaps do is reify the source-block-number as an actual value -- storing a "1" before the callbr in true, and storing a "2" before the callbr in "false". Then conditional-branch based on that...but that's real ugly... On Thu, Jun 27, 2019 at 3:18 PM Nick Desaulniers via cfe-dev < cfe-dev at lists.llvm.org> wrote:> + CBL mailing list > > > On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <isanbard at gmail.com> wrote: > >> [Adding the correct cfe-dev mailing list address.] >> >> On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <isanbard at gmail.com> >> wrote: >> >>> Now that ASM goto support has landed, Nick Desaulniers and I wrote up a >>> document describing how to expand clang's implementation of ASM goto to >>> support output constraints. The work *should* be straight-forward, but >>> as always will need to be verified to work. Below is a copy of our >>> whitepaper. Please take a look and offer any comments you have. >>> >>> Share and enjoy! >>> -bw >>> Overview >>> >>> Support for asm goto >>> <https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html> with output >>> constraints is a feature that the Linux community is interested in having. Adding >>> this new feature should give Clang a higher profile in the Linux community: >>> >>> >>> - >>> >>> It demonstrates the Clang community's commitment to supporting Linux. >>> - >>> >>> Developers are likely to adopt it on their own, which means they >>> will need to use Clang in some fashion, either as a complete replacement >>> for or in addition to GCC. >>> >>> Current state >>> >>> Clang's implementation of asm goto converts this code: >>> >>> int vogon(unsigned a, unsigned b) { >>> asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error); >>> return a + b; >>> >>> error: >>> return -1; >>> } >>> >>> into the following LLVM IR: >>> >>> define i32 @vogon(i32 %a, i32 %b) { >>> entry: >>> callbr void asm sideeffect "poetry $0, $1", "r,r,X" >>> (i32 %a, i32 %b, i8* blockaddress(@vogon, %return)) >>> to label %asm.fallthrough [label %return] >>> >>> asm.fallthrough: >>> %add = add i32 %b, %a >>> br label %return >>> >>> return: >>> %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ] >>> ret i32 %retval.0 >>> } >>> >>> Our proposal won't change LLVM's current behavior–i.e. a callbr without >>> a return value will act in the same way as the current implementation. >>> Proposal >>> >>> GCC restricts asm goto from having output constraints due to >>> limitations in its internal representation–i.e. GCC's control transfer >>> instructions cannot have outputs. For example: >>> >>> int vogon(int a, int b) { >>> asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error); >>> return a + b; >>> >>> error: >>> return -1; >>> } >>> >>> currently fails to compile in GCC with the following error: >>> >>> <source>: In function 'vogon': >>> <source>:2:29: error: expected ':' before string constant >>> 2 | asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error); >>> | ^~~~~ >>> | : >>> >>> >>> >>> ToT Clang matches GCC's behavior: >>> >>> <source>:2:30: error: 'asm goto' cannot have output constraints >>> asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error); >>> >>> However, LLVM doesn't restrict control transfer instructions from having >>> outputs (e.g. the invoke instruction >>> <https://llvm.org/docs/LangRef.html#invoke-instruction>). We propose >>> changing LLVM's callbr instruction >>> <https://llvm.org/docs/LangRef.html#callbr-instruction> to allow return >>> values, similar to how LLVM's implementation of inline assembly (via the >>> call instruction <https://llvm.org/docs/LangRef.html#call-instruction>) >>> allows return values. Since there can potentially be zero to many output >>> constraints, callbr would now return an aggregate which contains an >>> element for each output constraint. These values would then be extracted >>> via extractvalue. With our proposal, the above C example will be >>> converted to LLVM IR like this: >>> >>> define i32 @vogon(i32 %a, i32 %b) { >>> entry: >>> %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X" >>> (i8* blockaddress(@vogon, %error)) >>> to label %asm.fallthrough [label %error] >>> >>> >>> asm.fallthrough: >>> %asmresult.a = extractvalue { i32, i32 } %0, 0 >>> %asmresult.b = extractvalue { i32, i32 } %0, 1 >>> %result = add i32 %asmresult.a, %asmresult.b >>> ret i32 %result >>> >>> error: >>> ret i32 -1 >>> } >>> >>> Note that unlike the invoke instruction, callbr's return values are >>> assumed valid on all branches. The assumption is that the programmer >>> knows what their inline assembly is doing and where its output constraints >>> are valid. If the value isn't valid on a particular branch but is used >>> there anyway, then the result is a poison value. (Also, if a callbr's >>> return values affect a branch, it will be handled similarly to the >>> invoke instruction's implementation.) Here's an example of how this >>> would work: >>> >>> int vogon(int a, int b) { >>> asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error); >>> if (a == 42) >>> return 42 * b; >>> return a + b; >>> >>> error: >>> return b - 42; >>> } >>> >>> generates the following LLVM IR: >>> >>> define i32 @vogon(i32 %a, i32 %b) { >>> entry: >>> %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X" >>> (i8* blockaddress(@vogon, %error)) >>> to label %asm.fallthrough [label %error] >>> >>> asm.fallthrough: >>> %asmresult.a = extractvalue { i32, i32 } %0, 0 >>> %tobool = icmp eq i32 %asmresult.a, 42 >>> br i1 %tobool, label %if.true, label %if.false >>> >>> if.true: >>> %asmresult.b = extractvalue { i32, i32 } %0, 1 >>> %mul = mul i32 42, %asmresult.b >>> ret i32 %mul >>> >>> if.false: >>> %asmresult.a.1 = extractvalue { i32, i32 } %0, 0 >>> %asmresult.b.1 = extractvalue { i32, i32 } %0, 1 >>> %result = add i32 %asmresult.a.1, %asmresult.b.1 >>> ret i32 %result >>> >>> error: >>> %asmresult.b.error = extractvalue { i32, i32 } %0, 1 >>> %error.result = sub i32 %asmresult.b.error, 42 >>> ret i32 %error.result >>> } >>> Implementation >>> >>> Because LLVM's invoke instruction is a terminating instruction that may >>> have return values, we can use it as a template for callbr's changes. >>> The new functionality lies mostly in modifying Clang's front-end. In >>> particular, we need to do the following: >>> >>> >>> - >>> >>> Remove all error checks restricting asm goto from returning values, >>> and >>> - >>> >>> Generate the extractvalue instructions on callbr's branches. >>> >>> >>> LLVM's middle- and back-ends need to be audited to ensure there are no >>> restrictions on callbr returning a value. We expect all passes to Just >>> Work™ without modifications, but of course will be verified. >>> >> > > -- > Thanks, > ~Nick Desaulniers > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/36bc48dc/attachment-0001.html>
Bill Wendling via llvm-dev
2019-Jun-27 20:44 UTC
[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at google.com> wrote:> I think this is fine, except that it stops at the point where things > actually start to get interesting and tricky. > > How will you actually handle the flow of values from the callbr into the > error blocks? A callbr can specify requirements on where its outputs live. > So, what if two callbr, in different branches of code, specify _different_ > constraints for the same output, and list the same block as a possible > error successor? How can the resulting phi be codegened? > > This is where I fall back on the statement about how "the programmer knowswhat they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path. It'd sure be a whole lot easier to not have the values valid on the> secondary exit blocks. Can you present examples where preserving the values > on the branches is be a requirement? (I feel like I've seen some before, > but it'd be good to be reminded). > > E.g., imagine code like this: > > << > entry: > br i1 %cmp, label %true, label %false > true: > %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" > (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error] > false: > %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", > "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label > %asm.fallthrough [label %error] > > error: > %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ] > >> > > Normally, if a common register cannot be found to use across relevant > block transitions, it can simply fall back on storing values on the stack. > But, that's not possible with callbr, since the location is fixed by the > asm, and no code can be inserted after the values are written, before the > branch (as both value writes and the branch are inside the asm blob). So > what can be done, in that case? > > One thing you might be able to do is to duplicate the error block so you > have a different target for every callbr, but I'd consider that an invalid > transform (because the address of the block is potentially being used as a > value in the asm too). > > Another thing you could perhaps do is reify the source-block-number as an > actual value -- storing a "1" before the callbr in true, and storing a "2" > before the callbr in "false". Then conditional-branch based on that...but > that's real ugly... >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/a3488100/attachment.html>
Bill Wendling via llvm-dev
2019-Jun-28 19:00 UTC
[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at gmail.com> wrote:> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at google.com> > wrote: > >> I think this is fine, except that it stops at the point where things >> actually start to get interesting and tricky. >> >> How will you actually handle the flow of values from the callbr into the >> error blocks? A callbr can specify requirements on where its outputs live. >> So, what if two callbr, in different branches of code, specify _different_ >> constraints for the same output, and list the same block as a possible >> error successor? How can the resulting phi be codegened? >> >> This is where I fall back on the statement about how "the programmer > knows what they're doing". Perhaps I'm being too cavalier here? My concern, > if you want to call it that, is that we don't be too restrictive on the new > behavior. For example, the "asm goto" may set a register to an error value > (made up on the spot; may not be a common use). But, if there's no real > reason to have the value be valid on the abnormal path, then sure we can > declare that it's not valid on the abnormal path. > > I think I should explain my "programmer knows what they're doing"statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply). When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors): if: br i1 %cmp, label %true, label %false true: %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" () br label %end false: %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" () br label %end end: %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ] How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them. -bw> It'd sure be a whole lot easier to not have the values valid on the >> secondary exit blocks. Can you present examples where preserving the values >> on the branches is be a requirement? (I feel like I've seen some before, >> but it'd be good to be reminded). >> >> E.g., imagine code like this: >> >> << >> entry: >> br i1 %cmp, label %true, label %false >> true: >> %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", >> "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label >> %asm.fallthrough [label %error] >> false: >> %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", >> "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label >> %asm.fallthrough [label %error] >> >> error: >> %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ] >> >> >> >> Normally, if a common register cannot be found to use across relevant >> block transitions, it can simply fall back on storing values on the stack. >> But, that's not possible with callbr, since the location is fixed by the >> asm, and no code can be inserted after the values are written, before the >> branch (as both value writes and the branch are inside the asm blob). So >> what can be done, in that case? >> >> One thing you might be able to do is to duplicate the error block so you >> have a different target for every callbr, but I'd consider that an invalid >> transform (because the address of the block is potentially being used as a >> value in the asm too). >> >> Another thing you could perhaps do is reify the source-block-number as an >> actual value -- storing a "1" before the callbr in true, and storing a "2" >> before the callbr in "false". Then conditional-branch based on that...but >> that's real ugly... >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/e08ca2dd/attachment.html>