thr3ads.net - llvm dev - [LLVMdev] Upstreaming PNaCl's IR simplification passes [Mar 2014]

If this information is useful, please help other people find it:
Share via:

James Courtier-Dutton

2014-Mar-06 00:53 UTC

[LLVMdev] Upstreaming PNaCl's IR simplification passes

>
> Just in case it gets lost in my longer reply, I want to emphasize that if
> these will be used to simplify the in-tree backends and those backend
> maintainers are on board, then I am *totally* in favor of this going into
> the tree. My concerns are heavily based on the fact that as proposed, none
> of that seems likely to happen.
>
>
> Framing the problem differently, what I see is this:
> PNaCl (and, by implication elsewhere in the thread, Emscripten and a
> hypothetical new C backend or MSIL backend) are basically backends that
> don't go through the SelectionDAG mechanism and largely bypass the
current
> backend logic that legalizes IR for a backend. The problem is that
basically
> all the targets in the LLVM tree use SelectionDAG and associated
mechanisms.
> Arguably the NVPTX backend might benefit from such an approach (since it
> ultimately needs to allocate virtual registers), but I've never
developed
> any backends, so I don't know what the tradeoffs are there. In any
case,
> it's extremely unlikely that code which is only useful for IR-based
backends
> instead of SelectionDAG-based backends could be useful for any in-tree
> targets.
>
> In that frame of model, though, I do see a potential compromise: instead of
> proposing virtual clones of what is essentially IR legalization for an
> IR-based backend, why not attempt to generalize the current legalization
> logic to work for IR-based backends instead of only SelectionDAG-based
> backends?
>
I agree that is should be a lot easier to create a backend.
See the below comment describing a virtual machine architecture.
A complete CPU definition in 54 lines of text!!!
For those interested, the 54 lines is taken from GCHQ.
I think it would be useful to maybe use the below as an example
backend template.

I would also welcome more IR passes in the LLVM tree, even if the
current backends don't use them.
I can probably re-use them in my project.
So long as there were tests for them, and an interested party could
claim "MAINTAINER" for each one.
A similar model to that of the Linux kernel MAINTAINER.
Anyone can submit a device driver to the linux kernel and it will go
into upstream mainline, so long as a "MAINTAINER" is identified.
If the MAINTAINER becomes absent for a period of time, and no one else
claims it, the driver is then removed again.
A single LLVM IR pass is a relatively unobtrusive piece of the LLVM code base.
How would adding a new LLVM IR pass into the upstream LLVM code base
cause problems for the core MAINTAINERS?

Kind Regards

James



 exec: function()
  {
    // virtual machine architecture
    // ++++++++++++++++++++++++++++
    //
    // segmented memory model with 16-byte segment size (notation seg:offset)
    //
    // 4 general-purpose registers (r0-r3)
    // 2 segment registers (cs, ds equiv. to r4, r5)
    // 1 flags register (fl)
    //
    // instruction encoding
    // ++++++++++++++++++++
    //
    //           byte 1               byte 2 (optional)
    // bits      [ 7 6 5 4 3 2 1 0 ]  [ 7 6 5 4 3 2 1 0 ]
    // opcode      - - -
    // mod               -
    // operand1            - - - -
    // operand2                         - - - - - - - -
    //
    // operand1 is always a register index
    // operand2 is optional, depending upon the instruction set specified below
    // the value of mod alters the meaning of any operand2
    //   0: operand2 = reg ix
    //   1: operand2 = fixed immediate value or target segment
(depending on instruction)
    //
    // instruction set
    // +++++++++++++++
    //
    // Notes:
    //   * r1, r2 => operand 1 is register 1, operand 2 is register 2
    //   * movr r1, r2 => move contents of register r2 into register r1
    //
    // opcode | instruction | operands (mod 0) | operands (mod 1)
    // -------+-------------+------------------+-----------------
    // 0x00   | jmp         | r1               | r2:r1
    // 0x01   | movr        | r1, r2           | rx,   imm
    // 0x02   | movm        | r1, [ds:r2]      | [ds:r1], r2
    // 0x03   | add         | r1, r2           | r1,   imm
    // 0x04   | xor         | r1, r2           | r1,   imm
    // 0x05   | cmp         | r1, r2           | r1,   imm
    // 0x06   | jmpe        | r1               | r2:r1
    // 0x07   | hlt         | N/A              | N/A
    //
    // flags
    // +++++
    //
    // cmp r1, r2 instruction results in:
    //   r1 == r2 => fl = 0
    //   r1 < r2  => fl = 0xff
    //   r1 > r2  => fl = 1
    //
    // jmpe r1
    //   => if (fl == 0) jmp r1
    //      else nop

    throw "VM.exec not yet implemented";
  }

David Sehr

2014-Mar-06 18:23 UTC

head link

[LLVMdev] Upstreaming PNaCl's IR simplification passes

All,

Thanks for the insights and thoughtful suggestions ventured here.  If I
may, I wanted to summarize the discussion so far, add a few small points,
and suggest a step forward.  I'll begin by a recap and add a few points.

First, the central objections to placing these passes in-tree seem to
center mostly around additional complexity in the code base and lack of
testing, etc.  As several have pointed out, the complexity of the passes
themselves is very low, and in all cases is intended to *reduce* the
complexity of the IR surface exposed to consumers of the bitcode.  We
believe, as others on this list have expressed, a smaller IR surface to
understand and implement enables new users of the LLVM infrastructure.

Second, there are teams that are already using LLVM today for
non-traditional applications outside the tree, in products that their
respective companies support.  For instance, LLVM is already being used
efforts to make native code in the web something real.  To plug the idea
we're all pursuing there -- bringing native code to the web increases the
reach of developers (reminder: there are now over a billion web browser
users out there, and the number is growing fast) and has demonstrated
advantages for deployment and update.  To name one of the projects pursuing
this, Mozilla's Emscripten effort has a JavaScript backend in very active
development, and Alon has stated here that he would like to upstream his
work eventually.  And of course, my own team has been using LLVM since the
beginning and have released PNaCl, based on LLVM bitcode, as a feature
available in Chrome.

Third, an objection was raised that, if these simplifications are useful,
they should be used by in-tree backends.  As someone noted, some of the
transformations are more like legalizations for backends, and I would
imagine that they would be more useful for some than for others (e.g, GEP
simplifications may be a better match for a RISC like MIPS than for x86).
 I propose we get specific about the passes and see which backend
maintainers think they might profit from using them.  But even if other
in-tree backends don't want to use the passes, there seems to be a class of
backend that is not yet in the tree that does want to use similar
transformations.  Or simplifying the IR in these ways will produce more
such users.

Fourth, an objection came up that these passes were similar to ones that
used to be in LLVM, but were removed.  Sometimes ideas just have their
right time, and we seem to have found more users than just the C backend.

Last, some have objected that the passes bake in C/C++/ELF runtime library
conventions such as baking in init/fini/ctor/dtor processing and libc
startup/teardown.  The passes Mark described are not monolithic, and we
would be happy to share any or all of them individually if such
transformations aren't deemed interesting.

Now to the steps ahead:  I propose Mark sends some patches for the simpler
passes out to let you, our esteemed colleagues, discuss them concretely.
 As several have noted, these patches could be interesting to them, and it
seems reasonable to pass them along here for the potential benefit of those
folks. Also as Alon noted, it would be nice to have this underway before he
comes back with his backend work.  Your thoughts are respectfully solicited.

Cheers,

David

On Wed, Mar 5, 2014 at 4:53 PM, James Courtier-Dutton <
james.dutton at gmail.com> wrote:
> >
> > Just in case it gets lost in my longer reply, I want to emphasize that
if
> > these will be used to simplify the in-tree backends and those backend
> > maintainers are on board, then I am *totally* in favor of this going
into
> > the tree. My concerns are heavily based on the fact that as proposed,
> none
> > of that seems likely to happen.
> >
> >
> > Framing the problem differently, what I see is this:
> > PNaCl (and, by implication elsewhere in the thread, Emscripten and a
> > hypothetical new C backend or MSIL backend) are basically backends
that
> > don't go through the SelectionDAG mechanism and largely bypass the
> current
> > backend logic that legalizes IR for a backend. The problem is that
> basically
> > all the targets in the LLVM tree use SelectionDAG and associated
> mechanisms.
> > Arguably the NVPTX backend might benefit from such an approach (since
it
> > ultimately needs to allocate virtual registers), but I've never
developed
> > any backends, so I don't know what the tradeoffs are there. In any
case,
> > it's extremely unlikely that code which is only useful for
IR-based
> backends
> > instead of SelectionDAG-based backends could be useful for any in-tree
> > targets.
> >
> > In that frame of model, though, I do see a potential compromise:
instead
> of
> > proposing virtual clones of what is essentially IR legalization for an
> > IR-based backend, why not attempt to generalize the current
legalization
> > logic to work for IR-based backends instead of only SelectionDAG-based
> > backends?
> >
>
> I agree that is should be a lot easier to create a backend.
> See the below comment describing a virtual machine architecture.
> A complete CPU definition in 54 lines of text!!!
> For those interested, the 54 lines is taken from GCHQ.
> I think it would be useful to maybe use the below as an example
> backend template.
>
> I would also welcome more IR passes in the LLVM tree, even if the
> current backends don't use them.
> I can probably re-use them in my project.
> So long as there were tests for them, and an interested party could
> claim "MAINTAINER" for each one.
> A similar model to that of the Linux kernel MAINTAINER.
> Anyone can submit a device driver to the linux kernel and it will go
> into upstream mainline, so long as a "MAINTAINER" is identified.
> If the MAINTAINER becomes absent for a period of time, and no one else
> claims it, the driver is then removed again.
> A single LLVM IR pass is a relatively unobtrusive piece of the LLVM code
> base.
> How would adding a new LLVM IR pass into the upstream LLVM code base
> cause problems for the core MAINTAINERS?
>
> Kind Regards
>
> James
>
>
>
>  exec: function()
>   {
>     // virtual machine architecture
>     // ++++++++++++++++++++++++++++
>     //
>     // segmented memory model with 16-byte segment size (notation
> seg:offset)
>     //
>     // 4 general-purpose registers (r0-r3)
>     // 2 segment registers (cs, ds equiv. to r4, r5)
>     // 1 flags register (fl)
>     //
>     // instruction encoding
>     // ++++++++++++++++++++
>     //
>     //           byte 1               byte 2 (optional)
>     // bits      [ 7 6 5 4 3 2 1 0 ]  [ 7 6 5 4 3 2 1 0 ]
>     // opcode      - - -
>     // mod               -
>     // operand1            - - - -
>     // operand2                         - - - - - - - -
>     //
>     // operand1 is always a register index
>     // operand2 is optional, depending upon the instruction set specified
> below
>     // the value of mod alters the meaning of any operand2
>     //   0: operand2 = reg ix
>     //   1: operand2 = fixed immediate value or target segment
> (depending on instruction)
>     //
>     // instruction set
>     // +++++++++++++++
>     //
>     // Notes:
>     //   * r1, r2 => operand 1 is register 1, operand 2 is register 2
>     //   * movr r1, r2 => move contents of register r2 into register r1
>     //
>     // opcode | instruction | operands (mod 0) | operands (mod 1)
>     // -------+-------------+------------------+-----------------
>     // 0x00   | jmp         | r1               | r2:r1
>     // 0x01   | movr        | r1, r2           | rx,   imm
>     // 0x02   | movm        | r1, [ds:r2]      | [ds:r1], r2
>     // 0x03   | add         | r1, r2           | r1,   imm
>     // 0x04   | xor         | r1, r2           | r1,   imm
>     // 0x05   | cmp         | r1, r2           | r1,   imm
>     // 0x06   | jmpe        | r1               | r2:r1
>     // 0x07   | hlt         | N/A              | N/A
>     //
>     // flags
>     // +++++
>     //
>     // cmp r1, r2 instruction results in:
>     //   r1 == r2 => fl = 0
>     //   r1 < r2  => fl = 0xff
>     //   r1 > r2  => fl = 1
>     //
>     // jmpe r1
>     //   => if (fl == 0) jmp r1
>     //      else nop
>
>     throw "VM.exec not yet implemented";
>   }
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140306/1d9ae639/attachment.html>

Chris Lattner

2014-Mar-06 23:50 UTC

head link

[LLVMdev] Upstreaming PNaCl's IR simplification passes

On Mar 6, 2014, at 10:23 AM, David Sehr <sehr at google.com>
wrote:> Now to the steps ahead:  I propose Mark sends some patches for the simpler
passes out to let you, our esteemed colleagues, discuss them concretely.  As
several have noted, these patches could be interesting to them, and it seems
reasonable to pass them along here for the potential benefit of those folks.
Also as Alon noted, it would be nice to have this underway before he comes back
with his backend work.  Your thoughts are respectfully solicited.
I agree, I don't think that an abstract discussion is useful here, lets talk
details.

One point though: I don't understand the claims that this will make it
easier to write a backend.  The LLVM backend infrastructure already handles the
lowering of constantexprs and others constructs for a target.  AFAIK, these
sorts of lowering passes would only help someone not using our target.

Supporting pnacl and emscripten are still worthwhile goals if the maintenance
complexity is balanced right, I just have never understood the point about
simplifying target descriptions.

-Chris

llvm dev - Mar 2014 - [LLVMdev] Upstreaming PNaCl's IR simplification passes

[LLVMdev] Upstreaming PNaCl's IR simplification passes

[LLVMdev] Upstreaming PNaCl's IR simplification passes

[LLVMdev] Upstreaming PNaCl's IR simplification passes