thr3ads.net - llvm dev - [LLVMdev] RFC: Machine Level IR text-based serialization format [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Alex L

2015-Apr-28 16:56 UTC

[LLVMdev] RFC: Machine Level IR text-based serialization format

Hi all,


I would like to propose a text-based, human readable format that will be used to

serialize the machine level IR. The major goal of this format is to allow LLVM

to save the machine level IR after any code generation pass and then to load

it again and continue running passes on the machine level IR. The
primary use case

of this format is to enable easier testing process for the code
generation passes,

by allowing the developers to write tests that load the IR, then invoke just a

specific code gen pass and then inspect the output of that pass by checking the

printed out IR.



The proposed format has a number of key features:

- It stores the machine level IR and the optional LLVM IR in one text file.

- The connections between the machine level IR and the LLVM IR are preserved.

- The format uses a YAML based container for most of the data
structures. The LLVM

  IR is embedded in the YAML container.

- The format also uses a new, text-based syntax to serialize the
machine instructions.

  The instructions are embedded in YAML.


This is an incomplete example of a YAML file containing the LLVM IR,
the machine level IR

and the instructions:


---

ir: |

  define i32 @fact(i32 %n) {

    %1 = alloca i32, align 4

    store i32 %n, i32* %1, align 4

    %2 = load i32, i32* %1, align 4

    %3 = icmp eq i32 %2, 0

    br i1 %3, label %10, label %4


  ; <label>:4                                       ; preds = %0

    %5 = load i32, i32* %1, align 4

    %6 = sub nsw i32 %5, 1

    %7 = call i32 @fact(i32 %6)

    %8 = load i32, i32* %1, align 4

    %9 = mul nsw i32 %7, %8

    br label %10


  ; <label>:10                                      ; preds = %0, %4

    %11 = phi i32 [ %9, %4 ], [ 1, %0 ]

    ret i32 %11

  }


...

---

number:          0

name:            fact

alignment:       4

regInfo:

  ....

frameInfo:

  ....

body:

  - bb:              0

    llbb:            '%0'

    successors:      [ 'bb#2', 'bb#1' ]

    liveIns:         [ '%edi' ]

    instructions:

      - 'push64r undef %rax, %rsp, %rsp'

      - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'

      - ....

        ....

  - bb:              1

    llbb:            '%4'

    successors:      [ 'bb#2' ]

    instructions:

      - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'

      - ....

        ....

  - ....

    ....

...


The example above shows a YAML file with two YAML documents (delimited by `---`

and `...`) containing the LLVM IR and the machine function information
for the function `fact`.



When a specific format is chosen, I'll start with patches that serialize the

embedded LLVM IR. Then I'll add support for things like machine functions
and

machine basic blocks, and I think that an intrusive implementation
will work best

for data structures like these. After that I will continue adding support for

serialization of the remaining data structures.



Thanks for reading through the proposal. What are you thoughts about
this format?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/c647b3ef/attachment.html>

Krzysztof Parzyszek

2015-Apr-28 17:09 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

Looks good.  How are you planning to "assemble" the MI-level YAML 
description into the actual in-memory IR?

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Quentin Colombet

2015-Apr-28 17:14 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

Hi Alex,

Thanks for working on this.

Personally I would rather not have to write YAML inputs but instead resort on
the what the machine dumps look like. That being said, I can live with YAML :).

More importantly, how do you plan to report syntax errors to the users?
Things like invalid instruction, invalid registers, etc.?
What about unallocated code, i.e., virtual registers, invalid SSA form, etc.?

Cheers,
Q.> On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com> wrote:
> 
> Hi all,
> 
> I would like to propose a text-based, human readable format that will be
used to
> serialize the machine level IR. The major goal of this format is to allow
LLVM
> to save the machine level IR after any code generation pass and then to
load
> it again and continue running passes on the machine level IR. The primary
use case
> of this format is to enable easier testing process for the code generation
passes,
> by allowing the developers to write tests that load the IR, then invoke
just a
> specific code gen pass and then inspect the output of that pass by checking
the
> printed out IR.
> 
> 
> The proposed format has a number of key features:
> - It stores the machine level IR and the optional LLVM IR in one text file.
> - The connections between the machine level IR and the LLVM IR are
preserved.
> - The format uses a YAML based container for most of the data structures.
The LLVM
>   IR is embedded in the YAML container.
> - The format also uses a new, text-based syntax to serialize the machine
instructions.
>   The instructions are embedded in YAML.
> 
> This is an incomplete example of a YAML file containing the LLVM IR, the
machine level IR
> and the instructions:
> 
> ---
> ir: |
>   define i32 @fact(i32 %n) {
>     %1 = alloca i32, align 4
>     store i32 %n, i32* %1, align 4
>     %2 = load i32, i32* %1, align 4
>     %3 = icmp eq i32 %2, 0
>     br i1 %3, label %10, label %4
> 
>   ; <label>:4                                       ; preds = %0
>     %5 = load i32, i32* %1, align 4
>     %6 = sub nsw i32 %5, 1
>     %7 = call i32 @fact(i32 %6)
>     %8 = load i32, i32* %1, align 4
>     %9 = mul nsw i32 %7, %8
>     br label %10
> 
>   ; <label>:10                                      ; preds = %0, %4
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>     ret i32 %11
>   }
> 
> ...
> ---
> number:          0
> name:            fact
> alignment:       4
> regInfo:
>   ....
> frameInfo:
>   ....
> body:
>   - bb:              0
>     llbb:            '%0'
>     successors:      [ 'bb#2', 'bb#1' ]
>     liveIns:         [ '%edi' ]
>     instructions:
>       - 'push64r undef %rax, %rsp, %rsp'
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>       - ....
>         ....
>   - bb:              1
>     llbb:            '%4'
>     successors:      [ 'bb#2' ]
>     instructions:
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>       - ....
>         ....
>   - ....
>     ....
> ...
> 
> The example above shows a YAML file with two YAML documents (delimited by
`---`
> and `...`) containing the LLVM IR and the machine function information for
the function `fact`.
> 
> 
> When a specific format is chosen, I'll start with patches that
serialize the
> embedded LLVM IR. Then I'll add support for things like machine
functions and
> machine basic blocks, and I think that an intrusive implementation will
work best
> for data structures like these. After that I will continue adding support
for
> serialization of the remaining data structures.
> 
> 
> Thanks for reading through the proposal. What are you thoughts about this
format?
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/6f40e8eb/attachment.html>

Hal Finkel

2015-Apr-28 17:15 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

----- Original Message -----
> From: "Alex L" <arphaman at gmail.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, April 28, 2015 11:56:42 AM
> Subject: [LLVMdev] RFC: Machine Level IR text-based serialization
> format
> Hi all,
> I would like to propose a text-based, human readable format that will
> be used to serialize the machine level IR. The major goal of this
> format is to allow LLVM to save the machine level IR after any code
> generation pass and then to load it again and continue running
> passes on the machine level IR. The primary use case of this format
> is to enable easier testing process for the code generation passes,
> by allowing the developers to write tests that load the IR, then
> invoke just a specific code gen pass and then inspect the output of
> that pass by checking the printed out IR.
> The proposed format has a number of key features: - It stores the
> machine level IR and the optional LLVM IR in one text file. - The
> connections between the machine level IR and the LLVM IR are
> preserved. - The format uses a YAML based container for most of the
> data structures. The LLVM IR is embedded in the YAML container. -
> The format also uses a new, text-based syntax to serialize the
> machine instructions. The instructions are embedded in YAML.
> This is an incomplete example of a YAML file containing the LLVM IR,
> the machine level IR and the instructions:
> --- ir: | define i32 @fact(i32 %n) { %1 = alloca i32, align 4 store
> i32 %n, i32* %1, align 4 %2 = load i32, i32* %1, align 4 %3 = icmp
> eq i32 %2, 0 br i1 %3, label %10, label %4
> ; <label>:4 ; preds = %0 %5 = load i32, i32* %1, align 4 %6 = sub nsw
> i32 %5, 1 %7 = call i32 @fact(i32 %6) %8 = load i32, i32* %1, align
> 4 %9 = mul nsw i32 %7, %8 br label %10
> ; <label>:10 ; preds = %0, %4 %11 = phi i32 [ %9, %4 ], [ 1, %0 ] ret
> i32 %11 }
> ... --- number: 0 name: fact alignment: 4 regInfo: .... frameInfo:
> .... body: - bb: 0 llbb: '%0' successors: [ 'bb#2',
'bb#1' ]
> liveIns: [ '%edi' ] instructions: - 'push64r undef %rax, %rsp,
%rsp'
> - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'Hi Alex, 

I think this looks promising. What are the 1 an 4 above? How are you proposing
to serialize operand flags (dead, etc.)?

-Hal 
> - .... .... - bb: 1 llbb: '%4' successors: [ 'bb#2' ]
instructions: -
> '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg' - .... .... - .... ....
> ...
> The example above shows a YAML file with two YAML documents
> (delimited by `---` and `...`) containing the LLVM IR and the
> machine function information for the function `fact`.
> When a specific format is chosen, I'll start with patches that
> serialize the embedded LLVM IR. Then I'll add support for things
> like machine functions and machine basic blocks, and I think that an
> intrusive implementation will work best for data structures like
> these. After that I will continue adding support for serialization
> of the remaining data structures.
> Thanks for reading through the proposal. What are you thoughts about
> this format?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/11d88a1a/attachment.html>

Alex L

2015-Apr-28 17:46 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-28 10:09 GMT-07:00 Krzysztof Parzyszek <kparzysz at
codeaurora.org>:
> Looks good.  How are you planning to "assemble" the MI-level YAML
> description into the actual in-memory IR?

I plan on developing a parser for the new text format for the machine
instructions. This parser will parse instructions, operands and memory
operands,
and it will after run the machine function and the embedded LLVM IR are
parsed, so that the references to the basic blocks, constant pools,
frame indices, etc. can be resolved immediately. Each string literal in a
list of instructions in a machine basic blocks will be parsed using this
parser
and then they will be assembled together into a list of instructions for
that basic block.

I hope that answers your question,
Alex.

>
>
> -Krzysztof
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/0446bb66/attachment.html>

Alex L

2015-Apr-28 18:00 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-28 10:14 GMT-07:00 Quentin Colombet <qcolombet at apple.com>:
> Hi Alex,
>
> Thanks for working on this.
>
> Personally I would rather not have to write YAML inputs but instead resort
> on the what the machine dumps look like. That being said, I can live with
> YAML :).
>
> More importantly, how do you plan to report syntax errors to the users?
> Things like invalid instruction, invalid registers, etc.?
> What about unallocated code, i.e., virtual registers, invalid SSA form,
> etc.?
>
> Cheers,
> Q.
>
Thanks,

Unfortunately, the machine dumps are quite incomplete (and tricky to parse
too!), and thus some sort of new syntax has to be developed.
I think that a YAML based container is a good candidate for this purpose,
as it has a structured format that represents things like machine functions,
frame information, register information, target specific machine function
details, etc in a clear and readable way.

I haven't thought about error reporting that much, as I've been mostly
working on developing the syntax and making sure that all the data
structures
can be represented by it. But I believe that the errors that crop up in an
invalid machine instruction syntax, like invalid basic block references,
invalid instructions,
etc. can be reported quite well and I can rely on already existing error
reporting facilities in LLVM to help me. The more structural errors, like
missing attributes
will be handled by the YAML parser automatically, and I might extend it to
provide better/more specific error messages. And I think that it's possible
to use the machine verifier to catch the other errors that you've mentioned.

Alex


> On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com> wrote:
>
> Hi all,
>
>
> I would like to propose a text-based, human readable format that will be
used to
>
> serialize the machine level IR. The major goal of this format is to allow
LLVM
>
> to save the machine level IR after any code generation pass and then to
load
>
> it again and continue running passes on the machine level IR. The primary
use case
>
> of this format is to enable easier testing process for the code generation
passes,
>
> by allowing the developers to write tests that load the IR, then invoke
just a
>
> specific code gen pass and then inspect the output of that pass by checking
the
>
> printed out IR.
>
>
>
> The proposed format has a number of key features:
>
> - It stores the machine level IR and the optional LLVM IR in one text file.
>
> - The connections between the machine level IR and the LLVM IR are
preserved.
>
> - The format uses a YAML based container for most of the data structures.
The LLVM
>
>   IR is embedded in the YAML container.
>
> - The format also uses a new, text-based syntax to serialize the machine
instructions.
>
>   The instructions are embedded in YAML.
>
>
> This is an incomplete example of a YAML file containing the LLVM IR, the
machine level IR
>
> and the instructions:
>
>
> ---
>
> ir: |
>
>   define i32 @fact(i32 %n) {
>
>     %1 = alloca i32, align 4
>
>     store i32 %n, i32* %1, align 4
>
>     %2 = load i32, i32* %1, align 4
>
>     %3 = icmp eq i32 %2, 0
>
>     br i1 %3, label %10, label %4
>
>
>   ; <label>:4                                       ; preds = %0
>
>     %5 = load i32, i32* %1, align 4
>
>     %6 = sub nsw i32 %5, 1
>
>     %7 = call i32 @fact(i32 %6)
>
>     %8 = load i32, i32* %1, align 4
>
>     %9 = mul nsw i32 %7, %8
>
>     br label %10
>
>
>   ; <label>:10                                      ; preds = %0, %4
>
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>
>     ret i32 %11
>
>   }
>
>
> ...
>
> ---
>
> number:          0
>
> name:            fact
>
> alignment:       4
>
> regInfo:
>
>   ....
>
> frameInfo:
>
>   ....
>
> body:
>
>   - bb:              0
>
>     llbb:            '%0'
>
>     successors:      [ 'bb#2', 'bb#1' ]
>
>     liveIns:         [ '%edi' ]
>
>     instructions:
>
>       - 'push64r undef %rax, %rsp, %rsp'
>
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>
>       - ....
>
>         ....
>
>   - bb:              1
>
>     llbb:            '%4'
>
>     successors:      [ 'bb#2' ]
>
>     instructions:
>
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>
>       - ....
>
>         ....
>
>   - ....
>
>     ....
>
> ...
>
>
> The example above shows a YAML file with two YAML documents (delimited by
`---`
>
> and `...`) containing the LLVM IR and the machine function information for
the function `fact`.
>
>
>
> When a specific format is chosen, I'll start with patches that
serialize the
>
> embedded LLVM IR. Then I'll add support for things like machine
functions and
>
> machine basic blocks, and I think that an intrusive implementation will
work best
>
> for data structures like these. After that I will continue adding support
for
>
> serialization of the remaining data structures.
>
>
>
> Thanks for reading through the proposal. What are you thoughts about this
format?
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/cfb2abda/attachment.html>

Bevin Hansson

2015-Apr-28 18:09 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

On 2015-04-28 19:14, Quentin Colombet wrote:> Personally I would rather not have to write YAML inputs but instead
> resort on the what the machine dumps look like. That being said, I can
> live with YAML :).
> 
YAML is what is suggested in the FIXME for the textual Machine IR, so
that might be the motivation behind Alex's choice.

I sort of agree that it could be better to go with a "proprietary"
format based off of the dumps. This means that a dedicated Machine
IR parser could be implemented for the purposes of library users who
want to open the files. I also think that the dumps are much easier
to diff and read.

There are parts of the suggested YAML format that seem to require some
parsing anyway, like the instruction strings. If YAML is going to be 
used,
I think it would be better to let the instructions be encoded in YAML
instead of leaving them as a string, if that makes sense.

/ Bevin

Alex L

2015-Apr-28 18:18 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
>
> ------------------------------
>
> *From: *"Alex L" <arphaman at gmail.com>
> *To: *"LLVM Developers Mailing List" <llvmdev at
cs.uiuc.edu>
> *Sent: *Tuesday, April 28, 2015 11:56:42 AM
> *Subject: *[LLVMdev] RFC: Machine Level IR text-based serialization format
>
>
> Hi all,
>
>
> I would like to propose a text-based, human readable format that will be
used to
>
> serialize the machine level IR. The major goal of this format is to allow
LLVM
>
> to save the machine level IR after any code generation pass and then to
load
>
> it again and continue running passes on the machine level IR. The primary
use case
>
> of this format is to enable easier testing process for the code generation
passes,
>
> by allowing the developers to write tests that load the IR, then invoke
just a
>
> specific code gen pass and then inspect the output of that pass by checking
the
>
> printed out IR.
>
>
>
> The proposed format has a number of key features:
>
> - It stores the machine level IR and the optional LLVM IR in one text file.
>
> - The connections between the machine level IR and the LLVM IR are
preserved.
>
> - The format uses a YAML based container for most of the data structures.
The LLVM
>
>   IR is embedded in the YAML container.
>
> - The format also uses a new, text-based syntax to serialize the machine
instructions.
>
>   The instructions are embedded in YAML.
>
>
> This is an incomplete example of a YAML file containing the LLVM IR, the
machine level IR
>
> and the instructions:
>
>
> ---
>
> ir: |
>
>   define i32 @fact(i32 %n) {
>
>     %1 = alloca i32, align 4
>
>     store i32 %n, i32* %1, align 4
>
>     %2 = load i32, i32* %1, align 4
>
>     %3 = icmp eq i32 %2, 0
>
>     br i1 %3, label %10, label %4
>
>
>   ; <label>:4                                       ; preds = %0
>
>     %5 = load i32, i32* %1, align 4
>
>     %6 = sub nsw i32 %5, 1
>
>     %7 = call i32 @fact(i32 %6)
>
>     %8 = load i32, i32* %1, align 4
>
>     %9 = mul nsw i32 %7, %8
>
>     br label %10
>
>
>   ; <label>:10                                      ; preds = %0, %4
>
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>
>     ret i32 %11
>
>   }
>
>
> ...
>
> ---
>
> number:          0
>
> name:            fact
>
> alignment:       4
>
> regInfo:
>
>   ....
>
> frameInfo:
>
>   ....
>
> body:
>
>   - bb:              0
>
>     llbb:            '%0'
>
>     successors:      [ 'bb#2', 'bb#1' ]
>
>     liveIns:         [ '%edi' ]
>
>     instructions:
>
>       - 'push64r undef %rax, %rsp, %rsp'
>
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>
>
> Hi Alex,
>
> I think this looks promising. What are the 1 an 4 above? How are you
> proposing to serialize operand flags (dead, etc.)?
>
>  -Hal
>
Hi Hal,

The 1 and 4 above are constants that are specific to x86 memory addressing,
I believe they basically compute the address RSP + 1 * 0 + 4.
I haven't settled on a final version of the operand flags (for registers)
syntax, but at the moment I'm thinking of something like this:
- The IsDef flag is implied by the use of the register before the '=',
unless it's implicit.
- TiedTo and IsEarlyClobber aren't not serialized, as they are defined by
the instruction description. (I believe that's true in all cases, but
I'm
not 100% sure).
- IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords like
'implicit', 'undef', 'kill', 'dead' are used
before the register e.g.
'undef %rax', 'implicit-def kill %eflags'.

I don't have a syntax for the SubReg_TargetFlags at the moment.

Alex

>
>       - ....
>
>         ....
>
>   - bb:              1
>
>     llbb:            '%4'
>
>     successors:      [ 'bb#2' ]
>
>     instructions:
>
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>
>       - ....
>
>         ....
>
>   - ....
>
>     ....
>
> ...
>
>
> The example above shows a YAML file with two YAML documents (delimited by
`---`
>
> and `...`) containing the LLVM IR and the machine function information for
the function `fact`.
>
>
>
> When a specific format is chosen, I'll start with patches that
serialize the
>
> embedded LLVM IR. Then I'll add support for things like machine
functions and
>
> machine basic blocks, and I think that an intrusive implementation will
work best
>
> for data structures like these. After that I will continue adding support
for
>
> serialization of the remaining data structures.
>
>
>
> Thanks for reading through the proposal. What are you thoughts about this
format?
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/0ef63547/attachment.html>

Adrian Prantl

2015-Apr-30 19:54 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

> 
> On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com> wrote:
> 
> Hi all,
> 
> I would like to propose a text-based, human readable format that will be
used to
> serialize the machine level IR. The major goal of this format is to allow
LLVM
> to save the machine level IR after any code generation pass and then to
load
> it again and continue running passes on the machine level IR. The primary
use case
> of this format is to enable easier testing process for the code generation
passes,
> by allowing the developers to write tests that load the IR, then invoke
just a
> specific code gen pass and then inspect the output of that pass by checking
the
> printed out IR.
> 
> 
> The proposed format has a number of key features:
> - It stores the machine level IR and the optional LLVM IR in one text file.
> - The connections between the machine level IR and the LLVM IR are
preserved.
> - The format uses a YAML based container for most of the data structures.
The LLVM
>   IR is embedded in the YAML container.
> - The format also uses a new, text-based syntax to serialize the machine
instructions.
>   The instructions are embedded in YAML.
> 
> This is an incomplete example of a YAML file containing the LLVM IR, the
machine level IR
> and the instructions:
> 
> ---
> ir: |
>   define i32 @fact(i32 %n) {
>     %1 = alloca i32, align 4
>     store i32 %n, i32* %1, align 4
>     %2 = load i32, i32* %1, align 4
>     %3 = icmp eq i32 %2, 0
>     br i1 %3, label %10, label %4
> 
>   ; <label>:4                                       ; preds = %0
>     %5 = load i32, i32* %1, align 4
>     %6 = sub nsw i32 %5, 1
>     %7 = call i32 @fact(i32 %6)
>     %8 = load i32, i32* %1, align 4
>     %9 = mul nsw i32 %7, %8
>     br label %10
> 
>   ; <label>:10                                      ; preds = %0, %4
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>     ret i32 %11
>   }
> 
> ...
> ---
> number:          0
> name:            fact
> alignment:       4
> regInfo:
>   ....
> frameInfo:
>   ....
> body:
>   - bb:              0
>     llbb:            '%0'
>     successors:      [ 'bb#2', 'bb#1' ]
>     liveIns:         [ '%edi' ]
>     instructions:
>       - 'push64r undef %rax, %rsp, %rsp'
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>       - ....
>         ....
>   - bb:              1
>     llbb:            '%4'
>     successors:      [ 'bb#2' ]
>     instructions:
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>       - ....
>         ....
>   - ....
>     ....
> ...
> 
> The example above shows a YAML file with two YAML documents (delimited by
`---`
> and `...`) containing the LLVM IR and the machine function information for
the function `fact`.
> 
> 
> When a specific format is chosen, I'll start with patches that
serialize the
> embedded LLVM IR. Then I'll add support for things like machine
functions and
> machine basic blocks, and I think that an intrusive implementation will
work best
> for data structures like these. After that I will continue adding support
for
> serialization of the remaining data structures.
> 
> 
> Thanks for reading through the proposal. What are you thoughts about this
format?
I’m really looking forward to this; it will be extremely useful for testing the
debug info backend.
For debug nodes referenced via DBG_VALUE intrinsics, it looks like they could
just point to the corresponding nodes in the optional IR.
Are there any plans to represent metadata such as the DebugLoc(ations) attached
to the machine instructions?

-- adrian

Alex L

2015-Apr-30 20:28 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-30 12:54 GMT-07:00 Adrian Prantl <aprantl at apple.com>:
> >
> > On Apr 28, 2015, at 9:56 AM, Alex L <arphaman at gmail.com>
wrote:
> >
> > Hi all,
> >
> > I would like to propose a text-based, human readable format that will
be
> used to
> > serialize the machine level IR. The major goal of this format is to
> allow LLVM
> > to save the machine level IR after any code generation pass and then
to
> load
> > it again and continue running passes on the machine level IR. The
> primary use case
> > of this format is to enable easier testing process for the code
> generation passes,
> > by allowing the developers to write tests that load the IR, then
invoke
> just a
> > specific code gen pass and then inspect the output of that pass by
> checking the
> > printed out IR.
> >
> >
> > The proposed format has a number of key features:
> > - It stores the machine level IR and the optional LLVM IR in one text
> file.
> > - The connections between the machine level IR and the LLVM IR are
> preserved.
> > - The format uses a YAML based container for most of the data
> structures. The LLVM
> >   IR is embedded in the YAML container.
> > - The format also uses a new, text-based syntax to serialize the
machine
> instructions.
> >   The instructions are embedded in YAML.
> >
> > This is an incomplete example of a YAML file containing the LLVM IR,
the
> machine level IR
> > and the instructions:
> >
> > ---
> > ir: |
> >   define i32 @fact(i32 %n) {
> >     %1 = alloca i32, align 4
> >     store i32 %n, i32* %1, align 4
> >     %2 = load i32, i32* %1, align 4
> >     %3 = icmp eq i32 %2, 0
> >     br i1 %3, label %10, label %4
> >
> >   ; <label>:4                                       ; preds = %0
> >     %5 = load i32, i32* %1, align 4
> >     %6 = sub nsw i32 %5, 1
> >     %7 = call i32 @fact(i32 %6)
> >     %8 = load i32, i32* %1, align 4
> >     %9 = mul nsw i32 %7, %8
> >     br label %10
> >
> >   ; <label>:10                                      ; preds =
%0, %4
> >     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
> >     ret i32 %11
> >   }
> >
> > ...
> > ---
> > number:          0
> > name:            fact
> > alignment:       4
> > regInfo:
> >   ....
> > frameInfo:
> >   ....
> > body:
> >   - bb:              0
> >     llbb:            '%0'
> >     successors:      [ 'bb#2', 'bb#1' ]
> >     liveIns:         [ '%edi' ]
> >     instructions:
> >       - 'push64r undef %rax, %rsp, %rsp'
> >       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
> >       - ....
> >         ....
> >   - bb:              1
> >     llbb:            '%4'
> >     successors:      [ 'bb#2' ]
> >     instructions:
> >       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
> >       - ....
> >         ....
> >   - ....
> >     ....
> > ...
> >
> > The example above shows a YAML file with two YAML documents (delimited
> by `---`
> > and `...`) containing the LLVM IR and the machine function information
> for the function `fact`.
> >
> >
> > When a specific format is chosen, I'll start with patches that
serialize
> the
> > embedded LLVM IR. Then I'll add support for things like machine
> functions and
> > machine basic blocks, and I think that an intrusive implementation
will
> work best
> > for data structures like these. After that I will continue adding
> support for
> > serialization of the remaining data structures.
> >
> >
> > Thanks for reading through the proposal. What are you thoughts about
> this format?
>
> I’m really looking forward to this; it will be extremely useful for
> testing the debug info backend.
> For debug nodes referenced via DBG_VALUE intrinsics, it looks like they
> could just point to the corresponding nodes in the optional IR.
> Are there any plans to represent metadata such as the DebugLoc(ations)
> attached to the machine instructions?
>
> -- adrian

Yes, the debug location that's attached to the machine instruction will be
serialized as well. I will describe how when
I will send out a patch that serializes it.

Alex.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150430/ff032299/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Apr 2015 - [LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

Seemingly Similar Threads