thr3ads.net - llvm dev - [LLVMdev] RFC: Machine Level IR text-based serialization format [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Alex L

2015-Apr-28 18:18 UTC

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
>
> ------------------------------
>
> *From: *"Alex L" <arphaman at gmail.com>
> *To: *"LLVM Developers Mailing List" <llvmdev at
cs.uiuc.edu>
> *Sent: *Tuesday, April 28, 2015 11:56:42 AM
> *Subject: *[LLVMdev] RFC: Machine Level IR text-based serialization format
>
>
> Hi all,
>
>
> I would like to propose a text-based, human readable format that will be
used to
>
> serialize the machine level IR. The major goal of this format is to allow
LLVM
>
> to save the machine level IR after any code generation pass and then to
load
>
> it again and continue running passes on the machine level IR. The primary
use case
>
> of this format is to enable easier testing process for the code generation
passes,
>
> by allowing the developers to write tests that load the IR, then invoke
just a
>
> specific code gen pass and then inspect the output of that pass by checking
the
>
> printed out IR.
>
>
>
> The proposed format has a number of key features:
>
> - It stores the machine level IR and the optional LLVM IR in one text file.
>
> - The connections between the machine level IR and the LLVM IR are
preserved.
>
> - The format uses a YAML based container for most of the data structures.
The LLVM
>
>   IR is embedded in the YAML container.
>
> - The format also uses a new, text-based syntax to serialize the machine
instructions.
>
>   The instructions are embedded in YAML.
>
>
> This is an incomplete example of a YAML file containing the LLVM IR, the
machine level IR
>
> and the instructions:
>
>
> ---
>
> ir: |
>
>   define i32 @fact(i32 %n) {
>
>     %1 = alloca i32, align 4
>
>     store i32 %n, i32* %1, align 4
>
>     %2 = load i32, i32* %1, align 4
>
>     %3 = icmp eq i32 %2, 0
>
>     br i1 %3, label %10, label %4
>
>
>   ; <label>:4                                       ; preds = %0
>
>     %5 = load i32, i32* %1, align 4
>
>     %6 = sub nsw i32 %5, 1
>
>     %7 = call i32 @fact(i32 %6)
>
>     %8 = load i32, i32* %1, align 4
>
>     %9 = mul nsw i32 %7, %8
>
>     br label %10
>
>
>   ; <label>:10                                      ; preds = %0, %4
>
>     %11 = phi i32 [ %9, %4 ], [ 1, %0 ]
>
>     ret i32 %11
>
>   }
>
>
> ...
>
> ---
>
> number:          0
>
> name:            fact
>
> alignment:       4
>
> regInfo:
>
>   ....
>
> frameInfo:
>
>   ....
>
> body:
>
>   - bb:              0
>
>     llbb:            '%0'
>
>     successors:      [ 'bb#2', 'bb#1' ]
>
>     liveIns:         [ '%edi' ]
>
>     instructions:
>
>       - 'push64r undef %rax, %rsp, %rsp'
>
>       - 'mov32mr %rsp, 1, %noreg, 4, %noreg, %edi'
>
>
> Hi Alex,
>
> I think this looks promising. What are the 1 an 4 above? How are you
> proposing to serialize operand flags (dead, etc.)?
>
>  -Hal
>
Hi Hal,

The 1 and 4 above are constants that are specific to x86 memory addressing,
I believe they basically compute the address RSP + 1 * 0 + 4.
I haven't settled on a final version of the operand flags (for registers)
syntax, but at the moment I'm thinking of something like this:
- The IsDef flag is implied by the use of the register before the '=',
unless it's implicit.
- TiedTo and IsEarlyClobber aren't not serialized, as they are defined by
the instruction description. (I believe that's true in all cases, but
I'm
not 100% sure).
- IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords like
'implicit', 'undef', 'kill', 'dead' are used
before the register e.g.
'undef %rax', 'implicit-def kill %eflags'.

I don't have a syntax for the SubReg_TargetFlags at the moment.

Alex

>
>       - ....
>
>         ....
>
>   - bb:              1
>
>     llbb:            '%4'
>
>     successors:      [ 'bb#2' ]
>
>     instructions:
>
>       - '%edi = mov32rm %rsp, 1, %noreg, 4, %noreg'
>
>       - ....
>
>         ....
>
>   - ....
>
>     ....
>
> ...
>
>
> The example above shows a YAML file with two YAML documents (delimited by
`---`
>
> and `...`) containing the LLVM IR and the machine function information for
the function `fact`.
>
>
>
> When a specific format is chosen, I'll start with patches that
serialize the
>
> embedded LLVM IR. Then I'll add support for things like machine
functions and
>
> machine basic blocks, and I think that an intrusive implementation will
work best
>
> for data structures like these. After that I will continue adding support
for
>
> serialization of the remaining data structures.
>
>
>
> Thanks for reading through the proposal. What are you thoughts about this
format?
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/0ef63547/attachment.html>

Bevin Hansson

2015-Apr-28 21:08 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

On 2015-04-28 20:18, Alex L wrote:> 2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
> Hi Alex,
> 
> I think this looks promising. What are the 1 an 4 above? How are you
> proposing to serialize operand flags (dead, etc.)?
> 
>  -Hal
> 
> 
> Hi Hal,
> 
> The 1 and 4 above are constants that are specific to x86 memory 
> addressing,
> I believe they basically compute the address RSP + 1 * 0 + 4.
> I haven't settled on a final version of the operand flags (for 
> registers)
> syntax, but at the moment I'm thinking of something like this:
> - The IsDef flag is implied by the use of the register before the
'=',
> unless it's implicit.
> - TiedTo and IsEarlyClobber aren't not serialized, as they are defined 
> by
> the instruction description. (I believe that's true in all cases, but 
> I'm
> not 100% sure).
> - IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords 
> like
> 'implicit', 'undef', 'kill', 'dead' are
used before the register e.g.
> 'undef %rax', 'implicit-def kill %eflags'.
> 
> I don't have a syntax for the SubReg_TargetFlags at the moment.
> 
Since the instruction format is partially based on the machine dump 
format,
you could use something similar to that, like '%reg:subreg'.

On an tangential note, IIRC the machine dumps store the virtual 
register
information (register class, mainly) in-band at the end of the 
instruction.
Based on the format you described, I'm assuming this is what would be 
stored
out-of-band in 'regInfo'.

/ Bevin

Matthias Braun

2015-Apr-28 23:26 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

To get this out first: I'd love to have a way to serialize machine-IR! I
often spend a lot of time trying to create .ll files in a way that the
machine-IR still looks a certain way when it finally hits the relevant passes in
codegen. It would be so much easier to just specify the machine IR immediately
before the pass I'm interested in.

For that use case it is worth keeping the following things in mind:
- Please try to keep the output of the various dump functions, esp.
MachineInstr::dump(), MachineOperand::dump(), MachineBasicBlock::dump() as close
as possible to the format you use for serializing. It would be unnecessary
confusing to have the dump()s while I debug different from what I can read in a
textfile. Having said that you don't necessarily have to change your
serialization format to be like the dump() functions, you may just as well
adjust the dump() functions - just avoid them being different without reason. I
can also imagine that the serialization shows a bit less information in cases
where the information which is obvious in a serialization context but not when
dump()ing a piece in isolation.
- Design the format in a way that makes it easy for humans to create it. If the
only way to produce these files reliably is by dumping existing machine-ir I
will have a hard time designing minimal and easy to understand testcases. By
that I mean mostly the possibility to leave out information that can be inferred
or guessed, so the resulting test is compact and shows what it is about. Just
looking at your example below there is a lot of information that is redundant or
which could be filled in by sensible defaults: the function "number",
the basic block number, predecessors and successors of a basic block, maybe
allowing to leave out the llvm IR (though that probably is not allowed by
CodeGen at the moment).

- Matthias
> On Apr 28, 2015, at 2:08 PM, Bevin Hansson <bevinh at sics.se> wrote:
> 
> On 2015-04-28 20:18, Alex L wrote:
>> 2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
>> Hi Alex,
>> I think this looks promising. What are the 1 an 4 above? How are you
>> proposing to serialize operand flags (dead, etc.)?
>> -Hal
>> Hi Hal,
>> The 1 and 4 above are constants that are specific to x86 memory
addressing,
>> I believe they basically compute the address RSP + 1 * 0 + 4.
>> I haven't settled on a final version of the operand flags (for
registers)
>> syntax, but at the moment I'm thinking of something like this:
>> - The IsDef flag is implied by the use of the register before the
'=',
>> unless it's implicit.
>> - TiedTo and IsEarlyClobber aren't not serialized, as they are
defined by
>> the instruction description. (I believe that's true in all cases,
but I'm
>> not 100% sure).
>> - IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords
like
>> 'implicit', 'undef', 'kill', 'dead' are
used before the register e.g.
>> 'undef %rax', 'implicit-def kill %eflags'.
>> I don't have a syntax for the SubReg_TargetFlags at the moment.
> 
> Since the instruction format is partially based on the machine dump format,
> you could use something similar to that, like '%reg:subreg'.
> 
> On an tangential note, IIRC the machine dumps store the virtual register
> information (register class, mainly) in-band at the end of the instruction.
> Based on the format you described, I'm assuming this is what would be
stored
> out-of-band in 'regInfo'.
> 
> / Bevin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - Apr 2015 - [LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format