thr3ads.net - llvm dev - [LLVMdev] RFC: Machine Level IR text-based serialization format [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Matthias Braun

2015-Apr-28 23:26 UTC

[LLVMdev] RFC: Machine Level IR text-based serialization format

To get this out first: I'd love to have a way to serialize machine-IR! I
often spend a lot of time trying to create .ll files in a way that the
machine-IR still looks a certain way when it finally hits the relevant passes in
codegen. It would be so much easier to just specify the machine IR immediately
before the pass I'm interested in.

For that use case it is worth keeping the following things in mind:
- Please try to keep the output of the various dump functions, esp.
MachineInstr::dump(), MachineOperand::dump(), MachineBasicBlock::dump() as close
as possible to the format you use for serializing. It would be unnecessary
confusing to have the dump()s while I debug different from what I can read in a
textfile. Having said that you don't necessarily have to change your
serialization format to be like the dump() functions, you may just as well
adjust the dump() functions - just avoid them being different without reason. I
can also imagine that the serialization shows a bit less information in cases
where the information which is obvious in a serialization context but not when
dump()ing a piece in isolation.
- Design the format in a way that makes it easy for humans to create it. If the
only way to produce these files reliably is by dumping existing machine-ir I
will have a hard time designing minimal and easy to understand testcases. By
that I mean mostly the possibility to leave out information that can be inferred
or guessed, so the resulting test is compact and shows what it is about. Just
looking at your example below there is a lot of information that is redundant or
which could be filled in by sensible defaults: the function "number",
the basic block number, predecessors and successors of a basic block, maybe
allowing to leave out the llvm IR (though that probably is not allowed by
CodeGen at the moment).

- Matthias
> On Apr 28, 2015, at 2:08 PM, Bevin Hansson <bevinh at sics.se> wrote:
> 
> On 2015-04-28 20:18, Alex L wrote:
>> 2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
>> Hi Alex,
>> I think this looks promising. What are the 1 an 4 above? How are you
>> proposing to serialize operand flags (dead, etc.)?
>> -Hal
>> Hi Hal,
>> The 1 and 4 above are constants that are specific to x86 memory
addressing,
>> I believe they basically compute the address RSP + 1 * 0 + 4.
>> I haven't settled on a final version of the operand flags (for
registers)
>> syntax, but at the moment I'm thinking of something like this:
>> - The IsDef flag is implied by the use of the register before the
'=',
>> unless it's implicit.
>> - TiedTo and IsEarlyClobber aren't not serialized, as they are
defined by
>> the instruction description. (I believe that's true in all cases,
but I'm
>> not 100% sure).
>> - IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug - keywords
like
>> 'implicit', 'undef', 'kill', 'dead' are
used before the register e.g.
>> 'undef %rax', 'implicit-def kill %eflags'.
>> I don't have a syntax for the SubReg_TargetFlags at the moment.
> 
> Since the instruction format is partially based on the machine dump format,
> you could use something similar to that, like '%reg:subreg'.
> 
> On an tangential note, IIRC the machine dumps store the virtual register
> information (register class, mainly) in-band at the end of the instruction.
> Based on the format you described, I'm assuming this is what would be
stored
> out-of-band in 'regInfo'.
> 
> / Bevin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Alex L

2015-Apr-29 00:13 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

2015-04-28 16:26 GMT-07:00 Matthias Braun <matze at braunis.de>:
> To get this out first: I'd love to have a way to serialize machine-IR!
I
> often spend a lot of time trying to create .ll files in a way that the
> machine-IR still looks a certain way when it finally hits the relevant
> passes in codegen. It would be so much easier to just specify the machine
> IR immediately before the pass I'm interested in.
>
> For that use case it is worth keeping the following things in mind:
> - Please try to keep the output of the various dump functions, esp.
> MachineInstr::dump(), MachineOperand::dump(), MachineBasicBlock::dump() as
> close as possible to the format you use for serializing. It would be
> unnecessary confusing to have the dump()s while I debug different from what
> I can read in a textfile. Having said that you don't necessarily have
to
> change your serialization format to be like the dump() functions, you may
> just as well adjust the dump() functions - just avoid them being different
> without reason. I can also imagine that the serialization shows a bit less
> information in cases where the information which is obvious in a
> serialization context but not when dump()ing a piece in isolation.
>
Ideally the new syntax would replace the existing print/dump syntax. The
new syntax will lead to certain missing information when
this information can be inferred (e.g. the TiedTo and IsEarlyClobber
attributes for register operands that I mentioned earlier in this thread),
so maybe we could have some sort of verbose dumping option where absolutely
everything is dumped.
The syntax does try to be kind of similar to the current format, but at the
same time it tries to be more parser and human friendly as well.

> - Design the format in a way that makes it easy for humans to create it.
> If the only way to produce these files reliably is by dumping existing
> machine-ir I will have a hard time designing minimal and easy to understand
> testcases. By that I mean mostly the possibility to leave out information
> that can be inferred or guessed, so the resulting test is compact and shows
> what it is about. Just looking at your example below there is a lot of
> information that is redundant or which could be filled in by sensible
> defaults: the function "number", the basic block number,
predecessors and
> successors of a basic block, maybe allowing to leave out the llvm IR
> (though that probably is not allowed by CodeGen at the moment).

I agree, one of my goals is to try to make it minimal and leave out things
where it makes sense to do so.
I plan on making a lot of the YAML attributes optional, so that the user
won't necessarily have to specify them,
and the parser will set the attributes to some predetermined default values
or will try to infer them. I will
present my plans for those optional attributes in data structures when I
will send out patches that serialize
the specific data structures, so feel free to check them out.

The LLVM IR is optional by the way, but a lot of passes will probably crash
if you don't include it ;)

>
> - Matthias
>
> > On Apr 28, 2015, at 2:08 PM, Bevin Hansson <bevinh at sics.se>
wrote:
> >
> > On 2015-04-28 20:18, Alex L wrote:
> >> 2015-04-28 10:15 GMT-07:00 Hal Finkel <hfinkel at anl.gov>:
> >> Hi Alex,
> >> I think this looks promising. What are the 1 an 4 above? How are
you
> >> proposing to serialize operand flags (dead, etc.)?
> >> -Hal
> >> Hi Hal,
> >> The 1 and 4 above are constants that are specific to x86 memory
> addressing,
> >> I believe they basically compute the address RSP + 1 * 0 + 4.
> >> I haven't settled on a final version of the operand flags (for
> registers)
> >> syntax, but at the moment I'm thinking of something like this:
> >> - The IsDef flag is implied by the use of the register before the
'=',
> >> unless it's implicit.
> >> - TiedTo and IsEarlyClobber aren't not serialized, as they are
defined
> by
> >> the instruction description. (I believe that's true in all
cases, but
> I'm
> >> not 100% sure).
> >> - IsUndef, IsImp, IsKill, IsDead, IsInternalRead, IsDebug -
keywords
> like
> >> 'implicit', 'undef', 'kill',
'dead' are used before the register e.g.
> >> 'undef %rax', 'implicit-def kill %eflags'.
> >> I don't have a syntax for the SubReg_TargetFlags at the
moment.
> >
> > Since the instruction format is partially based on the machine dump
> format,
> > you could use something similar to that, like '%reg:subreg'.
> >
> > On an tangential note, IIRC the machine dumps store the virtual
register
> > information (register class, mainly) in-band at the end of the
> instruction.
> > Based on the format you described, I'm assuming this is what would
be
> stored
> > out-of-band in 'regInfo'.
> >
> > / Bevin
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150428/17de4ef9/attachment.html>

Krzysztof Parzyszek

2015-Apr-29 13:40 UTC

head link

[LLVMdev] RFC: Machine Level IR text-based serialization format

On 4/28/2015 7:13 PM, Alex L wrote:>
>
> 2015-04-28 16:26 GMT-07:00 Matthias Braun <matze at braunis.de
> <mailto:matze at braunis.de>>:
>
>     For that use case it is worth keeping the following things in mind:
>     - Please try to keep the output of the various dump functions, esp.
>     MachineInstr::dump(), MachineOperand::dump(),
>     MachineBasicBlock::dump() as close as possible to the format you use
>     for serializing.
> [...]
>
> Ideally the new syntax would replace the existing print/dump syntax. The
> new syntax will lead to certain missing information when
> this information can be inferred (e.g. the TiedTo and IsEarlyClobber
> attributes for register operands that I mentioned earlier in this thread),
> so maybe we could have some sort of verbose dumping option where
> absolutely everything is dumped.

I think that the new syntax is less readable than the current format of 
the "dump" functions, and in the long term it would be better to have 
something more human-friendly.  However, using YAML has the advantage 
that it's easier to parse it than the direct output of "dump" and
so it
will take less time to implement a YAML-based solution.  My concern is 
that you may run out of time to complete this and the file format is not 
the most important thing in this project.  Getting it to work, if only 
as a proof of concept, would be very helpful to everyone.  Coming up 
with a fancier grammar and implementing a parser for it could be done 
later on top of the initial implementation.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

llvm dev - Apr 2015 - [LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format

[LLVMdev] RFC: Machine Level IR text-based serialization format