thr3ads.net - llvm dev - [LLVMdev] Proposal: extended MDString syntax [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Eric Christopher

2013-Jun-26 23:18 UTC

[LLVMdev] Proposal: extended MDString syntax

On Wed, Jun 26, 2013 at 3:59 PM, Nadav Rotem <nrotem at apple.com>
wrote:>
> On Jun 26, 2013, at 3:51 PM, Chandler Carruth <chandlerc at
google.com> wrote:
>
> Can you suggest an alternative solution? Can you describe why you don't
> think metadata is the right container? This alone isn't really helpful
at
> moving us toward something that there has been widespread agreement LLVM
> needs.
>
>
> Hi Chandler,
>
> Sure, we can talk about serializing MF.  But the discussion should focus on
> serializing MF, and not multi-line metadata support, which is only one of
> the possible solutions.  I understand the problem that Dan mentioned (that
> MF references IR), and I am sure that there are other problems that he did
> not mention. I would be happy to hear more about other solutions that you
> considered and other problems that you ran into.  Have you considered using
> a new format that embeds LLVM-IR ?
>
(Note, this is the first I've heard of this plan and just figured it out
myself)

So inverting it so that MI contains LLVM IR instead of the other way
around? Then we'd need a serialization format for MI that happened to
include a way of serializing LLVM IR within. From a quick "hey, this
seems reasonable" the idea of embedding the MI into the IR rather than
the other way around seems to make sense since we have already have
code to serialize the IR.

The only other idea I've seen was an intern project that really didn't
go very far a few years ago of using *AML (one of them, I can't recall
which). I think Bob had some idea of finishing the project, but I'm
not sure where it's going.

Do you have any other ideas or some ideas as to why you'd prefer one
direction rather than the other?

-eric

Nadav Rotem

2013-Jun-26 23:25 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

On Jun 26, 2013, at 4:18 PM, Eric Christopher <echristo at gmail.com>
wrote:
> (Note, this is the first I've heard of this plan and just figured it
out myself)
Yes, this is also the first time I heard about this and I haven’t had a chance
to think about this problem too deeply.
> 
> So inverting it so that MI contains LLVM IR instead of the other way
> around? Then we'd need a serialization format for MI that happened to
> include a way of serializing LLVM IR within. From a quick "hey, this
> seems reasonable" the idea of embedding the MI into the IR rather than
> the other way around seems to make sense since we have already have
> code to serialize the IR.
> 
> The only other idea I've seen was an intern project that really
didn't
> go very far a few years ago of using *AML (one of them, I can't recall
> which). I think Bob had some idea of finishing the project, but I'm
> not sure where it's going.
> 
> Do you have any other ideas or some ideas as to why you'd prefer one
> direction rather than the other?
> 
> -eric
I think that the two alternatives that are obvious are for the MF to contain the
IR, or for the IR to contain the MF.  Alternatively, they can live in parallel
and the MF may reference the IR.  I am not sure what is the right approach here,
but my gut feeling is that metadata is not necessarily the right container for
MF.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20130626/833629e2/attachment.html>

Bob Wilson

2013-Jun-26 23:29 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

On Jun 26, 2013, at 4:18 PM, Eric Christopher <echristo at gmail.com>
wrote:
> On Wed, Jun 26, 2013 at 3:59 PM, Nadav Rotem <nrotem at apple.com>
wrote:
>> 
>> On Jun 26, 2013, at 3:51 PM, Chandler Carruth <chandlerc at
google.com> wrote:
>> 
>> Can you suggest an alternative solution? Can you describe why you
don't
>> think metadata is the right container? This alone isn't really
helpful at
>> moving us toward something that there has been widespread agreement
LLVM
>> needs.
>> 
>> 
>> Hi Chandler,
>> 
>> Sure, we can talk about serializing MF.  But the discussion should
focus on
>> serializing MF, and not multi-line metadata support, which is only one
of
>> the possible solutions.  I understand the problem that Dan mentioned
(that
>> MF references IR), and I am sure that there are other problems that he
did
>> not mention. I would be happy to hear more about other solutions that
you
>> considered and other problems that you ran into.  Have you considered
using
>> a new format that embeds LLVM-IR ?
>> 
> 
> (Note, this is the first I've heard of this plan and just figured it
out myself)
> 
> So inverting it so that MI contains LLVM IR instead of the other way
> around? Then we'd need a serialization format for MI that happened to
> include a way of serializing LLVM IR within. From a quick "hey, this
> seems reasonable" the idea of embedding the MI into the IR rather than
> the other way around seems to make sense since we have already have
> code to serialize the IR.
> 
> The only other idea I've seen was an intern project that really
didn't
> go very far a few years ago of using *AML (one of them, I can't recall
> which). I think Bob had some idea of finishing the project, but I'm
> not sure where it's going.
> 
> Do you have any other ideas or some ideas as to why you'd prefer one
> direction rather than the other?
Bin Zeng worked on a project as an intern last summer to serialize machine
functions to yaml.  At the time, we were unable to commit it to trunk because we
were waiting for Nick's yamlio work to get committed.  I've still got
his patches and plan to commit them whenever I get a chance.  I was also
considering having another intern pick up that project where it left off.

The approach is perhaps similar to what Dan is proposing, just flipped around. 
In one scheme, the top-level container is yaml and the IR is embedded within it
along with the machine function stuff.  In the other, the IR is the top-level
container and the machine functions are embedded as metadata.  I prefer the yaml
approach.

I'd be glad to reprioritize contributing the rest of Bin's patches to
make those available sooner rather than later.  The more interesting part, with
either scheme, is how to represent the machine functions.  We definitely want
something that is readable but still easy to parse.

Eric Christopher

2013-Jun-26 23:30 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

>
>
> I think that the two alternatives that are obvious are for the MF to
contain
> the IR, or for the IR to contain the MF.  Alternatively, they can live in
> parallel and the MF may reference the IR.  I am not sure what is the right
> approach here, but my gut feeling is that metadata is not necessarily the
> right container for MF.
Off the cuff I'd think that IR containing MF seems most reasonable and
the use of metadata to contain it seems to be good from two
perspectives I think:

a) it already exists,
b) oddly enough that we could get rid of the metadata and still have a
valid module/compilation unit seems like it might be interestingly
useful, but I'm not sure what uses there are off the top of my head.

That said, I really have no preference either way, just idle
speculation. Probably similar to you since we've both not thought
deeply upon it :)

The MDString stuff does seem like it might be useful in general if
we'd like to have that though.

-eric

Eric Christopher

2013-Jun-26 23:33 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

>
> Bin Zeng worked on a project as an intern last summer to serialize machine
functions to yaml.  At the time, we were unable to commit it to trunk because we
were waiting for Nick's yamlio work to get committed.  I've still got
his patches and plan to commit them whenever I get a chance.  I was also
considering having another intern pick up that project where it left off.
>
> The approach is perhaps similar to what Dan is proposing, just flipped
around.  In one scheme, the top-level container is yaml and the IR is embedded
within it along with the machine function stuff.  In the other, the IR is the
top-level container and the machine functions are embedded as metadata.  I
prefer the yaml approach.
>
Any reason? I remember the project, of course, but didn't really have
a good feel on any of the design decisions other than "hey, there's
this yaml thing". That said I don't believe I was in on the design
discussion in the first place.
> I'd be glad to reprioritize contributing the rest of Bin's patches
to make those available sooner rather than later.  The more interesting part,
with either scheme, is how to represent the machine functions.  We definitely
want something that is readable but still easy to parse.
At least posting them with some description and a design for how it
works and the tradeoffs could be goodness. Then they'd be out there to
look at and discussed.

-eric

Jakob Stoklund Olesen

2013-Jun-27 16:50 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

On Jun 26, 2013, at 4:18 PM, Eric Christopher <echristo at gmail.com>
wrote:
> So inverting it so that MI contains LLVM IR instead of the other way
> around? Then we'd need a serialization format for MI that happened to
> include a way of serializing LLVM IR within. From a quick "hey, this
> seems reasonable" the idea of embedding the MI into the IR rather than
> the other way around seems to make sense since we have already have
> code to serialize the IR.
I’d suggest something based on YAML which would allow you to include IR verbatim
just by indenting it.

The IR module should be optional when serializing MI. The back-pointers from MI
to IR are not required, and I can imagine many useful test cases that won’t need
them.

module: |
  define void @linkit(i8* %source) #0 {
  entry:
    %.b243 = load i1* @Pflag, align 1
    %cond = select i1 %.b243, i32 (i8*, %struct.stat.6.13.20.64*)* @lstat, i32
(i8*, %struct.stat.6.13.20.64*)* @stat
    %call = call signext i32 %cond(i8* %source, %struct.stat.6.13.20.64* undef)
#2
    ret void
  }
  @Pflag = external unnamed_addr global i1
  declare signext i32 @lstat(i8* nocapture, %struct.stat.6.13.20.64* nocapture)
#1
  declare signext i32 @stat(i8* nocapture, %struct.stat.6.13.20.64* nocapture)
#1

mi: |
  BB#0: derived from LLVM BB %entry
      Live Ins: %I0
	%O6<def> = SAVEri %O6, -176
	%I1<def> = SETHIi <ga:@Pflag>[TF=3]
  	%I1<def> = ADDri %I1<kill>, <ga:@Pflag>[TF=4]
	%I1<def> = SLLXri %I1<kill>, 12
	%I2<def> = LDUBri %I1<kill>, <ga:@Pflag>[TF=5];
mem:LD1[@Pflag]
	%I1<def> = SETHIi <ga:@stat>[TF=3]
	%I1<def> = ADDri %I1<kill>, <ga:@stat>[TF=4]
	%I1<def> = SLLXri %I1<kill>, 12
	%I1<def> = ADDri %I1<kill>, <ga:@stat>[TF=5]
	%I3<def> = SETHIi <ga:@lstat>[TF=3]
	%I3<def> = ADDri %I3<kill>, <ga:@lstat>[TF=4]
	%I3<def> = SLLXri %I3<kill>, 12
	%I3<def> = ADDri %I3<kill>, <ga:@lstat>[TF=5]
	CMPri %I2<kill>, 0, %ICC<imp-def>
	%I1<def,tied2> = MOVXCCrr %I3<kill>, %I1<kill,tied0>, 9,
%ICC<imp-use,kill>
	JMPLrr %I1<kill>, %G0, %O0<kill>, %O1<undef>,
%O0<imp-def,dead>, %O1<imp-def,dead>, %ICC<imp-def,dead>,
%O6<imp-use>, ...
	%O0<def> = ORrr %G0, %I0<kill>
	RET 8
	%G0<def> = RESTORErr %G0, %G0

We could also use more YAML structure to represent MI functions and basic
blocks, if needed.

Thanks,
/jakob

Chandler Carruth

2013-Jun-27 17:12 UTC

head link

[LLVMdev] Proposal: extended MDString syntax

On Thu, Jun 27, 2013 at 9:50 AM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
> On Jun 26, 2013, at 4:18 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
> > So inverting it so that MI contains LLVM IR instead of the other way
> > around? Then we'd need a serialization format for MI that happened
to
> > include a way of serializing LLVM IR within. From a quick "hey,
this
> > seems reasonable" the idea of embedding the MI into the IR rather
than
> > the other way around seems to make sense since we have already have
> > code to serialize the IR.
>
> I’d suggest something based on YAML which would allow you to include IR
> verbatim just by indenting it.
>
We can also use YAML embedded inside IR, potentially using the string
syntax Dan proposed or any other number of embedding mechanisms.

I like using YAML to represent the somewhat arbitrary datastructures of MI
so that we don't spend a lot of time inventing clever syntax for something
that has much more limited uses than the actual IR. I haven't heard anyone
really object to it.

However, I do think it's an open question as to whether to embed IR in a MI
container, or MI in an IR container. A few observations:

- No one has pointed out any really fundamental *problems* with any of the
approaches. I think both approaches can be made to work with reasonable
amounts of effort, and neither has really fundamental design problems.

- Different use cases will be more or less easy to write in different
forms. For example, Jakob's point:
> The IR module should be optional when serializing MI. The back-pointers
> from MI to IR are not required, and I can imagine many useful test cases
> that won’t need them.

I've heard Dan and others say exactly the opposite -- that MI should be
optional. I suspect that some test cases are more MI focused, and some are
less. But I don't see either being optional as a hard prerequisite.


So, here is my concrete suggestion: if all of these approaches seem to work
and there aren't huge downsides but only reasonable tradeoffs, let the
folks writing the patches make the decision. At the moment that appears to
be Dan and maybe Bob. Is there a reason to not let them pick the design
they want to make forward progress with and run with it? I think that will
be much more productive and get us back to the important part: testing
MI-level passes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20130627/75768afd/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jun 2013 - [LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

[LLVMdev] Proposal: extended MDString syntax

Reasonably Related Threads