thr3ads.net - llvm dev - [LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO) [Nov 2013]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2013-Nov-13 00:31 UTC

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

On Tue, Nov 12, 2013 at 4:19 PM, Chandler Carruth <chandlerc at
google.com>wrote:
>
> On Tue, Nov 12, 2013 at 4:14 PM, Chris Lattner <clattner at
apple.com> wrote:
>
>> I'm moderately opposed to just encoding these in a string format. I
think
>> we can do something substantially better both for space, time, and
>> readability. Fundamentally, there is no reason for the original
metadata
>> node you describe to not *encode* its operands into a dense bit-packed
blob
>> of memory. We can still expose APIs that manipulate them as separate
>> entities, and have the AsmPrinter and AsmParser map back and forth with
>> nice human-readable forms. But even a simple varint encoding will be
both
>> smaller and faster than ascii.
>>
>>
>> I guess you could make it work, but would that actually be simpler than
>> what is proposed?  If it is denser, how much denser would it have to be
to
>> justify the complexity?
>>
>
> I don't think it would be more complex than a string encoding. At
least,
> I'm not imagining we want to be super clever here.
>
> I could even imagine doing a versioned giant bitfield and using the
> version to handle auto-upgrade...
>
>
>>
>> Just to be clear, I still want the nice format (much like your proposed
>> format, but maybe with the numbers outside of the "s) in the
textual IR, I
>> just think we should use a more direct and efficient in-memory encoding
>> (and in-bitcode encoding if that isn't already suitably dense).
>>
>>
>> Where would the encoding schema be specified?
>>
>
> Same question applies to a string encoding. We have to define the schema
> somewhere clearly. I'm just lobbying for the textual IR and the APIs to
> both operate directly on N fields, and just make the memory representation
> dense.
>
The difference here is that debug info parsing code would know the schema
externally - so the metadata itself wouldn't have to be self-describing or
typed in any way. Just a flat series of bytes of a fixed size would be
sufficient. (then leaving out the fields that refer to other IR constructs
such as functions, variables, etc)

But if we could make general metadata generally more compact that'd be nice
too and maybe sufficient/instead/not worth the added complexity in debug
info code of pulling out fields in the debug info handling code.

>
>
>>
>> Note that there are simple things that can be done to make MDNodes more
>> efficient in common cases.  The CallbackVH is only necessary when
pointing
>> to Value*’s that are not MDNode/MDString, and
>> Constants-other-than-GlobalValue.  If we make MDNode detect when it has
>> “all-immortal” operands (like most debug info nodes) then we could just
>> store Value*’s directly.  This would be a completely invisible
>> implementation improvement, but would not provide the same level of
>> improvement as the “flatten into strings” approach.  The two are quite
>> complementary.
>>
>
> Yea, I'd rather go for at least a bit more dense than that, but maybe
we
> should do this step-by-step.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131112/0a5c96c7/attachment.html>

Chandler Carruth

2013-Nov-13 00:36 UTC

head link

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

On Tue, Nov 12, 2013 at 4:31 PM, David Blaikie <dblaikie at gmail.com>
wrote:
> Where would the encoding schema be specified?
>>>
>>
>> Same question applies to a string encoding. We have to define the
schema
>> somewhere clearly. I'm just lobbying for the textual IR and the
APIs to
>> both operate directly on N fields, and just make the memory
representation
>> dense.
>>
>
> The difference here is that debug info parsing code would know the schema
> externally - so the metadata itself wouldn't have to be self-describing
or
> typed in any way. Just a flat series of bytes of a fixed size would be
> sufficient. (then leaving out the fields that refer to other IR constructs
> such as functions, variables, etc)
>
> But if we could make general metadata generally more compact that'd be
> nice too and maybe sufficient/instead/not worth the added complexity in
> debug info code of pulling out fields in the debug info handling code.
>
If it makes it possible for humans to read, author, and adjust debug info
test cases, that would also be worth it IMO. I'm really unsatisfied by the
reliance on C-code-in-comments to figure out what on earth the debug info
came from.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131112/754403b9/attachment.html>

David Blaikie

2013-Nov-13 00:51 UTC

head link

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

On Tue, Nov 12, 2013 at 4:36 PM, Chandler Carruth <chandlerc at
google.com>wrote:
>
> On Tue, Nov 12, 2013 at 4:31 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>> Where would the encoding schema be specified?
>>>>
>>>
>>> Same question applies to a string encoding. We have to define the
schema
>>> somewhere clearly. I'm just lobbying for the textual IR and the
APIs to
>>> both operate directly on N fields, and just make the memory
representation
>>> dense.
>>>
>>
>> The difference here is that debug info parsing code would know the
schema
>> externally - so the metadata itself wouldn't have to be
self-describing or
>> typed in any way. Just a flat series of bytes of a fixed size would be
>> sufficient. (then leaving out the fields that refer to other IR
constructs
>> such as functions, variables, etc)
>>
>> But if we could make general metadata generally more compact that'd
be
>> nice too and maybe sufficient/instead/not worth the added complexity in
>> debug info code of pulling out fields in the debug info handling code.
>>
>
> If it makes it possible for humans to read, author, and adjust debug info
> test cases, that would also be worth it IMO. I'm really unsatisfied by
the
> reliance on C-code-in-comments to figure out what on earth the debug info
> came from.
>
Fair point - it's a worthy goal I had only half considered in this context
but may be worthwhile doing so.

But it may be at odds - increasing/changing the schema to the point of
human readability may not be the most terse/memory-efficient representation
for the actual compiler (I'm thinking of things like actually having
self-describing textual representation with the names of fields, etc - but
maybe it doesn't have to go that far).

Perhaps there's some middle ground, but it might be difficult to justify
avoiding compile time improvements for the sake of editable/hand-authorable
debug info test cases... we'll have to toss it around a bit more to see
what shakes out in this space, I suspect.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131112/2f21303f/attachment.html>

Sean Silva

2013-Nov-13 02:43 UTC

head link

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

On Tue, Nov 12, 2013 at 7:36 PM, Chandler Carruth <chandlerc at
google.com>wrote:
>
> On Tue, Nov 12, 2013 at 4:31 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>> Where would the encoding schema be specified?
>>>>
>>>
>>> Same question applies to a string encoding. We have to define the
schema
>>> somewhere clearly. I'm just lobbying for the textual IR and the
APIs to
>>> both operate directly on N fields, and just make the memory
representation
>>> dense.
>>>
>>
>> The difference here is that debug info parsing code would know the
schema
>> externally - so the metadata itself wouldn't have to be
self-describing or
>> typed in any way. Just a flat series of bytes of a fixed size would be
>> sufficient. (then leaving out the fields that refer to other IR
constructs
>> such as functions, variables, etc)
>>
>> But if we could make general metadata generally more compact that'd
be
>> nice too and maybe sufficient/instead/not worth the added complexity in
>> debug info code of pulling out fields in the debug info handling code.
>>
>
> If it makes it possible for humans to read, author, and adjust debug info
> test cases, that would also be worth it IMO. I'm really unsatisfied by
the
> reliance on C-code-in-comments to figure out what on earth the debug info
> came from.
>
+1 This has been something that has been flustering me every time I see it.

-- Sean Silva

>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131112/e8ea90f9/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Nov 2013 - [LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

[LLVMdev] How to reduce the footprint of MDNodes? (About the comment you made at BOF LTO)

Reasonably Related Threads