thr3ads.net - llvm dev - [LLVMdev] [lld] contentHash in the Reader ? [May 2013]

If this information is useful, please help other people find it:
Share via:

Michael Spencer

2013-May-08 05:38 UTC

[LLVMdev] [lld] contentHash in the Reader ?

On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com>
wrote:
> Shankar,
>
> Do you mean add a method like:
>
>      virtual unsigned contentHash() const = 0;
>
> or maybe:
>
>      virtual llvm::hash_code contentHash() const = 0
>
> to lld::DefinedAtom?  That seems good to me.   We just need to figure out
> what should happen with atoms not intended to be merged.  Should the method
> assert?  In the case where we want there to be a hash available, is it
> computed lazily?
>
> Regarding the NativeReader/NativeWriter if the resolver is using the hash,
> then it would make sense to add the hash to the file format so reading
> native format is faster.
>
> -Nick
>
I'd rather we use a crypto hash so we don't have to compare content at
all.

- Michael Spencer

>
> On May 7, 2013, at 4:43 PM, Shankar Easwaran wrote:
> > Can we add a atomContentHash for DefinedAtoms when the atoms are being
> created. This can essentially speed up comparisons of atoms especially for
> >
> > * ICF (Identical code folding)
> > * Section groups (to identify similiar sections)
> >
> > Not sure where else this would help. This would essentially be used
only
> by the Reader and the Resolver.
> >
> > There would be no change to the NativeReader/NativeWriter.
> >
> > Thanks
> >
> > Shankar Easwaran
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by the Linux Foundation
> >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130507/5d39291c/attachment.html>

Shankar Easwaran

2013-May-08 15:04 UTC

head link

[LLVMdev] [lld] contentHash in the Reader ?

On 5/8/2013 12:38 AM, Michael Spencer wrote:> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com>
wrote:
>
>> Shankar,
>>
>> Do you mean add a method like:
>>
>>       virtual unsigned contentHash() const = 0;
>>
>> or maybe:
>>
>>       virtual llvm::hash_code contentHash() const = 0We could use a crypto hash too with the function prototype that looks 
like :-

      virtual lld::crypto::sha256 contentHash() const = 0
>> to lld::DefinedAtom?  That seems good to me.   We just need to figure
out
>> what should happen with atoms not intended to be merged.  Should the
method
>> assert?  In the case where we want there to be a hash available, is it
>> computed lazily?I was thinking that we could use this even for 'typeCode' atoms that 
could be merged if they have the same content too.

This is a snip from a bug report for binutils ld :-

<----snip--------->
Identical code folding (icf) is currently implemented in GOLD.
In our C++ applications it is very effective in reducing the size of 
libraries
in presence of templates and "machine-generated" code where functions
differ
essentially only by the type of some input pointer.
<----snip--------->>>
>> Regarding the NativeReader/NativeWriter if the resolver is using the
hash,
>> then it would make sense to add the hash to the file format so reading
>> native format is faster.
>>
>> -Nick
>>
> I'd rather we use a crypto hash so we don't have to compare content
at all.
Did you mean a sha256/md5 or something similiar ?
>
> - Michael Spencer
>
>
>> On May 7, 2013, at 4:43 PM, Shankar Easwaran wrote:
>>> Can we add a atomContentHash for DefinedAtoms when the atoms are
being
>> created. This can essentially speed up comparisons of atoms especially
for
>>> * ICF (Identical code folding)
>>> * Section groups (to identify similiar sections)
>>>
>>> Not sure where else this would help. This would essentially be used
only
>> by the Reader and the Resolver.
>>> There would be no change to the NativeReader/NativeWriter.
>>>
>>> Thanks
>>>
>>> Shankar Easwaran
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by the Linux Foundation
>>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

Rui Ueyama

2013-May-08 16:03 UTC

head link

[LLVMdev] [lld] contentHash in the Reader ?

On Thu, May 9, 2013 at 12:04 AM, Shankar Easwaran
<shankare at codeaurora.org>wrote:
> On 5/8/2013 12:38 AM, Michael Spencer wrote:
>
>> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at
apple.com> wrote:
>>
>>  Shankar,
>>>
>>> Do you mean add a method like:
>>>
>>>       virtual unsigned contentHash() const = 0;
>>>
>>> or maybe:
>>>
>>>       virtual llvm::hash_code contentHash() const = 0
>>>
>> We could use a crypto hash too with the function prototype that looks
> like :-
>
>      virtual lld::crypto::sha256 contentHash() const = 0

I'd use SHA128 or MD5 as the linker does not handle hostile input. I think
as long as it's collision free, it should suffice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130509/9ff353fa/attachment.html>

Nick Kledzik

2013-May-08 19:45 UTC

head link

[LLVMdev] [lld] contentHash in the Reader ?

On May 7, 2013, at 10:38 PM, Michael Spencer <bigcheesegs at gmail.com>
wrote:> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com>
wrote:
> Shankar,
> 
> Do you mean add a method like:
> 
>      virtual unsigned contentHash() const = 0;
> 
> or maybe:
> 
>      virtual llvm::hash_code contentHash() const = 0
> 
> to lld::DefinedAtom?  That seems good to me.   We just need to figure out
what should happen with atoms not intended to be merged.  Should the method
assert?  In the case where we want there to be a hash available, is it computed
lazily?
> 
> Regarding the NativeReader/NativeWriter if the resolver is using the hash,
then it would make sense to add the hash to the file format so reading native
format is faster.
> 
> -Nick
> 
> I'd rather we use a crypto hash so we don't have to compare content
at all.
The crypto hashes work well if the atom content is const data (e.g. c-string or
other literals), since you just point the hash function at the range of bytes in
the constant data.  Where it gets messier is if you are trying to coalesce
non-leaf functions or non-const data because it is not just the content bytes
that need to be compared but also all the references must somehow be
incorporated into the hash. For example, two functions have the exact same
instruction bytes, but one calls foo and one calls bar.

-Nick

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/371e22cc/attachment.html>

Shankar Easwaran

2013-May-08 19:54 UTC

head link

[LLVMdev] [lld] contentHash in the Reader ?

On 5/8/2013 2:45 PM, Nick Kledzik wrote:>
> I'd rather we use a crypto hash so we don't have to compare content
at all.
> The crypto hashes work well if the atom content is const data (e.g.
c-string or other literals), since you just point the hash function at the range
of bytes in the constant data.  Where it gets messier is if you are trying to
coalesce non-leaf functions or non-const data because it is not just the content
bytes that need to be compared but also all the references must somehow be
incorporated into the hash. For example, two functions have the exact same
instruction bytes, but one calls foo and one calls bar.
>I was thinking that we just do ICF for leaf functions only, non leaf 
functions can fold only if all the references end up calling the same 
targets isnt it ? (Which could result from templated code ?)

Thanks

Shankar Easwaran

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - May 2013 - [LLVMdev] [lld] contentHash in the Reader ?

[LLVMdev] [lld] contentHash in the Reader ?

[LLVMdev] [lld] contentHash in the Reader ?

[LLVMdev] [lld] contentHash in the Reader ?

[LLVMdev] [lld] contentHash in the Reader ?

[LLVMdev] [lld] contentHash in the Reader ?

Apparently Analagous Threads