On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com> wrote:> Shankar, > > Do you mean add a method like: > > virtual unsigned contentHash() const = 0; > > or maybe: > > virtual llvm::hash_code contentHash() const = 0 > > to lld::DefinedAtom? That seems good to me. We just need to figure out > what should happen with atoms not intended to be merged. Should the method > assert? In the case where we want there to be a hash available, is it > computed lazily? > > Regarding the NativeReader/NativeWriter if the resolver is using the hash, > then it would make sense to add the hash to the file format so reading > native format is faster. > > -Nick >I'd rather we use a crypto hash so we don't have to compare content at all. - Michael Spencer> > On May 7, 2013, at 4:43 PM, Shankar Easwaran wrote: > > Can we add a atomContentHash for DefinedAtoms when the atoms are being > created. This can essentially speed up comparisons of atoms especially for > > > > * ICF (Identical code folding) > > * Section groups (to identify similiar sections) > > > > Not sure where else this would help. This would essentially be used only > by the Reader and the Resolver. > > > > There would be no change to the NativeReader/NativeWriter. > > > > Thanks > > > > Shankar Easwaran > > > > -- > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by the Linux Foundation > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130507/5d39291c/attachment.html>
On 5/8/2013 12:38 AM, Michael Spencer wrote:> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com> wrote: > >> Shankar, >> >> Do you mean add a method like: >> >> virtual unsigned contentHash() const = 0; >> >> or maybe: >> >> virtual llvm::hash_code contentHash() const = 0We could use a crypto hash too with the function prototype that looks like :- virtual lld::crypto::sha256 contentHash() const = 0>> to lld::DefinedAtom? That seems good to me. We just need to figure out >> what should happen with atoms not intended to be merged. Should the method >> assert? In the case where we want there to be a hash available, is it >> computed lazily?I was thinking that we could use this even for 'typeCode' atoms that could be merged if they have the same content too. This is a snip from a bug report for binutils ld :- <----snip---------> Identical code folding (icf) is currently implemented in GOLD. In our C++ applications it is very effective in reducing the size of libraries in presence of templates and "machine-generated" code where functions differ essentially only by the type of some input pointer. <----snip--------->>> >> Regarding the NativeReader/NativeWriter if the resolver is using the hash, >> then it would make sense to add the hash to the file format so reading >> native format is faster. >> >> -Nick >> > I'd rather we use a crypto hash so we don't have to compare content at all.Did you mean a sha256/md5 or something similiar ?> > - Michael Spencer > > >> On May 7, 2013, at 4:43 PM, Shankar Easwaran wrote: >>> Can we add a atomContentHash for DefinedAtoms when the atoms are being >> created. This can essentially speed up comparisons of atoms especially for >>> * ICF (Identical code folding) >>> * Section groups (to identify similiar sections) >>> >>> Not sure where else this would help. This would essentially be used only >> by the Reader and the Resolver. >>> There would be no change to the NativeReader/NativeWriter. >>> >>> Thanks >>> >>> Shankar Easwaran >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by the Linux Foundation >>-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
On Thu, May 9, 2013 at 12:04 AM, Shankar Easwaran <shankare at codeaurora.org>wrote:> On 5/8/2013 12:38 AM, Michael Spencer wrote: > >> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com> wrote: >> >> Shankar, >>> >>> Do you mean add a method like: >>> >>> virtual unsigned contentHash() const = 0; >>> >>> or maybe: >>> >>> virtual llvm::hash_code contentHash() const = 0 >>> >> We could use a crypto hash too with the function prototype that looks > like :- > > virtual lld::crypto::sha256 contentHash() const = 0I'd use SHA128 or MD5 as the linker does not handle hostile input. I think as long as it's collision free, it should suffice. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130509/9ff353fa/attachment.html>
On May 7, 2013, at 10:38 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:> On Tue, May 7, 2013 at 10:08 PM, Nick Kledzik <kledzik at apple.com> wrote: > Shankar, > > Do you mean add a method like: > > virtual unsigned contentHash() const = 0; > > or maybe: > > virtual llvm::hash_code contentHash() const = 0 > > to lld::DefinedAtom? That seems good to me. We just need to figure out what should happen with atoms not intended to be merged. Should the method assert? In the case where we want there to be a hash available, is it computed lazily? > > Regarding the NativeReader/NativeWriter if the resolver is using the hash, then it would make sense to add the hash to the file format so reading native format is faster. > > -Nick > > I'd rather we use a crypto hash so we don't have to compare content at all.The crypto hashes work well if the atom content is const data (e.g. c-string or other literals), since you just point the hash function at the range of bytes in the constant data. Where it gets messier is if you are trying to coalesce non-leaf functions or non-const data because it is not just the content bytes that need to be compared but also all the references must somehow be incorporated into the hash. For example, two functions have the exact same instruction bytes, but one calls foo and one calls bar. -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/371e22cc/attachment.html>
On 5/8/2013 2:45 PM, Nick Kledzik wrote:> > I'd rather we use a crypto hash so we don't have to compare content at all. > The crypto hashes work well if the atom content is const data (e.g. c-string or other literals), since you just point the hash function at the range of bytes in the constant data. Where it gets messier is if you are trying to coalesce non-leaf functions or non-const data because it is not just the content bytes that need to be compared but also all the references must somehow be incorporated into the hash. For example, two functions have the exact same instruction bytes, but one calls foo and one calls bar. >I was thinking that we just do ICF for leaf functions only, non leaf functions can fold only if all the references end up calling the same targets isnt it ? (Which could result from templated code ?) Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation