thr3ads.net - llvm dev - [llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef [May 2020]

If this information is useful, please help other people find it:
Share via:

Shoaib Meenai via llvm-dev

2020-May-18 20:14 UTC

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef

I was looking at the SymbolTable code in LLD COFF and ELF recently, and I’m
confused by the use of CachedHashStringRef.

From what I understand, a CachedHashStringRef combines a StringRef with a
computed hash. There’s no caching going on in the CachedHashStringRef itself;
that is, if you construct CachedHashStringRef("foo"), and then
construct a second CachedHashStringRef("foo") again later, you'll
compute the hash for "foo" twice [1]. Instead, once you've
constructed a CachedHashStringRef from a StringRef, you can pass that around
instead of the StringRef to avoid needing to recompute the hash for it.

LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef,
Symbol *> named symMap [2]. (ELF's is similar, except it maps to a vector
index instead of a Symbol * directly for symbol order determinism reasons.) The
only accesses to symMap are either iterating over its values or doing a lookup.
In the two cases where a map lookup is done [3][4], the input to the function
doing the lookup is just a StringRef, and a CachedHashStringRef is constructed
to perform the lookup. What's the advantage of using a CachedHashStringRef
in that case, as opposed to just having a DenseMap<StringRef, Symbol *>
directly? With that, the DenseMap would be performing the hash computation for
each lookup operation, but the CachedHashStringRef construction we have right
now is doing the same hash computation anyway, so I don't understand the
benefit of using it here.

[1]
https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36
[2]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129
[3]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458
[4]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727

Nikita Popov via llvm-dev

2020-May-18 20:19 UTC

head link

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef

On Mon, May 18, 2020 at 10:14 PM Shoaib Meenai via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I was looking at the SymbolTable code in LLD COFF and ELF recently, and
> I’m confused by the use of CachedHashStringRef.
>
> From what I understand, a CachedHashStringRef combines a StringRef with a
> computed hash. There’s no caching going on in the CachedHashStringRef
> itself; that is, if you construct CachedHashStringRef("foo"), and
then
> construct a second CachedHashStringRef("foo") again later,
you'll compute
> the hash for "foo" twice [1]. Instead, once you've
constructed a
> CachedHashStringRef from a StringRef, you can pass that around instead of
> the StringRef to avoid needing to recompute the hash for it.
>
> LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef,
> Symbol *> named symMap [2]. (ELF's is similar, except it maps to a
vector
> index instead of a Symbol * directly for symbol order determinism reasons.)
> The only accesses to symMap are either iterating over its values or doing a
> lookup. In the two cases where a map lookup is done [3][4], the input to
> the function doing the lookup is just a StringRef, and a
> CachedHashStringRef is constructed to perform the lookup. What's the
> advantage of using a CachedHashStringRef in that case, as opposed to just
> having a DenseMap<StringRef, Symbol *> directly? With that, the
DenseMap
> would be performing the hash computation for each lookup operation, but the
> CachedHashStringRef construction we have right now is doing the same hash
> computation anyway, so I don't understand the benefit of using it here.
>
> [1]
>
https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36
> [2]
>
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129
> [3]
>
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458
> [4]
>
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727
>
Not familiar with this code, but a possible reason might be to avoid
recomputing string hashes when the DenseMap is resized.

Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/47ad1c31/attachment.html>

Shoaib Meenai via llvm-dev

2020-May-18 21:01 UTC

head link

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef

Got it, thanks. I didn’t realize the DenseMap would be recomputing those on a
resize instead of caching them internally (although I haven’t though
particularly hard about whether that internal caching would even be feasible in
the general case).

From: Nikita Popov <nikita.ppv at gmail.com>
Date: Monday, May 18, 2020 at 1:19 PM
To: Shoaib Meenai <smeenai at fb.com>
Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Understanding LLD's SymbolTable's use of
CachedHashStringRef

On Mon, May 18, 2020 at 10:14 PM Shoaib Meenai via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I was looking at the SymbolTable code in LLD COFF and ELF recently, and I’m
confused by the use of CachedHashStringRef.

From what I understand, a CachedHashStringRef combines a StringRef with a
computed hash. There’s no caching going on in the CachedHashStringRef itself;
that is, if you construct CachedHashStringRef("foo"), and then
construct a second CachedHashStringRef("foo") again later, you'll
compute the hash for "foo" twice [1]. Instead, once you've
constructed a CachedHashStringRef from a StringRef, you can pass that around
instead of the StringRef to avoid needing to recompute the hash for it.

LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef,
Symbol *> named symMap [2]. (ELF's is similar, except it maps to a vector
index instead of a Symbol * directly for symbol order determinism reasons.) The
only accesses to symMap are either iterating over its values or doing a lookup.
In the two cases where a map lookup is done [3][4], the input to the function
doing the lookup is just a StringRef, and a CachedHashStringRef is constructed
to perform the lookup. What's the advantage of using a CachedHashStringRef
in that case, as opposed to just having a DenseMap<StringRef, Symbol *>
directly? With that, the DenseMap would be performing the hash computation for
each lookup operation, but the CachedHashStringRef construction we have right
now is doing the same hash computation anyway, so I don't understand the
benefit of using it here.

[1]
https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36
[2]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129
[3]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458
[4]
https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727

Not familiar with this code, but a possible reason might be to avoid recomputing
string hashes when the DenseMap is resized.

Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/660db17e/attachment.html>

llvm dev - May 2020 - Understanding LLD's SymbolTable's use of CachedHashStringRef

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef

[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef