Shoaib Meenai via llvm-dev
2020-May-18 20:14 UTC
[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef
I was looking at the SymbolTable code in LLD COFF and ELF recently, and I’m confused by the use of CachedHashStringRef. From what I understand, a CachedHashStringRef combines a StringRef with a computed hash. There’s no caching going on in the CachedHashStringRef itself; that is, if you construct CachedHashStringRef("foo"), and then construct a second CachedHashStringRef("foo") again later, you'll compute the hash for "foo" twice [1]. Instead, once you've constructed a CachedHashStringRef from a StringRef, you can pass that around instead of the StringRef to avoid needing to recompute the hash for it. LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef, Symbol *> named symMap [2]. (ELF's is similar, except it maps to a vector index instead of a Symbol * directly for symbol order determinism reasons.) The only accesses to symMap are either iterating over its values or doing a lookup. In the two cases where a map lookup is done [3][4], the input to the function doing the lookup is just a StringRef, and a CachedHashStringRef is constructed to perform the lookup. What's the advantage of using a CachedHashStringRef in that case, as opposed to just having a DenseMap<StringRef, Symbol *> directly? With that, the DenseMap would be performing the hash computation for each lookup operation, but the CachedHashStringRef construction we have right now is doing the same hash computation anyway, so I don't understand the benefit of using it here. [1] https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36 [2] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129 [3] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458 [4] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727
Nikita Popov via llvm-dev
2020-May-18 20:19 UTC
[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef
On Mon, May 18, 2020 at 10:14 PM Shoaib Meenai via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I was looking at the SymbolTable code in LLD COFF and ELF recently, and > I’m confused by the use of CachedHashStringRef. > > From what I understand, a CachedHashStringRef combines a StringRef with a > computed hash. There’s no caching going on in the CachedHashStringRef > itself; that is, if you construct CachedHashStringRef("foo"), and then > construct a second CachedHashStringRef("foo") again later, you'll compute > the hash for "foo" twice [1]. Instead, once you've constructed a > CachedHashStringRef from a StringRef, you can pass that around instead of > the StringRef to avoid needing to recompute the hash for it. > > LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef, > Symbol *> named symMap [2]. (ELF's is similar, except it maps to a vector > index instead of a Symbol * directly for symbol order determinism reasons.) > The only accesses to symMap are either iterating over its values or doing a > lookup. In the two cases where a map lookup is done [3][4], the input to > the function doing the lookup is just a StringRef, and a > CachedHashStringRef is constructed to perform the lookup. What's the > advantage of using a CachedHashStringRef in that case, as opposed to just > having a DenseMap<StringRef, Symbol *> directly? With that, the DenseMap > would be performing the hash computation for each lookup operation, but the > CachedHashStringRef construction we have right now is doing the same hash > computation anyway, so I don't understand the benefit of using it here. > > [1] > https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36 > [2] > https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129 > [3] > https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458 > [4] > https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727 >Not familiar with this code, but a possible reason might be to avoid recomputing string hashes when the DenseMap is resized. Nikita -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/47ad1c31/attachment.html>
Shoaib Meenai via llvm-dev
2020-May-18 21:01 UTC
[llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef
Got it, thanks. I didn’t realize the DenseMap would be recomputing those on a resize instead of caching them internally (although I haven’t though particularly hard about whether that internal caching would even be feasible in the general case). From: Nikita Popov <nikita.ppv at gmail.com> Date: Monday, May 18, 2020 at 1:19 PM To: Shoaib Meenai <smeenai at fb.com> Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Understanding LLD's SymbolTable's use of CachedHashStringRef On Mon, May 18, 2020 at 10:14 PM Shoaib Meenai via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I was looking at the SymbolTable code in LLD COFF and ELF recently, and I’m confused by the use of CachedHashStringRef. From what I understand, a CachedHashStringRef combines a StringRef with a computed hash. There’s no caching going on in the CachedHashStringRef itself; that is, if you construct CachedHashStringRef("foo"), and then construct a second CachedHashStringRef("foo") again later, you'll compute the hash for "foo" twice [1]. Instead, once you've constructed a CachedHashStringRef from a StringRef, you can pass that around instead of the StringRef to avoid needing to recompute the hash for it. LLD COFF's symbol table structure is a DenseMap<CachedHashStringRef, Symbol *> named symMap [2]. (ELF's is similar, except it maps to a vector index instead of a Symbol * directly for symbol order determinism reasons.) The only accesses to symMap are either iterating over its values or doing a lookup. In the two cases where a map lookup is done [3][4], the input to the function doing the lookup is just a StringRef, and a CachedHashStringRef is constructed to perform the lookup. What's the advantage of using a CachedHashStringRef in that case, as opposed to just having a DenseMap<StringRef, Symbol *> directly? With that, the DenseMap would be performing the hash computation for each lookup operation, but the CachedHashStringRef construction we have right now is doing the same hash computation anyway, so I don't understand the benefit of using it here. [1] https://github.com/llvm/llvm-project/blob/47a0e9f49b903aa4ef821d2c7a679a145ee983f9/llvm/include/llvm/ADT/CachedHashString.h#L35-L36 [2] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.h#L129 [3] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L458 [4] https://github.com/llvm/llvm-project/blob/3f5f8f39734e88c797b003d4a0002b2eaef1ac17/lld/COFF/SymbolTable.cpp#L727 Not familiar with this code, but a possible reason might be to avoid recomputing string hashes when the DenseMap is resized. Nikita -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/660db17e/attachment.html>