Paul C. Anagnostopoulos via llvm-dev
2020-Dec-01 23:45 UTC
[llvm-dev] Trying to use unordered_map
Here are some statistics for those of you who like statistics. Before embarking on a project to speed up access to record fields in TableGen, I thought I'd collect a little data. Number of records built from the AMDGPU .td files: 55,539 Number of fields in those records: 2,877,918 Average number of fields per record: 52 So the average field lookup sequential scan is about 25 iterations Number of field lookups performed by DAG ISel emitter: 6,294,426 Number of iterations = 6,294,426 x 25 = 157,000,000 (approx.) How long does each iteration take? Can it be more than 10 instructions? It's 7 instructions on the X86. So perhaps about 3 ns.? (I may be off here.) Time saved if the field access could be cut to 0 ns.: 472,000,000 ns. Next project!
Not exactly. A modern CPU frequently has three cache levels with each subsequent larger cache level having a slower access. Then there is memory beyond cache, and disk beyond memory with disk having a cache for frequently used disk data. In addition to where the data may be in this hierarchy is the way the data is obtained which is in blocks of a certain size. This link gives a rough idea of how long it takes to access various data sources. http://norvig.com/21-days.html#answers Beyond this is how the container is used. For example, using an unordered_map into pointers to another data location where the data is actually stored provides another path for delay. To actually know if a particular container selection and design will work better will likely require benchmarking the alternatives under realistic conditions. Along this line I had thought that NVME SSDs were substantially faster than SATA SSDs and then saw some careful benchmarking that only showed that NVME SSDs were marginally faster which explained their marginal difference in price. Neil Nelson On 12/1/20 4:45 PM, Paul C. Anagnostopoulos via llvm-dev wrote:> Here are some statistics for those of you who like statistics. > > Before embarking on a project to speed up access to record fields in TableGen, I thought I'd collect a little data. > > Number of records built from the AMDGPU .td files: 55,539 > Number of fields in those records: 2,877,918 > Average number of fields per record: 52 > So the average field lookup sequential scan is about 25 iterations > > Number of field lookups performed by DAG ISel emitter: 6,294,426 > Number of iterations = 6,294,426 x 25 = 157,000,000 (approx.) > > How long does each iteration take? Can it be more than 10 instructions? It's 7 instructions on the X86. So perhaps about 3 ns.? (I may be off here.) > > Time saved if the field access could be cut to 0 ns.: 472,000,000 ns. > > Next project! > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201201/20535f1c/attachment.html>