On Mon, Mar 16, 2015 at 11:00 PM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org> > wrote: >> >> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org> >> wrote: >> > >> > Shankar's parallel for per-se didn't introduce any performance benefit >> > (or regression). >> > If the change I propose is safe, I would like to see Shankar's change >> > in (and this on top of it). >> > I have other related changes coming next, but I would like to tackle >> > them one at a time. >> > >> >> Here's an update. >> >> After http://reviews.llvm.org/D8372 , I updated the profiling data. >> >> https://people.freebsd.org/~davide/llvm/lld-03162015.svg >> It seems now 85% of CPU time is spent inside >> FileArchive::buildTableOfContents(). >> In particular, 35% of the samples are spent inserting into >> unordered_map, so there's maybe something we can do differently there >> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad). > > > Anyone tried a DenseMap instead of an unordered_map? If you need pointer > validity to the elements, a DenseMap with unique_ptrs rather than direct > values could be an option. Chandler's usual argument here is that walking > the map is cheap with high locality (as in a DenseMap) even if the nodes > themselves involve indirection. Could be worth an experiment. >I did now. It actually makes things slower for the aforementioned workload (linking clang). It was worth trying though. Patch, in case somebody wants to try at home: https://people.freebsd.org/~davide/llvm/densemap_membermap.diff Patched: real 1m27.849s user 2m47.373s sys 0m16.370s real 1m29.583s user 2m47.771s sys 0m16.816s real 1m25.956s user 2m43.397s sys 0m15.254s real 1m29.380s user 2m47.618s sys 0m15.386s real 1m25.426s user 2m43.388s sys 0m16.961s Unpatched: real 1m26.872s user 2m46.999s sys 0m16.540s real 1m28.187s user 2m47.084s sys 0m17.149s real 1m24.814s user 2m43.311s sys 0m16.979s real 1m25.011s user 2m43.184s sys 0m16.975s real 1m25.536s user 2m44.577s sys 0m16.784s -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
With a 4 second difference between best and worst runs in both cases, and only 0.2 second difference between the best for the two different cases (user time), I don't think you can make any conclusion that one is better than the other! They are very very similar. On Wed, Mar 18, 2015 at 5:36 PM, Davide Italiano <davide at freebsd.org> wrote:> On Mon, Mar 16, 2015 at 11:00 PM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > > > On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org> > > wrote: > >> > >> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org> > >> wrote: > >> > > >> > Shankar's parallel for per-se didn't introduce any performance benefit > >> > (or regression). > >> > If the change I propose is safe, I would like to see Shankar's change > >> > in (and this on top of it). > >> > I have other related changes coming next, but I would like to tackle > >> > them one at a time. > >> > > >> > >> Here's an update. > >> > >> After http://reviews.llvm.org/D8372 , I updated the profiling data. > >> > >> https://people.freebsd.org/~davide/llvm/lld-03162015.svg > >> It seems now 85% of CPU time is spent inside > >> FileArchive::buildTableOfContents(). > >> In particular, 35% of the samples are spent inserting into > >> unordered_map, so there's maybe something we can do differently there > >> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad). > > > > > > Anyone tried a DenseMap instead of an unordered_map? If you need pointer > > validity to the elements, a DenseMap with unique_ptrs rather than direct > > values could be an option. Chandler's usual argument here is that walking > > the map is cheap with high locality (as in a DenseMap) even if the nodes > > themselves involve indirection. Could be worth an experiment. > > > > I did now. It actually makes things slower for the aforementioned > workload (linking clang). It was worth trying though. > > Patch, in case somebody wants to try at home: > https://people.freebsd.org/~davide/llvm/densemap_membermap.diff > > Patched: > real 1m27.849s user 2m47.373s sys 0m16.370s > real 1m29.583s user 2m47.771s sys 0m16.816s > real 1m25.956s user 2m43.397s sys 0m15.254s > real 1m29.380s user 2m47.618s sys 0m15.386s > real 1m25.426s user 2m43.388s sys 0m16.961s > > Unpatched: > real 1m26.872s user 2m46.999s sys 0m16.540s > real 1m28.187s user 2m47.084s sys 0m17.149s > real 1m24.814s user 2m43.311s sys 0m16.979s > real 1m25.011s user 2m43.184s sys 0m16.975s > real 1m25.536s user 2m44.577s sys 0m16.784s > > -- > Davide > > "There are no solved problems; there are only problems that are more > or less solved" -- Henri Poincare > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/1ce1dda1/attachment.html>
On Tue, Mar 17, 2015 at 10:20 PM, Bruce Hoult <bruce at hoult.org> wrote:> With a 4 second difference between best and worst runs in both cases, and > only 0.2 second difference between the best for the two different cases > (user time), I don't think you can make any conclusion that one is better > than the other! They are very very similar. >I wasn't (unfortunately) able to control that variance :( Reading the results again, I tend to agree though that the difference might be noise. I also think that the change per se doesn't introduce any benefit so i'm less inclined to keep going for this route. I don't exclude this might have some utility though, that's why I attached the patch to the mail, in case somebody wants to play with it a bit more. -- Davide
On Tue, Mar 17, 2015 at 9:36 PM, Davide Italiano <davide at freebsd.org> wrote:> On Mon, Mar 16, 2015 at 11:00 PM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > > > On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org> > > wrote: > >> > >> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org> > >> wrote: > >> > > >> > Shankar's parallel for per-se didn't introduce any performance benefit > >> > (or regression). > >> > If the change I propose is safe, I would like to see Shankar's change > >> > in (and this on top of it). > >> > I have other related changes coming next, but I would like to tackle > >> > them one at a time. > >> > > >> > >> Here's an update. > >> > >> After http://reviews.llvm.org/D8372 , I updated the profiling data. > >> > >> https://people.freebsd.org/~davide/llvm/lld-03162015.svg > >> It seems now 85% of CPU time is spent inside > >> FileArchive::buildTableOfContents(). > >> In particular, 35% of the samples are spent inserting into > >> unordered_map, so there's maybe something we can do differently there > >> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad). > > > > > > Anyone tried a DenseMap instead of an unordered_map? If you need pointer > > validity to the elements, a DenseMap with unique_ptrs rather than direct > > values could be an option. Chandler's usual argument here is that walking > > the map is cheap with high locality (as in a DenseMap) even if the nodes > > themselves involve indirection. Could be worth an experiment. > > > > I did now. It actually makes things slower for the aforementioned > workload (linking clang). It was worth trying though. > > Patch, in case somebody wants to try at home: > https://people.freebsd.org/~davide/llvm/densemap_membermap.diffFYI we have StringMap which is specialized for strings. Also I'm sort of amazed that StringMap is using HashString (bernstein) instead of the fairly sophisticated hash functionality we have in ADT/Hashing.h -- Sean Silva> > > Patched: > real 1m27.849s user 2m47.373s sys 0m16.370s > real 1m29.583s user 2m47.771s sys 0m16.816s > real 1m25.956s user 2m43.397s sys 0m15.254s > real 1m29.380s user 2m47.618s sys 0m15.386s > real 1m25.426s user 2m43.388s sys 0m16.961s > > Unpatched: > real 1m26.872s user 2m46.999s sys 0m16.540s > real 1m28.187s user 2m47.084s sys 0m17.149s > real 1m24.814s user 2m43.311s sys 0m16.979s > real 1m25.011s user 2m43.184s sys 0m16.975s > real 1m25.536s user 2m44.577s sys 0m16.784s > > -- > Davide > > "There are no solved problems; there are only problems that are more > or less solved" -- Henri Poincare > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/d085dbbc/attachment.html>
On Wed, Mar 18, 2015 at 9:35 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Tue, Mar 17, 2015 at 9:36 PM, Davide Italiano <davide at freebsd.org> wrote: >> >> On Mon, Mar 16, 2015 at 11:00 PM, David Blaikie <dblaikie at gmail.com> >> wrote: >> > >> > >> > On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org> >> > wrote: >> >> >> >> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org> >> >> wrote: >> >> > >> >> > Shankar's parallel for per-se didn't introduce any performance >> >> > benefit >> >> > (or regression). >> >> > If the change I propose is safe, I would like to see Shankar's change >> >> > in (and this on top of it). >> >> > I have other related changes coming next, but I would like to tackle >> >> > them one at a time. >> >> > >> >> >> >> Here's an update. >> >> >> >> After http://reviews.llvm.org/D8372 , I updated the profiling data. >> >> >> >> https://people.freebsd.org/~davide/llvm/lld-03162015.svg >> >> It seems now 85% of CPU time is spent inside >> >> FileArchive::buildTableOfContents(). >> >> In particular, 35% of the samples are spent inserting into >> >> unordered_map, so there's maybe something we can do differently there >> >> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad). >> > >> > >> > Anyone tried a DenseMap instead of an unordered_map? If you need pointer >> > validity to the elements, a DenseMap with unique_ptrs rather than direct >> > values could be an option. Chandler's usual argument here is that >> > walking >> > the map is cheap with high locality (as in a DenseMap) even if the nodes >> > themselves involve indirection. Could be worth an experiment. >> > >> >> I did now. It actually makes things slower for the aforementioned >> workload (linking clang). It was worth trying though. >> >> Patch, in case somebody wants to try at home: >> https://people.freebsd.org/~davide/llvm/densemap_membermap.diff > > > FYI we have StringMap which is specialized for strings. Also I'm sort of > amazed that StringMap is using HashString (bernstein) instead of the fairly > sophisticated hash functionality we have in ADT/Hashing.hI tried using the new hashing back when Hashing.h was written. The result was that there were fewer collisions, but the hash function was significantly slower. Result was a net regression in runtime. - Ben> >> >> >> >> Patched: >> real 1m27.849s user 2m47.373s sys 0m16.370s >> real 1m29.583s user 2m47.771s sys 0m16.816s >> real 1m25.956s user 2m43.397s sys 0m15.254s >> real 1m29.380s user 2m47.618s sys 0m15.386s >> real 1m25.426s user 2m43.388s sys 0m16.961s >> >> Unpatched: >> real 1m26.872s user 2m46.999s sys 0m16.540s >> real 1m28.187s user 2m47.084s sys 0m17.149s >> real 1m24.814s user 2m43.311s sys 0m16.979s >> real 1m25.011s user 2m43.184s sys 0m16.975s >> real 1m25.536s user 2m44.577s sys 0m16.784s >> >> -- >> Davide >> >> "There are no solved problems; there are only problems that are more >> or less solved" -- Henri Poincare >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
> Le 18 mars 2015 à 21:35, Sean Silva <chisophugis at gmail.com> a écrit : > > > > On Tue, Mar 17, 2015 at 9:36 PM, Davide Italiano <davide at freebsd.org <mailto:davide at freebsd.org>> wrote: > On Mon, Mar 16, 2015 at 11:00 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: > > > > > > On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org <mailto:davide at freebsd.org>> > > wrote: > >> > >> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org <mailto:davide at freebsd.org>> > >> wrote: > >> > > >> > Shankar's parallel for per-se didn't introduce any performance benefit > >> > (or regression). > >> > If the change I propose is safe, I would like to see Shankar's change > >> > in (and this on top of it). > >> > I have other related changes coming next, but I would like to tackle > >> > them one at a time. > >> > > >> > >> Here's an update. > >> > >> After http://reviews.llvm.org/D8372 <http://reviews.llvm.org/D8372> , I updated the profiling data. > >> > >> https://people.freebsd.org/~davide/llvm/lld-03162015.svg <https://people.freebsd.org/~davide/llvm/lld-03162015.svg> > >> It seems now 85% of CPU time is spent inside > >> FileArchive::buildTableOfContents(). > >> In particular, 35% of the samples are spent inserting into > >> unordered_map, so there's maybe something we can do differently there > >> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad). > > > > > > Anyone tried a DenseMap instead of an unordered_map? If you need pointer > > validity to the elements, a DenseMap with unique_ptrs rather than direct > > values could be an option. Chandler's usual argument here is that walking > > the map is cheap with high locality (as in a DenseMap) even if the nodes > > themselves involve indirection. Could be worth an experiment. > > > > I did now. It actually makes things slower for the aforementioned > workload (linking clang). It was worth trying though. > > Patch, in case somebody wants to try at home: > https://people.freebsd.org/~davide/llvm/densemap_membermap.diff <https://people.freebsd.org/~davide/llvm/densemap_membermap.diff> > > FYI we have StringMap which is specialized for strings. Also I'm sort of amazed that StringMap is using HashString (bernstein) instead of the fairly sophisticated hash functionality we have in ADT/Hashing.h >While StringMap is very useful, in lld we rarely need to have a copy of the StringRef keys that we can avoid by using DenseMap<StringRef>. I don’t know it this is a significant memory of processor optimization though. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/c6a7b025/attachment.html>