Sean Silva via llvm-dev
2015-Dec-12 00:48 UTC
[llvm-dev] Memory utilization problems in profile reader
On Wed, Dec 9, 2015 at 12:14 PM, Xinliang David Li via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Can you extract the relevant part of the heap profile data? How large is > the sample profile data fed to the compiler? > > The indexed format profile size for clang is <100MB. The InstrProfRecord > for each function is read, used and discarded one at a time, so there > should not be problem as described. >If I'm reading the code right, we are also doing O(keys of the hash table) memory allocation in the indexed reader here: http://llvm.org/docs/doxygen/html/classllvm_1_1InstrProfReaderIndex.html#acc49fd2c0a8c8dfc3e29b01e09869af7 ? That seems unnecessary. (it seems to be used for value profiling stuff for some reason?) -- Sean Silva> > David > > > > On Wed, Dec 9, 2015 at 7:52 AM, Diego Novillo via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >> I've been experimenting with profiled bootstraps using sample profiles. >> Initially, I made stage2 build stage3 while running under Perf. This >> produced a 20Gb profile which took too long to convert to LLVM, and used >> ~30Gb of RAM. So, I decided that this was not going to be very useful for >> general usage. >> >> I then changed the bootstrap to instead run each individual compile under >> Perf. This produced ~2,200 profiles, each of which took up to 1 minute to >> convert, and then they all have to be merged into a single profile. Also >> didn't like it. >> >> Since all compiles are more or less the same in terms of what the >> compiler does, I decided to take the top 10 biggest profiles and merge >> those. That seemed to work. This resulted in a 21Mb profile that I could >> use as input to -fprofile-sample-use. >> >> I started stage 3 of the bootstrap and left it to work. I noticed it was >> slow, so I thought "we'll need to speed things up". The build never >> finished. Instead, ninja crashed my machine. >> >> It turns out that each clang invocation was growing to 4Gb of RSS. All >> that memory is being allocated by the profile reader ( >> https://drive.google.com/file/d/0B9lq1VKvmXKFQVp1cGtZM2RSdWc/view?usp=sharing >> ). >> >> So, heads up, we need to trim it down. Perhaps by only loading one >> function profile at a time, use it and actively discard it. Or simply be >> better at flushing the reader data structures as they're used during >> annotations. I'll be sending patches about this in the coming days. >> >> It's likely that the sample reader is doing something silly here. >> Duncan, Justin, do you have memories of issues like this one with >> instrumentation? I'll be trying a similar experiment with it after I'm >> done with the biggest issues in the sampler. >> >> >> Thanks. Diego. >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/b939c433/attachment.html>
Xinliang David Li via llvm-dev
2015-Dec-12 03:19 UTC
[llvm-dev] Memory utilization problems in profile reader
On Fri, Dec 11, 2015 at 4:48 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Wed, Dec 9, 2015 at 12:14 PM, Xinliang David Li via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Can you extract the relevant part of the heap profile data? How large >> is the sample profile data fed to the compiler? >> >> The indexed format profile size for clang is <100MB. The InstrProfRecord >> for each function is read, used and discarded one at a time, so there >> should not be problem as described. >> > > If I'm reading the code right, we are also doing O(keys of the hash table) > memory allocation in the indexed reader here: > http://llvm.org/docs/doxygen/html/classllvm_1_1InstrProfReaderIndex.html#acc49fd2c0a8c8dfc3e29b01e09869af7 > ? > That seems unnecessary. (it seems to be used for value profiling stuff for > some reason?) >It is for value profiling -- it is used to convert on-disk callee target value (in md5) to unique string pointer when the function record's VP data is read from memory. I will check its memory overhead at some point. This (the translation) is not strictly needed as a matter of fact (which I actually wanted to get rid of, but did not find time to do yet -- it is on my TODO list). David> > -- Sean Silva > > >> >> David >> >> >> >> On Wed, Dec 9, 2015 at 7:52 AM, Diego Novillo via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> >>> I've been experimenting with profiled bootstraps using sample profiles. >>> Initially, I made stage2 build stage3 while running under Perf. This >>> produced a 20Gb profile which took too long to convert to LLVM, and used >>> ~30Gb of RAM. So, I decided that this was not going to be very useful for >>> general usage. >>> >>> I then changed the bootstrap to instead run each individual compile >>> under Perf. This produced ~2,200 profiles, each of which took up to 1 >>> minute to convert, and then they all have to be merged into a single >>> profile. Also didn't like it. >>> >>> Since all compiles are more or less the same in terms of what the >>> compiler does, I decided to take the top 10 biggest profiles and merge >>> those. That seemed to work. This resulted in a 21Mb profile that I could >>> use as input to -fprofile-sample-use. >>> >>> I started stage 3 of the bootstrap and left it to work. I noticed it >>> was slow, so I thought "we'll need to speed things up". The build never >>> finished. Instead, ninja crashed my machine. >>> >>> It turns out that each clang invocation was growing to 4Gb of RSS. All >>> that memory is being allocated by the profile reader ( >>> https://drive.google.com/file/d/0B9lq1VKvmXKFQVp1cGtZM2RSdWc/view?usp=sharing >>> ). >>> >>> So, heads up, we need to trim it down. Perhaps by only loading one >>> function profile at a time, use it and actively discard it. Or simply be >>> better at flushing the reader data structures as they're used during >>> annotations. I'll be sending patches about this in the coming days. >>> >>> It's likely that the sample reader is doing something silly here. >>> Duncan, Justin, do you have memories of issues like this one with >>> instrumentation? I'll be trying a similar experiment with it after I'm >>> done with the biggest issues in the sampler. >>> >>> >>> Thanks. Diego. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/281c4d1e/attachment-0001.html>
Sean Silva via llvm-dev
2015-Dec-14 22:31 UTC
[llvm-dev] Memory utilization problems in profile reader
On Fri, Dec 11, 2015 at 7:19 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> > > On Fri, Dec 11, 2015 at 4:48 PM, Sean Silva <chisophugis at gmail.com> wrote: > >> >> >> On Wed, Dec 9, 2015 at 12:14 PM, Xinliang David Li via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Can you extract the relevant part of the heap profile data? How large >>> is the sample profile data fed to the compiler? >>> >>> The indexed format profile size for clang is <100MB. The >>> InstrProfRecord for each function is read, used and discarded one at a >>> time, so there should not be problem as described. >>> >> >> If I'm reading the code right, we are also doing O(keys of the hash >> table) memory allocation in the indexed reader here: >> http://llvm.org/docs/doxygen/html/classllvm_1_1InstrProfReaderIndex.html#acc49fd2c0a8c8dfc3e29b01e09869af7 >> ? >> That seems unnecessary. (it seems to be used for value profiling stuff >> for some reason?) >> > > It is for value profiling -- it is used to convert on-disk callee target > value (in md5) to unique string pointer when the function record's VP data > is read from memory. I will check its memory overhead at some point. This > (the translation) is not strictly needed as a matter of fact (which I > actually wanted to get rid of, but did not find time to do yet -- it is on > my TODO list). >Thanks. Good to know it is on your radar. -- Sean Silva> > David > > > >> >> -- Sean Silva >> >> >>> >>> David >>> >>> >>> >>> On Wed, Dec 9, 2015 at 7:52 AM, Diego Novillo via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> >>>> I've been experimenting with profiled bootstraps using sample >>>> profiles. Initially, I made stage2 build stage3 while running under Perf. >>>> This produced a 20Gb profile which took too long to convert to LLVM, and >>>> used ~30Gb of RAM. So, I decided that this was not going to be very useful >>>> for general usage. >>>> >>>> I then changed the bootstrap to instead run each individual compile >>>> under Perf. This produced ~2,200 profiles, each of which took up to 1 >>>> minute to convert, and then they all have to be merged into a single >>>> profile. Also didn't like it. >>>> >>>> Since all compiles are more or less the same in terms of what the >>>> compiler does, I decided to take the top 10 biggest profiles and merge >>>> those. That seemed to work. This resulted in a 21Mb profile that I could >>>> use as input to -fprofile-sample-use. >>>> >>>> I started stage 3 of the bootstrap and left it to work. I noticed it >>>> was slow, so I thought "we'll need to speed things up". The build never >>>> finished. Instead, ninja crashed my machine. >>>> >>>> It turns out that each clang invocation was growing to 4Gb of RSS. All >>>> that memory is being allocated by the profile reader ( >>>> https://drive.google.com/file/d/0B9lq1VKvmXKFQVp1cGtZM2RSdWc/view?usp=sharing >>>> ). >>>> >>>> So, heads up, we need to trim it down. Perhaps by only loading one >>>> function profile at a time, use it and actively discard it. Or simply be >>>> better at flushing the reader data structures as they're used during >>>> annotations. I'll be sending patches about this in the coming days. >>>> >>>> It's likely that the sample reader is doing something silly here. >>>> Duncan, Justin, do you have memories of issues like this one with >>>> instrumentation? I'll be trying a similar experiment with it after I'm >>>> done with the biggest issues in the sampler. >>>> >>>> >>>> Thanks. Diego. >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151214/114253c3/attachment.html>