Xinliang David Li via llvm-dev
2015-Sep-05 05:11 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com> > wrote: >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > >> > >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <davidxl at google.com> >> > wrote: >> >> >> >> > >> >> > I think it is reasonable to simply replace the key we currently use >> >> > with >> >> > MD5(key) for getting a size reduction. In practice for my use cases, >> >> > I >> >> > have >> >> > not observed any of the issues you mentioned under "Large size of >> >> > overhead >> >> > can limit the usability of PGO greatly", but I can understand that >> >> > some >> >> > of >> >> > these issues could become problems in Google's use case. I would >> >> > personally >> >> > prefer to keep the existing behavior as the default (see below), and >> >> > have >> >> > MD5(key) as an option. >> >> >> >> The problem is that this requires profile format changes. >> > >> > >> > Why? AFAIK the "function name" is just an arbitrary string. Using s or >> > MD5(s) shouldn't matter. Of course, the user will need to pass >> > consistent >> > flags to clang. >> >> The raw format for 64bit target can be made 'unchanged', but not for >> the 32bit raw format -- the nameptr field is only 32bit. >> >> The indexed format can not be made the same -- The ondisk profile >> record layout totally changes. The key field changes from a blob of >> chars into an 64bit integer. > > > An MD5 sum cannot be represented as a blob of chars?yes -- it is fixed length (8byte) blob which may include null byte in the middle.> > Or to say it another way, suppose that Itanium mangling required as a final > step to replace the string with its md5 sum in hex. Therefore all symbol > names are "small". My understanding is that this is effectively all your > patch is doing.The key type before the change is StringRef, while the after the key type is uint64_t. Are you suggesting treating uint64_t md5 sum key as a string of 8 bytes or storing md5 has in text form which will double the size? In the raw format, md5 sum key can be an embedded field in the prf_data variable instead of as different var referenced by prf_data.> > If this is not the case, you should show your current patch so that we can > discuss things concretely.It is not. See above about the difference.> >> >> > >> > >> >> >> >> It will be >> >> very messy to support multiple formats in instr-codegen and >> >> instr-runtime. For compatibility concerns, the reader is taught to >> >> support previous format, but the changes there are isolated (also >> >> expected to be removed in the future). >> >> >> >> > >> >> > My primary concern is that if the function name are not kept at all >> >> > stages, >> >> > then it becomes difficult to analyze the profile data in a standalone >> >> > way. >> >> > Many times, I have used `llvm-profdata show -all-functions >> >> > foo.profdata` >> >> > on >> >> > the resulting profile data and then imported that data into >> >> > Mathematica >> >> > for >> >> > analysis. >> >> >> >> This is certainly a very valid use case. >> >> >> >> >My understanding of your proposal is that `llvm-profdata show >> >> > -all-functions foo.profdata` will not show the actual function names >> >> > but >> >> > instead MD5 hashes, >> >> >> >> Yes. >> >> >> >> To support your use case, there are two solutions: >> >> >> >> 1) user can add -fcoverage-mapping option in the build >> >> 2) introduce a new option -fprofile-instr-names that force the >> >> emission of the name sections in the .o file. This is similar to 1), >> >> but no covmap section is needed. >> >> >> >> llvm-profdata tool will be taught to read the name section and attach >> >> function names to the profile records. >> > >> > >> > Needing to pass the executable to llvm-profdata would cause deployment >> > issues for my customers in practice. >> >> Why? The deployment needs to pass the profile data anyway right? > > > Yes, but not the executable. > > The PGO training run is likely being run by a gameplay tester > (non-programmer). In general the binary will not be lying around as a loose > file anywhere, it will be part of a full package of the binary+assets (think > like what will end up on a bluray disc). A game's binary *completely > useless* without the assets, so except locally on a programmer's machine > while they iterate/debug, there is no reason for a binary to ever exist as a > standalone file. > > I'm not saying that needing the binary is insurmountable in any particular > scenario. Just that it will cause a strict increase in the number of issues > to deploying PGO.Your concern is acknowledged.> > These are much bigger "compatibility concerns" for me than for newer > toolchains to accept the old format. For a change in format I can easily > tell my users to replace an exe with a newer one and that is all they need > to do and it takes 10 seconds, guaranteed. A workflow change is potentially > a massive disruption and guaranteed to take more than 10 seconds to fix > (perhaps hours or days).ok.> > >> >> This >> is no different from llvm-cov usage model. > > > In practice, getting the performance of PGO is a higher priority for my > users, so we should not assume that llvm-cov is being used.Glad to hear that :) thanks, David> > -- Sean Silva > >> >> >> David >> >> >> >> > >> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from the reduced >> >> profile size. >> > >> > >> > Let me reiterate that the size of the profile is not a problem I have >> > observed in practice (nor have I heard of this being a problem in >> > practice >> > until this thread). Therefore I'm skeptical of any changes to our >> > default >> > behavior or any new requirements that are not opt-in. >> > >> > -- Sean Silva >> > >> >> >> >> >> >> thanks, >> >> >> >> David >> >> >> >> >> >> >> >> >> >> >which will make it more difficult for me to do this kind >> >> > of analysis (would require using nm on the original binary, hashing >> >> > everything, etc.). >> >> > >> >> > btw, feel free to attach the patch even if it in a rough state. It >> >> > can >> >> > still >> >> > help to clarify the proposal and be a good talking point. >> >> > Fine-grained >> >> > patch >> >> > review for caring about the rough parts will happen on llvm-commits; >> >> > the >> >> > rough parts will not distract the discussion here on llvm-dev. >> >> > >> >> > -- Sean Silva >> >> > >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> David >> >> >> _______________________________________________ >> >> >> LLVM Developers mailing list >> >> >> llvm-dev at lists.llvm.org >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > >> >> > >> > >> > > >
Sean Silva via llvm-dev
2015-Sep-05 06:03 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com> wrote:> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > > > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com> > > wrote: > >> > >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> > wrote: > >> > > >> > > >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <davidxl at google.com > > > >> > wrote: > >> >> > >> >> > > >> >> > I think it is reasonable to simply replace the key we currently use > >> >> > with > >> >> > MD5(key) for getting a size reduction. In practice for my use > cases, > >> >> > I > >> >> > have > >> >> > not observed any of the issues you mentioned under "Large size of > >> >> > overhead > >> >> > can limit the usability of PGO greatly", but I can understand that > >> >> > some > >> >> > of > >> >> > these issues could become problems in Google's use case. I would > >> >> > personally > >> >> > prefer to keep the existing behavior as the default (see below), > and > >> >> > have > >> >> > MD5(key) as an option. > >> >> > >> >> The problem is that this requires profile format changes. > >> > > >> > > >> > Why? AFAIK the "function name" is just an arbitrary string. Using s or > >> > MD5(s) shouldn't matter. Of course, the user will need to pass > >> > consistent > >> > flags to clang. > >> > >> The raw format for 64bit target can be made 'unchanged', but not for > >> the 32bit raw format -- the nameptr field is only 32bit. > >> > >> The indexed format can not be made the same -- The ondisk profile > >> record layout totally changes. The key field changes from a blob of > >> chars into an 64bit integer. > > > > > > An MD5 sum cannot be represented as a blob of chars? > > yes -- it is fixed length (8byte) blob which may include null byte in > the middle. >For reference, MD5 sum is 16 bytes (128-bit): https://en.wikipedia.org/wiki/MD5> > > > > Or to say it another way, suppose that Itanium mangling required as a > final > > step to replace the string with its md5 sum in hex. Therefore all symbol > > names are "small". My understanding is that this is effectively all your > > patch is doing. > > The key type before the change is StringRef, while the after the key > type is uint64_t. Are you suggesting treating uint64_t md5 sum key as > a string of 8 bytes or storing md5 has in text form which will double > the size? >How much does this change the benefit? If most of the benefit is avoiding extraordinarily long mangled names then it may be sufficient. With IR-level instrumentation like Rong is pursuing the size may be reduced sufficiently that we do not need the optimization proposed in this thread. For example, Rong found >2x size reduction on Google's C++ benchmarks, which I assume are representative of the extremely large Google binaries that are causing the problems addressed by your proposal in this thread. The measurements you mention for Clang in this thread provide similar size reductions, so Rong's approach may be sufficient (especially because functions with extremely large mangled names tend to be small inline functions in header-only template libraries). Of the points you mention in "Large size of overhead can limit the usability of PGO greatly", many of the issues are hard limits that prevent the use of PGO. Do you have a lower bound on how much the size of the PGO data must be reduced in order to overcome the hard limits? Obviously LLVM must be able to support the extremely large binaries in your configuration (otherwise what use is LLVM as a compiler ;) My questions are primarily aimed at establishing which tradeoffs are acceptable for supporting this (both for LLVM and for you guys). Btw, for us, the issue of PGO data size is not completely immaterial but is very different from your use case. For us, the primary issue is the additional memory use at run time, since PS4 games usually use "all" available memory. We had a problem with UBSan where the large amount of memory required for storing the UBSan diagnostic data at runtime required the game programmers to manually change their memory map to make room. +Filipe, do you remember how much memory UBSan was using that caused a problem? -- Sean Silva> > In the raw format, md5 sum key can be an embedded field in the > prf_data variable instead of as different var referenced by prf_data. > > > > > If this is not the case, you should show your current patch so that we > can > > discuss things concretely. > > It is not. See above about the difference. > > > > >> > >> > > >> > > >> >> > >> >> It will be > >> >> very messy to support multiple formats in instr-codegen and > >> >> instr-runtime. For compatibility concerns, the reader is taught to > >> >> support previous format, but the changes there are isolated (also > >> >> expected to be removed in the future). > >> >> > >> >> > > >> >> > My primary concern is that if the function name are not kept at all > >> >> > stages, > >> >> > then it becomes difficult to analyze the profile data in a > standalone > >> >> > way. > >> >> > Many times, I have used `llvm-profdata show -all-functions > >> >> > foo.profdata` > >> >> > on > >> >> > the resulting profile data and then imported that data into > >> >> > Mathematica > >> >> > for > >> >> > analysis. > >> >> > >> >> This is certainly a very valid use case. > >> >> > >> >> >My understanding of your proposal is that `llvm-profdata show > >> >> > -all-functions foo.profdata` will not show the actual function > names > >> >> > but > >> >> > instead MD5 hashes, > >> >> > >> >> Yes. > >> >> > >> >> To support your use case, there are two solutions: > >> >> > >> >> 1) user can add -fcoverage-mapping option in the build > >> >> 2) introduce a new option -fprofile-instr-names that force the > >> >> emission of the name sections in the .o file. This is similar to 1), > >> >> but no covmap section is needed. > >> >> > >> >> llvm-profdata tool will be taught to read the name section and > attach > >> >> function names to the profile records. > >> > > >> > > >> > Needing to pass the executable to llvm-profdata would cause deployment > >> > issues for my customers in practice. > >> > >> Why? The deployment needs to pass the profile data anyway right? > > > > > > Yes, but not the executable. > > > > The PGO training run is likely being run by a gameplay tester > > (non-programmer). In general the binary will not be lying around as a > loose > > file anywhere, it will be part of a full package of the binary+assets > (think > > like what will end up on a bluray disc). A game's binary *completely > > useless* without the assets, so except locally on a programmer's machine > > while they iterate/debug, there is no reason for a binary to ever exist > as a > > standalone file. > > > > I'm not saying that needing the binary is insurmountable in any > particular > > scenario. Just that it will cause a strict increase in the number of > issues > > to deploying PGO. > > Your concern is acknowledged. > > > > > These are much bigger "compatibility concerns" for me than for newer > > toolchains to accept the old format. For a change in format I can easily > > tell my users to replace an exe with a newer one and that is all they > need > > to do and it takes 10 seconds, guaranteed. A workflow change is > potentially > > a massive disruption and guaranteed to take more than 10 seconds to fix > > (perhaps hours or days). > > ok. > > > > > > >> > >> This > >> is no different from llvm-cov usage model. > > > > > > In practice, getting the performance of PGO is a higher priority for my > > users, so we should not assume that llvm-cov is being used. > > Glad to hear that :) > > thanks, > > David > > > > > -- Sean Silva > > > >> > >> > >> David > >> > >> > >> > >> > > >> >> > >> >> > >> >> Note that with 1) or 2), the user can still benefit from the reduced > >> >> profile size. > >> > > >> > > >> > Let me reiterate that the size of the profile is not a problem I have > >> > observed in practice (nor have I heard of this being a problem in > >> > practice > >> > until this thread). Therefore I'm skeptical of any changes to our > >> > default > >> > behavior or any new requirements that are not opt-in. > >> > > >> > -- Sean Silva > >> > > >> >> > >> >> > >> >> thanks, > >> >> > >> >> David > >> >> > >> >> > >> >> > >> >> > >> >> >which will make it more difficult for me to do this kind > >> >> > of analysis (would require using nm on the original binary, hashing > >> >> > everything, etc.). > >> >> > > >> >> > btw, feel free to attach the patch even if it in a rough state. It > >> >> > can > >> >> > still > >> >> > help to clarify the proposal and be a good talking point. > >> >> > Fine-grained > >> >> > patch > >> >> > review for caring about the rough parts will happen on > llvm-commits; > >> >> > the > >> >> > rough parts will not distract the discussion here on llvm-dev. > >> >> > > >> >> > -- Sean Silva > >> >> > > >> >> >> > >> >> >> > >> >> >> thanks, > >> >> >> > >> >> >> David > >> >> >> _______________________________________________ > >> >> >> LLVM Developers mailing list > >> >> >> llvm-dev at lists.llvm.org > >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >> > > >> >> > > >> > > >> > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/74dc3e58/attachment.html>
Sean Silva via llvm-dev
2015-Sep-05 06:17 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com> > wrote: > >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > >> > >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com> >> > wrote: >> >> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> >> wrote: >> >> > >> >> > >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li < >> davidxl at google.com> >> >> > wrote: >> >> >> >> >> >> > >> >> >> > I think it is reasonable to simply replace the key we currently >> use >> >> >> > with >> >> >> > MD5(key) for getting a size reduction. In practice for my use >> cases, >> >> >> > I >> >> >> > have >> >> >> > not observed any of the issues you mentioned under "Large size of >> >> >> > overhead >> >> >> > can limit the usability of PGO greatly", but I can understand that >> >> >> > some >> >> >> > of >> >> >> > these issues could become problems in Google's use case. I would >> >> >> > personally >> >> >> > prefer to keep the existing behavior as the default (see below), >> and >> >> >> > have >> >> >> > MD5(key) as an option. >> >> >> >> >> >> The problem is that this requires profile format changes. >> >> > >> >> > >> >> > Why? AFAIK the "function name" is just an arbitrary string. Using s >> or >> >> > MD5(s) shouldn't matter. Of course, the user will need to pass >> >> > consistent >> >> > flags to clang. >> >> >> >> The raw format for 64bit target can be made 'unchanged', but not for >> >> the 32bit raw format -- the nameptr field is only 32bit. >> >> >> >> The indexed format can not be made the same -- The ondisk profile >> >> record layout totally changes. The key field changes from a blob of >> >> chars into an 64bit integer. >> > >> > >> > An MD5 sum cannot be represented as a blob of chars? >> >> yes -- it is fixed length (8byte) blob which may include null byte in >> the middle. >> > > For reference, MD5 sum is 16 bytes (128-bit): > https://en.wikipedia.org/wiki/MD5 > > >> >> > >> > Or to say it another way, suppose that Itanium mangling required as a >> final >> > step to replace the string with its md5 sum in hex. Therefore all symbol >> > names are "small". My understanding is that this is effectively all your >> > patch is doing. >> >> The key type before the change is StringRef, while the after the key >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as >> a string of 8 bytes or storing md5 has in text form which will double >> the size? >> > > How much does this change the benefit? If most of the benefit is avoiding > extraordinarily long mangled names then it may be sufficient. > > With IR-level instrumentation like Rong is pursuing the size may be > reduced sufficiently that we do not need the optimization proposed in this > thread. For example, Rong found >2x size reduction >To clarify, I mean the size of the PGO data, not the binary size. Column (2) in section 3.5 of Rong's RFC to be precise. -- Sean Silva> on Google's C++ benchmarks, which I assume are representative of the > extremely large Google binaries that are causing the problems addressed by > your proposal in this thread. The measurements you mention for Clang in > this thread provide similar size reductions, so Rong's approach may be > sufficient (especially because functions with extremely large mangled names > tend to be small inline functions in header-only template libraries). > > Of the points you mention in "Large size of overhead can limit the > usability of PGO greatly", many of the issues are hard limits that prevent > the use of PGO. Do you have a lower bound on how much the size of the PGO > data must be reduced in order to overcome the hard limits? > > Obviously LLVM must be able to support the extremely large binaries in > your configuration (otherwise what use is LLVM as a compiler ;) My > questions are primarily aimed at establishing which tradeoffs are > acceptable for supporting this (both for LLVM and for you guys). > > Btw, for us, the issue of PGO data size is not completely immaterial but > is very different from your use case. For us, the primary issue is the > additional memory use at run time, since PS4 games usually use "all" > available memory. We had a problem with UBSan where the large amount of > memory required for storing the UBSan diagnostic data at runtime required > the game programmers to manually change their memory map to make room. > +Filipe, do you remember how much memory UBSan was using that caused a > problem? > > -- Sean Silva > > >> >> In the raw format, md5 sum key can be an embedded field in the >> prf_data variable instead of as different var referenced by prf_data. >> >> > >> > If this is not the case, you should show your current patch so that we >> can >> > discuss things concretely. >> >> It is not. See above about the difference. >> >> > >> >> >> >> > >> >> > >> >> >> >> >> >> It will be >> >> >> very messy to support multiple formats in instr-codegen and >> >> >> instr-runtime. For compatibility concerns, the reader is taught to >> >> >> support previous format, but the changes there are isolated (also >> >> >> expected to be removed in the future). >> >> >> >> >> >> > >> >> >> > My primary concern is that if the function name are not kept at >> all >> >> >> > stages, >> >> >> > then it becomes difficult to analyze the profile data in a >> standalone >> >> >> > way. >> >> >> > Many times, I have used `llvm-profdata show -all-functions >> >> >> > foo.profdata` >> >> >> > on >> >> >> > the resulting profile data and then imported that data into >> >> >> > Mathematica >> >> >> > for >> >> >> > analysis. >> >> >> >> >> >> This is certainly a very valid use case. >> >> >> >> >> >> >My understanding of your proposal is that `llvm-profdata show >> >> >> > -all-functions foo.profdata` will not show the actual function >> names >> >> >> > but >> >> >> > instead MD5 hashes, >> >> >> >> >> >> Yes. >> >> >> >> >> >> To support your use case, there are two solutions: >> >> >> >> >> >> 1) user can add -fcoverage-mapping option in the build >> >> >> 2) introduce a new option -fprofile-instr-names that force the >> >> >> emission of the name sections in the .o file. This is similar to 1), >> >> >> but no covmap section is needed. >> >> >> >> >> >> llvm-profdata tool will be taught to read the name section and >> attach >> >> >> function names to the profile records. >> >> > >> >> > >> >> > Needing to pass the executable to llvm-profdata would cause >> deployment >> >> > issues for my customers in practice. >> >> >> >> Why? The deployment needs to pass the profile data anyway right? >> > >> > >> > Yes, but not the executable. >> > >> > The PGO training run is likely being run by a gameplay tester >> > (non-programmer). In general the binary will not be lying around as a >> loose >> > file anywhere, it will be part of a full package of the binary+assets >> (think >> > like what will end up on a bluray disc). A game's binary *completely >> > useless* without the assets, so except locally on a programmer's machine >> > while they iterate/debug, there is no reason for a binary to ever exist >> as a >> > standalone file. >> > >> > I'm not saying that needing the binary is insurmountable in any >> particular >> > scenario. Just that it will cause a strict increase in the number of >> issues >> > to deploying PGO. >> >> Your concern is acknowledged. >> >> > >> > These are much bigger "compatibility concerns" for me than for newer >> > toolchains to accept the old format. For a change in format I can easily >> > tell my users to replace an exe with a newer one and that is all they >> need >> > to do and it takes 10 seconds, guaranteed. A workflow change is >> potentially >> > a massive disruption and guaranteed to take more than 10 seconds to fix >> > (perhaps hours or days). >> >> ok. >> >> > >> > >> >> >> >> This >> >> is no different from llvm-cov usage model. >> > >> > >> > In practice, getting the performance of PGO is a higher priority for my >> > users, so we should not assume that llvm-cov is being used. >> >> Glad to hear that :) >> >> thanks, >> >> David >> >> > >> > -- Sean Silva >> > >> >> >> >> >> >> David >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from the reduced >> >> >> profile size. >> >> > >> >> > >> >> > Let me reiterate that the size of the profile is not a problem I have >> >> > observed in practice (nor have I heard of this being a problem in >> >> > practice >> >> > until this thread). Therefore I'm skeptical of any changes to our >> >> > default >> >> > behavior or any new requirements that are not opt-in. >> >> > >> >> > -- Sean Silva >> >> > >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >which will make it more difficult for me to do this kind >> >> >> > of analysis (would require using nm on the original binary, >> hashing >> >> >> > everything, etc.). >> >> >> > >> >> >> > btw, feel free to attach the patch even if it in a rough state. It >> >> >> > can >> >> >> > still >> >> >> > help to clarify the proposal and be a good talking point. >> >> >> > Fine-grained >> >> >> > patch >> >> >> > review for caring about the rough parts will happen on >> llvm-commits; >> >> >> > the >> >> >> > rough parts will not distract the discussion here on llvm-dev. >> >> >> > >> >> >> > -- Sean Silva >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> >> >> David >> >> >> >> _______________________________________________ >> >> >> >> LLVM Developers mailing list >> >> >> >> llvm-dev at lists.llvm.org >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/4a1b93ce/attachment.html>
Xinliang David Li via llvm-dev
2015-Sep-05 06:44 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com> > wrote: >> >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > >> > >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com> >> > wrote: >> >> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> >> >> wrote: >> >> > >> >> > >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li >> >> > <davidxl at google.com> >> >> > wrote: >> >> >> >> >> >> > >> >> >> > I think it is reasonable to simply replace the key we currently >> >> >> > use >> >> >> > with >> >> >> > MD5(key) for getting a size reduction. In practice for my use >> >> >> > cases, >> >> >> > I >> >> >> > have >> >> >> > not observed any of the issues you mentioned under "Large size of >> >> >> > overhead >> >> >> > can limit the usability of PGO greatly", but I can understand that >> >> >> > some >> >> >> > of >> >> >> > these issues could become problems in Google's use case. I would >> >> >> > personally >> >> >> > prefer to keep the existing behavior as the default (see below), >> >> >> > and >> >> >> > have >> >> >> > MD5(key) as an option. >> >> >> >> >> >> The problem is that this requires profile format changes. >> >> > >> >> > >> >> > Why? AFAIK the "function name" is just an arbitrary string. Using s >> >> > or >> >> > MD5(s) shouldn't matter. Of course, the user will need to pass >> >> > consistent >> >> > flags to clang. >> >> >> >> The raw format for 64bit target can be made 'unchanged', but not for >> >> the 32bit raw format -- the nameptr field is only 32bit. >> >> >> >> The indexed format can not be made the same -- The ondisk profile >> >> record layout totally changes. The key field changes from a blob of >> >> chars into an 64bit integer. >> > >> > >> > An MD5 sum cannot be represented as a blob of chars? >> >> yes -- it is fixed length (8byte) blob which may include null byte in >> the middle. > > > For reference, MD5 sum is 16 bytes (128-bit): > https://en.wikipedia.org/wiki/MD5yes, LLVM's MD5 hash only takes the lower 64bit.> >> >> >> > >> > Or to say it another way, suppose that Itanium mangling required as a >> > final >> > step to replace the string with its md5 sum in hex. Therefore all symbol >> > names are "small". My understanding is that this is effectively all your >> > patch is doing. >> >> The key type before the change is StringRef, while the after the key >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as >> a string of 8 bytes or storing md5 has in text form which will double >> the size? > > > How much does this change the benefit? If most of the benefit is avoiding > extraordinarily long mangled names then it may be sufficient. > > With IR-level instrumentation like Rong is pursuing the size may be reduced > sufficiently that we do not need the optimization proposed in this thread. > For example, Rong found >2x size reduction on Google's C++ benchmarks, which > I assume are representative of the extremely large Google binaries that are > causing the problems addressed by your proposal in this thread. The > measurements you mention for Clang in this thread provide similar size > reductions, so Rong's approach may be sufficient (especially because > functions with extremely large mangled names tend to be small inline > functions in header-only template libraries).Late instrumentation helps many cases. In some cases (as shown in SPEC), the reduction in size is not as large. Reducing PGO overhead will lower the bar for its adoption.> > Of the points you mention in "Large size of overhead can limit the usability > of PGO greatly", many of the issues are hard limits that prevent the use of > PGO. Do you have a lower bound on how much the size of the PGO data must be > reduced in order to overcome the hard limits?This is a static view: Think about the situation where application size is ever increasing; also think about situation where we want to collect more types of profile data. Think about situation where user want to run pgo binaries on small devices with tiny memory/storage ..> > Obviously LLVM must be able to support the extremely large binaries in your > configuration (otherwise what use is LLVM as a compiler ;) My questions are > primarily aimed at establishing which tradeoffs are acceptable for > supporting this (both for LLVM and for you guys).As I said, with the modified proposal (after getting your feedback), no PGO users in LLVM land is going to lose anything/functionality. The end result will be net win for general users of LLVM (even though your customers don't care about it), not just 'us' as you have mentioned many times.> > Btw, for us, the issue of PGO data size is not completely immaterial but is > very different from your use case. For us, the primary issue is the > additional memory use at run time, since PS4 games usually use "all" > available memory. We had a problem with UBSan where the large amount of > memory required for storing the UBSan diagnostic data at runtime required > the game programmers to manually change their memory map to make room. > +Filipe, do you remember how much memory UBSan was using that caused a > problem? >My proposal does help reducing rodata size significantly. David> -- Sean Silva > >> >> >> In the raw format, md5 sum key can be an embedded field in the >> prf_data variable instead of as different var referenced by prf_data. >> >> > >> > If this is not the case, you should show your current patch so that we >> > can >> > discuss things concretely. >> >> It is not. See above about the difference. >> >> > >> >> >> >> > >> >> > >> >> >> >> >> >> It will be >> >> >> very messy to support multiple formats in instr-codegen and >> >> >> instr-runtime. For compatibility concerns, the reader is taught to >> >> >> support previous format, but the changes there are isolated (also >> >> >> expected to be removed in the future). >> >> >> >> >> >> > >> >> >> > My primary concern is that if the function name are not kept at >> >> >> > all >> >> >> > stages, >> >> >> > then it becomes difficult to analyze the profile data in a >> >> >> > standalone >> >> >> > way. >> >> >> > Many times, I have used `llvm-profdata show -all-functions >> >> >> > foo.profdata` >> >> >> > on >> >> >> > the resulting profile data and then imported that data into >> >> >> > Mathematica >> >> >> > for >> >> >> > analysis. >> >> >> >> >> >> This is certainly a very valid use case. >> >> >> >> >> >> >My understanding of your proposal is that `llvm-profdata show >> >> >> > -all-functions foo.profdata` will not show the actual function >> >> >> > names >> >> >> > but >> >> >> > instead MD5 hashes, >> >> >> >> >> >> Yes. >> >> >> >> >> >> To support your use case, there are two solutions: >> >> >> >> >> >> 1) user can add -fcoverage-mapping option in the build >> >> >> 2) introduce a new option -fprofile-instr-names that force the >> >> >> emission of the name sections in the .o file. This is similar to 1), >> >> >> but no covmap section is needed. >> >> >> >> >> >> llvm-profdata tool will be taught to read the name section and >> >> >> attach >> >> >> function names to the profile records. >> >> > >> >> > >> >> > Needing to pass the executable to llvm-profdata would cause >> >> > deployment >> >> > issues for my customers in practice. >> >> >> >> Why? The deployment needs to pass the profile data anyway right? >> > >> > >> > Yes, but not the executable. >> > >> > The PGO training run is likely being run by a gameplay tester >> > (non-programmer). In general the binary will not be lying around as a >> > loose >> > file anywhere, it will be part of a full package of the binary+assets >> > (think >> > like what will end up on a bluray disc). A game's binary *completely >> > useless* without the assets, so except locally on a programmer's machine >> > while they iterate/debug, there is no reason for a binary to ever exist >> > as a >> > standalone file. >> > >> > I'm not saying that needing the binary is insurmountable in any >> > particular >> > scenario. Just that it will cause a strict increase in the number of >> > issues >> > to deploying PGO. >> >> Your concern is acknowledged. >> >> > >> > These are much bigger "compatibility concerns" for me than for newer >> > toolchains to accept the old format. For a change in format I can easily >> > tell my users to replace an exe with a newer one and that is all they >> > need >> > to do and it takes 10 seconds, guaranteed. A workflow change is >> > potentially >> > a massive disruption and guaranteed to take more than 10 seconds to fix >> > (perhaps hours or days). >> >> ok. >> >> > >> > >> >> >> >> This >> >> is no different from llvm-cov usage model. >> > >> > >> > In practice, getting the performance of PGO is a higher priority for my >> > users, so we should not assume that llvm-cov is being used. >> >> Glad to hear that :) >> >> thanks, >> >> David >> >> > >> > -- Sean Silva >> > >> >> >> >> >> >> David >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from the reduced >> >> >> profile size. >> >> > >> >> > >> >> > Let me reiterate that the size of the profile is not a problem I have >> >> > observed in practice (nor have I heard of this being a problem in >> >> > practice >> >> > until this thread). Therefore I'm skeptical of any changes to our >> >> > default >> >> > behavior or any new requirements that are not opt-in. >> >> > >> >> > -- Sean Silva >> >> > >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >which will make it more difficult for me to do this kind >> >> >> > of analysis (would require using nm on the original binary, >> >> >> > hashing >> >> >> > everything, etc.). >> >> >> > >> >> >> > btw, feel free to attach the patch even if it in a rough state. It >> >> >> > can >> >> >> > still >> >> >> > help to clarify the proposal and be a good talking point. >> >> >> > Fine-grained >> >> >> > patch >> >> >> > review for caring about the rough parts will happen on >> >> >> > llvm-commits; >> >> >> > the >> >> >> > rough parts will not distract the discussion here on llvm-dev. >> >> >> > >> >> >> > -- Sean Silva >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> >> >> David >> >> >> >> _______________________________________________ >> >> >> >> LLVM Developers mailing list >> >> >> >> llvm-dev at lists.llvm.org >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > > >
Filipe Cabecinhas via llvm-dev
2015-Sep-05 18:23 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
Hi Sean, Xinliang, I don't have the numbers for the UBSan problem we had (the binary was much bigger than clang/opt and made from many more files). But here's how it changes in clang/opt (OS X): -rwxr-xr-x 1 49M Aug 10 23:15 build-asserts/bin/clang-3.8 -rwxr-xr-x 1 164M Aug 10 21:15 build-debug/bin/clang-3.8 -rwxr-xr-x 1 370M Sep 5 10:24 build-ubsan/bin/clang-3.8 -rwxr-xr-x 1 44M Sep 4 20:43 build/bin/clang-3.8 -rwxr-xr-x 1 25M Aug 10 21:03 build-asserts/bin/opt -rwxr-xr-x 1 67M Aug 10 21:16 build-debug/bin/opt -rwxr-xr-x 1 156M Sep 5 10:24 build-ubsan/bin/opt -rwxr-xr-x 1 22M Sep 4 20:45 build/bin/opt build and build-ubsan are the same, except for -DLLVM_USE_SANITIZER=Undefined build and build-asserts are the same, except for -DLLVM_ENABLE_ASSERTIONS=ON build and build-debug are the same except for -DCMAKE_BUILD_TYPE=Debug (IIRC) As you can see, a release build grows to more than 7.5x just by enabling UBSan! This is more than double the debug build! Each check will add a few bytes (depending on the check, I think the median is more than 32B (just recollection, didn't go and double-check)) For every file which has checks, the full file path is in the binary as a string. (I want to, at least, make this (configurably) smaller, but will check to make sure there's savings to be had, first) This is not about runtime memory, but binary size. On our platform, game teams have to account for binary size too. Filipe On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com> > wrote: > >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > >> > >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com> >> > wrote: >> >> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> >> wrote: >> >> > >> >> > >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li < >> davidxl at google.com> >> >> > wrote: >> >> >> >> >> >> > >> >> >> > I think it is reasonable to simply replace the key we currently >> use >> >> >> > with >> >> >> > MD5(key) for getting a size reduction. In practice for my use >> cases, >> >> >> > I >> >> >> > have >> >> >> > not observed any of the issues you mentioned under "Large size of >> >> >> > overhead >> >> >> > can limit the usability of PGO greatly", but I can understand that >> >> >> > some >> >> >> > of >> >> >> > these issues could become problems in Google's use case. I would >> >> >> > personally >> >> >> > prefer to keep the existing behavior as the default (see below), >> and >> >> >> > have >> >> >> > MD5(key) as an option. >> >> >> >> >> >> The problem is that this requires profile format changes. >> >> > >> >> > >> >> > Why? AFAIK the "function name" is just an arbitrary string. Using s >> or >> >> > MD5(s) shouldn't matter. Of course, the user will need to pass >> >> > consistent >> >> > flags to clang. >> >> >> >> The raw format for 64bit target can be made 'unchanged', but not for >> >> the 32bit raw format -- the nameptr field is only 32bit. >> >> >> >> The indexed format can not be made the same -- The ondisk profile >> >> record layout totally changes. The key field changes from a blob of >> >> chars into an 64bit integer. >> > >> > >> > An MD5 sum cannot be represented as a blob of chars? >> >> yes -- it is fixed length (8byte) blob which may include null byte in >> the middle. >> > > For reference, MD5 sum is 16 bytes (128-bit): > https://en.wikipedia.org/wiki/MD5 > > >> >> > >> > Or to say it another way, suppose that Itanium mangling required as a >> final >> > step to replace the string with its md5 sum in hex. Therefore all symbol >> > names are "small". My understanding is that this is effectively all your >> > patch is doing. >> >> The key type before the change is StringRef, while the after the key >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as >> a string of 8 bytes or storing md5 has in text form which will double >> the size? >> > > How much does this change the benefit? If most of the benefit is avoiding > extraordinarily long mangled names then it may be sufficient. > > With IR-level instrumentation like Rong is pursuing the size may be > reduced sufficiently that we do not need the optimization proposed in this > thread. For example, Rong found >2x size reduction on Google's C++ > benchmarks, which I assume are representative of the extremely large Google > binaries that are causing the problems addressed by your proposal in this > thread. The measurements you mention for Clang in this thread provide > similar size reductions, so Rong's approach may be sufficient (especially > because functions with extremely large mangled names tend to be small > inline functions in header-only template libraries). > > Of the points you mention in "Large size of overhead can limit the > usability of PGO greatly", many of the issues are hard limits that prevent > the use of PGO. Do you have a lower bound on how much the size of the PGO > data must be reduced in order to overcome the hard limits? > > Obviously LLVM must be able to support the extremely large binaries in > your configuration (otherwise what use is LLVM as a compiler ;) My > questions are primarily aimed at establishing which tradeoffs are > acceptable for supporting this (both for LLVM and for you guys). > > Btw, for us, the issue of PGO data size is not completely immaterial but > is very different from your use case. For us, the primary issue is the > additional memory use at run time, since PS4 games usually use "all" > available memory. We had a problem with UBSan where the large amount of > memory required for storing the UBSan diagnostic data at runtime required > the game programmers to manually change their memory map to make room. > +Filipe, do you remember how much memory UBSan was using that caused a > problem? > > -- Sean Silva > > >> >> In the raw format, md5 sum key can be an embedded field in the >> prf_data variable instead of as different var referenced by prf_data. >> >> > >> > If this is not the case, you should show your current patch so that we >> can >> > discuss things concretely. >> >> It is not. See above about the difference. >> >> > >> >> >> >> > >> >> > >> >> >> >> >> >> It will be >> >> >> very messy to support multiple formats in instr-codegen and >> >> >> instr-runtime. For compatibility concerns, the reader is taught to >> >> >> support previous format, but the changes there are isolated (also >> >> >> expected to be removed in the future). >> >> >> >> >> >> > >> >> >> > My primary concern is that if the function name are not kept at >> all >> >> >> > stages, >> >> >> > then it becomes difficult to analyze the profile data in a >> standalone >> >> >> > way. >> >> >> > Many times, I have used `llvm-profdata show -all-functions >> >> >> > foo.profdata` >> >> >> > on >> >> >> > the resulting profile data and then imported that data into >> >> >> > Mathematica >> >> >> > for >> >> >> > analysis. >> >> >> >> >> >> This is certainly a very valid use case. >> >> >> >> >> >> >My understanding of your proposal is that `llvm-profdata show >> >> >> > -all-functions foo.profdata` will not show the actual function >> names >> >> >> > but >> >> >> > instead MD5 hashes, >> >> >> >> >> >> Yes. >> >> >> >> >> >> To support your use case, there are two solutions: >> >> >> >> >> >> 1) user can add -fcoverage-mapping option in the build >> >> >> 2) introduce a new option -fprofile-instr-names that force the >> >> >> emission of the name sections in the .o file. This is similar to 1), >> >> >> but no covmap section is needed. >> >> >> >> >> >> llvm-profdata tool will be taught to read the name section and >> attach >> >> >> function names to the profile records. >> >> > >> >> > >> >> > Needing to pass the executable to llvm-profdata would cause >> deployment >> >> > issues for my customers in practice. >> >> >> >> Why? The deployment needs to pass the profile data anyway right? >> > >> > >> > Yes, but not the executable. >> > >> > The PGO training run is likely being run by a gameplay tester >> > (non-programmer). In general the binary will not be lying around as a >> loose >> > file anywhere, it will be part of a full package of the binary+assets >> (think >> > like what will end up on a bluray disc). A game's binary *completely >> > useless* without the assets, so except locally on a programmer's machine >> > while they iterate/debug, there is no reason for a binary to ever exist >> as a >> > standalone file. >> > >> > I'm not saying that needing the binary is insurmountable in any >> particular >> > scenario. Just that it will cause a strict increase in the number of >> issues >> > to deploying PGO. >> >> Your concern is acknowledged. >> >> > >> > These are much bigger "compatibility concerns" for me than for newer >> > toolchains to accept the old format. For a change in format I can easily >> > tell my users to replace an exe with a newer one and that is all they >> need >> > to do and it takes 10 seconds, guaranteed. A workflow change is >> potentially >> > a massive disruption and guaranteed to take more than 10 seconds to fix >> > (perhaps hours or days). >> >> ok. >> >> > >> > >> >> >> >> This >> >> is no different from llvm-cov usage model. >> > >> > >> > In practice, getting the performance of PGO is a higher priority for my >> > users, so we should not assume that llvm-cov is being used. >> >> Glad to hear that :) >> >> thanks, >> >> David >> >> > >> > -- Sean Silva >> > >> >> >> >> >> >> David >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from the reduced >> >> >> profile size. >> >> > >> >> > >> >> > Let me reiterate that the size of the profile is not a problem I have >> >> > observed in practice (nor have I heard of this being a problem in >> >> > practice >> >> > until this thread). Therefore I'm skeptical of any changes to our >> >> > default >> >> > behavior or any new requirements that are not opt-in. >> >> > >> >> > -- Sean Silva >> >> > >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >which will make it more difficult for me to do this kind >> >> >> > of analysis (would require using nm on the original binary, >> hashing >> >> >> > everything, etc.). >> >> >> > >> >> >> > btw, feel free to attach the patch even if it in a rough state. It >> >> >> > can >> >> >> > still >> >> >> > help to clarify the proposal and be a good talking point. >> >> >> > Fine-grained >> >> >> > patch >> >> >> > review for caring about the rough parts will happen on >> llvm-commits; >> >> >> > the >> >> >> > rough parts will not distract the discussion here on llvm-dev. >> >> >> > >> >> >> > -- Sean Silva >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> >> >> David >> >> >> >> _______________________________________________ >> >> >> >> LLVM Developers mailing list >> >> >> >> llvm-dev at lists.llvm.org >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150905/589bbca9/attachment.html>
Xinliang David Li via llvm-dev
2015-Sep-08 05:38 UTC
[llvm-dev] RFC: Reducing Instr PGO size overhead
>> The key type before the change is StringRef, while the after the key >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as >> a string of 8 bytes or storing md5 has in text form which will double >> the size? > > > How much does this change the benefit? If most of the benefit is avoiding > extraordinarily long mangled names then it may be sufficient.I implemented the same functionality using the approach suggested by you. 1) There is no format change required for raw/indexed profile data 2) when a user option is specified, md5hash string is used as the function key instead of of the raw function name In addition to that, the option specified is also passed to the runtime and emitted into profile data. This allows different variants of the profile data encoded in the same format to be automatically recognized by the client (clang -fprofile-use, llvm-profdata etc) without the need to specify the option again. There are some size increase, but increase seems to be in the range that is acceptable. Using clang as the test case. The indexed profile size is now 33.6 M (compared with 29.5M which is the optimal case). The raw profile size is 54.1M (compared with 40.1M from previous patch). David> > With IR-level instrumentation like Rong is pursuing the size may be reduced > sufficiently that we do not need the optimization proposed in this thread. > For example, Rong found >2x size reduction on Google's C++ benchmarks, which > I assume are representative of the extremely large Google binaries that are > causing the problems addressed by your proposal in this thread. The > measurements you mention for Clang in this thread provide similar size > reductions, so Rong's approach may be sufficient (especially because > functions with extremely large mangled names tend to be small inline > functions in header-only template libraries). > > Of the points you mention in "Large size of overhead can limit the usability > of PGO greatly", many of the issues are hard limits that prevent the use of > PGO. Do you have a lower bound on how much the size of the PGO data must be > reduced in order to overcome the hard limits? > > Obviously LLVM must be able to support the extremely large binaries in your > configuration (otherwise what use is LLVM as a compiler ;) My questions are > primarily aimed at establishing which tradeoffs are acceptable for > supporting this (both for LLVM and for you guys). > > Btw, for us, the issue of PGO data size is not completely immaterial but is > very different from your use case. For us, the primary issue is the > additional memory use at run time, since PS4 games usually use "all" > available memory. We had a problem with UBSan where the large amount of > memory required for storing the UBSan diagnostic data at runtime required > the game programmers to manually change their memory map to make room. > +Filipe, do you remember how much memory UBSan was using that caused a > problem? > > -- Sean Silva > >> >> >> In the raw format, md5 sum key can be an embedded field in the >> prf_data variable instead of as different var referenced by prf_data. >> >> > >> > If this is not the case, you should show your current patch so that we >> > can >> > discuss things concretely. >> >> It is not. See above about the difference. >> >> > >> >> >> >> > >> >> > >> >> >> >> >> >> It will be >> >> >> very messy to support multiple formats in instr-codegen and >> >> >> instr-runtime. For compatibility concerns, the reader is taught to >> >> >> support previous format, but the changes there are isolated (also >> >> >> expected to be removed in the future). >> >> >> >> >> >> > >> >> >> > My primary concern is that if the function name are not kept at >> >> >> > all >> >> >> > stages, >> >> >> > then it becomes difficult to analyze the profile data in a >> >> >> > standalone >> >> >> > way. >> >> >> > Many times, I have used `llvm-profdata show -all-functions >> >> >> > foo.profdata` >> >> >> > on >> >> >> > the resulting profile data and then imported that data into >> >> >> > Mathematica >> >> >> > for >> >> >> > analysis. >> >> >> >> >> >> This is certainly a very valid use case. >> >> >> >> >> >> >My understanding of your proposal is that `llvm-profdata show >> >> >> > -all-functions foo.profdata` will not show the actual function >> >> >> > names >> >> >> > but >> >> >> > instead MD5 hashes, >> >> >> >> >> >> Yes. >> >> >> >> >> >> To support your use case, there are two solutions: >> >> >> >> >> >> 1) user can add -fcoverage-mapping option in the build >> >> >> 2) introduce a new option -fprofile-instr-names that force the >> >> >> emission of the name sections in the .o file. This is similar to 1), >> >> >> but no covmap section is needed. >> >> >> >> >> >> llvm-profdata tool will be taught to read the name section and >> >> >> attach >> >> >> function names to the profile records. >> >> > >> >> > >> >> > Needing to pass the executable to llvm-profdata would cause >> >> > deployment >> >> > issues for my customers in practice. >> >> >> >> Why? The deployment needs to pass the profile data anyway right? >> > >> > >> > Yes, but not the executable. >> > >> > The PGO training run is likely being run by a gameplay tester >> > (non-programmer). In general the binary will not be lying around as a >> > loose >> > file anywhere, it will be part of a full package of the binary+assets >> > (think >> > like what will end up on a bluray disc). A game's binary *completely >> > useless* without the assets, so except locally on a programmer's machine >> > while they iterate/debug, there is no reason for a binary to ever exist >> > as a >> > standalone file. >> > >> > I'm not saying that needing the binary is insurmountable in any >> > particular >> > scenario. Just that it will cause a strict increase in the number of >> > issues >> > to deploying PGO. >> >> Your concern is acknowledged. >> >> > >> > These are much bigger "compatibility concerns" for me than for newer >> > toolchains to accept the old format. For a change in format I can easily >> > tell my users to replace an exe with a newer one and that is all they >> > need >> > to do and it takes 10 seconds, guaranteed. A workflow change is >> > potentially >> > a massive disruption and guaranteed to take more than 10 seconds to fix >> > (perhaps hours or days). >> >> ok. >> >> > >> > >> >> >> >> This >> >> is no different from llvm-cov usage model. >> > >> > >> > In practice, getting the performance of PGO is a higher priority for my >> > users, so we should not assume that llvm-cov is being used. >> >> Glad to hear that :) >> >> thanks, >> >> David >> >> > >> > -- Sean Silva >> > >> >> >> >> >> >> David >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from the reduced >> >> >> profile size. >> >> > >> >> > >> >> > Let me reiterate that the size of the profile is not a problem I have >> >> > observed in practice (nor have I heard of this being a problem in >> >> > practice >> >> > until this thread). Therefore I'm skeptical of any changes to our >> >> > default >> >> > behavior or any new requirements that are not opt-in. >> >> > >> >> > -- Sean Silva >> >> > >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >which will make it more difficult for me to do this kind >> >> >> > of analysis (would require using nm on the original binary, >> >> >> > hashing >> >> >> > everything, etc.). >> >> >> > >> >> >> > btw, feel free to attach the patch even if it in a rough state. It >> >> >> > can >> >> >> > still >> >> >> > help to clarify the proposal and be a good talking point. >> >> >> > Fine-grained >> >> >> > patch >> >> >> > review for caring about the rough parts will happen on >> >> >> > llvm-commits; >> >> >> > the >> >> >> > rough parts will not distract the discussion here on llvm-dev. >> >> >> > >> >> >> > -- Sean Silva >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> thanks, >> >> >> >> >> >> >> >> David >> >> >> >> _______________________________________________ >> >> >> >> LLVM Developers mailing list >> >> >> >> llvm-dev at lists.llvm.org >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > > >