Justin, do you have more concerns on keeping value_kind? If there is still disagreement, can we agree to move on with it for now ? After the initial version of the patches checked in, we can do more serious testings with large apps and revisit this if there are problems discovered. thanks, David On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li <davidxl at google.com> wrote:>> But since the consumer is the frontend, and it knows which counts are >> which, it knows the kind, no? I don't understand how storing the kind >> info helps or hurts - it's just redundant. > > Yes, the frontend consumer knows, but the raw reader does not. It is > natural to do those processing when processing the raw profile data > (for instance when function pointer to name mapping still exists). > >> >>>> >>>> The other thing we should do is store which profiling options are >>>> enabled, in both formats. When we specify profile-instr-use it'll be >>>> less error prone if we can detect whether or not this includes indirect >>>> call profiling without checking other options, and it's probably a good >>>> idea for "llvm-profdata merge" to disallow merging profiles that are >>>> gathering different sets of data. A bitfield seems suitable for this. >>> >>> For value profiling, allowing profile-gen and profile-use passes using >>> different options is a useful feature. Consider the following >>> scenarios: >>> 1) collect the same profile data once with all kinds of value profile >>> data collected. The exact same profile data can be used in performance >>> experiments with different kinds of value profiling enabled/disabled >> >> This use case is pretty easy to accomodate for with the model where the >> profile stores its type. We could autodetect the type to decide how to >> interpret the profile, and if more specific profile flags are set simply >> ignore some of the data. The thing that the file format storing the type >> gives us is that it's harder to misinterpret the data by supplying the >> mismatching flags between reading and writing. > > Yes, this is achievable with some extra effort -- for instance by > collecting the profile sites of all kinds regardless of the flags. > After profiling matching, simply drop selected sites according to the > flags. > > This adds complexity of the code. With value kinds encoded in profile > data, the handling is much simpler -- each value profiler only needs > to deal with its own kind. > >> >>> 2) work around profile-use/value transformation bug selectively for >>> some file without the need to change anything in instrumentation pass >> >> I'm not sure I understand what you mean here. > > Similar to the above, but for correctness. > >> >>> Besides, with the latest patch, the value_kind is not recorded in each >>> profile value thus the overhead is minimized. >> >> The overhead is low, sure, but the code complexity of dealing with the >> multiple kinds in this way is quite real. > > I actually believe having value-kind can help reduce code complexity > instead of the other way around. What is the extra complexity? > > thanks, > > David > > >>Since I'm not convinced we're >> getting real benefit from storing the kind, I don't believe this >> trade-off is worth it. >> >>> thanks, >>> >>> David
> Justin, do you have more concerns on keeping value_kind?Hi Justin, The next CL's according to the merge plan all depend on the value_kind. Therefore both http://reviews.llvm.org/D10471 and http://reviews.llvm.org/D10674 introduce it. Could you please review these CL's? As David stated, we need to bring the value_kind discussion to finalization w/ a decision. Since the original IC profiling patches to today it's been over two months and we're still not arrived at a conclusion. -Betul> If there is still disagreement, can we agree to move on with it for > now ? After the initial version of the patches checked in, we can do > more serious testings with large apps and revisit this if there are > problems discovered. > > thanks, > > David > > > > On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li <davidxl at google.com> > wrote: >>> But since the consumer is the frontend, and it knows which counts are >>> which, it knows the kind, no? I don't understand how storing the kind >>> info helps or hurts - it's just redundant. >> >> Yes, the frontend consumer knows, but the raw reader does not. It is >> natural to do those processing when processing the raw profile data >> (for instance when function pointer to name mapping still exists). >> >>> >>>>> >>>>> The other thing we should do is store which profiling options are >>>>> enabled, in both formats. When we specify profile-instr-use it'll be >>>>> less error prone if we can detect whether or not this includes >>>>> indirect >>>>> call profiling without checking other options, and it's probably a >>>>> good >>>>> idea for "llvm-profdata merge" to disallow merging profiles that are >>>>> gathering different sets of data. A bitfield seems suitable for this. >>>> >>>> For value profiling, allowing profile-gen and profile-use passes using >>>> different options is a useful feature. Consider the following >>>> scenarios: >>>> 1) collect the same profile data once with all kinds of value profile >>>> data collected. The exact same profile data can be used in >>>> performance >>>> experiments with different kinds of value profiling >>>> enabled/disabled >>> >>> This use case is pretty easy to accomodate for with the model where the >>> profile stores its type. We could autodetect the type to decide how to >>> interpret the profile, and if more specific profile flags are set >>> simply >>> ignore some of the data. The thing that the file format storing the >>> type >>> gives us is that it's harder to misinterpret the data by supplying the >>> mismatching flags between reading and writing. >> >> Yes, this is achievable with some extra effort -- for instance by >> collecting the profile sites of all kinds regardless of the flags. >> After profiling matching, simply drop selected sites according to the >> flags. >> >> This adds complexity of the code. With value kinds encoded in profile >> data, the handling is much simpler -- each value profiler only needs >> to deal with its own kind. >> >>> >>>> 2) work around profile-use/value transformation bug selectively for >>>> some file without the need to change anything in instrumentation >>>> pass >>> >>> I'm not sure I understand what you mean here. >> >> Similar to the above, but for correctness. >> >>> >>>> Besides, with the latest patch, the value_kind is not recorded in each >>>> profile value thus the overhead is minimized. >>> >>> The overhead is low, sure, but the code complexity of dealing with the >>> multiple kinds in this way is quite real. >> >> I actually believe having value-kind can help reduce code complexity >> instead of the other way around. What is the extra complexity? >> >> thanks, >> >> David >> >> >>>Since I'm not convinced we're >>> getting real benefit from storing the kind, I don't believe this >>> trade-off is worth it. >>> >>>> thanks, >>>> >>>> David >
Ping? -----Original Message----- From: Betul Buyukkurt [mailto:betulb at codeaurora.org] Sent: Thursday, June 25, 2015 11:47 AM To: Xinliang David Li Cc: Justin Bogner; Betul Buyukkurt; llvmdev Subject: Re: [LLVMdev] IC profiling infrastructure> Justin, do you have more concerns on keeping value_kind?Hi Justin, The next CL's according to the merge plan all depend on the value_kind. Therefore both http://reviews.llvm.org/D10471 and http://reviews.llvm.org/D10674 introduce it. Could you please review these CL's? As David stated, we need to bring the value_kind discussion to finalization w/ a decision. Since the original IC profiling patches to today it's been over two months and we're still not arrived at a conclusion. -Betul> If there is still disagreement, can we agree to move on with it for > now ? After the initial version of the patches checked in, we can do > more serious testings with large apps and revisit this if there are > problems discovered. > > thanks, > > David > > > > On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li > <davidxl at google.com> > wrote: >>> But since the consumer is the frontend, and it knows which counts >>> are which, it knows the kind, no? I don't understand how storing the >>> kind info helps or hurts - it's just redundant. >> >> Yes, the frontend consumer knows, but the raw reader does not. It is >> natural to do those processing when processing the raw profile data >> (for instance when function pointer to name mapping still exists). >> >>> >>>>> >>>>> The other thing we should do is store which profiling options are >>>>> enabled, in both formats. When we specify profile-instr-use it'll >>>>> be less error prone if we can detect whether or not this includes >>>>> indirect call profiling without checking other options, and it's >>>>> probably a good idea for "llvm-profdata merge" to disallow merging >>>>> profiles that are gathering different sets of data. A bitfield >>>>> seems suitable for this. >>>> >>>> For value profiling, allowing profile-gen and profile-use passes >>>> using different options is a useful feature. Consider the following >>>> scenarios: >>>> 1) collect the same profile data once with all kinds of value profile >>>> data collected. The exact same profile data can be used in >>>> performance >>>> experiments with different kinds of value profiling >>>> enabled/disabled >>> >>> This use case is pretty easy to accomodate for with the model where >>> the profile stores its type. We could autodetect the type to decide >>> how to interpret the profile, and if more specific profile flags are >>> set simply ignore some of the data. The thing that the file format >>> storing the type gives us is that it's harder to misinterpret the >>> data by supplying the mismatching flags between reading and writing. >> >> Yes, this is achievable with some extra effort -- for instance by >> collecting the profile sites of all kinds regardless of the flags. >> After profiling matching, simply drop selected sites according to the >> flags. >> >> This adds complexity of the code. With value kinds encoded in profile >> data, the handling is much simpler -- each value profiler only needs >> to deal with its own kind. >> >>> >>>> 2) work around profile-use/value transformation bug selectively for >>>> some file without the need to change anything in instrumentation >>>> pass >>> >>> I'm not sure I understand what you mean here. >> >> Similar to the above, but for correctness. >> >>> >>>> Besides, with the latest patch, the value_kind is not recorded in >>>> each profile value thus the overhead is minimized. >>> >>> The overhead is low, sure, but the code complexity of dealing with >>> the multiple kinds in this way is quite real. >> >> I actually believe having value-kind can help reduce code complexity >> instead of the other way around. What is the extra complexity? >> >> thanks, >> >> David >> >> >>>Since I'm not convinced we're >>> getting real benefit from storing the kind, I don't believe this >>>trade-off is worth it. >>> >>>> thanks, >>>> >>>> David >
I still have some issues with what we're doing with value_kind here. Let me summarize the arguments I've heard for including the value_kind in various places in this set of changes: 1. We would like to be able to enable multiple types of value profiling at the same time. 1a. We would like to be able to selectively use only particular value kinds from a particular profile. 1b. We would like to be able to combine profiles with different sets of profiling options enabled. 2. We will need to know the value kind in order to parse the raw profile data. 3. We'd like to avoid needing to change the .profdata format when we add new value kinds, since it needs to stay backwards compatible. 4. The frontend work is simpler if the value kind is explicitly encoded. There are a couple of these points that certainly need to be addressed, and some that are a little bit less clear. A. We do need to handle (1) in some way, but the issue can be stated a little bit more generally. If there are optional parts of the profile data, we definitely need to know which options are enabled. B. (1a) is only an issue if you believe (4). This might be true or it might not, and I don't think we can make an informed decision until we're actually using multiple value kinds. For now there's just a bunch of code that's effectively useless - "if value kind is 2, do nothing". I'm opposed to adding this kind of practically unreachable code "just in case". It adds a kind of complexity and premature generality that, in my experience, will have to be completely re-done when we get to actually generalizing the code. C. (1b) is an interesting one. Is this actually a useful feature? If we want to combine multiple profiles with different subsets of profiling turned on, then we definitely need the information recorded somewhere. D. I have issues with (2). If the different value kinds actually have different structures and need to be read in / interpreted differently, then writing the support for them now simply won't work. We're reading them in as if they're interpreted in the exact same way as the indirect call profiling - if they aren't, then we have to rewrite this code when we add a new one anyway. What's the point of writing code that won't actually work when we try to use it? E. There is value in dealing with (3) as long as we're confident that we can get it right. If we're pretty confident that the value profile data structure we're encoding in the .profdata file is going to work for multiple value profile kinds, then it makes sense to allow multiple of them and encode which kinds they are. David, is "InstrProfValueRecord" in Betul's current patches going to work to record the other types of value profiling information you're planning on looking at? I'd also like to point out that the instrprof parts of LLVM try to be relatively frontend agnostic. We should strive to keep it such that the frontend is the piece that decides what a particular profiling kind means if possible, as long as the structure and way the data is laid out is suitable. This doesn't affect the current work very much, but it's a good idea to keep it in mind. So, with all of that said, I have a couple of suggestions on how to move forward. - I don't think we should store the kind in the raw profile yet. Let's keep this simple, since it's easy to change and will be easier to get right when we're implementing or experimenting with a second kind (based on (D)). - Assuming the answer to my question in (E) is "yes", let's go ahead and include a value kind for the value records in the .profdata format. I don't really like the current approach of an array of std::vectors, because I think we can do this more simply. I'll bring that up on the review thread. - In clang, we should error if we see value kinds we don't understand yet. I think that this addresses the short term concerns without forcing us to write unreachable/untestable code that may or may not work for the longer term concerns. It minimally handles (A ) in that the data that is there implies which features are being used, puts off (B) without making it harder to deal with when the time comes, allows us to implement (C) without much trouble if we want it, avoids doing work that could cause problems with (D), and handles (E) since avoiding it would mean more work later. What do you think? Xinliang David Li <davidxl at google.com> writes:> Justin, do you have more concerns on keeping value_kind? > > If there is still disagreement, can we agree to move on with it for > now ? After the initial version of the patches checked in, we can do > more serious testings with large apps and revisit this if there are > problems discovered. > > thanks, > > David > > > > On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li <davidxl at google.com> wrote: >>> But since the consumer is the frontend, and it knows which counts are >>> which, it knows the kind, no? I don't understand how storing the kind >>> info helps or hurts - it's just redundant. >> >> Yes, the frontend consumer knows, but the raw reader does not. It is >> natural to do those processing when processing the raw profile data >> (for instance when function pointer to name mapping still exists). >> >>> >>>>> >>>>> The other thing we should do is store which profiling options are >>>>> enabled, in both formats. When we specify profile-instr-use it'll be >>>>> less error prone if we can detect whether or not this includes indirect >>>>> call profiling without checking other options, and it's probably a good >>>>> idea for "llvm-profdata merge" to disallow merging profiles that are >>>>> gathering different sets of data. A bitfield seems suitable for this. >>>> >>>> For value profiling, allowing profile-gen and profile-use passes using >>>> different options is a useful feature. Consider the following >>>> scenarios: >>>> 1) collect the same profile data once with all kinds of value profile >>>> data collected. The exact same profile data can be used in performance >>>> experiments with different kinds of value profiling enabled/disabled >>> >>> This use case is pretty easy to accomodate for with the model where the >>> profile stores its type. We could autodetect the type to decide how to >>> interpret the profile, and if more specific profile flags are set simply >>> ignore some of the data. The thing that the file format storing the type >>> gives us is that it's harder to misinterpret the data by supplying the >>> mismatching flags between reading and writing. >> >> Yes, this is achievable with some extra effort -- for instance by >> collecting the profile sites of all kinds regardless of the flags. >> After profiling matching, simply drop selected sites according to the >> flags. >> >> This adds complexity of the code. With value kinds encoded in profile >> data, the handling is much simpler -- each value profiler only needs >> to deal with its own kind. >> >>> >>>> 2) work around profile-use/value transformation bug selectively for >>>> some file without the need to change anything in instrumentation pass >>> >>> I'm not sure I understand what you mean here. >> >> Similar to the above, but for correctness. >> >>> >>>> Besides, with the latest patch, the value_kind is not recorded in each >>>> profile value thus the overhead is minimized. >>> >>> The overhead is low, sure, but the code complexity of dealing with the >>> multiple kinds in this way is quite real. >> >> I actually believe having value-kind can help reduce code complexity >> instead of the other way around. What is the extra complexity? >> >> thanks, >> >> David >> >> >>>Since I'm not convinced we're >>> getting real benefit from storing the kind, I don't believe this >>> trade-off is worth it. >>> >>>> thanks, >>>> >>>> David
Justin, thanks for the reply. I would like to point out that value_kind is actually not really stored in the raw profile data (nor do we intend to do so), other than the minimal information about per value kind NumOfSites info in per-function profile header. Basically the data is organized per value kind instead of stored intermixed. That is about it. More replies below. On Mon, Jun 29, 2015 at 11:53 AM, Justin Bogner <mail at justinbogner.com> wrote:> I still have some issues with what we're doing with value_kind here. Let > me summarize the arguments I've heard for including the value_kind in > various places in this set of changes: > > 1. We would like to be able to enable multiple types of value profiling > at the same time. > > 1a. We would like to be able to selectively use only particular value > kinds from a particular profile. > > 1b. We would like to be able to combine profiles with different sets of > profiling options enabled. >yes.> 2. We will need to know the value kind in order to parse the raw profile > data. >Yes -- but we don't need to store value kind in order to do that -- we simply need to organize the profile data more structurally.> 3. We'd like to avoid needing to change the .profdata format when we add > new value kinds, since it needs to stay backwards compatible. > > 4. The frontend work is simpler if the value kind is explicitly encoded. > > There are a couple of these points that certainly need to be addressed, > and some that are a little bit less clear. > > A. We do need to handle (1) in some way, but the issue can be stated a > little bit more generally. If there are optional parts of the profile > data, we definitely need to know which options are enabled. > > B. (1a) is only an issue if you believe (4). This might be true or it might > not, and I don't think we can make an informed decision until we're > actually using multiple value kinds. For now there's just a bunch of > code that's effectively useless - "if value kind is 2, do nothing". I'm > opposed to adding this kind of practically unreachable code "just in > case". It adds a kind of complexity and premature generality that, in my > experience, will have to be completely re-done when we get to actually > generalizing the code.In terms of value profiling code, there is no need for the check you mentioned "if value kind is ..." at all, but 'attached' to the codegen code for language construct of interests, for instance, ... CodeGenFunction::EmitCall(...) { if (enable_indirect_profile) { ....profileIndirectCall(); // does both instrumentation and profile annotation depending on the context. } ... } Organizing the profile data according value profile kind allows profile data to be independently loaded and used: .. CodeGenPGO::assignRegionCounters (...) { if (enable_indirect_profile) { loadValueProfile(VK_indirect_call); } if (enable_xxx_profile) { loadXXXProfile(VK_xxx); } } if we end up with inter-mixing all value kind profile data, we will have to load all value profile data into memory regardless of the option specified. It also makes the profile data uses really hard -- how do we easily match the value profile site with the profile data counter index if profile kinds are selectively enabled? Besides, in the profile-use pass, each value kind's in-memory format of profile data may be slightly different, so the loader will have to do different things ..> > C. (1b) is an interesting one. Is this actually a useful feature? If we > want to combine multiple profiles with different subsets of profiling > turned on, then we definitely need the information recorded > somewhere.I think it is a good flexibility to have.> > D. I have issues with (2). If the different value kinds actually have > different structures and need to be read in / interpreted > differently, then writing the support for them now simply won't > work.We choose to have very compact form representing the raw value data for all value kinds, but to support efficient use of value profile of different kinds, the raw profile reader has to do something different here. For instance, indirect call profile data needs the translation of the raw address which other value kinds do not care. I think this is something we can do without.>We're reading them in as if they're interpreted in the exact > same way as the indirect call profiling - if they aren't, then we > have to rewrite this code when we add a new one anyway. What's the > point of writing code that won't actually work when we try to use it?I don't think this would be a problem as adding a new kind requires client/use side change too, otherwise we won't be able to use it.> > E. There is value in dealing with (3) as long as we're confident that we > can get it right. If we're pretty confident that the value profile > data structure we're encoding in the .profdata file is going to work > for multiple value profile kinds, then it makes sense to allow > multiple of them and encode which kinds they are. David, is > "InstrProfValueRecord" in Betul's current patches going to work to > record the other types of value profiling information you're planning > on looking at?The raw profile format is certainly good for other kinds. InstrProfValueRecord is mostly there (for instance overloading the Name field for other purpose, but this is minor and can be done as follow up when support of other kinds are added), so I am ok to leave it as it is now.> > I'd also like to point out that the instrprof parts of LLVM try to be > relatively frontend agnostic. We should strive to keep it such that the > frontend is the piece that decides what a particular profiling kind > means if possible, as long as the structure and way the data is laid out > is suitable. This doesn't affect the current work very much, but it's a > good idea to keep it in mind.yes. See the above -- FE code does not really need to care about the format.> > So, with all of that said, I have a couple of suggestions on how to move > forward. > > - I don't think we should store the kind in the raw profile yet. Let's > keep this simple, since it's easy to change and will be easier to get > right when we're implementing or experimenting with a second kind > (based on (D)). >See my reply above -- we don't really store the value kind in raw profile, but we need to organize the profile data according to the kind in order to process (read) and use (profile-use) them efficiently.> - Assuming the answer to my question in (E) is "yes", let's go ahead and > include a value kind for the value records in the .profdata format.yes.>I > don't really like the current approach of an array of std::vectors, > because I think we can do this more simply. I'll bring that up on the > review thread. > > - In clang, we should error if we see value kinds we don't understand > yet.yes.> > I think that this addresses the short term concerns without forcing us > to write unreachable/untestable code that may or may not work for the > longer term concerns. It minimally handles (A ) in that the data that is > there implies which features are being used, puts off (B) without making > it harder to deal with when the time comes, allows us to implement (C) > without much trouble if we want it, avoids doing work that could cause > problems with (D), and handles (E) since avoiding it would mean more > work later. > > What do you think?See my reply above. For now, I think the things we need to organize the value profile data in raw profile according to kind (not storing) -- I think it will handle all the cases where you have concerns. thanks, David> > Xinliang David Li <davidxl at google.com> writes: >> Justin, do you have more concerns on keeping value_kind? >> >> If there is still disagreement, can we agree to move on with it for >> now ? After the initial version of the patches checked in, we can do >> more serious testings with large apps and revisit this if there are >> problems discovered. >> >> thanks, >> >> David >> >> >> >> On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li <davidxl at google.com> wrote: >>>> But since the consumer is the frontend, and it knows which counts are >>>> which, it knows the kind, no? I don't understand how storing the kind >>>> info helps or hurts - it's just redundant. >>> >>> Yes, the frontend consumer knows, but the raw reader does not. It is >>> natural to do those processing when processing the raw profile data >>> (for instance when function pointer to name mapping still exists). >>> >>>> >>>>>> >>>>>> The other thing we should do is store which profiling options are >>>>>> enabled, in both formats. When we specify profile-instr-use it'll be >>>>>> less error prone if we can detect whether or not this includes indirect >>>>>> call profiling without checking other options, and it's probably a good >>>>>> idea for "llvm-profdata merge" to disallow merging profiles that are >>>>>> gathering different sets of data. A bitfield seems suitable for this. >>>>> >>>>> For value profiling, allowing profile-gen and profile-use passes using >>>>> different options is a useful feature. Consider the following >>>>> scenarios: >>>>> 1) collect the same profile data once with all kinds of value profile >>>>> data collected. The exact same profile data can be used in performance >>>>> experiments with different kinds of value profiling enabled/disabled >>>> >>>> This use case is pretty easy to accomodate for with the model where the >>>> profile stores its type. We could autodetect the type to decide how to >>>> interpret the profile, and if more specific profile flags are set simply >>>> ignore some of the data. The thing that the file format storing the type >>>> gives us is that it's harder to misinterpret the data by supplying the >>>> mismatching flags between reading and writing. >>> >>> Yes, this is achievable with some extra effort -- for instance by >>> collecting the profile sites of all kinds regardless of the flags. >>> After profiling matching, simply drop selected sites according to the >>> flags. >>> >>> This adds complexity of the code. With value kinds encoded in profile >>> data, the handling is much simpler -- each value profiler only needs >>> to deal with its own kind. >>> >>>> >>>>> 2) work around profile-use/value transformation bug selectively for >>>>> some file without the need to change anything in instrumentation pass >>>> >>>> I'm not sure I understand what you mean here. >>> >>> Similar to the above, but for correctness. >>> >>>> >>>>> Besides, with the latest patch, the value_kind is not recorded in each >>>>> profile value thus the overhead is minimized. >>>> >>>> The overhead is low, sure, but the code complexity of dealing with the >>>> multiple kinds in this way is quite real. >>> >>> I actually believe having value-kind can help reduce code complexity >>> instead of the other way around. What is the extra complexity? >>> >>> thanks, >>> >>> David >>> >>> >>>>Since I'm not convinced we're >>>> getting real benefit from storing the kind, I don't believe this >>>> trade-off is worth it. >>>> >>>>> thanks, >>>>> >>>>> David