Xinliang David Li <davidxl at google.com> writes:> On Mon, Jun 15, 2015 at 5:35 PM, Justin Bogner <mail at justinbogner.com> wrote: >> "Betul Buyukkurt" <betulb at codeaurora.org> writes: >>>> - We don't need to store the value profiling kind in the data at all. >>>> The frontend knows which invocations of the intrinsic are for each kind >>>> implicitly, much like it knows the difference between a counter for an >>>> "if" and a "for" apart implicitly. Just store one set of profiling data >>>> and let the frontend sort it out. >>> >>> I think so too. However, value_kind should stay around until the kinds of >>> expressions whose values are computed are included in the hash >>> computation. Otherwise, if a value-profiled expression is to be removed >>> from source, then all the rest of the values would get attached to the MD >>> fields of wrong types. Currently hash checking verifies if the counter >>> assigned regions in the source has been changed across profile consumption >>> runs. >> >> This isn't a good reason to keep value_kind - the function hash not >> reflecting these kinds of changes is a problem regardless of whether or >> not we have multiple types. We should just add hash computations for >> these indirect call counters. As long as we only add this when indirect >> profiling is turned on it won't invalidate old hashes so it's trivial to >> stay backwards compatible. > > I think there are other better reasons why keeping value kind is > desirable. Different kinds of value profile data needs different > 'preprocessing' during reading. For instance, indirect call target > needs to translate target address into function name. For length data, > we may want to compute average length; for data address, we may want > to compute the min-alignment for each site. It will be hard to do so > with out value kind info.But since the consumer is the frontend, and it knows which counts are which, it knows the kind, no? I don't understand how storing the kind info helps or hurts - it's just redundant.>> >> The other thing we should do is store which profiling options are >> enabled, in both formats. When we specify profile-instr-use it'll be >> less error prone if we can detect whether or not this includes indirect >> call profiling without checking other options, and it's probably a good >> idea for "llvm-profdata merge" to disallow merging profiles that are >> gathering different sets of data. A bitfield seems suitable for this. > > For value profiling, allowing profile-gen and profile-use passes using > different options is a useful feature. Consider the following > scenarios: > 1) collect the same profile data once with all kinds of value profile > data collected. The exact same profile data can be used in performance > experiments with different kinds of value profiling enabled/disabledThis use case is pretty easy to accomodate for with the model where the profile stores its type. We could autodetect the type to decide how to interpret the profile, and if more specific profile flags are set simply ignore some of the data. The thing that the file format storing the type gives us is that it's harder to misinterpret the data by supplying the mismatching flags between reading and writing.> 2) work around profile-use/value transformation bug selectively for > some file without the need to change anything in instrumentation passI'm not sure I understand what you mean here.> Besides, with the latest patch, the value_kind is not recorded in each > profile value thus the overhead is minimized.The overhead is low, sure, but the code complexity of dealing with the multiple kinds in this way is quite real. Since I'm not convinced we're getting real benefit from storing the kind, I don't believe this trade-off is worth it.> thanks, > > David
> Xinliang David Li <davidxl at google.com> writes: >> On Mon, Jun 15, 2015 at 5:35 PM, Justin Bogner <mail at justinbogner.com> >> wrote: >>> "Betul Buyukkurt" <betulb at codeaurora.org> writes: >>>>> - We don't need to store the value profiling kind in the data at all. >>>>> The frontend knows which invocations of the intrinsic are for each >>>>> kind >>>>> implicitly, much like it knows the difference between a counter for >>>>> an >>>>> "if" and a "for" apart implicitly. Just store one set of profiling >>>>> data >>>>> and let the frontend sort it out. >>>> >>>> I think so too. However, value_kind should stay around until the kinds >>>> of >>>> expressions whose values are computed are included in the hash >>>> computation. Otherwise, if a value-profiled expression is to be >>>> removed >>>> from source, then all the rest of the values would get attached to the >>>> MD >>>> fields of wrong types. Currently hash checking verifies if the counter >>>> assigned regions in the source has been changed across profile >>>> consumption >>>> runs. >>> >>> This isn't a good reason to keep value_kind - the function hash not >>> reflecting these kinds of changes is a problem regardless of whether or >>> not we have multiple types. We should just add hash computations for >>> these indirect call counters. As long as we only add this when indirect >>> profiling is turned on it won't invalidate old hashes so it's trivial >>> to >>> stay backwards compatible. >> >> I think there are other better reasons why keeping value kind is >> desirable. Different kinds of value profile data needs different >> 'preprocessing' during reading. For instance, indirect call target >> needs to translate target address into function name. For length data, >> we may want to compute average length; for data address, we may want >> to compute the min-alignment for each site. It will be hard to do so >> with out value kind info. > > But since the consumer is the frontend, and it knows which counts are > which, it knows the kind, no? I don't understand how storing the kind > info helps or hurts - it's just redundant. > >>> >>> The other thing we should do is store which profiling options are >>> enabled, in both formats. When we specify profile-instr-use it'll be >>> less error prone if we can detect whether or not this includes indirect >>> call profiling without checking other options, and it's probably a good >>> idea for "llvm-profdata merge" to disallow merging profiles that are >>> gathering different sets of data. A bitfield seems suitable for this. >> >> For value profiling, allowing profile-gen and profile-use passes using >> different options is a useful feature. Consider the following >> scenarios: >> 1) collect the same profile data once with all kinds of value profile >> data collected. The exact same profile data can be used in >> performance >> experiments with different kinds of value profiling enabled/disabled > > This use case is pretty easy to accomodate for with the model where the > profile stores its type. We could autodetect the type to decide how to > interpret the profile, and if more specific profile flags are set simply > ignore some of the data. The thing that the file format storing the type > gives us is that it's harder to misinterpret the data by supplying the > mismatching flags between reading and writing.How would you plan to merge profile files where one profile is collected w/ one value_kind turned on with another file where another value_kind is turned on. This might be a valid use case. The value_kinds at this time are not stored per raw value in the raw profile files. The newer is design is extremely compact. Having the value_kind is extremely useful when doing the merges and when processing (i.e. derivation of function names from target addresses) data differently per value_kind. You may read the design discussions in the past threads.>> 2) work around profile-use/value transformation bug selectively for >> some file without the need to change anything in instrumentation pass > > I'm not sure I understand what you mean here. > >> Besides, with the latest patch, the value_kind is not recorded in each >> profile value thus the overhead is minimized. > > The overhead is low, sure, but the code complexity of dealing with the > multiple kinds in this way is quite real. Since I'm not convinced we're > getting real benefit from storing the kind, I don't believe this > trade-off is worth it. > >> thanks, >> >> David >
> But since the consumer is the frontend, and it knows which counts are > which, it knows the kind, no? I don't understand how storing the kind > info helps or hurts - it's just redundant.Yes, the frontend consumer knows, but the raw reader does not. It is natural to do those processing when processing the raw profile data (for instance when function pointer to name mapping still exists).> >>> >>> The other thing we should do is store which profiling options are >>> enabled, in both formats. When we specify profile-instr-use it'll be >>> less error prone if we can detect whether or not this includes indirect >>> call profiling without checking other options, and it's probably a good >>> idea for "llvm-profdata merge" to disallow merging profiles that are >>> gathering different sets of data. A bitfield seems suitable for this. >> >> For value profiling, allowing profile-gen and profile-use passes using >> different options is a useful feature. Consider the following >> scenarios: >> 1) collect the same profile data once with all kinds of value profile >> data collected. The exact same profile data can be used in performance >> experiments with different kinds of value profiling enabled/disabled > > This use case is pretty easy to accomodate for with the model where the > profile stores its type. We could autodetect the type to decide how to > interpret the profile, and if more specific profile flags are set simply > ignore some of the data. The thing that the file format storing the type > gives us is that it's harder to misinterpret the data by supplying the > mismatching flags between reading and writing.Yes, this is achievable with some extra effort -- for instance by collecting the profile sites of all kinds regardless of the flags. After profiling matching, simply drop selected sites according to the flags. This adds complexity of the code. With value kinds encoded in profile data, the handling is much simpler -- each value profiler only needs to deal with its own kind.> >> 2) work around profile-use/value transformation bug selectively for >> some file without the need to change anything in instrumentation pass > > I'm not sure I understand what you mean here.Similar to the above, but for correctness.> >> Besides, with the latest patch, the value_kind is not recorded in each >> profile value thus the overhead is minimized. > > The overhead is low, sure, but the code complexity of dealing with the > multiple kinds in this way is quite real.I actually believe having value-kind can help reduce code complexity instead of the other way around. What is the extra complexity? thanks, David>Since I'm not convinced we're > getting real benefit from storing the kind, I don't believe this > trade-off is worth it. > >> thanks, >> >> David
Justin, do you have more concerns on keeping value_kind? If there is still disagreement, can we agree to move on with it for now ? After the initial version of the patches checked in, we can do more serious testings with large apps and revisit this if there are problems discovered. thanks, David On Mon, Jun 15, 2015 at 10:47 PM, Xinliang David Li <davidxl at google.com> wrote:>> But since the consumer is the frontend, and it knows which counts are >> which, it knows the kind, no? I don't understand how storing the kind >> info helps or hurts - it's just redundant. > > Yes, the frontend consumer knows, but the raw reader does not. It is > natural to do those processing when processing the raw profile data > (for instance when function pointer to name mapping still exists). > >> >>>> >>>> The other thing we should do is store which profiling options are >>>> enabled, in both formats. When we specify profile-instr-use it'll be >>>> less error prone if we can detect whether or not this includes indirect >>>> call profiling without checking other options, and it's probably a good >>>> idea for "llvm-profdata merge" to disallow merging profiles that are >>>> gathering different sets of data. A bitfield seems suitable for this. >>> >>> For value profiling, allowing profile-gen and profile-use passes using >>> different options is a useful feature. Consider the following >>> scenarios: >>> 1) collect the same profile data once with all kinds of value profile >>> data collected. The exact same profile data can be used in performance >>> experiments with different kinds of value profiling enabled/disabled >> >> This use case is pretty easy to accomodate for with the model where the >> profile stores its type. We could autodetect the type to decide how to >> interpret the profile, and if more specific profile flags are set simply >> ignore some of the data. The thing that the file format storing the type >> gives us is that it's harder to misinterpret the data by supplying the >> mismatching flags between reading and writing. > > Yes, this is achievable with some extra effort -- for instance by > collecting the profile sites of all kinds regardless of the flags. > After profiling matching, simply drop selected sites according to the > flags. > > This adds complexity of the code. With value kinds encoded in profile > data, the handling is much simpler -- each value profiler only needs > to deal with its own kind. > >> >>> 2) work around profile-use/value transformation bug selectively for >>> some file without the need to change anything in instrumentation pass >> >> I'm not sure I understand what you mean here. > > Similar to the above, but for correctness. > >> >>> Besides, with the latest patch, the value_kind is not recorded in each >>> profile value thus the overhead is minimized. >> >> The overhead is low, sure, but the code complexity of dealing with the >> multiple kinds in this way is quite real. > > I actually believe having value-kind can help reduce code complexity > instead of the other way around. What is the extra complexity? > > thanks, > > David > > >>Since I'm not convinced we're >> getting real benefit from storing the kind, I don't believe this >> trade-off is worth it. >> >>> thanks, >>> >>> David