Than McIntosh via llvm-dev
2021-Oct-04 15:52 UTC
[llvm-dev] RFC: A binary serialization format for MemProf
>>I don't think the gc compiler even involves llvm as it is written in Go.Correct.>>I'm not personally very familiar with Go compiler toolchains and theirroadmaps, but Than can probably comment. I don't see any reason why something similar to what Teresa and Snehasish are proposing couldn't be implemented for the Go gc-based toolchain (with a significant amount of effort)-- from my reading it looks fairly language independent. True, as previously pointed out, the gc-based Go toolchain currently doesn't support ASAN and lacks any sort of PGO/FDO capability, but this is not written in stone. FDO support, along with improving the compiler back end to exploit profile data (via inlining, basic block layout, etc) is something that could be added if need be. Go's priorities have simply been different from those of C/C++.>IMHO, there is an intrinsic value of data formats being unified amongdifferent toolchains -- as very well demonstrated by DWARF Comparison with DWARF seems a bit odd here. I agree that unified formats can be useful, but I would point out that there is a great deal of administrative overhead associated with standards like DWARF (committee meetings, heavyweight processes for reaching consensus on new features, release cycles measured in years, etc). Go (for example) uses its own object file format, as opposed to using an existing standard format (e.g. ELF or PE/COFF). The ability to modify and evolve the object file format is a huge enabler when it comes to rolling out new features. It was a key element in the last two big Go projects I've worked on; had we been stuck with an existing object file format, the work would have been much more difficult. Than On Mon, Oct 4, 2021 at 10:55 AM Teresa Johnson <tejohnson at google.com> wrote:> +Than McIntosh <thanm at google.com> again to comment on the gc question > below. > > On Mon, Oct 4, 2021 at 2:38 AM Andrey Bokhanko <andreybokhanko at gmail.com> > wrote: > >> Thanks Teresa and others for the clarification! >> >> On Fri, Oct 1, 2021 at 8:32 PM Teresa Johnson <tejohnson at google.com> >> wrote: >> >>> I was going to respond similarly, and add a note that it isn't clear >>> that gollvm (LLVM-based Go compiler) supports either PGO or the sanitizers, >>> so that may be more difficult than Rust which does. As Snehasish notes, we >>> are focused on C/C++, but this will all be done in the LLVM IR level and >>> should be language independent in theory. >>> >> >> Let me note that I specifically meant gc (Google's standard Go compiler), >> not gollvm. IMHO, there is an intrinsic value of data formats being unified >> among different toolchains -- as very well demonstrated by DWARF. >> >> (Yes, I'm aware that gc doesn't support even ages-long instruction >> profiling. One of the reasons is the apparent lack of implemented >> optimizations that can directly benefit from profiling. In case of memory >> profiling, the use case is clear. Also, given that BOLT helps Go a lot (up >> to +20% speed-up on our internal tests), I expect the same for memory >> profiling, which will warrant extending gc capabilities to use MemProf >> format.) >> > > I don't think the gc compiler even involves llvm as it is written in Go. > So that's definitely outside the scope of our work. I'm not personally very > familiar with Go compiler toolchains and their roadmaps, but Than can > probably comment. > > Teresa > > >> Yours, >> Andrey >> >> >>> Teresa >>> >>> On Fri, Oct 1, 2021 at 10:25 AM Snehasish Kumar <snehasishk at google.com> >>> wrote: >>> >>>> Hi Andrey, >>>> >>>> The serialization format is language independent, though our focus is >>>> C/C++. Note that our instrumentation is based on the LLVM sanitizer >>>> infrastructure and should work for Rust (supports building with sanitizers >>>> [1]). We have not considered using the data profile for non-C/C++ codes. >>>> >>>> Regards, >>>> Snehasish >>>> >>>> [1] >>>> https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html >>>> >>>> On Fri, Oct 1, 2021 at 9:14 AM Andrey Bokhanko < >>>> andreybokhanko at gmail.com> wrote: >>>> >>>>> Hi Snehasish, David and Theresa, >>>>> >>>>> I'm really glad to see the steady progress in this area! >>>>> >>>>> It looks like the format is pretty much language independent >>>>> (correct?) -- so it can be applied not only to C/C++, but other >>>>> languages (Rust) and even toolchains (Go) as well? If you have already >>>>> considered using data profile for non-C/C++, may I kindly ask you to >>>>> share your thoughts on this? >>>>> >>>>> Yours, >>>>> Andrey >>>>> ==>>>>> Advanced Software Technology Lab >>>>> Huawei >>>>> >>>>> On Thu, Sep 30, 2021 at 1:17 AM Snehasish Kumar <snehasishk at google.com> >>>>> wrote: >>>>> > >>>>> >>>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>> >> > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211004/c7305168/attachment-0001.html>
Snehasish Kumar via llvm-dev
2021-Oct-05 00:37 UTC
[llvm-dev] RFC: A binary serialization format for MemProf
Hi Hongtao,> How are recursive allocation contexts stored? Wondering if there’s anyrecursive compression performed. For example, a tree-based construction algorithm may create tree nodes recursively. Is each tree node object modeled by its unique dynamic context? There is no special handling of recursive calling contexts, we store the entire unique dynamic calling context as the identifier.> Will the contexts of a leaf function merged during compilation when theleaf function is not inlined? If so, where does the merging happen? During compilation, each allocation site may be annotated with one or more heap allocation info blocks each identified by a unique dynamic calling context. We will not merge heap profile information across unique contexts as one of our immediate goals is to distinguish between hot and cold allocation contexts. The mechanism to distinguish the allocation contexts involve cloning or parameterization and Teresa will present the details in an upcoming RFC. On Mon, Oct 4, 2021 at 8:53 AM Than McIntosh <thanm at google.com> wrote:> > >>I don't think the gc compiler even involves llvm as it is written in Go. > > Correct. > > >>I'm not personally very familiar with Go compiler toolchains and their > roadmaps, but Than can probably comment. > > I don't see any reason why something similar to what Teresa and Snehasish > are proposing couldn't be implemented for the Go gc-based toolchain (with a > significant amount of effort)-- from my reading it looks fairly language > independent. > > True, as previously pointed out, the gc-based Go toolchain currently > doesn't support ASAN and lacks any sort of PGO/FDO capability, but this is > not written in stone. FDO support, along with improving the compiler back > end to exploit profile data (via inlining, basic block layout, etc) is > something that could be added if need be. Go's priorities have simply been > different from those of C/C++. > > >IMHO, there is an intrinsic value of data formats being unified among > different toolchains -- as very well demonstrated by DWARF > > Comparison with DWARF seems a bit odd here. I agree that unified formats > can be useful, but I would point out that there is a great deal of > administrative overhead associated with standards like DWARF (committee > meetings, heavyweight processes for reaching consensus on new features, > release cycles measured in years, etc). > > Go (for example) uses its own object file format, as opposed to using an > existing standard format (e.g. ELF or PE/COFF). The ability to modify and > evolve the object file format is a huge enabler when it comes to rolling > out new features. It was a key element in the last two big Go projects > I've worked on; had we been stuck with an existing object file format, the > work would have been much more difficult. > > Than > > On Mon, Oct 4, 2021 at 10:55 AM Teresa Johnson <tejohnson at google.com> > wrote: > >> +Than McIntosh <thanm at google.com> again to comment on the gc question >> below. >> >> On Mon, Oct 4, 2021 at 2:38 AM Andrey Bokhanko <andreybokhanko at gmail.com> >> wrote: >> >>> Thanks Teresa and others for the clarification! >>> >>> On Fri, Oct 1, 2021 at 8:32 PM Teresa Johnson <tejohnson at google.com> >>> wrote: >>> >>>> I was going to respond similarly, and add a note that it isn't clear >>>> that gollvm (LLVM-based Go compiler) supports either PGO or the sanitizers, >>>> so that may be more difficult than Rust which does. As Snehasish notes, we >>>> are focused on C/C++, but this will all be done in the LLVM IR level and >>>> should be language independent in theory. >>>> >>> >>> Let me note that I specifically meant gc (Google's standard Go >>> compiler), not gollvm. IMHO, there is an intrinsic value of data formats >>> being unified among different toolchains -- as very well demonstrated by >>> DWARF. >>> >>> (Yes, I'm aware that gc doesn't support even ages-long instruction >>> profiling. One of the reasons is the apparent lack of implemented >>> optimizations that can directly benefit from profiling. In case of memory >>> profiling, the use case is clear. Also, given that BOLT helps Go a lot (up >>> to +20% speed-up on our internal tests), I expect the same for memory >>> profiling, which will warrant extending gc capabilities to use MemProf >>> format.) >>> >> >> I don't think the gc compiler even involves llvm as it is written in Go. >> So that's definitely outside the scope of our work. I'm not personally very >> familiar with Go compiler toolchains and their roadmaps, but Than can >> probably comment. >> >> Teresa >> >> >>> Yours, >>> Andrey >>> >>> >>>> Teresa >>>> >>>> On Fri, Oct 1, 2021 at 10:25 AM Snehasish Kumar <snehasishk at google.com> >>>> wrote: >>>> >>>>> Hi Andrey, >>>>> >>>>> The serialization format is language independent, though our focus is >>>>> C/C++. Note that our instrumentation is based on the LLVM sanitizer >>>>> infrastructure and should work for Rust (supports building with sanitizers >>>>> [1]). We have not considered using the data profile for non-C/C++ codes. >>>>> >>>>> Regards, >>>>> Snehasish >>>>> >>>>> [1] >>>>> https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html >>>>> >>>>> On Fri, Oct 1, 2021 at 9:14 AM Andrey Bokhanko < >>>>> andreybokhanko at gmail.com> wrote: >>>>> >>>>>> Hi Snehasish, David and Theresa, >>>>>> >>>>>> I'm really glad to see the steady progress in this area! >>>>>> >>>>>> It looks like the format is pretty much language independent >>>>>> (correct?) -- so it can be applied not only to C/C++, but other >>>>>> languages (Rust) and even toolchains (Go) as well? If you have already >>>>>> considered using data profile for non-C/C++, may I kindly ask you to >>>>>> share your thoughts on this? >>>>>> >>>>>> Yours, >>>>>> Andrey >>>>>> ==>>>>>> Advanced Software Technology Lab >>>>>> Huawei >>>>>> >>>>>> On Thu, Sep 30, 2021 at 1:17 AM Snehasish Kumar < >>>>>> snehasishk at google.com> wrote: >>>>>> > >>>>>> >>>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> >>> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211004/07b0f027/attachment.html>