Charles Saternos via llvm-dev
2017-Jun-09 19:58 UTC
[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
OK, I tested the hotness, and it works.> I'd ask Charles to take it over. I think it just needs a test case and anupdate to the usage message. Sure - I've added the message and a quick test. The patch is here: https://reviews.llvm.org/D34063 Thanks, Charles On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> I'd ask Charles to take it over. I think it just needs a test case and an > update to the usage message. > > Peter > > On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Great! For the hotness, try creating a small test case with a very hot >> loop that iterates many times. Let me know if you are still having trouble. >> While the llvm-dis serialization is being discussed, I suppose at the very >> least this can go in with the rest of the existing YAML summary dumping and >> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you >> want to add that to llvm-lto2 so that we have a way of dumping the existing >> YAML supported summary info to stdout, or would you rather have Charles >> take that one over and submit it (probably just needs a test case). >> >> Teresa >> >> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos < >> charles.saternos at gmail.com> wrote: >> >>> Hey Teresa, >>> >>> I've updated the YAML to include the names and GUIDs for all >>> functions/global vars/aliases. I've also added the hotness info to the >>> output, but for some reason, none of my tests when running with FDO gave >>> anything besides Unknown. I'll be looking more into this tomorrow. >>> >>> Here's the current format: >>> >>> > ../build/bin/llvm-lto2 dump-summary b.o >>> --- >>> NamedGlobalValueMap: >>> : >>> - GUID: 3762489268811518743 >>> Kind: GlobalVar >>> Linkage: PrivateLinkage >>> NotEligibleToImport: true >>> Live: false >>> cold: >>> - GUID: 11668175513417606517 >>> Kind: Function >>> Linkage: ExternalLinkage >>> NotEligibleToImport: true >>> Live: false >>> InstCount: 5 >>> Calls: >>> - Name: puts >>> GUID: 8979701042202144121 >>> Hotness: Unknown >>> fib: >>> - GUID: 8667248078361406812 >>> Kind: Function >>> Linkage: ExternalLinkage >>> NotEligibleToImport: true >>> Live: false >>> InstCount: 26 >>> Calls: >>> - Name: fib >>> GUID: 8667248078361406812 >>> Hotness: Unknown >>> hot: >>> - GUID: 10177652421713147431 >>> Kind: Function >>> Linkage: ExternalLinkage >>> NotEligibleToImport: true >>> Live: false >>> InstCount: 14 >>> Calls: >>> - Name: fib >>> GUID: 8667248078361406812 >>> Hotness: Unknown >>> - Name: printf >>> GUID: 7383291119112528047 >>> Hotness: Unknown >>> llvm.used: >>> - GUID: 15665353970260777610 >>> Kind: GlobalVar >>> Linkage: AppendingLinkage >>> NotEligibleToImport: true >>> Live: true >>> TypeIdMap: >>> WithGlobalValueDeadStripping: false >>> ... >>> >>> Thanks, >>> Charles >>> >>> >>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com> >>> wrote: >>> >>>> >>>> >>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos < >>>> charles.saternos at gmail.com> wrote: >>>> >>>>> Alright, now it outputs YAML in the following format: >>>>> >>>>> --- >>>>> NamedGlobalValueMap: >>>>> X: >>>>> - Kind: GlobalVar >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: false >>>>> Live: false >>>>> a: >>>>> - Kind: Alias >>>>> Linkage: WeakAnyLinkage >>>>> NotEligibleToImport: false >>>>> Live: false >>>>> AliaseeGUID: 1881667236089500162 >>>>> afun: >>>>> - Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: false >>>>> Live: false >>>>> InstCount: 2 >>>>> testtest: >>>>> - Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: false >>>>> Live: false >>>>> InstCount: 2 >>>>> Calls: >>>>> - Function: 14471680721094503013 >>>>> TypeIdMap: >>>>> WithGlobalValueDeadStripping: false >>>>> ... >>>>> >>>>> Any thoughts on the new format? >>>>> >>>> >>>> Thanks, Charles. The main improvement I think we would want is to >>>> output value names instead of the GUID. Can you build up a map from GUID -> >>>> name ahead of time and use those like you were for your initial patch? >>>> Actually, I also think it would be useful to emit both the GUID and the >>>> name, since the combined index will eventually only have the GUID, so this >>>> would give a mapping to use for at least the visual inspection of the >>>> combined index. >>>> >>>> Also, would be good to see an example with FDO, to make sure the >>>> hotness info of the calls is emitted. >>>> >>>> Teresa >>>> >>>> >>>>> Thanks, >>>>> Charles >>>>> >>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev < >>>>>>>> llvm-dev at lists.llvm.org>: >>>>>>>> >>>>>>>>> I know there's been a bunch of discussion here already, but I was >>>>>>>>> wondering if perhaps someone (probably Teresa? Peter?) could: >>>>>>>>> >>>>>>>>> 1) summarize the current state >>>>>>>>> 2) describe the end-goal >>>>>>>>> 3) describe what steps (& how this patch relates) are planned to >>>>>>>>> get to (2) >>>>>>>>> >>>>>>>>> My naive thoughts, not being intimately familiar with any of this: >>>>>>>>> Usually bitcode and textual IR support go in together or around the same >>>>>>>>> time, and designed that way from the start (take r211920 for examaple, >>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to >>>>>>>>> have been an oversight in the implementation of IR summaries (is that an >>>>>>>>> accurate representation/statement?) >>>>>>>>> >>>>>>>> >>>>>>>> More or less: it was not an oversight. >>>>>>>> The summaries are not really part of the IR, it is more like an >>>>>>>> "analysis result" that is serialized. It can always be recomputed from the >>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result >>>>>>>> that I know of that we serialize. >>>>>>>> >>>>>>> >>>>>>> The use list work seems pretty similar in some ways (granted, can't >>>>>>> be recomputed to match, hence the desire to serialize it for test case >>>>>>> implementation). >>>>>>> >>>>>> >>>>>> I see use-list as a leaky implementation detail of the IR that we >>>>>> serialized because it impact the processing of the IR. >>>>>> >>>>>> Summaries are more like serializing the CFG for example. >>>>>> >>>>>> >>>>>>> But it looks like the same is true here to a degree - there are test >>>>>>> cases that exercise the summary handling, so they want summaries for input >>>>>>> (for now, I think, I've seen test cases that run another LLVM tool to >>>>>>> insert/create a summary to then feed that back in for a test), or to test >>>>>>> that the resulting summary is correct. >>>>>>> >>>>>> >>>>>> We have cases were we want summaries as an input and check a combined >>>>>> summary as an output, and for these having the YAML representation will be >>>>>> useful (we didn't have it before). >>>>>> >>>>>> >>>>>>> >>>>>>> Can summaries be standalone? I thought they could (that'd be ideal >>>>>>> for the distributed situation - only the summary needs to go to the 'thin >>>>>>> link' step, I think? (currently maybe only the debug info is stripped for >>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I >>>>>>> would think) >>>>>>> >>>>>> >>>>>> Yes conceptually they can be standalone. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> & now there's an effort to correct that. >>>>>>>>> >>>>>>>> >>>>>>>> The main motivation here, I believe, is more to help dev to have >>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect >>>>>>>> separately summaries is a pain. >>>>>>>> >>>>>>> >>>>>>> Not sure I quite follow - inspect separately? >>>>>>> >>>>>> >>>>>> llvm-dis does not display summaries today, so you can't just use >>>>>> llvm-dis like a "regular" flow. >>>>>> >>>>>> >>>>>>> How are they inspected today? >>>>>>> >>>>>> >>>>>> llvm-bcanalyzer? And now the YAML dump as well. >>>>>> >>>>>> >>>>>>> & also, I think there are test cases that want to/are currently >>>>>>> testing summary input but do so somewhat awkwardly by using another tool to >>>>>>> produce the summary first. Ideally the test case would have the summary >>>>>>> written in to start, I would think, if that's a codepath worth testing? >>>>>>> >>>>>> >>>>>> The IR already contains all the information, so why repeating it? >>>>>> This makes the test case harder to maintain, in the vast majority, I expect >>>>>> that if a test needs IR then it shouldn't need to include a summary as well >>>>>> (and vice-versa). >>>>>> >>>>>> In the majority of test we have we want to check if the importing >>>>>> does what it is supposed to do, and if the linkage are correctly adjusted. >>>>>> With a YAML (or other) serialization for the summaries this could indeed >>>>>> been done purely with summaries, without any IR involved. >>>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> - Dave >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Mehdi >>>>>>>> >>>>>>>> So it seems like that would start with a discussion of what the >>>>>>>>> right end-state would be: What the syntax in textual IR should be, then >>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's >>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on >>>>>>>>> either side seems fine - implementing the emission first (hidden behind a >>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing >>>>>>>>> first (no need to hide it behind any flags - manually written examples can >>>>>>>>> be used as input tests). >>>>>>>>> >>>>>>>>> (& it sounds like there's some partially implemented functionality >>>>>>>>> using a YAML format that was intended to address how some test cases could >>>>>>>>> be written? & this might be a good basis for the syntax - but seems to me >>>>>>>>> like it might be a bit disjointed/out of place in the textual IR format >>>>>>>>> that's not otherwise YAML-based?) >>>>>>>>> >>>>>>>>> - Dave >>>>>>>>> >>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module >>>>>>>>>> summary in the llvm-dis utility: >>>>>>>>>> >>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll >>>>>>>>>> ; ModuleID = '2.o' >>>>>>>>>> source_filename = "2.ll" >>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >>>>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>>>> >>>>>>>>>> @X = constant i32 42, section "foo", align 4 >>>>>>>>>> >>>>>>>>>> @a = weak alias i32, i32* @X >>>>>>>>>> >>>>>>>>>> define void @afun() { >>>>>>>>>> %1 = load i32, i32* @a >>>>>>>>>> ret void >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> define void @testtest() { >>>>>>>>>> tail call void @boop() >>>>>>>>>> ret void >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> declare void @boop() >>>>>>>>>> >>>>>>>>>> ; Module summary: >>>>>>>>>> ; testtest (External linkage) >>>>>>>>>> ; Function (2 instructions) >>>>>>>>>> ; Calls: boop >>>>>>>>>> ; X (External linkage) >>>>>>>>>> ; Global Variable >>>>>>>>>> ; afun (External linkage) >>>>>>>>>> ; Function (2 instructions) >>>>>>>>>> ; Refs: >>>>>>>>>> ; a >>>>>>>>>> ; a (Weak any linkage) >>>>>>>>>> ; Alias (aliasee X) >>>>>>>>>> >>>>>>>>>> I've implemented the above format in the llvm-dis utility, since >>>>>>>>>> there currently isn't really a way of getting ThinLTO summaries in a >>>>>>>>>> human-readable format. >>>>>>>>>> >>>>>>>>>> Let me know what you think of this format, and what information >>>>>>>>>> you think should be added/removed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Charles >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> 408-460-2413 <(408)%20460-2413> >>>> >>> >>> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> 408-460-2413 <(408)%20460-2413> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > -- > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170609/2db5579b/attachment.html>
Charles Saternos via llvm-dev
2017-Jul-17 13:11 UTC
[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
Hey @chandlerc and @dblaikie, Any updates on this in relation to "[PATCH] D34080: [ThinLTO] Add dump-summary command to llvm-lto2 tool"? Thanks, Charles On Fri, Jun 9, 2017 at 3:58 PM, Charles Saternos <charles.saternos at gmail.com> wrote:> OK, I tested the hotness, and it works. > > > I'd ask Charles to take it over. I think it just needs a test case and > an update to the usage message. > > Sure - I've added the message and a quick test. The patch is here: > https://reviews.llvm.org/D34063 > > Thanks, > Charles > > On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > >> I'd ask Charles to take it over. I think it just needs a test case and an >> update to the usage message. >> >> Peter >> >> On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Great! For the hotness, try creating a small test case with a very hot >>> loop that iterates many times. Let me know if you are still having trouble. >>> While the llvm-dis serialization is being discussed, I suppose at the very >>> least this can go in with the rest of the existing YAML summary dumping and >>> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you >>> want to add that to llvm-lto2 so that we have a way of dumping the existing >>> YAML supported summary info to stdout, or would you rather have Charles >>> take that one over and submit it (probably just needs a test case). >>> >>> Teresa >>> >>> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos < >>> charles.saternos at gmail.com> wrote: >>> >>>> Hey Teresa, >>>> >>>> I've updated the YAML to include the names and GUIDs for all >>>> functions/global vars/aliases. I've also added the hotness info to the >>>> output, but for some reason, none of my tests when running with FDO gave >>>> anything besides Unknown. I'll be looking more into this tomorrow. >>>> >>>> Here's the current format: >>>> >>>> > ../build/bin/llvm-lto2 dump-summary b.o >>>> --- >>>> NamedGlobalValueMap: >>>> : >>>> - GUID: 3762489268811518743 >>>> Kind: GlobalVar >>>> Linkage: PrivateLinkage >>>> NotEligibleToImport: true >>>> Live: false >>>> cold: >>>> - GUID: 11668175513417606517 >>>> Kind: Function >>>> Linkage: ExternalLinkage >>>> NotEligibleToImport: true >>>> Live: false >>>> InstCount: 5 >>>> Calls: >>>> - Name: puts >>>> GUID: 8979701042202144121 >>>> Hotness: Unknown >>>> fib: >>>> - GUID: 8667248078361406812 >>>> Kind: Function >>>> Linkage: ExternalLinkage >>>> NotEligibleToImport: true >>>> Live: false >>>> InstCount: 26 >>>> Calls: >>>> - Name: fib >>>> GUID: 8667248078361406812 >>>> Hotness: Unknown >>>> hot: >>>> - GUID: 10177652421713147431 >>>> Kind: Function >>>> Linkage: ExternalLinkage >>>> NotEligibleToImport: true >>>> Live: false >>>> InstCount: 14 >>>> Calls: >>>> - Name: fib >>>> GUID: 8667248078361406812 >>>> Hotness: Unknown >>>> - Name: printf >>>> GUID: 7383291119112528047 >>>> Hotness: Unknown >>>> llvm.used: >>>> - GUID: 15665353970260777610 >>>> Kind: GlobalVar >>>> Linkage: AppendingLinkage >>>> NotEligibleToImport: true >>>> Live: true >>>> TypeIdMap: >>>> WithGlobalValueDeadStripping: false >>>> ... >>>> >>>> Thanks, >>>> Charles >>>> >>>> >>>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos < >>>>> charles.saternos at gmail.com> wrote: >>>>> >>>>>> Alright, now it outputs YAML in the following format: >>>>>> >>>>>> --- >>>>>> NamedGlobalValueMap: >>>>>> X: >>>>>> - Kind: GlobalVar >>>>>> Linkage: ExternalLinkage >>>>>> NotEligibleToImport: false >>>>>> Live: false >>>>>> a: >>>>>> - Kind: Alias >>>>>> Linkage: WeakAnyLinkage >>>>>> NotEligibleToImport: false >>>>>> Live: false >>>>>> AliaseeGUID: 1881667236089500162 >>>>>> afun: >>>>>> - Kind: Function >>>>>> Linkage: ExternalLinkage >>>>>> NotEligibleToImport: false >>>>>> Live: false >>>>>> InstCount: 2 >>>>>> testtest: >>>>>> - Kind: Function >>>>>> Linkage: ExternalLinkage >>>>>> NotEligibleToImport: false >>>>>> Live: false >>>>>> InstCount: 2 >>>>>> Calls: >>>>>> - Function: 14471680721094503013 >>>>>> TypeIdMap: >>>>>> WithGlobalValueDeadStripping: false >>>>>> ... >>>>>> >>>>>> Any thoughts on the new format? >>>>>> >>>>> >>>>> Thanks, Charles. The main improvement I think we would want is to >>>>> output value names instead of the GUID. Can you build up a map from GUID -> >>>>> name ahead of time and use those like you were for your initial patch? >>>>> Actually, I also think it would be useful to emit both the GUID and the >>>>> name, since the combined index will eventually only have the GUID, so this >>>>> would give a mapping to use for at least the visual inspection of the >>>>> combined index. >>>>> >>>>> Also, would be good to see an example with FDO, to make sure the >>>>> hotness info of the calls is emitted. >>>>> >>>>> Teresa >>>>> >>>>> >>>>>> Thanks, >>>>>> Charles >>>>>> >>>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org>: >>>>>>>>> >>>>>>>>>> I know there's been a bunch of discussion here already, but I was >>>>>>>>>> wondering if perhaps someone (probably Teresa? Peter?) could: >>>>>>>>>> >>>>>>>>>> 1) summarize the current state >>>>>>>>>> 2) describe the end-goal >>>>>>>>>> 3) describe what steps (& how this patch relates) are planned to >>>>>>>>>> get to (2) >>>>>>>>>> >>>>>>>>>> My naive thoughts, not being intimately familiar with any of >>>>>>>>>> this: Usually bitcode and textual IR support go in together or around the >>>>>>>>>> same time, and designed that way from the start (take r211920 for examaple, >>>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to >>>>>>>>>> have been an oversight in the implementation of IR summaries (is that an >>>>>>>>>> accurate representation/statement?) >>>>>>>>>> >>>>>>>>> >>>>>>>>> More or less: it was not an oversight. >>>>>>>>> The summaries are not really part of the IR, it is more like an >>>>>>>>> "analysis result" that is serialized. It can always be recomputed from the >>>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result >>>>>>>>> that I know of that we serialize. >>>>>>>>> >>>>>>>> >>>>>>>> The use list work seems pretty similar in some ways (granted, can't >>>>>>>> be recomputed to match, hence the desire to serialize it for test case >>>>>>>> implementation). >>>>>>>> >>>>>>> >>>>>>> I see use-list as a leaky implementation detail of the IR that we >>>>>>> serialized because it impact the processing of the IR. >>>>>>> >>>>>>> Summaries are more like serializing the CFG for example. >>>>>>> >>>>>>> >>>>>>>> But it looks like the same is true here to a degree - there are >>>>>>>> test cases that exercise the summary handling, so they want summaries for >>>>>>>> input (for now, I think, I've seen test cases that run another LLVM tool to >>>>>>>> insert/create a summary to then feed that back in for a test), or to test >>>>>>>> that the resulting summary is correct. >>>>>>>> >>>>>>> >>>>>>> We have cases were we want summaries as an input and check a >>>>>>> combined summary as an output, and for these having the YAML representation >>>>>>> will be useful (we didn't have it before). >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Can summaries be standalone? I thought they could (that'd be ideal >>>>>>>> for the distributed situation - only the summary needs to go to the 'thin >>>>>>>> link' step, I think? (currently maybe only the debug info is stripped for >>>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I >>>>>>>> would think) >>>>>>>> >>>>>>> >>>>>>> Yes conceptually they can be standalone. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> & now there's an effort to correct that. >>>>>>>>>> >>>>>>>>> >>>>>>>>> The main motivation here, I believe, is more to help dev to have >>>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect >>>>>>>>> separately summaries is a pain. >>>>>>>>> >>>>>>>> >>>>>>>> Not sure I quite follow - inspect separately? >>>>>>>> >>>>>>> >>>>>>> llvm-dis does not display summaries today, so you can't just use >>>>>>> llvm-dis like a "regular" flow. >>>>>>> >>>>>>> >>>>>>>> How are they inspected today? >>>>>>>> >>>>>>> >>>>>>> llvm-bcanalyzer? And now the YAML dump as well. >>>>>>> >>>>>>> >>>>>>>> & also, I think there are test cases that want to/are currently >>>>>>>> testing summary input but do so somewhat awkwardly by using another tool to >>>>>>>> produce the summary first. Ideally the test case would have the summary >>>>>>>> written in to start, I would think, if that's a codepath worth testing? >>>>>>>> >>>>>>> >>>>>>> The IR already contains all the information, so why repeating it? >>>>>>> This makes the test case harder to maintain, in the vast majority, I expect >>>>>>> that if a test needs IR then it shouldn't need to include a summary as well >>>>>>> (and vice-versa). >>>>>>> >>>>>>> In the majority of test we have we want to check if the importing >>>>>>> does what it is supposed to do, and if the linkage are correctly adjusted. >>>>>>> With a YAML (or other) serialization for the summaries this could indeed >>>>>>> been done purely with summaries, without any IR involved. >>>>>>> >>>>>>> -- >>>>>>> Mehdi >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> - Dave >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mehdi >>>>>>>>> >>>>>>>>> So it seems like that would start with a discussion of what the >>>>>>>>>> right end-state would be: What the syntax in textual IR should be, then >>>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's >>>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on >>>>>>>>>> either side seems fine - implementing the emission first (hidden behind a >>>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing >>>>>>>>>> first (no need to hide it behind any flags - manually written examples can >>>>>>>>>> be used as input tests). >>>>>>>>>> >>>>>>>>>> (& it sounds like there's some partially implemented >>>>>>>>>> functionality using a YAML format that was intended to address how some >>>>>>>>>> test cases could be written? & this might be a good basis for the syntax - >>>>>>>>>> but seems to me like it might be a bit disjointed/out of place in the >>>>>>>>>> textual IR format that's not otherwise YAML-based?) >>>>>>>>>> >>>>>>>>>> - Dave >>>>>>>>>> >>>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev < >>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>> >>>>>>>>>>> Hey all, >>>>>>>>>>> >>>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module >>>>>>>>>>> summary in the llvm-dis utility: >>>>>>>>>>> >>>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll >>>>>>>>>>> ; ModuleID = '2.o' >>>>>>>>>>> source_filename = "2.ll" >>>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>>>>> >>>>>>>>>>> @X = constant i32 42, section "foo", align 4 >>>>>>>>>>> >>>>>>>>>>> @a = weak alias i32, i32* @X >>>>>>>>>>> >>>>>>>>>>> define void @afun() { >>>>>>>>>>> %1 = load i32, i32* @a >>>>>>>>>>> ret void >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> define void @testtest() { >>>>>>>>>>> tail call void @boop() >>>>>>>>>>> ret void >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> declare void @boop() >>>>>>>>>>> >>>>>>>>>>> ; Module summary: >>>>>>>>>>> ; testtest (External linkage) >>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>> ; Calls: boop >>>>>>>>>>> ; X (External linkage) >>>>>>>>>>> ; Global Variable >>>>>>>>>>> ; afun (External linkage) >>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>> ; Refs: >>>>>>>>>>> ; a >>>>>>>>>>> ; a (Weak any linkage) >>>>>>>>>>> ; Alias (aliasee X) >>>>>>>>>>> >>>>>>>>>>> I've implemented the above format in the llvm-dis utility, since >>>>>>>>>>> there currently isn't really a way of getting ThinLTO summaries in a >>>>>>>>>>> human-readable format. >>>>>>>>>>> >>>>>>>>>>> Let me know what you think of this format, and what information >>>>>>>>>>> you think should be added/removed. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Charles >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>> 408-460-2413 <(408)%20460-2413> >>>>> >>>> >>>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>> 408-460-2413 <(408)%20460-2413> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >> >> -- >> -- >> Peter >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/5de3e17e/attachment-0001.html>
Charles Saternos via llvm-dev
2017-Jul-17 13:12 UTC
[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
btw, the YAML format is specified in this patch: https://reviews.llvm.org/D34063 (you can see it in the test case for the patch). On Mon, Jul 17, 2017 at 9:11 AM, Charles Saternos < charles.saternos at gmail.com> wrote:> Hey @chandlerc and @dblaikie, > > Any updates on this in relation to "[PATCH] D34080: [ThinLTO] Add > dump-summary command to llvm-lto2 tool"? > > Thanks, > Charles > > On Fri, Jun 9, 2017 at 3:58 PM, Charles Saternos < > charles.saternos at gmail.com> wrote: > >> OK, I tested the hotness, and it works. >> >> > I'd ask Charles to take it over. I think it just needs a test case and >> an update to the usage message. >> >> Sure - I've added the message and a quick test. The patch is here: >> https://reviews.llvm.org/D34063 >> >> Thanks, >> Charles >> >> On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk> >> wrote: >> >>> I'd ask Charles to take it over. I think it just needs a test case and >>> an update to the usage message. >>> >>> Peter >>> >>> On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Great! For the hotness, try creating a small test case with a very hot >>>> loop that iterates many times. Let me know if you are still having trouble. >>>> While the llvm-dis serialization is being discussed, I suppose at the very >>>> least this can go in with the rest of the existing YAML summary dumping and >>>> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you >>>> want to add that to llvm-lto2 so that we have a way of dumping the existing >>>> YAML supported summary info to stdout, or would you rather have Charles >>>> take that one over and submit it (probably just needs a test case). >>>> >>>> Teresa >>>> >>>> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos < >>>> charles.saternos at gmail.com> wrote: >>>> >>>>> Hey Teresa, >>>>> >>>>> I've updated the YAML to include the names and GUIDs for all >>>>> functions/global vars/aliases. I've also added the hotness info to the >>>>> output, but for some reason, none of my tests when running with FDO gave >>>>> anything besides Unknown. I'll be looking more into this tomorrow. >>>>> >>>>> Here's the current format: >>>>> >>>>> > ../build/bin/llvm-lto2 dump-summary b.o >>>>> --- >>>>> NamedGlobalValueMap: >>>>> : >>>>> - GUID: 3762489268811518743 >>>>> Kind: GlobalVar >>>>> Linkage: PrivateLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> cold: >>>>> - GUID: 11668175513417606517 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 5 >>>>> Calls: >>>>> - Name: puts >>>>> GUID: 8979701042202144121 >>>>> Hotness: Unknown >>>>> fib: >>>>> - GUID: 8667248078361406812 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 26 >>>>> Calls: >>>>> - Name: fib >>>>> GUID: 8667248078361406812 >>>>> Hotness: Unknown >>>>> hot: >>>>> - GUID: 10177652421713147431 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 14 >>>>> Calls: >>>>> - Name: fib >>>>> GUID: 8667248078361406812 >>>>> Hotness: Unknown >>>>> - Name: printf >>>>> GUID: 7383291119112528047 >>>>> Hotness: Unknown >>>>> llvm.used: >>>>> - GUID: 15665353970260777610 >>>>> Kind: GlobalVar >>>>> Linkage: AppendingLinkage >>>>> NotEligibleToImport: true >>>>> Live: true >>>>> TypeIdMap: >>>>> WithGlobalValueDeadStripping: false >>>>> ... >>>>> >>>>> Thanks, >>>>> Charles >>>>> >>>>> >>>>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos < >>>>>> charles.saternos at gmail.com> wrote: >>>>>> >>>>>>> Alright, now it outputs YAML in the following format: >>>>>>> >>>>>>> --- >>>>>>> NamedGlobalValueMap: >>>>>>> X: >>>>>>> - Kind: GlobalVar >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> a: >>>>>>> - Kind: Alias >>>>>>> Linkage: WeakAnyLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> AliaseeGUID: 1881667236089500162 >>>>>>> afun: >>>>>>> - Kind: Function >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> InstCount: 2 >>>>>>> testtest: >>>>>>> - Kind: Function >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> InstCount: 2 >>>>>>> Calls: >>>>>>> - Function: 14471680721094503013 >>>>>>> TypeIdMap: >>>>>>> WithGlobalValueDeadStripping: false >>>>>>> ... >>>>>>> >>>>>>> Any thoughts on the new format? >>>>>>> >>>>>> >>>>>> Thanks, Charles. The main improvement I think we would want is to >>>>>> output value names instead of the GUID. Can you build up a map from GUID -> >>>>>> name ahead of time and use those like you were for your initial patch? >>>>>> Actually, I also think it would be useful to emit both the GUID and the >>>>>> name, since the combined index will eventually only have the GUID, so this >>>>>> would give a mapping to use for at least the visual inspection of the >>>>>> combined index. >>>>>> >>>>>> Also, would be good to see an example with FDO, to make sure the >>>>>> hotness info of the calls is emitted. >>>>>> >>>>>> Teresa >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Charles >>>>>>> >>>>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev < >>>>>>>>>> llvm-dev at lists.llvm.org>: >>>>>>>>>> >>>>>>>>>>> I know there's been a bunch of discussion here already, but I >>>>>>>>>>> was wondering if perhaps someone (probably Teresa? Peter?) could: >>>>>>>>>>> >>>>>>>>>>> 1) summarize the current state >>>>>>>>>>> 2) describe the end-goal >>>>>>>>>>> 3) describe what steps (& how this patch relates) are planned to >>>>>>>>>>> get to (2) >>>>>>>>>>> >>>>>>>>>>> My naive thoughts, not being intimately familiar with any of >>>>>>>>>>> this: Usually bitcode and textual IR support go in together or around the >>>>>>>>>>> same time, and designed that way from the start (take r211920 for examaple, >>>>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to >>>>>>>>>>> have been an oversight in the implementation of IR summaries (is that an >>>>>>>>>>> accurate representation/statement?) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> More or less: it was not an oversight. >>>>>>>>>> The summaries are not really part of the IR, it is more like an >>>>>>>>>> "analysis result" that is serialized. It can always be recomputed from the >>>>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result >>>>>>>>>> that I know of that we serialize. >>>>>>>>>> >>>>>>>>> >>>>>>>>> The use list work seems pretty similar in some ways (granted, >>>>>>>>> can't be recomputed to match, hence the desire to serialize it for test >>>>>>>>> case implementation). >>>>>>>>> >>>>>>>> >>>>>>>> I see use-list as a leaky implementation detail of the IR that we >>>>>>>> serialized because it impact the processing of the IR. >>>>>>>> >>>>>>>> Summaries are more like serializing the CFG for example. >>>>>>>> >>>>>>>> >>>>>>>>> But it looks like the same is true here to a degree - there are >>>>>>>>> test cases that exercise the summary handling, so they want summaries for >>>>>>>>> input (for now, I think, I've seen test cases that run another LLVM tool to >>>>>>>>> insert/create a summary to then feed that back in for a test), or to test >>>>>>>>> that the resulting summary is correct. >>>>>>>>> >>>>>>>> >>>>>>>> We have cases were we want summaries as an input and check a >>>>>>>> combined summary as an output, and for these having the YAML representation >>>>>>>> will be useful (we didn't have it before). >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Can summaries be standalone? I thought they could (that'd be ideal >>>>>>>>> for the distributed situation - only the summary needs to go to the 'thin >>>>>>>>> link' step, I think? (currently maybe only the debug info is stripped for >>>>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I >>>>>>>>> would think) >>>>>>>>> >>>>>>>> >>>>>>>> Yes conceptually they can be standalone. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> & now there's an effort to correct that. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The main motivation here, I believe, is more to help dev to have >>>>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect >>>>>>>>>> separately summaries is a pain. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Not sure I quite follow - inspect separately? >>>>>>>>> >>>>>>>> >>>>>>>> llvm-dis does not display summaries today, so you can't just use >>>>>>>> llvm-dis like a "regular" flow. >>>>>>>> >>>>>>>> >>>>>>>>> How are they inspected today? >>>>>>>>> >>>>>>>> >>>>>>>> llvm-bcanalyzer? And now the YAML dump as well. >>>>>>>> >>>>>>>> >>>>>>>>> & also, I think there are test cases that want to/are currently >>>>>>>>> testing summary input but do so somewhat awkwardly by using another tool to >>>>>>>>> produce the summary first. Ideally the test case would have the summary >>>>>>>>> written in to start, I would think, if that's a codepath worth testing? >>>>>>>>> >>>>>>>> >>>>>>>> The IR already contains all the information, so why repeating it? >>>>>>>> This makes the test case harder to maintain, in the vast majority, I expect >>>>>>>> that if a test needs IR then it shouldn't need to include a summary as well >>>>>>>> (and vice-versa). >>>>>>>> >>>>>>>> In the majority of test we have we want to check if the importing >>>>>>>> does what it is supposed to do, and if the linkage are correctly adjusted. >>>>>>>> With a YAML (or other) serialization for the summaries this could indeed >>>>>>>> been done purely with summaries, without any IR involved. >>>>>>>> >>>>>>>> -- >>>>>>>> Mehdi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> - Dave >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Mehdi >>>>>>>>>> >>>>>>>>>> So it seems like that would start with a discussion of what the >>>>>>>>>>> right end-state would be: What the syntax in textual IR should be, then >>>>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's >>>>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on >>>>>>>>>>> either side seems fine - implementing the emission first (hidden behind a >>>>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing >>>>>>>>>>> first (no need to hide it behind any flags - manually written examples can >>>>>>>>>>> be used as input tests). >>>>>>>>>>> >>>>>>>>>>> (& it sounds like there's some partially implemented >>>>>>>>>>> functionality using a YAML format that was intended to address how some >>>>>>>>>>> test cases could be written? & this might be a good basis for the syntax - >>>>>>>>>>> but seems to me like it might be a bit disjointed/out of place in the >>>>>>>>>>> textual IR format that's not otherwise YAML-based?) >>>>>>>>>>> >>>>>>>>>>> - Dave >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev < >>>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey all, >>>>>>>>>>>> >>>>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module >>>>>>>>>>>> summary in the llvm-dis utility: >>>>>>>>>>>> >>>>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll >>>>>>>>>>>> ; ModuleID = '2.o' >>>>>>>>>>>> source_filename = "2.ll" >>>>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >>>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>>>>>> >>>>>>>>>>>> @X = constant i32 42, section "foo", align 4 >>>>>>>>>>>> >>>>>>>>>>>> @a = weak alias i32, i32* @X >>>>>>>>>>>> >>>>>>>>>>>> define void @afun() { >>>>>>>>>>>> %1 = load i32, i32* @a >>>>>>>>>>>> ret void >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> define void @testtest() { >>>>>>>>>>>> tail call void @boop() >>>>>>>>>>>> ret void >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> declare void @boop() >>>>>>>>>>>> >>>>>>>>>>>> ; Module summary: >>>>>>>>>>>> ; testtest (External linkage) >>>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>>> ; Calls: boop >>>>>>>>>>>> ; X (External linkage) >>>>>>>>>>>> ; Global Variable >>>>>>>>>>>> ; afun (External linkage) >>>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>>> ; Refs: >>>>>>>>>>>> ; a >>>>>>>>>>>> ; a (Weak any linkage) >>>>>>>>>>>> ; Alias (aliasee X) >>>>>>>>>>>> >>>>>>>>>>>> I've implemented the above format in the llvm-dis utility, >>>>>>>>>>>> since there currently isn't really a way of getting ThinLTO summaries in a >>>>>>>>>>>> human-readable format. >>>>>>>>>>>> >>>>>>>>>>>> Let me know what you think of this format, and what information >>>>>>>>>>>> you think should be added/removed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Charles >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>>> 408-460-2413 <(408)%20460-2413> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> 408-460-2413 <(408)%20460-2413> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> >>> -- >>> -- >>> Peter >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/f07e0cfd/attachment.html>
David Blaikie via llvm-dev
2017-Jul-17 23:49 UTC
[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hey @chandlerc and @dblaikie, > > Any updates on this in relation to "[PATCH] D34080: [ThinLTO] Add > dump-summary command to llvm-lto2 tool"? >Sorry you've kind of got stuck in the middle of this - but I'm still hoping to hear/understand the pushback on implementing this as a first class .ll construct with serialization and deserialization support. I think Peter mentioned he didn't think this was the right path forward in the long term? If that's the case, I'd like to understand that/reach that conclusion for the project now rather than treating this as a stop-gap with some idea that in the future someone might implement full serialization support (when it's been over a year already, and other stop gaps have been implemented (the yaml input support) already). & if a .ll construct with serialization/deserialization is the path forward, understanding the motivation for a something other than going straight for that would be helpful - usually bitcode features come with .ll support from day 1, not a year later. I'm not clear on what would make this feature an exception/more expensive to do this for (& why it would be worth deferring that work, and what/when that work will be motivated/done) - Dave> > Thanks, > Charles > > On Fri, Jun 9, 2017 at 3:58 PM, Charles Saternos < > charles.saternos at gmail.com> wrote: > >> OK, I tested the hotness, and it works. >> >> > I'd ask Charles to take it over. I think it just needs a test case and >> an update to the usage message. >> >> Sure - I've added the message and a quick test. The patch is here: >> https://reviews.llvm.org/D34063 >> >> Thanks, >> Charles >> >> On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk> >> wrote: >> >>> I'd ask Charles to take it over. I think it just needs a test case and >>> an update to the usage message. >>> >>> Peter >>> >>> On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Great! For the hotness, try creating a small test case with a very hot >>>> loop that iterates many times. Let me know if you are still having trouble. >>>> While the llvm-dis serialization is being discussed, I suppose at the very >>>> least this can go in with the rest of the existing YAML summary dumping and >>>> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you >>>> want to add that to llvm-lto2 so that we have a way of dumping the existing >>>> YAML supported summary info to stdout, or would you rather have Charles >>>> take that one over and submit it (probably just needs a test case). >>>> >>>> Teresa >>>> >>>> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos < >>>> charles.saternos at gmail.com> wrote: >>>> >>>>> Hey Teresa, >>>>> >>>>> I've updated the YAML to include the names and GUIDs for all >>>>> functions/global vars/aliases. I've also added the hotness info to the >>>>> output, but for some reason, none of my tests when running with FDO gave >>>>> anything besides Unknown. I'll be looking more into this tomorrow. >>>>> >>>>> Here's the current format: >>>>> >>>>> > ../build/bin/llvm-lto2 dump-summary b.o >>>>> --- >>>>> NamedGlobalValueMap: >>>>> : >>>>> - GUID: 3762489268811518743 >>>>> Kind: GlobalVar >>>>> Linkage: PrivateLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> cold: >>>>> - GUID: 11668175513417606517 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 5 >>>>> Calls: >>>>> - Name: puts >>>>> GUID: 8979701042202144121 >>>>> Hotness: Unknown >>>>> fib: >>>>> - GUID: 8667248078361406812 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 26 >>>>> Calls: >>>>> - Name: fib >>>>> GUID: 8667248078361406812 >>>>> Hotness: Unknown >>>>> hot: >>>>> - GUID: 10177652421713147431 >>>>> Kind: Function >>>>> Linkage: ExternalLinkage >>>>> NotEligibleToImport: true >>>>> Live: false >>>>> InstCount: 14 >>>>> Calls: >>>>> - Name: fib >>>>> GUID: 8667248078361406812 >>>>> Hotness: Unknown >>>>> - Name: printf >>>>> GUID: 7383291119112528047 >>>>> Hotness: Unknown >>>>> llvm.used: >>>>> - GUID: 15665353970260777610 >>>>> Kind: GlobalVar >>>>> Linkage: AppendingLinkage >>>>> NotEligibleToImport: true >>>>> Live: true >>>>> TypeIdMap: >>>>> WithGlobalValueDeadStripping: false >>>>> ... >>>>> >>>>> Thanks, >>>>> Charles >>>>> >>>>> >>>>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos < >>>>>> charles.saternos at gmail.com> wrote: >>>>>> >>>>>>> Alright, now it outputs YAML in the following format: >>>>>>> >>>>>>> --- >>>>>>> NamedGlobalValueMap: >>>>>>> X: >>>>>>> - Kind: GlobalVar >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> a: >>>>>>> - Kind: Alias >>>>>>> Linkage: WeakAnyLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> AliaseeGUID: 1881667236089500162 >>>>>>> afun: >>>>>>> - Kind: Function >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> InstCount: 2 >>>>>>> testtest: >>>>>>> - Kind: Function >>>>>>> Linkage: ExternalLinkage >>>>>>> NotEligibleToImport: false >>>>>>> Live: false >>>>>>> InstCount: 2 >>>>>>> Calls: >>>>>>> - Function: 14471680721094503013 >>>>>>> TypeIdMap: >>>>>>> WithGlobalValueDeadStripping: false >>>>>>> ... >>>>>>> >>>>>>> Any thoughts on the new format? >>>>>>> >>>>>> >>>>>> Thanks, Charles. The main improvement I think we would want is to >>>>>> output value names instead of the GUID. Can you build up a map from GUID -> >>>>>> name ahead of time and use those like you were for your initial patch? >>>>>> Actually, I also think it would be useful to emit both the GUID and the >>>>>> name, since the combined index will eventually only have the GUID, so this >>>>>> would give a mapping to use for at least the visual inspection of the >>>>>> combined index. >>>>>> >>>>>> Also, would be good to see an example with FDO, to make sure the >>>>>> hotness info of the calls is emitted. >>>>>> >>>>>> Teresa >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Charles >>>>>>> >>>>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev < >>>>>>>>>> llvm-dev at lists.llvm.org>: >>>>>>>>>> >>>>>>>>>>> I know there's been a bunch of discussion here already, but I >>>>>>>>>>> was wondering if perhaps someone (probably Teresa? Peter?) could: >>>>>>>>>>> >>>>>>>>>>> 1) summarize the current state >>>>>>>>>>> 2) describe the end-goal >>>>>>>>>>> 3) describe what steps (& how this patch relates) are planned to >>>>>>>>>>> get to (2) >>>>>>>>>>> >>>>>>>>>>> My naive thoughts, not being intimately familiar with any of >>>>>>>>>>> this: Usually bitcode and textual IR support go in together or around the >>>>>>>>>>> same time, and designed that way from the start (take r211920 for examaple, >>>>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to >>>>>>>>>>> have been an oversight in the implementation of IR summaries (is that an >>>>>>>>>>> accurate representation/statement?) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> More or less: it was not an oversight. >>>>>>>>>> The summaries are not really part of the IR, it is more like an >>>>>>>>>> "analysis result" that is serialized. It can always be recomputed from the >>>>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result >>>>>>>>>> that I know of that we serialize. >>>>>>>>>> >>>>>>>>> >>>>>>>>> The use list work seems pretty similar in some ways (granted, >>>>>>>>> can't be recomputed to match, hence the desire to serialize it for test >>>>>>>>> case implementation). >>>>>>>>> >>>>>>>> >>>>>>>> I see use-list as a leaky implementation detail of the IR that we >>>>>>>> serialized because it impact the processing of the IR. >>>>>>>> >>>>>>>> Summaries are more like serializing the CFG for example. >>>>>>>> >>>>>>>> >>>>>>>>> But it looks like the same is true here to a degree - there are >>>>>>>>> test cases that exercise the summary handling, so they want summaries for >>>>>>>>> input (for now, I think, I've seen test cases that run another LLVM tool to >>>>>>>>> insert/create a summary to then feed that back in for a test), or to test >>>>>>>>> that the resulting summary is correct. >>>>>>>>> >>>>>>>> >>>>>>>> We have cases were we want summaries as an input and check a >>>>>>>> combined summary as an output, and for these having the YAML representation >>>>>>>> will be useful (we didn't have it before). >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Can summaries be standalone? I thought they could (that'd be ideal >>>>>>>>> for the distributed situation - only the summary needs to go to the 'thin >>>>>>>>> link' step, I think? (currently maybe only the debug info is stripped for >>>>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I >>>>>>>>> would think) >>>>>>>>> >>>>>>>> >>>>>>>> Yes conceptually they can be standalone. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> & now there's an effort to correct that. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The main motivation here, I believe, is more to help dev to have >>>>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect >>>>>>>>>> separately summaries is a pain. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Not sure I quite follow - inspect separately? >>>>>>>>> >>>>>>>> >>>>>>>> llvm-dis does not display summaries today, so you can't just use >>>>>>>> llvm-dis like a "regular" flow. >>>>>>>> >>>>>>>> >>>>>>>>> How are they inspected today? >>>>>>>>> >>>>>>>> >>>>>>>> llvm-bcanalyzer? And now the YAML dump as well. >>>>>>>> >>>>>>>> >>>>>>>>> & also, I think there are test cases that want to/are currently >>>>>>>>> testing summary input but do so somewhat awkwardly by using another tool to >>>>>>>>> produce the summary first. Ideally the test case would have the summary >>>>>>>>> written in to start, I would think, if that's a codepath worth testing? >>>>>>>>> >>>>>>>> >>>>>>>> The IR already contains all the information, so why repeating it? >>>>>>>> This makes the test case harder to maintain, in the vast majority, I expect >>>>>>>> that if a test needs IR then it shouldn't need to include a summary as well >>>>>>>> (and vice-versa). >>>>>>>> >>>>>>>> In the majority of test we have we want to check if the importing >>>>>>>> does what it is supposed to do, and if the linkage are correctly adjusted. >>>>>>>> With a YAML (or other) serialization for the summaries this could indeed >>>>>>>> been done purely with summaries, without any IR involved. >>>>>>>> >>>>>>>> -- >>>>>>>> Mehdi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> - Dave >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Mehdi >>>>>>>>>> >>>>>>>>>> So it seems like that would start with a discussion of what the >>>>>>>>>>> right end-state would be: What the syntax in textual IR should be, then >>>>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's >>>>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on >>>>>>>>>>> either side seems fine - implementing the emission first (hidden behind a >>>>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing >>>>>>>>>>> first (no need to hide it behind any flags - manually written examples can >>>>>>>>>>> be used as input tests). >>>>>>>>>>> >>>>>>>>>>> (& it sounds like there's some partially implemented >>>>>>>>>>> functionality using a YAML format that was intended to address how some >>>>>>>>>>> test cases could be written? & this might be a good basis for the syntax - >>>>>>>>>>> but seems to me like it might be a bit disjointed/out of place in the >>>>>>>>>>> textual IR format that's not otherwise YAML-based?) >>>>>>>>>>> >>>>>>>>>>> - Dave >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev < >>>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey all, >>>>>>>>>>>> >>>>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module >>>>>>>>>>>> summary in the llvm-dis utility: >>>>>>>>>>>> >>>>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll >>>>>>>>>>>> ; ModuleID = '2.o' >>>>>>>>>>>> source_filename = "2.ll" >>>>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >>>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>>>>>> >>>>>>>>>>>> @X = constant i32 42, section "foo", align 4 >>>>>>>>>>>> >>>>>>>>>>>> @a = weak alias i32, i32* @X >>>>>>>>>>>> >>>>>>>>>>>> define void @afun() { >>>>>>>>>>>> %1 = load i32, i32* @a >>>>>>>>>>>> ret void >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> define void @testtest() { >>>>>>>>>>>> tail call void @boop() >>>>>>>>>>>> ret void >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> declare void @boop() >>>>>>>>>>>> >>>>>>>>>>>> ; Module summary: >>>>>>>>>>>> ; testtest (External linkage) >>>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>>> ; Calls: boop >>>>>>>>>>>> ; X (External linkage) >>>>>>>>>>>> ; Global Variable >>>>>>>>>>>> ; afun (External linkage) >>>>>>>>>>>> ; Function (2 instructions) >>>>>>>>>>>> ; Refs: >>>>>>>>>>>> ; a >>>>>>>>>>>> ; a (Weak any linkage) >>>>>>>>>>>> ; Alias (aliasee X) >>>>>>>>>>>> >>>>>>>>>>>> I've implemented the above format in the llvm-dis utility, >>>>>>>>>>>> since there currently isn't really a way of getting ThinLTO summaries in a >>>>>>>>>>>> human-readable format. >>>>>>>>>>>> >>>>>>>>>>>> Let me know what you think of this format, and what information >>>>>>>>>>>> you think should be added/removed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Charles >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>>> 408-460-2413 <(408)%20460-2413> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> 408-460-2413 <(408)%20460-2413> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> >>> -- >>> -- >>> Peter >>> >> >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/1b3bb3f5/attachment.html>