Hello, I have a question regarding PGO instrumented BBs (I use IR-level instrumentation). It seems that instrumented BBs do not match between the two compilations for profile-gen and profile-use for some cases. Here is an example from SPECcpu 2006 lbm (a simple case consisting of just two modules). In the first compilation, we have 5 instrumentation points for the main function as follows: $ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc -debug-only=pgo-instrumentation Dump Function main Hash: 61483163021 after CFGMST Number of Basic Blocks: 10 BB: FakeNode Index=0 BB: if.then Index=5 BB: for.body Index=4 BB: for.body.lr.ph Index=3 BB: entry Index=1 BB: for.inc Index=8 BB: if.then5 Index=7 BB: if.end Index=6 BB: for.end Index=2 BB: for.end.loopexit Index=9 Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) Edge 0: 8-->4 c W=247031 Edge 1: 6-->8 c W=159375 Edge 2: 4-->6 *c W=127500 Edge 3: 1-->2 c W=4500 Edge 4: 4-->5 W=127 Edge 5: 5-->6 * W=127 Edge 6: 6-->7 W=95 Edge 7: 7-->8 * W=95 Edge 8: 0-->1 W=12 Edge 9: 2-->0 * W=12 Edge 10: 3-->4 W=8 Edge 11: 9-->2 W=8 Edge 12: 1-->3 W=7 Edge 13: 8-->9 * W=7 Split critical edge: 4 --> 6 Adding Instrumentation in BB Name=for.body.if.end_crit_edge Adding Instrumentation in BB Name=if.then Adding Instrumentation in BB Name=if.then5 Adding Instrumentation in BB Name=for.end Adding Instrumentation in BB Name=for.end.loopexit After a training run, we get profile data for the main function as follows, but these count values are put into incorrect BBs in the second compilation. Block counts: [0, 300, 4, 1, 1] $ opt -analyze -pgo-instr-use _all_combined.bc -debug-only=pgo-instrumentation Dump Function main Hash: 61483163021 after CFGMST Number of Basic Blocks: 10 BB: FakeNode Index=0 BB: for.body.lr.ph Index=3 BB: if.end Index=6 BB: entry Index=1 BB: if.then Index=5 BB: for.body Index=4 BB: for.end.loopexit Index=9 BB: for.inc Index=8 BB: if.then5 Index=7 BB: for.end Index=2 Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) Edge 0: 8-->4 c W=247031 Edge 1: 6-->8 c W=159375 Edge 2: 4-->6 *c W=127500 Edge 3: 1-->2 c W=127058 Edge 4: 0-->1 W=135 Edge 5: 2-->0 * W=135 Edge 6: 4-->5 W=127 Edge 7: 5-->6 * W=127 Edge 8: 6-->7 W=95 Edge 9: 7-->8 * W=95 Edge 10: 3-->4 W=8 Edge 11: 9-->2 W=8 Edge 12: 1-->3 W=7 Edge 13: 8-->9 * W=7 5 counts 0: 0 1: 300 2: 4 3: 1 4: 1 SUM = 306 Split critical edge: 4 --> 6 Setting BB Name=for.body.if.end_crit_edge with CountValue=0 Setting BB Name=for.end with CountValue=300 Setting BB Name=if.then with CountValue=4 Setting BB Name=if.then5 with CountValue=1 Setting BB Name=for.end.loopexit with CountValue=1 The CountValue 300 should go to the BB=if.then (Index 5), not for.end (Index 2). Actually because of this incorrect setting, the entry count of the main function is set 300, instead of 1 (after populating the count values). The reason for this problem is that CFGMST edges are ordered in a different way due to different weight values (edges 0 --> 1 and 2 --> 0 get W=12 in the first compilation, while they get W=135 in the second compilation). The weight values are computed based on block frequency info and branch probability info, but somehow they produce different values between the two compilations. How can we assume that CFGMST is constructed in the same way between the two compilations so that we can always set profile results into correct basic blocks? Thank you, --Toshjio -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/c8bf9b83/attachment.html>
On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > I have a question regarding PGO instrumented BBs (I use IR-level > instrumentation). > > It seems that instrumented BBs do not match between the two compilations > for profile-gen and profile-use for some cases. Here is an example from > SPECcpu 2006 lbm (a simple case consisting of just two modules). > In the first compilation, we have 5 instrumentation points for the main > function as follows: > > $ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc > -debug-only=pgo-instrumentation > Dump Function main Hash: 61483163021 after CFGMST > Number of Basic Blocks: 10 > BB: FakeNode Index=0 > BB: if.then Index=5 > BB: for.body Index=4 > BB: for.body.lr.ph Index=3 > BB: entry Index=1 > BB: for.inc Index=8 > BB: if.then5 Index=7 > BB: if.end Index=6 > BB: for.end Index=2 > BB: for.end.loopexit Index=9 > Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) > Edge 0: 8-->4 c W=247031 > Edge 1: 6-->8 c W=159375 > Edge 2: 4-->6 *c W=127500 > Edge 3: 1-->2 c W=4500 > Edge 4: 4-->5 W=127 > Edge 5: 5-->6 * W=127 > Edge 6: 6-->7 W=95 > Edge 7: 7-->8 * W=95 > Edge 8: 0-->1 W=12 > Edge 9: 2-->0 * W=12 > Edge 10: 3-->4 W=8 > Edge 11: 9-->2 W=8 > Edge 12: 1-->3 W=7 > Edge 13: 8-->9 * W=7 > Split critical edge: 4 --> 6 > Adding Instrumentation in BB Name=for.body.if.end_crit_edge > Adding Instrumentation in BB Name=if.then > Adding Instrumentation in BB Name=if.then5 > Adding Instrumentation in BB Name=for.end > Adding Instrumentation in BB Name=for.end.loopexit > > After a training run, we get profile data for the main function as > follows, but these count values are put into incorrect BBs in the second > compilation. > Block counts: [0, 300, 4, 1, 1] > > $ opt -analyze -pgo-instr-use _all_combined.bc > -debug-only=pgo-instrumentation > Dump Function main Hash: 61483163021 after CFGMST > Number of Basic Blocks: 10 > BB: FakeNode Index=0 > BB: for.body.lr.ph Index=3 > BB: if.end Index=6 > BB: entry Index=1 > BB: if.then Index=5 > BB: for.body Index=4 > BB: for.end.loopexit Index=9 > BB: for.inc Index=8 > BB: if.then5 Index=7 > BB: for.end Index=2 > Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) > Edge 0: 8-->4 c W=247031 > Edge 1: 6-->8 c W=159375 > Edge 2: 4-->6 *c W=127500 > Edge 3: 1-->2 c W=127058 > Edge 4: 0-->1 W=135 > Edge 5: 2-->0 * W=135 > Edge 6: 4-->5 W=127 > Edge 7: 5-->6 * W=127 > Edge 8: 6-->7 W=95 > Edge 9: 7-->8 * W=95 > Edge 10: 3-->4 W=8 > Edge 11: 9-->2 W=8 > Edge 12: 1-->3 W=7 > Edge 13: 8-->9 * W=7 > 5 counts > 0: 0 > 1: 300 > 2: 4 > 3: 1 > 4: 1 > SUM = 306 > Split critical edge: 4 --> 6 > Setting BB Name=for.body.if.end_crit_edge with CountValue=0 > Setting BB Name=for.end with CountValue=300 > Setting BB Name=if.then with CountValue=4 > Setting BB Name=if.then5 with CountValue=1 > Setting BB Name=for.end.loopexit with CountValue=1 > > The CountValue 300 should go to the BB=if.then (Index 5), not for.end > (Index 2). Actually because of this incorrect setting, the entry count of > the main function is set 300, instead of 1 (after populating the count > values). > The reason for this problem is that CFGMST edges are ordered in a > different way due to different weight values (edges 0 --> 1 and 2 --> 0 get > W=12 in the first compilation, while they get W=135 in the second > compilation). The weight values are computed based on block frequency info > and branch probability info, but somehow they produce different values > between the two compilations. >Different BFI produced for otherwise identical compilation is a bug we should fix (can cause other problems too). Can you file a bug about it? thanks, David> > How can we assume that CFGMST is constructed in the same way between the > two compilations so that we can always set profile results into correct > basic blocks? > > Thank you, > --Toshjio > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/08858012/attachment.html>
Hi David, Thank you. I just submitted a bug report 27024 (PGO instrumentation profile data is not reflected in correct basic blocks). Thank you, --Toshio From: Xinliang David Li <xinliangli at gmail.com> To: Toshio Suganuma/Japan/IBM at IBMJP Cc: llvm-dev <llvm-dev at lists.llvm.org>, Rong Xu <xur at google.com> Date: 2016/03/22 12:04 Subject: Re: [llvm-dev] Instrumented BB in PGO On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-dev < llvm-dev at lists.llvm.org> wrote: Hello, I have a question regarding PGO instrumented BBs (I use IR-level instrumentation). It seems that instrumented BBs do not match between the two compilations for profile-gen and profile-use for some cases. Here is an example from SPECcpu 2006 lbm (a simple case consisting of just two modules). In the first compilation, we have 5 instrumentation points for the main function as follows: $ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc -debug-only=pgo-instrumentation Dump Function main Hash: 61483163021 after CFGMST Number of Basic Blocks: 10 BB: FakeNode Index=0 BB: if.then Index=5 BB: for.body Index=4 BB: for.body.lr.ph Index=3 BB: entry Index=1 BB: for.inc Index=8 BB: if.then5 Index=7 BB: if.end Index=6 BB: for.end Index=2 BB: for.end.loopexit Index=9 Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) Edge 0: 8-->4 c W=247031 Edge 1: 6-->8 c W=159375 Edge 2: 4-->6 *c W=127500 Edge 3: 1-->2 c W=4500 Edge 4: 4-->5 W=127 Edge 5: 5-->6 * W=127 Edge 6: 6-->7 W=95 Edge 7: 7-->8 * W=95 Edge 8: 0-->1 W=12 Edge 9: 2-->0 * W=12 Edge 10: 3-->4 W=8 Edge 11: 9-->2 W=8 Edge 12: 1-->3 W=7 Edge 13: 8-->9 * W=7 Split critical edge: 4 --> 6 Adding Instrumentation in BB Name=for.body.if.end_crit_edge Adding Instrumentation in BB Name=if.then Adding Instrumentation in BB Name=if.then5 Adding Instrumentation in BB Name=for.end Adding Instrumentation in BB Name=for.end.loopexit After a training run, we get profile data for the main function as follows, but these count values are put into incorrect BBs in the second compilation. Block counts: [0, 300, 4, 1, 1] $ opt -analyze -pgo-instr-use _all_combined.bc -debug-only=pgo-instrumentation Dump Function main Hash: 61483163021 after CFGMST Number of Basic Blocks: 10 BB: FakeNode Index=0 BB: for.body.lr.ph Index=3 BB: if.end Index=6 BB: entry Index=1 BB: if.then Index=5 BB: for.body Index=4 BB: for.end.loopexit Index=9 BB: for.inc Index=8 BB: if.then5 Index=7 BB: for.end Index=2 Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) Edge 0: 8-->4 c W=247031 Edge 1: 6-->8 c W=159375 Edge 2: 4-->6 *c W=127500 Edge 3: 1-->2 c W=127058 Edge 4: 0-->1 W=135 Edge 5: 2-->0 * W=135 Edge 6: 4-->5 W=127 Edge 7: 5-->6 * W=127 Edge 8: 6-->7 W=95 Edge 9: 7-->8 * W=95 Edge 10: 3-->4 W=8 Edge 11: 9-->2 W=8 Edge 12: 1-->3 W=7 Edge 13: 8-->9 * W=7 5 counts 0: 0 1: 300 2: 4 3: 1 4: 1 SUM = 306 Split critical edge: 4 --> 6 Setting BB Name=for.body.if.end_crit_edge with CountValue=0 Setting BB Name=for.end with CountValue=300 Setting BB Name=if.then with CountValue=4 Setting BB Name=if.then5 with CountValue=1 Setting BB Name=for.end.loopexit with CountValue=1 The CountValue 300 should go to the BB=if.then (Index 5), not for.end (Index 2). Actually because of this incorrect setting, the entry count of the main function is set 300, instead of 1 (after populating the count values). The reason for this problem is that CFGMST edges are ordered in a different way due to different weight values (edges 0 --> 1 and 2 --> 0 get W=12 in the first compilation, while they get W=135 in the second compilation). The weight values are computed based on block frequency info and branch probability info, but somehow they produce different values between the two compilations. Different BFI produced for otherwise identical compilation is a bug we should fix (can cause other problems too). Can you file a bug about it? thanks, David How can we assume that CFGMST is constructed in the same way between the two compilations so that we can always set profile results into correct basic blocks? Thank you, --Toshjio _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/d7621f33/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/d7621f33/attachment.gif>
thank you. I have assigned the bug to xur at . David On Mon, Mar 21, 2016 at 10:24 PM, Toshio Suganuma <SUGANUMA at jp.ibm.com> wrote:> Hi David, > > Thank you. > I just submitted a bug report 27024 (PGO instrumentation profile data is > not reflected in correct basic blocks). > > Thank you, > --Toshio > > [image: Inactive hide details for Xinliang David Li ---2016/03/22 > 12:04:10---On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-]Xinliang > David Li ---2016/03/22 12:04:10---On Mon, Mar 21, 2016 at 7:19 PM, Toshio > Suganuma via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > From: Xinliang David Li <xinliangli at gmail.com> > To: Toshio Suganuma/Japan/IBM at IBMJP > Cc: llvm-dev <llvm-dev at lists.llvm.org>, Rong Xu <xur at google.com> > Date: 2016/03/22 12:04 > Subject: Re: [llvm-dev] Instrumented BB in PGO > ------------------------------ > > > > > > On Mon, Mar 21, 2016 at 7:19 PM, Toshio Suganuma via llvm-dev < > *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote: > > Hello, > > I have a question regarding PGO instrumented BBs (I use IR-level > instrumentation). > > It seems that instrumented BBs do not match between the two > compilations for profile-gen and profile-use for some cases. Here is an > example from SPECcpu 2006 lbm (a simple case consisting of just two > modules). > In the first compilation, we have 5 instrumentation points for the > main function as follows: > > $ opt -pgo-instr-gen -instrprof _all_combined.bc -o > _all_combined_inst.bc -debug-only=pgo-instrumentation > Dump Function main Hash: 61483163021 after CFGMST > Number of Basic Blocks: 10 > BB: FakeNode Index=0 > BB: if.then Index=5 > BB: for.body Index=4 > BB: *for.body.lr.ph* <http://for.body.lr.ph/> Index=3 > BB: entry Index=1 > BB: for.inc Index=8 > BB: if.then5 Index=7 > BB: if.end Index=6 > BB: for.end Index=2 > BB: for.end.loopexit Index=9 > Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) > Edge 0: 8-->4 c W=247031 > Edge 1: 6-->8 c W=159375 > Edge 2: 4-->6 *c W=127500 > Edge 3: 1-->2 c W=4500 > Edge 4: 4-->5 W=127 > Edge 5: 5-->6 * W=127 > Edge 6: 6-->7 W=95 > Edge 7: 7-->8 * W=95 > Edge 8: 0-->1 W=12 > Edge 9: 2-->0 * W=12 > Edge 10: 3-->4 W=8 > Edge 11: 9-->2 W=8 > Edge 12: 1-->3 W=7 > Edge 13: 8-->9 * W=7 > Split critical edge: 4 --> 6 > Adding Instrumentation in BB Name=for.body.if.end_crit_edge > Adding Instrumentation in BB Name=if.then > Adding Instrumentation in BB Name=if.then5 > Adding Instrumentation in BB Name=for.end > Adding Instrumentation in BB Name=for.end.loopexit > > After a training run, we get profile data for the main function as > follows, but these count values are put into incorrect BBs in the second > compilation. > Block counts: [0, 300, 4, 1, 1] > > $ opt -analyze -pgo-instr-use _all_combined.bc > -debug-only=pgo-instrumentation > Dump Function main Hash: 61483163021 after CFGMST > Number of Basic Blocks: 10 > BB: FakeNode Index=0 > BB: *for.body.lr.ph* <http://for.body.lr.ph/> Index=3 > BB: if.end Index=6 > BB: entry Index=1 > BB: if.then Index=5 > BB: for.body Index=4 > BB: for.end.loopexit Index=9 > BB: for.inc Index=8 > BB: if.then5 Index=7 > BB: for.end Index=2 > Number of Edges: 14 (*: Instrument, C: CriticalEdge, -: Removed) > Edge 0: 8-->4 c W=247031 > Edge 1: 6-->8 c W=159375 > Edge 2: 4-->6 *c W=127500 > Edge 3: 1-->2 c W=127058 > Edge 4: 0-->1 W=135 > Edge 5: 2-->0 * W=135 > Edge 6: 4-->5 W=127 > Edge 7: 5-->6 * W=127 > Edge 8: 6-->7 W=95 > Edge 9: 7-->8 * W=95 > Edge 10: 3-->4 W=8 > Edge 11: 9-->2 W=8 > Edge 12: 1-->3 W=7 > Edge 13: 8-->9 * W=7 > 5 counts > 0: 0 > 1: 300 > 2: 4 > 3: 1 > 4: 1 > SUM = 306 > Split critical edge: 4 --> 6 > Setting BB Name=for.body.if.end_crit_edge with CountValue=0 > Setting BB Name=for.end with CountValue=300 > Setting BB Name=if.then with CountValue=4 > Setting BB Name=if.then5 with CountValue=1 > Setting BB Name=for.end.loopexit with CountValue=1 > > The CountValue 300 should go to the BB=if.then (Index 5), not for.end > (Index 2). Actually because of this incorrect setting, the entry count of > the main function is set 300, instead of 1 (after populating the count > values). > The reason for this problem is that CFGMST edges are ordered in a > different way due to different weight values (edges 0 --> 1 and 2 --> 0 get > W=12 in the first compilation, while they get W=135 in the second > compilation). The weight values are computed based on block frequency info > and branch probability info, but somehow they produce different values > between the two compilations. > > > > Different BFI produced for otherwise identical compilation is a bug we > should fix (can cause other problems too). Can you file a bug about it? > > thanks, > > David > > > > How can we assume that CFGMST is constructed in the same way between > the two compilations so that we can always set profile results into correct > basic blocks? > > Thank you, > --Toshjio > > _______________________________________________ > LLVM Developers mailing list > *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org> > *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev* > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/476c17e4/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/476c17e4/attachment.gif>