Snehasish Kumar via llvm-dev
2016-Mar-16 21:03 UTC
[llvm-dev] GSoC Proposal : Path Profiling Support
Hi David,> Are the data below all collected when only one function is picked for > instrumentation?Yes, here is a list of the benchmarks and selected functions. +-----------------+----------------------------------------------------------------------------------------------+ | blks | _Z19BlkSchlsEqEuroNoDivfffffif | +-----------------+----------------------------------------------------------------------------------------------+ | bodytrack | _ZN17ImageMeasurements11InsideErrorERK17ProjectedCylinderRK11BinaryImageRiS6_ | +-----------------+----------------------------------------------------------------------------------------------+ | bzip2 | BZ2_compressBlock | +-----------------+----------------------------------------------------------------------------------------------+ | ferret | image_segment | +-----------------+----------------------------------------------------------------------------------------------+ | fluidanimate | _Z13ComputeForcesv | +-----------------+----------------------------------------------------------------------------------------------+ | freqmine | _Z32FPArray_conditional_pattern_baseIhEiP7FP_treeiiT_ | +-----------------+----------------------------------------------------------------------------------------------+ | gcc | bitmap_operation | +-----------------+----------------------------------------------------------------------------------------------+ | hmmer | P7Viterbi | +-----------------+----------------------------------------------------------------------------------------------+ | lbm | LBM_performStreamCollide | +-----------------+----------------------------------------------------------------------------------------------+ | mcf | price_out_impl | +-----------------+----------------------------------------------------------------------------------------------+ | mcf2000 | price_out_impl | +-----------------+----------------------------------------------------------------------------------------------+ | namd | _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded | +-----------------+----------------------------------------------------------------------------------------------+ | povray | _ZN3povL24All_Sphere_IntersectionsEPNS_13Object_StructEPNS_10Ray_StructEPNS_13istack_structE | +-----------------+----------------------------------------------------------------------------------------------+ | sjeng | gen | +-----------------+----------------------------------------------------------------------------------------------+ | soplex | _ZN6soplex9CLUFactor16vSolveUrightNoNZEPdS1_Piid | +-----------------+----------------------------------------------------------------------------------------------+ | sphinx | vector_gautbl_eval_logs3 | +-----------------+----------------------------------------------------------------------------------------------+ | streamcluster | _Z5pgainlP6PointsdPliP17pthread_barrier_t | +-----------------+----------------------------------------------------------------------------------------------+ | swaptions | _Z21HJM_Swaption_BlockingPddddddiidS_PS_llii | +-----------------+----------------------------------------------------------------------------------------------+ | h264ref | dct_luma_16x16 | +-----------------+----------------------------------------------------------------------------------------------+> Do you have data when such manual selection is not done?At the moment, I do not.> > thanks, > > David > > >> >> numpaths = Number of possible paths >> epp+compile = Time taken to compute encoding, insert instrumentation and >> compile to executable >> compile = Time taken to compile to executable >> execpaths = Number of paths dynamically executed >> epp-exec-time = Execution time with instrumentation >> exec-time = Normal execution time >> epp-bin-size = Size of instrumented binary in bytes >> bin-size = Size of binary >> ** size of shared library in bytes = 598042 >> >> >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | benchmark | numpaths | epp+compile | compile | execpaths | >> epp-exec-time | exec-time | epp-bin-size | bin-size | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | blks | 2 | 0m1.036s | 0m1.008s | 2 | >> 0m3.643s | 0m3.205s | 155931 | 155459 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | bodytrack | 29 | 0m4.907s | 0m4.881s | 5 | >> 0m14.786s | 0m1.943s | 2125256 | 2124224 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | bzip2 | 60 | 0m1.274s | 0m1.268s | 3 | >> 0m9.441s | 0m9.624s | 259125 | 258477 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | ferret | 360921 | 0m26.208s | 0m26.102s | 40 | >> 0m10.342s | 0m6.224s | 8342571 | 8338588 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | fluidanimate | 384117 | 0m0.895s | 0m0.869s | 88 | >> 0m56.631s | 0m1.294s | 202702 | 197878 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | freqmine | 45 | 0m1.220s | 0m1.214s | 18 | >> 0m22.150s | 0m5.515s | 278615 | 277656 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | gcc | 6026 | 0m31.941s | 0m31.327s | 125 | >> 1m30.139s | 0m36.601s | 6991413 | 6991245 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | hmmer | 1882 | 0m3.193s | 0m3.232s | 65 | >> 0m58.911s | 0m2.474s | 744510 | 742806 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | mcf | 230 | 0m0.838s | 0m0.830s | 10 | >> 0m11.097s | 0m3.074s | 162680 | 161736 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | mcf2000 | 1155 | 0m0.859s | 0m0.853s | 26 | >> 0m24.169s | 0m4.625s | 166092 | 165213 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | povray | 17 | 0m8.543s | 0m8.552s | 4 | >> 9m24.562s | 5m39.295s | 2388152 | 2387960 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | sjeng | 158740 | 0m1.648s | 0m1.637s | 280 | >> 0m20.786s | 0m5.229s | 368841 | 368009 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | soplex | 30 | 0m4.849s | 0m4.848s | 24 | >> 7m28.151s | 4m10.813s | 1244775 | 1242063 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | sphinx | 26 | 0m2.212s | 0m2.198s | 5 | >> 1m36.291s | 0m13.811s | 543534 | 543358 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | streamcluster | 21121728 | 0m0.947s | 0m0.908s | 33 | >> 0m50.212s | 0m5.986s | 191981 | 185438 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | swaptions | 20655 | 0m0.965s | 0m0.950s | 13 | >> 0m0.263s | 0m0.178s | 193841 | 184274 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | h264ref | 24130 | 0m4.278s | 0m4.272s | 76 | >> 3m26.701s | 3m4.461s | 816660 | 812396 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | lbm | 8 | 0m0.824s | 0m0.815s | 5 | >> 6m29.685s | 1m39.180s | 150871 | 150327 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> | namd | 59598954 | 0m4.124s | 0m4.139s | 43 | >> 18m36.447s | 6m50.288s | 925863 | 925271 | >> >> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >> >> >> >> > > Open Issues : >> > > + Update PathProfileInfo on CFG transformations ? >> >> > Could you clarify what this means? >> >> Changing the control flow graph of a routine may invalidate collected path >> profiles. For example, splitting a block with an unconditional branch does >> not change the profile, but introducing a conditional branch invalidates the >> profile. The issue I would like to address is which transformations should >> we allow as safe transformations and how should we update the internal path >> profile data structures if we allow this at all. >> >> > > + Verify with PGOEdge info ? >> >> > Ditto. >> >> Verification with PGOEdge info implies that the edge frequencies derived >> from path profiles and via instrprof should be equal. >> >> > > + Handle setjmp, longjmp, early program termination, noreturn calls >> >> > How do you handle indirect calls? >> >> No special handling of indirect calls as path profiles are >> intra-procedural and control returns to same basic block >> after call in the general case. For the above mentioned cases, control may >> not return. >> >> >> Regards, >> Snehasish >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Snehasish Kumar via llvm-dev
2016-Mar-21 22:07 UTC
[llvm-dev] GSoC Proposal : Path Profiling Support
Hi I am pinging to find out if there is any interest to mentor this proposal for GSoC this year? I've submitted a draft via the GSoC website. David, Vedant it would be great if I could get some advice on refining the goals and particulars of the implementation. The version we use internally is not performance oriented and will require refactoring. Here is a link to the draft document [1]. Thanks, Snehasish [1] https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing On Wed, Mar 16, 2016 at 2:03 PM, Snehasish Kumar <ska124 at sfu.ca> wrote:> Hi David, > >> Are the data below all collected when only one function is picked for >> instrumentation? > > Yes, here is a list of the benchmarks and selected functions. > > +-----------------+----------------------------------------------------------------------------------------------+ > | blks | _Z19BlkSchlsEqEuroNoDivfffffif > | > +-----------------+----------------------------------------------------------------------------------------------+ > | bodytrack | > _ZN17ImageMeasurements11InsideErrorERK17ProjectedCylinderRK11BinaryImageRiS6_ > | > +-----------------+----------------------------------------------------------------------------------------------+ > | bzip2 | BZ2_compressBlock > | > +-----------------+----------------------------------------------------------------------------------------------+ > | ferret | image_segment > | > +-----------------+----------------------------------------------------------------------------------------------+ > | fluidanimate | _Z13ComputeForcesv > | > +-----------------+----------------------------------------------------------------------------------------------+ > | freqmine | > _Z32FPArray_conditional_pattern_baseIhEiP7FP_treeiiT_ > | > +-----------------+----------------------------------------------------------------------------------------------+ > | gcc | bitmap_operation > | > +-----------------+----------------------------------------------------------------------------------------------+ > | hmmer | P7Viterbi > | > +-----------------+----------------------------------------------------------------------------------------------+ > | lbm | LBM_performStreamCollide > | > +-----------------+----------------------------------------------------------------------------------------------+ > | mcf | price_out_impl > | > +-----------------+----------------------------------------------------------------------------------------------+ > | mcf2000 | price_out_impl > | > +-----------------+----------------------------------------------------------------------------------------------+ > | namd | > _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded > | > +-----------------+----------------------------------------------------------------------------------------------+ > | povray | > _ZN3povL24All_Sphere_IntersectionsEPNS_13Object_StructEPNS_10Ray_StructEPNS_13istack_structE > | > +-----------------+----------------------------------------------------------------------------------------------+ > | sjeng | gen > | > +-----------------+----------------------------------------------------------------------------------------------+ > | soplex | _ZN6soplex9CLUFactor16vSolveUrightNoNZEPdS1_Piid > | > +-----------------+----------------------------------------------------------------------------------------------+ > | sphinx | vector_gautbl_eval_logs3 > | > +-----------------+----------------------------------------------------------------------------------------------+ > | streamcluster | _Z5pgainlP6PointsdPliP17pthread_barrier_t > | > +-----------------+----------------------------------------------------------------------------------------------+ > | swaptions | _Z21HJM_Swaption_BlockingPddddddiidS_PS_llii > | > +-----------------+----------------------------------------------------------------------------------------------+ > | h264ref | dct_luma_16x16 > | > +-----------------+----------------------------------------------------------------------------------------------+ > >> Do you have data when such manual selection is not done? > > At the moment, I do not. > >> >> thanks, >> >> David >> >> >>> >>> numpaths = Number of possible paths >>> epp+compile = Time taken to compute encoding, insert instrumentation and >>> compile to executable >>> compile = Time taken to compile to executable >>> execpaths = Number of paths dynamically executed >>> epp-exec-time = Execution time with instrumentation >>> exec-time = Normal execution time >>> epp-bin-size = Size of instrumented binary in bytes >>> bin-size = Size of binary >>> ** size of shared library in bytes = 598042 >>> >>> >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | benchmark | numpaths | epp+compile | compile | execpaths | >>> epp-exec-time | exec-time | epp-bin-size | bin-size | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | blks | 2 | 0m1.036s | 0m1.008s | 2 | >>> 0m3.643s | 0m3.205s | 155931 | 155459 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | bodytrack | 29 | 0m4.907s | 0m4.881s | 5 | >>> 0m14.786s | 0m1.943s | 2125256 | 2124224 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | bzip2 | 60 | 0m1.274s | 0m1.268s | 3 | >>> 0m9.441s | 0m9.624s | 259125 | 258477 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | ferret | 360921 | 0m26.208s | 0m26.102s | 40 | >>> 0m10.342s | 0m6.224s | 8342571 | 8338588 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | fluidanimate | 384117 | 0m0.895s | 0m0.869s | 88 | >>> 0m56.631s | 0m1.294s | 202702 | 197878 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | freqmine | 45 | 0m1.220s | 0m1.214s | 18 | >>> 0m22.150s | 0m5.515s | 278615 | 277656 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | gcc | 6026 | 0m31.941s | 0m31.327s | 125 | >>> 1m30.139s | 0m36.601s | 6991413 | 6991245 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | hmmer | 1882 | 0m3.193s | 0m3.232s | 65 | >>> 0m58.911s | 0m2.474s | 744510 | 742806 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | mcf | 230 | 0m0.838s | 0m0.830s | 10 | >>> 0m11.097s | 0m3.074s | 162680 | 161736 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | mcf2000 | 1155 | 0m0.859s | 0m0.853s | 26 | >>> 0m24.169s | 0m4.625s | 166092 | 165213 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | povray | 17 | 0m8.543s | 0m8.552s | 4 | >>> 9m24.562s | 5m39.295s | 2388152 | 2387960 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | sjeng | 158740 | 0m1.648s | 0m1.637s | 280 | >>> 0m20.786s | 0m5.229s | 368841 | 368009 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | soplex | 30 | 0m4.849s | 0m4.848s | 24 | >>> 7m28.151s | 4m10.813s | 1244775 | 1242063 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | sphinx | 26 | 0m2.212s | 0m2.198s | 5 | >>> 1m36.291s | 0m13.811s | 543534 | 543358 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | streamcluster | 21121728 | 0m0.947s | 0m0.908s | 33 | >>> 0m50.212s | 0m5.986s | 191981 | 185438 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | swaptions | 20655 | 0m0.965s | 0m0.950s | 13 | >>> 0m0.263s | 0m0.178s | 193841 | 184274 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | h264ref | 24130 | 0m4.278s | 0m4.272s | 76 | >>> 3m26.701s | 3m4.461s | 816660 | 812396 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | lbm | 8 | 0m0.824s | 0m0.815s | 5 | >>> 6m29.685s | 1m39.180s | 150871 | 150327 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> | namd | 59598954 | 0m4.124s | 0m4.139s | 43 | >>> 18m36.447s | 6m50.288s | 925863 | 925271 | >>> >>> +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ >>> >>> >>> >>> > > Open Issues : >>> > > + Update PathProfileInfo on CFG transformations ? >>> >>> > Could you clarify what this means? >>> >>> Changing the control flow graph of a routine may invalidate collected path >>> profiles. For example, splitting a block with an unconditional branch does >>> not change the profile, but introducing a conditional branch invalidates the >>> profile. The issue I would like to address is which transformations should >>> we allow as safe transformations and how should we update the internal path >>> profile data structures if we allow this at all. >>> >>> > > + Verify with PGOEdge info ? >>> >>> > Ditto. >>> >>> Verification with PGOEdge info implies that the edge frequencies derived >>> from path profiles and via instrprof should be equal. >>> >>> > > + Handle setjmp, longjmp, early program termination, noreturn calls >>> >>> > How do you handle indirect calls? >>> >>> No special handling of indirect calls as path profiles are >>> intra-procedural and control returns to same basic block >>> after call in the general case. For the above mentioned cases, control may >>> not return. >>> >>> >>> Regards, >>> Snehasish >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>
Vedant Kumar via llvm-dev
2016-Mar-22 07:04 UTC
[llvm-dev] GSoC Proposal : Path Profiling Support
I've added some inline comments to your proposal doc. It would be nice to get feedback from potential users of path profile information, like people who work on GVN. Some general notes: - It would be nice to measure the performance variation between programs optimized with edge-based profiles vs. with path profiles. - I'd be curious to see comparisons with the instrprof framework (compile time, exec time, profile size, binary size). This would establish a concrete target to beat.> David, Vedant it would be great if I could get some advice on refining > the goals and particulars of the implementation. > The version we use internally is not performance oriented and will > require refactoring. > Here is a link to the draft document [1]. > > [1] https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing > >>>>>> Open Issues : >>>>>> + Update PathProfileInfo on CFG transformations ? >>>> >>>>> Could you clarify what this means? >>>> >>>> Changing the control flow graph of a routine may invalidate collected path >>>> profiles. For example, splitting a block with an unconditional branch does >>>> not change the profile, but introducing a conditional branch invalidates the >>>> profile. The issue I would like to address is which transformations should >>>> we allow as safe transformations and how should we update the internal path >>>> profile data structures if we allow this at all.It seems that existing optimization passes can make changes that could invalidate a path profile. Is it expensive to detect these changes and invalidate the affected portions of the profile? vedant
Xinliang David Li via llvm-dev
2016-Mar-22 23:23 UTC
[llvm-dev] GSoC Proposal : Path Profiling Support
Hi Snehasish, thanks for writing up the proposal. As it stands today, path profiling still has serious scalability issue that prevents it from being usable by any optimization passes that may benefit from it. On the other hand, sampling based approach can still be promising. For instance, LBR can potentially together with static CFG constructed from the binary can be used to form path(let) samples, which is the area our intern will explore this summer. It will be interesting to see how the sampling based approach matches up instrumentation based method in detecting hot paths. Independent of the method used in generating path profile data, your proposed work on the path profile info representation and query APIs can be shared. thanks, David On Mon, Mar 21, 2016 at 3:07 PM, Snehasish Kumar <ska124 at sfu.ca> wrote:> Hi > > I am pinging to find out if there is any interest to mentor this > proposal for GSoC this year? I've submitted a draft via the GSoC > website. > > David, Vedant it would be great if I could get some advice on refining > the goals and particulars of the implementation. > The version we use internally is not performance oriented and will > require refactoring. > Here is a link to the draft document [1]. > > Thanks, > Snehasish > > > [1] > https://docs.google.com/document/d/18i9FvD7FSqX6tNEXb83gzc0EC_STeS3bWOVf167sFWk/edit?usp=sharing > > > On Wed, Mar 16, 2016 at 2:03 PM, Snehasish Kumar <ska124 at sfu.ca> wrote: > > Hi David, > > > >> Are the data below all collected when only one function is picked for > >> instrumentation? > > > > Yes, here is a list of the benchmarks and selected functions. > > > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | blks | _Z19BlkSchlsEqEuroNoDivfffffif > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | bodytrack | > > > _ZN17ImageMeasurements11InsideErrorERK17ProjectedCylinderRK11BinaryImageRiS6_ > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | bzip2 | BZ2_compressBlock > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | ferret | image_segment > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | fluidanimate | _Z13ComputeForcesv > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | freqmine | > > _Z32FPArray_conditional_pattern_baseIhEiP7FP_treeiiT_ > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | gcc | bitmap_operation > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | hmmer | P7Viterbi > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | lbm | LBM_performStreamCollide > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | mcf | price_out_impl > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | mcf2000 | price_out_impl > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | namd | > > _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | povray | > > > _ZN3povL24All_Sphere_IntersectionsEPNS_13Object_StructEPNS_10Ray_StructEPNS_13istack_structE > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | sjeng | gen > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | soplex | _ZN6soplex9CLUFactor16vSolveUrightNoNZEPdS1_Piid > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | sphinx | vector_gautbl_eval_logs3 > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | streamcluster | _Z5pgainlP6PointsdPliP17pthread_barrier_t > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | swaptions | _Z21HJM_Swaption_BlockingPddddddiidS_PS_llii > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > | h264ref | dct_luma_16x16 > > | > > > +-----------------+----------------------------------------------------------------------------------------------+ > > > >> Do you have data when such manual selection is not done? > > > > At the moment, I do not. > > > >> > >> thanks, > >> > >> David > >> > >> > >>> > >>> numpaths = Number of possible paths > >>> epp+compile = Time taken to compute encoding, insert instrumentation > and > >>> compile to executable > >>> compile = Time taken to compile to executable > >>> execpaths = Number of paths dynamically executed > >>> epp-exec-time = Execution time with instrumentation > >>> exec-time = Normal execution time > >>> epp-bin-size = Size of instrumented binary in bytes > >>> bin-size = Size of binary > >>> ** size of shared library in bytes = 598042 > >>> > >>> > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | benchmark | numpaths | epp+compile | compile | execpaths | > >>> epp-exec-time | exec-time | epp-bin-size | bin-size | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | blks | 2 | 0m1.036s | 0m1.008s | 2 | > >>> 0m3.643s | 0m3.205s | 155931 | 155459 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | bodytrack | 29 | 0m4.907s | 0m4.881s | 5 | > >>> 0m14.786s | 0m1.943s | 2125256 | 2124224 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | bzip2 | 60 | 0m1.274s | 0m1.268s | 3 | > >>> 0m9.441s | 0m9.624s | 259125 | 258477 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | ferret | 360921 | 0m26.208s | 0m26.102s | 40 | > >>> 0m10.342s | 0m6.224s | 8342571 | 8338588 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | fluidanimate | 384117 | 0m0.895s | 0m0.869s | 88 | > >>> 0m56.631s | 0m1.294s | 202702 | 197878 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | freqmine | 45 | 0m1.220s | 0m1.214s | 18 | > >>> 0m22.150s | 0m5.515s | 278615 | 277656 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | gcc | 6026 | 0m31.941s | 0m31.327s | 125 | > >>> 1m30.139s | 0m36.601s | 6991413 | 6991245 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | hmmer | 1882 | 0m3.193s | 0m3.232s | 65 | > >>> 0m58.911s | 0m2.474s | 744510 | 742806 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | mcf | 230 | 0m0.838s | 0m0.830s | 10 | > >>> 0m11.097s | 0m3.074s | 162680 | 161736 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | mcf2000 | 1155 | 0m0.859s | 0m0.853s | 26 | > >>> 0m24.169s | 0m4.625s | 166092 | 165213 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | povray | 17 | 0m8.543s | 0m8.552s | 4 | > >>> 9m24.562s | 5m39.295s | 2388152 | 2387960 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | sjeng | 158740 | 0m1.648s | 0m1.637s | 280 | > >>> 0m20.786s | 0m5.229s | 368841 | 368009 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | soplex | 30 | 0m4.849s | 0m4.848s | 24 | > >>> 7m28.151s | 4m10.813s | 1244775 | 1242063 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | sphinx | 26 | 0m2.212s | 0m2.198s | 5 | > >>> 1m36.291s | 0m13.811s | 543534 | 543358 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | streamcluster | 21121728 | 0m0.947s | 0m0.908s | 33 | > >>> 0m50.212s | 0m5.986s | 191981 | 185438 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | swaptions | 20655 | 0m0.965s | 0m0.950s | 13 | > >>> 0m0.263s | 0m0.178s | 193841 | 184274 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | h264ref | 24130 | 0m4.278s | 0m4.272s | 76 | > >>> 3m26.701s | 3m4.461s | 816660 | 812396 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | lbm | 8 | 0m0.824s | 0m0.815s | 5 | > >>> 6m29.685s | 1m39.180s | 150871 | 150327 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> | namd | 59598954 | 0m4.124s | 0m4.139s | 43 | > >>> 18m36.447s | 6m50.288s | 925863 | 925271 | > >>> > >>> > +---------------+----------+-------------+-----------+-----------+---------------+-----------+--------------+----------+ > >>> > >>> > >>> > >>> > > Open Issues : > >>> > > + Update PathProfileInfo on CFG transformations ? > >>> > >>> > Could you clarify what this means? > >>> > >>> Changing the control flow graph of a routine may invalidate collected > path > >>> profiles. For example, splitting a block with an unconditional branch > does > >>> not change the profile, but introducing a conditional branch > invalidates the > >>> profile. The issue I would like to address is which transformations > should > >>> we allow as safe transformations and how should we update the internal > path > >>> profile data structures if we allow this at all. > >>> > >>> > > + Verify with PGOEdge info ? > >>> > >>> > Ditto. > >>> > >>> Verification with PGOEdge info implies that the edge frequencies > derived > >>> from path profiles and via instrprof should be equal. > >>> > >>> > > + Handle setjmp, longjmp, early program termination, noreturn calls > >>> > >>> > How do you handle indirect calls? > >>> > >>> No special handling of indirect calls as path profiles are > >>> intra-procedural and control returns to same basic block > >>> after call in the general case. For the above mentioned cases, control > may > >>> not return. > >>> > >>> > >>> Regards, > >>> Snehasish > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/db34a03b/attachment.html>