Abe Skolnik via llvm-dev
2016-Sep-29 17:59 UTC
[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
Dear all, As part of working on making test-suite less demanding of exact FP results so my FP-contraction patch can go back into trunk and stay there, today I analyzed "MultiSource/Benchmarks/VersaBench/beamformer". I found that the raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of floating-point text, which IMO is too much to put into a patch -- or at least a _civilized_ patch. ;-) As a result, I wrote the below Python program, which I think should deal with the problem fairly well, or at least is a good first attempt at doing so and can be improved later. For the "MultiSource/Benchmarks/VersaBench/beamformer" test, using the same CLI arg.s as in the test-suite script, passing the output of the test through the below script gives me the following results. Vanilla compiler, i.e. without FP-contraction patch --------------------------------------------------- 286720 9178782.5878 Compiler WITH FP-contraction patch ---------------------------------- 286720 9178782.58444 I think this output format should be acceptable for further processing by "fpcmp", since the # of FPs read will always be an integer. At least with absolute tolerances, as long as abs_tol<1 I think it should be OK. Relative tolerances have me a little nervous; is there a way to tell "fpcmp" that integers that are output as such -- i.e. "13", not "13.0" -- are to be evaluated without any regard to tolerance? Or maybe it already does it that way? If not, then perhaps it`s easy to add that feature and have it be controlled by a flag, e.g. "--integers-ignore-tolerances", and have that flag be off by default for backwards compatibility. If "fpcmp" currently cannot ignore tolerances for integers and cannot easily be extended to be able to do so, then I propose that the two calculated results go to separate outputs [2 files?] to be tested separately: the e.g. "output_count" file to be tested against its reference _exactly_, and the e.g. "output_sum" file to be tested against its reference with a positive tolerance. I think I`d be able to make the Python code do that without anybody else`s help other than e.g. "yes, do that". My apologies for not uploading the script yet; the LLVM Phabricator seems to be down over the past few minutes. Regards, Abe test-suite/tools/count_and_sum_floats.py ---------------------------------------- #!/usr/bin/python import math, sys try_for_more = True count = 0 total = 0.0 while try_for_more: line = sys.stdin.readline() if line: split_line = line.split() # handles ASCII horizontal tabs as well as ASCII horizontal spaces as_floats = [float(x) for x in split_line] for the_float in as_floats: if not ( math.isinf(the_float) or math.isnan(the_float) ): count = count + 1 total = total + the_float else: try_for_more = False # cleaner than "break", I suppose print (count) print (total)
Sebastian Pop via llvm-dev
2016-Sep-29 18:21 UTC
[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
Cumulating errors is a bad idea. As others have suggested, please prepare a patch that disables fp-contract on those testcases. Thanks, Sebastian On Thu, Sep 29, 2016 at 12:59 PM, Abe Skolnik via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Dear all, > > As part of working on making test-suite less demanding of exact FP results > so my FP-contraction patch can go back into trunk and stay there, today I > analyzed "MultiSource/Benchmarks/VersaBench/beamformer". I found that the > raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of > floating-point text, which IMO is too much to put into a patch -- or at > least a _civilized_ patch. ;-) > > As a result, I wrote the below Python program, which I think should deal > with the problem fairly well, or at least is a good first attempt at doing > so and can be improved later. For the > "MultiSource/Benchmarks/VersaBench/beamformer" test, using the same CLI > arg.s as in the test-suite script, passing the output of the test through > the below script gives me the following results. > > Vanilla compiler, i.e. without FP-contraction patch > --------------------------------------------------- > 286720 > 9178782.5878 > > Compiler WITH FP-contraction patch > ---------------------------------- > 286720 > 9178782.58444 > > > > I think this output format should be acceptable for further processing by > "fpcmp", since the # of FPs read will always be an integer. At least with > absolute tolerances, as long as abs_tol<1 I think it should be OK. Relative > tolerances have me a little nervous; is there a way to tell "fpcmp" that > integers that are output as such -- i.e. "13", not "13.0" -- are to be > evaluated without any regard to tolerance? Or maybe it already does it that > way? If not, then perhaps it`s easy to add that feature and have it be > controlled by a flag, e.g. "--integers-ignore-tolerances", and have that > flag be off by default for backwards compatibility. > > If "fpcmp" currently cannot ignore tolerances for integers and cannot easily > be extended to be able to do so, then I propose that the two calculated > results go to separate outputs [2 files?] to be tested separately: the e.g. > "output_count" file to be tested against its reference _exactly_, and the > e.g. "output_sum" file to be tested against its reference with a positive > tolerance. I think I`d be able to make the Python code do that without > anybody else`s help other than e.g. "yes, do that". > > My apologies for not uploading the script yet; the LLVM Phabricator seems to > be down over the past few minutes. > > Regards, > > Abe > > > > > > > > > > test-suite/tools/count_and_sum_floats.py > ---------------------------------------- > #!/usr/bin/python > > import math, sys > > try_for_more = True > > count = 0 > total = 0.0 > > while try_for_more: > line = sys.stdin.readline() > if line: > split_line = line.split() # handles ASCII horizontal tabs as well as > ASCII horizontal spaces > as_floats = [float(x) for x in split_line] > for the_float in as_floats: > if not ( math.isinf(the_float) or math.isnan(the_float) ): > count = count + 1 > total = total + the_float > else: > try_for_more = False # cleaner than "break", I suppose > > print (count) > print (total) > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2016-Sep-29 19:05 UTC
[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
On 29 September 2016 at 19:21, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> Cumulating errors is a bad idea. > As others have suggested, please prepare a patch that disables > fp-contract on those testcases.No, please, let's not disable things just because they fail. If the test is not meaningful or if the results are not good, let's just change the test in a meaningful way that can work with any FP optimisation without changing meaning. If it does change meaning, it's a bug and we *want* to catch. cheers, --renato
Renato Golin via llvm-dev
2016-Sep-29 21:25 UTC
[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
On 29 September 2016 at 18:59, Abe Skolnik <a.skolnik at samsung.com> wrote:> As part of working on making test-suite less demanding of exact FP results > so my FP-contraction patch can go back into trunk and stay there, today I > analyzed "MultiSource/Benchmarks/VersaBench/beamformer". I found that the > raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of > floating-point text, which IMO is too much to put into a patch -- or at > least a _civilized_ patch. ;-)Not to mention having to debug the whole thing every time it breaks. :S How I "fixed" it in the past was to do some heuristics like you did, while still trying to keep the meaning. I think the idea of having a "number of results" is good, but I also think you can separate the 300k values in logical groups, maybe adding them up. Of course, the more you do to the results, the higher will be the rounding errors, and the less meaning the results will have. I don't know much about this specific benchmark, but if it has some kind of internal aggregation values, you can dump those instead?> As a result, I wrote the below Python program, which I think should deal > with the problem fairly well, or at least is a good first attempt at doing > so and can be improved later.The python script is a good prototype for what you want, but I'd rather change the source code of the benchmark to print less, more valuable, stuff. The more you print, the more your run will be tied to stdio and less to what you wanted to benchmark in the fist place.> Vanilla compiler, i.e. without FP-contraction patch > --------------------------------------------------- > 286720 > 9178782.5878 > > Compiler WITH FP-contraction patch > ---------------------------------- > 286720 > 9178782.58444This looks like a small enough change to me, given the amount of precision you're losing. But it'd be better to make sure the result has at least some meaning related to the benchmark.> If "fpcmp" currently cannot ignore tolerances for integers and cannot easily > be extended to be able to do so, then I propose that the two calculated > results go to separate outputs [2 files?] to be tested separately.That'd be fine, too. cheers, --renato
Sebastian Pop via llvm-dev
2016-Sep-29 21:33 UTC
[llvm-dev] [cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
On Thu, Sep 29, 2016 at 4:25 PM, Renato Golin via cfe-dev <cfe-dev at lists.llvm.org> wrote:> On 29 September 2016 at 18:59, Abe Skolnik <a.skolnik at samsung.com> wrote: >> As part of working on making test-suite less demanding of exact FP results >> so my FP-contraction patch can go back into trunk and stay there, today I >> analyzed "MultiSource/Benchmarks/VersaBench/beamformer". I found that the >> raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of >> floating-point text, which IMO is too much to put into a patch -- or at >> least a _civilized_ patch. ;-) > > Not to mention having to debug the whole thing every time it breaks. :S > > How I "fixed" it in the past was to do some heuristics like you did, > while still trying to keep the meaning. > > I think the idea of having a "number of results" is good, but I also > think you can separate the 300k values in logical groups, maybe adding > them up. > > Of course, the more you do to the results, the higher will be the > rounding errors, and the less meaning the results will have. > > I don't know much about this specific benchmark, but if it has some > kind of internal aggregation values, you can dump those instead? > > >> As a result, I wrote the below Python program, which I think should deal >> with the problem fairly well, or at least is a good first attempt at doing >> so and can be improved later. > > The python script is a good prototype for what you want, but I'd > rather change the source code of the benchmark to print less, more > valuable, stuff. > > The more you print, the more your run will be tied to stdio and less > to what you wanted to benchmark in the fist place. > > >> Vanilla compiler, i.e. without FP-contraction patch >> --------------------------------------------------- >> 286720 >> 9178782.5878 >> >> Compiler WITH FP-contraction patch >> ---------------------------------- >> 286720 >> 9178782.58444 > > This looks like a small enough change to me, given the amount of > precision you're losing. But it'd be better to make sure the result > has at least some meaning related to the benchmark.Summing up errors cancels them up: it makes no sense to do this. You may be missing errors that are an order of magnitude off both directions and still end up with an overall small delta after reduction. And by the way, what does it mean "small enough change"? Based on what metrics? Is there a way to quantify what is an acceptable error margin? There is a lot of information that is thrown away with the sum reduction. Sebastian
Apparently Analagous Threads
- [cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
- [cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers
- a proposed script to help with test-suite programs that output _lots_ of FP numbers
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"