thr3ads.net - llvm dev - [llvm-dev] a proposed script to help with test-suite programs that output _lots

If this information is useful, please help other people find it:
Share via:

Abe Skolnik via llvm-dev

2016-Sep-29 17:59 UTC

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

Dear all,

As part of working on making test-suite less demanding of exact FP results so my
FP-contraction
patch can go back into trunk and stay there, today I analyzed 
"MultiSource/Benchmarks/VersaBench/beamformer".  I found that the raw
output from that program
is 2789780 bytes [i.e. ~2.7 _megabytes_] of floating-point text, which IMO is
too much to put
into a patch -- or at least a _civilized_ patch.  ;-)

As a result, I wrote the below Python program, which I think should deal with
the problem
fairly well, or at least is a good first attempt at doing so and can be improved
later.  For
the "MultiSource/Benchmarks/VersaBench/beamformer" test, using the
same CLI arg.s as in the
test-suite script, passing the output of the test through the below script gives
me the
following results.

Vanilla compiler, i.e. without FP-contraction patch
---------------------------------------------------
286720
9178782.5878

Compiler WITH FP-contraction patch
----------------------------------
286720
9178782.58444



I think this output format should be acceptable for further processing by
"fpcmp", since the #
of FPs read will always be an integer.  At least with absolute tolerances, as
long as abs_tol<1
I think it should be OK.  Relative tolerances have me a little nervous; is there
a way to tell
"fpcmp" that integers that are output as such -- i.e. "13",
not "13.0" -- are to be evaluated
without any regard to tolerance?  Or maybe it already does it that way?  If not,
then perhaps
it`s easy to add that feature and have it be controlled by a flag, e.g. 
"--integers-ignore-tolerances", and have that flag be off by default
for backwards compatibility.

If "fpcmp" currently cannot ignore tolerances for integers and cannot
easily be extended to be
able to do so, then I propose that the two calculated results go to separate
outputs [2 files?]
to be tested separately: the e.g. "output_count" file to be tested
against its reference
_exactly_, and the e.g. "output_sum" file to be tested against its
reference with a positive
tolerance.  I think I`d be able to make the Python code do that without anybody
else`s help
other than e.g. "yes, do that".

My apologies for not uploading the script yet; the LLVM Phabricator seems to be
down over the
past few minutes.

Regards,

Abe









test-suite/tools/count_and_sum_floats.py
----------------------------------------
#!/usr/bin/python

import math, sys

try_for_more = True

count = 0
total = 0.0

while try_for_more:
   line = sys.stdin.readline()
   if line:
     split_line = line.split() # handles ASCII horizontal tabs as well as ASCII
horizontal spaces
     as_floats = [float(x) for x in split_line]
     for the_float in as_floats:
       if not ( math.isinf(the_float) or math.isnan(the_float) ):
         count = count + 1
         total = total + the_float
   else:
     try_for_more = False # cleaner than "break", I suppose

print (count)
print (total)

Sebastian Pop via llvm-dev

2016-Sep-29 18:21 UTC

head link

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

Cumulating errors is a bad idea.
As others have suggested, please prepare a patch that disables
fp-contract on those testcases.

Thanks,
Sebastian

On Thu, Sep 29, 2016 at 12:59 PM, Abe Skolnik via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Dear all,
>
> As part of working on making test-suite less demanding of exact FP results
> so my FP-contraction patch can go back into trunk and stay there, today I
> analyzed "MultiSource/Benchmarks/VersaBench/beamformer".  I found
that the
> raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of
> floating-point text, which IMO is too much to put into a patch -- or at
> least a _civilized_ patch.  ;-)
>
> As a result, I wrote the below Python program, which I think should deal
> with the problem fairly well, or at least is a good first attempt at doing
> so and can be improved later.  For the
> "MultiSource/Benchmarks/VersaBench/beamformer" test, using the
same CLI
> arg.s as in the test-suite script, passing the output of the test through
> the below script gives me the following results.
>
> Vanilla compiler, i.e. without FP-contraction patch
> ---------------------------------------------------
> 286720
> 9178782.5878
>
> Compiler WITH FP-contraction patch
> ----------------------------------
> 286720
> 9178782.58444
>
>
>
> I think this output format should be acceptable for further processing by
> "fpcmp", since the # of FPs read will always be an integer.  At
least with
> absolute tolerances, as long as abs_tol<1 I think it should be OK. 
Relative
> tolerances have me a little nervous; is there a way to tell
"fpcmp" that
> integers that are output as such -- i.e. "13", not
"13.0" -- are to be
> evaluated without any regard to tolerance?  Or maybe it already does it
that
> way?  If not, then perhaps it`s easy to add that feature and have it be
> controlled by a flag, e.g. "--integers-ignore-tolerances", and
have that
> flag be off by default for backwards compatibility.
>
> If "fpcmp" currently cannot ignore tolerances for integers and
cannot easily
> be extended to be able to do so, then I propose that the two calculated
> results go to separate outputs [2 files?] to be tested separately: the e.g.
> "output_count" file to be tested against its reference _exactly_,
and the
> e.g. "output_sum" file to be tested against its reference with a
positive
> tolerance.  I think I`d be able to make the Python code do that without
> anybody else`s help other than e.g. "yes, do that".
>
> My apologies for not uploading the script yet; the LLVM Phabricator seems
to
> be down over the past few minutes.
>
> Regards,
>
> Abe
>
>
>
>
>
>
>
>
>
> test-suite/tools/count_and_sum_floats.py
> ----------------------------------------
> #!/usr/bin/python
>
> import math, sys
>
> try_for_more = True
>
> count = 0
> total = 0.0
>
> while try_for_more:
>   line = sys.stdin.readline()
>   if line:
>     split_line = line.split() # handles ASCII horizontal tabs as well as
> ASCII horizontal spaces
>     as_floats = [float(x) for x in split_line]
>     for the_float in as_floats:
>       if not ( math.isinf(the_float) or math.isnan(the_float) ):
>         count = count + 1
>         total = total + the_float
>   else:
>     try_for_more = False # cleaner than "break", I suppose
>
> print (count)
> print (total)
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2016-Sep-29 19:05 UTC

head link

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

On 29 September 2016 at 19:21, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> Cumulating errors is a bad idea.
> As others have suggested, please prepare a patch that disables
> fp-contract on those testcases.
No, please, let's not disable things just because they fail.

If the test is not meaningful or if the results are not good, let's
just change the test in a meaningful way that can work with any FP
optimisation without changing meaning.

If it does change meaning, it's a bug and we *want* to catch.

cheers,
--renato

Renato Golin via llvm-dev

2016-Sep-29 21:25 UTC

head link

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

On 29 September 2016 at 18:59, Abe Skolnik <a.skolnik at samsung.com>
wrote:> As part of working on making test-suite less demanding of exact FP results
> so my FP-contraction patch can go back into trunk and stay there, today I
> analyzed "MultiSource/Benchmarks/VersaBench/beamformer".  I found
that the
> raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_] of
> floating-point text, which IMO is too much to put into a patch -- or at
> least a _civilized_ patch.  ;-)
Not to mention having to debug the whole thing every time it breaks. :S

How I "fixed" it in the past was to do some heuristics like you did,
while still trying to keep the meaning.

I think the idea of having a "number of results" is good, but I also
think you can separate the 300k values in logical groups, maybe adding
them up.

Of course, the more you do to the results, the higher will be the
rounding errors, and the less meaning the results will have.

I don't know much about this specific benchmark, but if it has some
kind of internal aggregation values, you can dump those instead?

> As a result, I wrote the below Python program, which I think should deal
> with the problem fairly well, or at least is a good first attempt at doing
> so and can be improved later.
The python script is a good prototype for what you want, but I'd
rather change the source code of the benchmark to print less, more
valuable, stuff.

The more you print, the more your run will be tied to stdio and less
to what you wanted to benchmark in the fist place.

> Vanilla compiler, i.e. without FP-contraction patch
> ---------------------------------------------------
> 286720
> 9178782.5878
>
> Compiler WITH FP-contraction patch
> ----------------------------------
> 286720
> 9178782.58444
This looks like a small enough change to me, given the amount of
precision you're losing. But it'd be better to make sure the result
has at least some meaning related to the benchmark.


> If "fpcmp" currently cannot ignore tolerances for integers and
cannot easily
> be extended to be able to do so, then I propose that the two calculated
> results go to separate outputs [2 files?] to be tested separately.
That'd be fine, too.

cheers,
--renato

Sebastian Pop via llvm-dev

2016-Sep-29 21:33 UTC

head link

[llvm-dev] [cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

On Thu, Sep 29, 2016 at 4:25 PM, Renato Golin via cfe-dev
<cfe-dev at lists.llvm.org> wrote:> On 29 September 2016 at 18:59, Abe Skolnik <a.skolnik at samsung.com>
wrote:
>> As part of working on making test-suite less demanding of exact FP
results
>> so my FP-contraction patch can go back into trunk and stay there, today
I
>> analyzed "MultiSource/Benchmarks/VersaBench/beamformer".  I
found that the
>> raw output from that program is 2789780 bytes [i.e. ~2.7 _megabytes_]
of
>> floating-point text, which IMO is too much to put into a patch -- or at
>> least a _civilized_ patch.  ;-)
>
> Not to mention having to debug the whole thing every time it breaks. :S
>
> How I "fixed" it in the past was to do some heuristics like you
did,
> while still trying to keep the meaning.
>
> I think the idea of having a "number of results" is good, but I
also
> think you can separate the 300k values in logical groups, maybe adding
> them up.
>
> Of course, the more you do to the results, the higher will be the
> rounding errors, and the less meaning the results will have.
>
> I don't know much about this specific benchmark, but if it has some
> kind of internal aggregation values, you can dump those instead?
>
>
>> As a result, I wrote the below Python program, which I think should
deal
>> with the problem fairly well, or at least is a good first attempt at
doing
>> so and can be improved later.
>
> The python script is a good prototype for what you want, but I'd
> rather change the source code of the benchmark to print less, more
> valuable, stuff.
>
> The more you print, the more your run will be tied to stdio and less
> to what you wanted to benchmark in the fist place.
>
>
>> Vanilla compiler, i.e. without FP-contraction patch
>> ---------------------------------------------------
>> 286720
>> 9178782.5878
>>
>> Compiler WITH FP-contraction patch
>> ----------------------------------
>> 286720
>> 9178782.58444
>
> This looks like a small enough change to me, given the amount of
> precision you're losing. But it'd be better to make sure the result
> has at least some meaning related to the benchmark.
Summing up errors cancels them up: it makes no sense to do this.
You may be missing errors that are an order of magnitude off both directions
and still end up with an overall small delta after reduction.
And by the way, what does it mean "small enough change"?
Based on what metrics?
Is there a way to quantify what is an acceptable error margin?
There is a lot of information that is thrown away with the sum reduction.

Sebastian

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Sep 2016 - a proposed script to help with test-suite programs that output _lots_ of FP numbers

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

[llvm-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

[llvm-dev] [cfe-dev] a proposed script to help with test-suite programs that output _lots_ of FP numbers

Maybe Matching Threads