thr3ads.net - llvm dev - [LLVMdev] YA Vectorization Benchmark [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2012-Nov-06 21:53 UTC

[LLVMdev] YA Vectorization Benchmark

Ok, I got the benchmark to work on test-suite, but it's not printing
details for each run (or execution wouldn't work). I had to comment
out the printf lines, but nothing more than that.

I'm not sure how individual timings would have to be extracted, but
the program produces output via text file, which can be used for
comparison. Also, it does check the results and does report if they
were as expected (not sure yet how that's calculated in detail).
Nevertheless, should be good to have this test, at least to make sure
we're not breaking floating point loops with vectorization in the
future.

Attached is a tar ball with the contents of LivermoreLoops to be
included inside test-suite/SingleSource/Benchmarks. Daniel, can I just
add this to the SVN repository, or are there other things that need to
be done as well? It might need some care to fully use the testing
infrastructure, though.

cheers,
--renato

On 5 November 2012 22:12, Nadav Rotem <nrotem at apple.com>
wrote:> That would be great!
>
> On Nov 5, 2012, at 2:11 PM, Renato Golin <rengolin at systemcall.org>
wrote:
>
>> On 5 November 2012 17:41, Nadav Rotem <nrotem at apple.com>
wrote:
>>> 1. We do not allow reductions on floating point types.  We should
allow them when unsafe-math is used.
>>> 2. All of the arrays are located in a struct. At the moment we
don't detect that these arrays are disjoin, and this prevents vectorization.
>>
>> Indeed, they look like simple changes. If no one is dying to get them
>> working, I suggest I try these first.
>>
>> I'll first get the tests running in the test-suite, than I'll
try to
>> vectorize them.
>>
>> --
>> cheers,
>> --renato
>>
>> http://systemcall.org/
>

-- 
cheers,
--renato

http://systemcall.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LivermoreLoops.tar.gz
Type: application/x-gzip
Size: 17555 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121106/59966d20/attachment.bin>

Daniel Dunbar

2012-Nov-06 22:28 UTC

head link

[LLVMdev] YA Vectorization Benchmark

Hey Renato,

Cool, glad you got it working.

There is very primitive support for tests that generate multiple output
results, but I would rather not use those facilities.

Is it possible instead to refactor the tests so that each binary
corresponds to one test? For example, look at how Hal went about
integrating TSVC:

http://llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/TSVC/
It isn't particularly pretty, but it fits well with the other parts of the
test suite infrastructure, and probably works out nicer in practice when
tests fail (i.e., you don't want to be staring at a broken bitcode with 24
kernels in one function).

Other things that I would *like* before integrating it:
 - Rip out the CPU ID stuff, this isn't useful and adds messiness.
 - Have the test just produce output that can be compared, instead of
including its own check routines
 - Have the tests run for fixed iterations, instead of doing their own
adaptive run
 - Produce reference output files, so it works with USE_REFERENCE_OUTPUT=1

The kernels themselves are really trivial, so it would be ideal if it was
split up to be one-test-per file with minimal other stuff in the test other
than setup and output.

 - Daniel


On Tue, Nov 6, 2012 at 1:53 PM, Renato Golin <rengolin at
systemcall.org>wrote:
> Ok, I got the benchmark to work on test-suite, but it's not printing
> details for each run (or execution wouldn't work). I had to comment
> out the printf lines, but nothing more than that.
>
> I'm not sure how individual timings would have to be extracted, but
> the program produces output via text file, which can be used for
> comparison. Also, it does check the results and does report if they
> were as expected (not sure yet how that's calculated in detail).
> Nevertheless, should be good to have this test, at least to make sure
> we're not breaking floating point loops with vectorization in the
> future.
>
> Attached is a tar ball with the contents of LivermoreLoops to be
> included inside test-suite/SingleSource/Benchmarks. Daniel, can I just
> add this to the SVN repository, or are there other things that need to
> be done as well? It might need some care to fully use the testing
> infrastructure, though.
>
> cheers,
> --renato
>
>
> On 5 November 2012 22:12, Nadav Rotem <nrotem at apple.com> wrote:
> > That would be great!
> >
> > On Nov 5, 2012, at 2:11 PM, Renato Golin <rengolin at
systemcall.org>
> wrote:
> >
> >> On 5 November 2012 17:41, Nadav Rotem <nrotem at apple.com>
wrote:
> >>> 1. We do not allow reductions on floating point types.  We
should
> allow them when unsafe-math is used.
> >>> 2. All of the arrays are located in a struct. At the moment we
don't
> detect that these arrays are disjoin, and this prevents vectorization.
> >>
> >> Indeed, they look like simple changes. If no one is dying to get
them
> >> working, I suggest I try these first.
> >>
> >> I'll first get the tests running in the test-suite, than
I'll try to
> >> vectorize them.
> >>
> >> --
> >> cheers,
> >> --renato
> >>
> >> http://systemcall.org/
> >
>
>
>
> --
> cheers,
> --renato
>
> http://systemcall.org/
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121106/f879e24b/attachment.html>

Renato Golin

2012-Nov-07 08:39 UTC

head link

[LLVMdev] YA Vectorization Benchmark

On 6 November 2012 22:28, Daniel Dunbar <daniel at zuster.org>
wrote:> Is it possible instead to refactor the tests so that each binary
corresponds
> to one test? For example, look at how Hal went about integrating TSVC:
It should be possible. I'll have to understand better what the
preamble does to make sure I'm not stripping out important stuff, but
also what to copy to each kernel's initialization.

Also, I don't know how the timing functions perform across platforms.
I'd have to implement a decent enough timing system, platform
independent, to factor out the initialization step.

> Other things that I would *like* before integrating it:
>  - Rip out the CPU ID stuff, this isn't useful and adds messiness.
Absolutely, that is meaningless.

>  - Have the test just produce output that can be compared, instead of
> including its own check routines
I can make it print numbers in order, is that good enough for the
comparison routines?

If I got it right, the tests self-validates the results, so at least
we know it executed correctly in the end. I can make it produce "OK"
and "FAIL" either way with some numbers stating the timing.

>  - Have the tests run for fixed iterations, instead of doing their own
> adaptive run
Yes, that's rubbish. That was needed to compare the results based on
CPU specific features, but we don't need that.

>  - Produce reference output files, so it works with USE_REFERENCE_OUTPUT=1
Is this a simple diff or do you compare the numerical results by value+-stdev?


cheers,
--renato

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Nov 2012 - [LLVMdev] YA Vectorization Benchmark

[LLVMdev] YA Vectorization Benchmark

[LLVMdev] YA Vectorization Benchmark

[LLVMdev] YA Vectorization Benchmark

Maybe Matching Threads