thr3ads.net - llvm dev - [llvm-dev] The AnghaBench collection of compilable programs [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Fernando Magno Quintao Pereira via llvm-dev

2020-Feb-22 20:30 UTC

[llvm-dev] The AnghaBench collection of compilable programs

Hi Florian,

    we though about using UIUC, like in LLVM. Do you guys know if that
could be a problem, given that we are mining the functions from
github?
>  Have you thought about integrating the benchmarks as external tests into
LLVM’s test-suite? That would make it very easy to play around with.
We did not think about it actually. But we would be happy to do it, if
the community accepts it.

Regards,

Fernando

On Sat, Feb 22, 2020 at 5:16 PM Florian Hahn <florian_hahn at apple.com>
wrote:>
> Hi Fernando,
>
> That sounds like a very useful resource to improve testing and also get
easier access to good stress tests (e.gQuite a few very large functions have
proven to surface compile time problems in some backend passes).
>
> From a quick look on the website I couldn’t find under which license the
code is published. That may be a problem for some users.
>
> Have you thought about integrating the benchmarks as external tests into
LLVM’s test-suite? That would make it very easy to play around with.
>
> Cheers,
> Florian
>
> > On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> >
> > Dear LLVMers,
> >
> >    we, at UFMG, have been building a large collection of compilable
> > benchmarks. Today, we have one million C files, mined from open-source
> > repositories, that compile into LLVM bytecodes (and from there to
> > object files). To ensure compilation, we perform type inference on the
> > C programs. Type inference lets us replace missing dependencies.
> >
> > The benchmarks are available at: http://cuda.dcc.ufmg.br/angha/
> >
> > We have a technical report describing the construction of this
> > collection:
http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
> >
> > Many things can be done with so many LLVM bytecodes. A few examples
> > follow below:
> >
> > * We can autotune compilers. We have trained YaCoS, a tool used to
> > find good optimization sequences. The objective function is code size.
> > We find the best optimization sequence for each program in the
> > database. To compile an unknown program, we get the program in the
> > database that is the closest, and apply the same optimization
> > sequence. Results are good: we can improve on clang -Oz by almost 10%
> > in MiBench, for instance.
> >
> > * We can perform many types of explorations on real-world code. For
> > instance, we have found that 95.4% of all the interference graphs of
> > these programs, even in machine code (no phi-functions and lots of
> > pre-colored registers), are chordal.
> >
> > * We can check how well different tools are doing on real-world code.
> > For instance, we can use these benchmarks to check how many programs
> > can be analyzed by Ultimate Buchi Automizer
> >
(https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
> > This is a tool that tries to prove termination or infinite execution
> > for some programs.
> >
> > * We can check how many programs can be compiled by different
> > high-level synthesis tools into FPGAs. We have tried LegUp and Vivado,
> > for instance.
> >
> > * Our webpage contains a search box, so that you can get the closest
> > programs to a given input program. Currently, we measure program
> > distance as the Euclidian distance on Namolaru feature vectors.
> >
> > We do not currently provide inputs for those programs. It's
possible
> > to execute the so called "leaf-functions", e.g., functions
that do not
> > call other routines. We have thousands of them. However, we do not
> > guarantee the absence of undefined behavior during the execution.
> >
> > Regards,
> >
> > Fernando
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Florian Hahn via llvm-dev

2020-Feb-22 20:53 UTC

head link

[llvm-dev] The AnghaBench collection of compilable programs

> On 22 Feb 2020, at 20:30, Fernando Magno Quintao Pereira <pronesto at
gmail.com> wrote:
> 
> Hi Florian,
> 
>    we though about using UIUC, like in LLVM. Do you guys know if that
> could be a problem, given that we are mining the functions from
> github?
If I understand your approach directly, I think the question will be quite
tricky to answer.  I am not a lawyer and cannot help there, sorry!
> 
>> Have you thought about integrating the benchmarks as external tests
into LLVM’s test-suite? That would make it very easy to play around with.
> 
> We did not think about it actually. But we would be happy to do it, if
> the community accepts it.
IIUC the mined benchmarks would fit quite well and should not be too hard to
integrate (as external). But it would probably be good to have the license
question answered, otherwise that might limit its practical usefulness.

Cheers,
Florian

Chris Lattner via llvm-dev

2020-Feb-28 05:21 UTC

head link

[llvm-dev] The AnghaBench collection of compilable programs

Hi Fernando,

My understanding is that LLVM’s test-suite is under a weird mix of different
licenses.  So long as you preserve the original licenses (and only include ones
with reasonable licenses), it should be possible I think.

-Chris
> On Feb 22, 2020, at 12:30 PM, Fernando Magno Quintao Pereira via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> 
> Hi Florian,
> 
>    we though about using UIUC, like in LLVM. Do you guys know if that
> could be a problem, given that we are mining the functions from
> github?
> 
>> Have you thought about integrating the benchmarks as external tests
into LLVM’s test-suite? That would make it very easy to play around with.
> 
> We did not think about it actually. But we would be happy to do it, if
> the community accepts it.
> 
> Regards,
> 
> Fernando
> 
> On Sat, Feb 22, 2020 at 5:16 PM Florian Hahn <florian_hahn at
apple.com> wrote:
>> 
>> Hi Fernando,
>> 
>> That sounds like a very useful resource to improve testing and also get
easier access to good stress tests (e.gQuite a few very large functions have
proven to surface compile time problems in some backend passes).
>> 
>> From a quick look on the website I couldn’t find under which license
the code is published. That may be a problem for some users.
>> 
>> Have you thought about integrating the benchmarks as external tests
into LLVM’s test-suite? That would make it very easy to play around with.
>> 
>> Cheers,
>> Florian
>> 
>>> On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via
llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Dear LLVMers,
>>> 
>>>   we, at UFMG, have been building a large collection of compilable
>>> benchmarks. Today, we have one million C files, mined from
open-source
>>> repositories, that compile into LLVM bytecodes (and from there to
>>> object files). To ensure compilation, we perform type inference on
the
>>> C programs. Type inference lets us replace missing dependencies.
>>> 
>>> The benchmarks are available at: http://cuda.dcc.ufmg.br/angha/
>>> 
>>> We have a technical report describing the construction of this
>>> collection:
http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
>>> 
>>> Many things can be done with so many LLVM bytecodes. A few examples
>>> follow below:
>>> 
>>> * We can autotune compilers. We have trained YaCoS, a tool used to
>>> find good optimization sequences. The objective function is code
size.
>>> We find the best optimization sequence for each program in the
>>> database. To compile an unknown program, we get the program in the
>>> database that is the closest, and apply the same optimization
>>> sequence. Results are good: we can improve on clang -Oz by almost
10%
>>> in MiBench, for instance.
>>> 
>>> * We can perform many types of explorations on real-world code. For
>>> instance, we have found that 95.4% of all the interference graphs
of
>>> these programs, even in machine code (no phi-functions and lots of
>>> pre-colored registers), are chordal.
>>> 
>>> * We can check how well different tools are doing on real-world
code.
>>> For instance, we can use these benchmarks to check how many
programs
>>> can be analyzed by Ultimate Buchi Automizer
>>>
(https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
>>> This is a tool that tries to prove termination or infinite
execution
>>> for some programs.
>>> 
>>> * We can check how many programs can be compiled by different
>>> high-level synthesis tools into FPGAs. We have tried LegUp and
Vivado,
>>> for instance.
>>> 
>>> * Our webpage contains a search box, so that you can get the
closest
>>> programs to a given input program. Currently, we measure program
>>> distance as the Euclidian distance on Namolaru feature vectors.
>>> 
>>> We do not currently provide inputs for those programs. It's
possible
>>> to execute the so called "leaf-functions", e.g.,
functions that do not
>>> call other routines. We have thousands of them. However, we do not
>>> guarantee the absence of undefined behavior during the execution.
>>> 
>>> Regards,
>>> 
>>> Fernando
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Fernando Magno Quintao Pereira via llvm-dev

2020-Feb-28 11:21 UTC

head link

[llvm-dev] The AnghaBench collection of compilable programs

Thank you for the feedback, Chris and Florian. We will start updating
the benchmarks with the licenses from the original repositories where
they came from. Once we update the individual benchmarks, we will try
to make them available as an external test in LLVM.

Regards,

Fernando

On Fri, Feb 28, 2020 at 2:21 AM Chris Lattner <clattner at nondot.org>
wrote:>
> Hi Fernando,
>
> My understanding is that LLVM’s test-suite is under a weird mix of
different licenses.  So long as you preserve the original licenses (and only
include ones with reasonable licenses), it should be possible I think.
>
> -Chris
>
> > On Feb 22, 2020, at 12:30 PM, Fernando Magno Quintao Pereira via
llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi Florian,
> >
> >    we though about using UIUC, like in LLVM. Do you guys know if that
> > could be a problem, given that we are mining the functions from
> > github?
> >
> >> Have you thought about integrating the benchmarks as external
tests into LLVM’s test-suite? That would make it very easy to play around with.
> >
> > We did not think about it actually. But we would be happy to do it, if
> > the community accepts it.
> >
> > Regards,
> >
> > Fernando
> >
> > On Sat, Feb 22, 2020 at 5:16 PM Florian Hahn <florian_hahn at
apple.com> wrote:
> >>
> >> Hi Fernando,
> >>
> >> That sounds like a very useful resource to improve testing and
also get easier access to good stress tests (e.gQuite a few very large functions
have proven to surface compile time problems in some backend passes).
> >>
> >> From a quick look on the website I couldn’t find under which
license the code is published. That may be a problem for some users.
> >>
> >> Have you thought about integrating the benchmarks as external
tests into LLVM’s test-suite? That would make it very easy to play around with.
> >>
> >> Cheers,
> >> Florian
> >>
> >>> On 22 Feb 2020, at 14:56, Fernando Magno Quintao Pereira via
llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Dear LLVMers,
> >>>
> >>>   we, at UFMG, have been building a large collection of
compilable
> >>> benchmarks. Today, we have one million C files, mined from
open-source
> >>> repositories, that compile into LLVM bytecodes (and from there
to
> >>> object files). To ensure compilation, we perform type
inference on the
> >>> C programs. Type inference lets us replace missing
dependencies.
> >>>
> >>> The benchmarks are available at:
http://cuda.dcc.ufmg.br/angha/
> >>>
> >>> We have a technical report describing the construction of this
> >>> collection:
http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf
> >>>
> >>> Many things can be done with so many LLVM bytecodes. A few
examples
> >>> follow below:
> >>>
> >>> * We can autotune compilers. We have trained YaCoS, a tool
used to
> >>> find good optimization sequences. The objective function is
code size.
> >>> We find the best optimization sequence for each program in the
> >>> database. To compile an unknown program, we get the program in
the
> >>> database that is the closest, and apply the same optimization
> >>> sequence. Results are good: we can improve on clang -Oz by
almost 10%
> >>> in MiBench, for instance.
> >>>
> >>> * We can perform many types of explorations on real-world
code. For
> >>> instance, we have found that 95.4% of all the interference
graphs of
> >>> these programs, even in machine code (no phi-functions and
lots of
> >>> pre-colored registers), are chordal.
> >>>
> >>> * We can check how well different tools are doing on
real-world code.
> >>> For instance, we can use these benchmarks to check how many
programs
> >>> can be analyzed by Ultimate Buchi Automizer
> >>>
(https://ultimate.informatik.uni-freiburg.de/downloads/BuchiAutomizer/).
> >>> This is a tool that tries to prove termination or infinite
execution
> >>> for some programs.
> >>>
> >>> * We can check how many programs can be compiled by different
> >>> high-level synthesis tools into FPGAs. We have tried LegUp and
Vivado,
> >>> for instance.
> >>>
> >>> * Our webpage contains a search box, so that you can get the
closest
> >>> programs to a given input program. Currently, we measure
program
> >>> distance as the Euclidian distance on Namolaru feature
vectors.
> >>>
> >>> We do not currently provide inputs for those programs.
It's possible
> >>> to execute the so called "leaf-functions", e.g.,
functions that do not
> >>> call other routines. We have thousands of them. However, we do
not
> >>> guarantee the absence of undefined behavior during the
execution.
> >>>
> >>> Regards,
> >>>
> >>> Fernando
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Feb 2020 - The AnghaBench collection of compilable programs

[llvm-dev] The AnghaBench collection of compilable programs

[llvm-dev] The AnghaBench collection of compilable programs

[llvm-dev] The AnghaBench collection of compilable programs

[llvm-dev] The AnghaBench collection of compilable programs