thr3ads.net - llvm dev - [LLVMdev] Adding ClamAV to the llvm testsuite (long) [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Török Edwin

2007-Dec-14 20:30 UTC

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

Hi,

I see that you are looking for new programs for the testsuite, as
described in 'Compile programs with the LLVM compiler', and 'Adding
programs to the llvm testsuite" on llvm.org/OpenProjects.

My favourite "C source code" is ClamAV (www.clamav.net), and I would
like to get it included in the testsuite.

This mail is kind of long, but please bear with me, as I want to clarify
how to best integrate Clamav into LLVM-testsuite's buildsystem.

Why include it?

It can be useful to find regressions, or new bugs; it already uncovered
a few bugs in llvm's cbe, llc, and optimizers that I've reported through
bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV was
also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)]

It can be useful to test new/existing optimizations. There aren't any
significant differences on its performance when compiled by different
compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in the
future) make it faster ;)

I had a quick look at the build infrastructure, and there are some
issues with getting it to work with programs that use autoconf (such as
ClamAV), since AFAICT testsuites aren't allowed to run configure (listed
below)

Building issues aside there are some more questions:
* ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite?
* what version to use? Latest stable, or latest svn?
[In any case I'll wait till the next stable is published, it should be
happening *very soon*]
* what happens if you find bugs that also cause it to fail under gcc
(unlikely) ? [I would prefer to get an entry on clamav's bugzilla then, 
with something in its subject saying it is llvm-testsuite related]
* what happens if it only fails under llvm-gcc/llc/clang,.. and it is
not due to a bug in llvm, but because of portability issues in the
source code (unlikely)?
I would prefer a clamav bugzilla here too, clamav is meant to be
"portable" :)

Also after I have set it up in the llvm testsuite, is there an easy way
to run clang on it? Currently I have to hack autoconf generated
makefiles if I want to test clang on it.

1. I've manually run, and generated a clamav-config.h.
This usually just contains HAVE_* macros for headers, which should all
be available on a POSIX system, so it shouldn't be a problem from this
perspective for llvm's build farm.
However there are some target specific macros:
#define C_LINUX 1
#define FPU_WORDS_BIGENDIAN 0
#define WORDS_BIGENDIAN 0
Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system
doesn't have a proper <stdint.h>
Also not sure of this:
/* ctime_r takes 2 arguments */
#define HAVE_CTIME_R_2 1

What OS and CPU do the machines on llvm's buildfarm have? We could try a
config.h that works on Linux (or MacOSX), and try to apply
that to all, though there might be (non-obvious) failures.

Any solutions to having these macros defined in the LLVM testsuite
build? (especially for the bigendian macro)

2. AFAICT the llvm-testsuite build doesn't support a program that is
built from multiple subdirectories.
 libclamav has its source split into multiple subdirectories, gathering
those into one also requires changing #include that have relative paths.
I also get files with the same name but from different subdirs, so I
have to rename them to subdir_filename, and do that in #include too.

I have done this manually, and it works (native, llc, cbe work).
I could hack together some perl script to do this automatically, or is
there a better solution?

3. Comparing output: I've written a small script that compares the
--debug output, because it needs some adjustments since I also get
memory addresses in the --debug output that obviously don't match up
between runs.
There isn't anything else to compare besides --debug output (besides
ClamAV saying no virus found), and that can be a fairly good test.

4. What is the input data?
Clamav is fast :)
It needs a lot of input data if you want to get reasonable timings out
of it (tens, hundreds of MB).
Scanning multiple small files will be I/O bound, and it'd be mostly
useless as a benchmark (though still useful for testing
compiler/optimization correctness).

So I was thinking of using some large files already available in the
testsuite (oggenc has one), and then maybe point it to scan the last
*stable* build of LLVM. Or find some files that are scanned slowly, but
that don't presume lots of disk I/O (an archive, with ratio/size limits
disabled, with highly compressable data).
You won't be able to benchmark clamav in a "real world" scenario
though,
since that'd involve making it scanning malware, and I'm sure you
don't
want that on your build farm.

You could give it to scan random data, but you'll need it to be
reproducible, so scanning /dev/random, or /bin of current LLVM tree is
not a good choice ;)

There's also the problem of eliminating the initial disk I/O time out of
the benchmark, like rerun 3 times automatically or something like that?

5. Library dependencies
It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I
can reasonably assume zlib is available on all systems where the
testsuite is run.

6. Sample output on using 126Mb of data as input:

$ make TEST=nightly report
....
Program  | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen |
GCC     CBE     LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA
LLC/LLC-BETA
clamscan | 7.0729 2074308  *           *                *           |  
17.48   17.55   18.81 *        *   | 1.00    0.93    n/a          n/a

7. Clamav is multithreaded
If you're interested in testing if llvm-generated code works when
multithreaded (I don't see why it wouldn't, but we're talking about
a
testsuite), you'd need to start the daemon (as an unprivileged user is
just fine), and then connect to it.
Is it possible to tell the testsuite build system to do this?

8. Code coverage
Testing all of clamav code with llvm is ... problematic. Unless you
create files with every packer/archiver known to clamav it is likely
there will be files that are compiled in but never used during the
testsuite run. You can still test that these files compile, but thats it.

9. Configure tests
Configure has 3 tests that check for gcc bugs known to break ClamAV (2
of which you already have, since those are in gcc's testsuite too). Add
as separate "programs" to run in llvm testsuite?

Thoughts?

Best regards,
Edwin

Evan Cheng

2007-Dec-17 18:52 UTC

head link

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

We always welcome more tests. But it looks like there are two issues  
here.

1. The autoconf requirement. Is it possible to get one configuration  
working without the need for autoconf?
2. GPL license. Chris?

Evan

On Dec 14, 2007, at 12:30 PM, Török Edwin wrote:
> Hi,
>
> I see that you are looking for new programs for the testsuite, as
> described in 'Compile programs with the LLVM compiler', and
'Adding
> programs to the llvm testsuite" on llvm.org/OpenProjects.
>
> My favourite "C source code" is ClamAV (www.clamav.net), and I
would
> like to get it included in the testsuite.
>
> This mail is kind of long, but please bear with me, as I want to  
> clarify
> how to best integrate Clamav into LLVM-testsuite's buildsystem.
>
> Why include it?
>
> It can be useful to find regressions, or new bugs; it already  
> uncovered
> a few bugs in llvm's cbe, llc, and optimizers that I've reported  
> through
> bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV  
> was
> also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)]
>
> It can be useful to test new/existing optimizations. There aren't any
> significant differences on its performance when compiled by different
> compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in  
> the
> future) make it faster ;)
>
> I had a quick look at the build infrastructure, and there are some
> issues with getting it to work with programs that use autoconf (such  
> as
> ClamAV), since AFAICT testsuites aren't allowed to run configure  
> (listed
> below)
>
> Building issues aside there are some more questions:
> * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite?
> * what version to use? Latest stable, or latest svn?
> [In any case I'll wait till the next stable is published, it should be
> happening *very soon*]
> * what happens if you find bugs that also cause it to fail under gcc
> (unlikely) ? [I would prefer to get an entry on clamav's bugzilla  
> then,
> with something in its subject saying it is llvm-testsuite related]
> * what happens if it only fails under llvm-gcc/llc/clang,.. and it is
> not due to a bug in llvm, but because of portability issues in the
> source code (unlikely)?
> I would prefer a clamav bugzilla here too, clamav is meant to be
> "portable" :)
>
> Also after I have set it up in the llvm testsuite, is there an easy  
> way
> to run clang on it? Currently I have to hack autoconf generated
> makefiles if I want to test clang on it.
>
> 1. I've manually run, and generated a clamav-config.h.
> This usually just contains HAVE_* macros for headers, which should all
> be available on a POSIX system, so it shouldn't be a problem from this
> perspective for llvm's build farm.
> However there are some target specific macros:
> #define C_LINUX 1
> #define FPU_WORDS_BIGENDIAN 0
> #define WORDS_BIGENDIAN 0
> Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system
> doesn't have a proper <stdint.h>
> Also not sure of this:
> /* ctime_r takes 2 arguments */
> #define HAVE_CTIME_R_2 1
>
> What OS and CPU do the machines on llvm's buildfarm have? We could  
> try a
> config.h that works on Linux (or MacOSX), and try to apply
> that to all, though there might be (non-obvious) failures.
>
> Any solutions to having these macros defined in the LLVM testsuite
> build? (especially for the bigendian macro)
>
> 2. AFAICT the llvm-testsuite build doesn't support a program that is
> built from multiple subdirectories.
> libclamav has its source split into multiple subdirectories, gathering
> those into one also requires changing #include that have relative  
> paths.
> I also get files with the same name but from different subdirs, so I
> have to rename them to subdir_filename, and do that in #include too.
>
> I have done this manually, and it works (native, llc, cbe work).
> I could hack together some perl script to do this automatically, or is
> there a better solution?
>
> 3. Comparing output: I've written a small script that compares the
> --debug output, because it needs some adjustments since I also get
> memory addresses in the --debug output that obviously don't match up
> between runs.
> There isn't anything else to compare besides --debug output (besides
> ClamAV saying no virus found), and that can be a fairly good test.
>
> 4. What is the input data?
> Clamav is fast :)
> It needs a lot of input data if you want to get reasonable timings out
> of it (tens, hundreds of MB).
> Scanning multiple small files will be I/O bound, and it'd be mostly
> useless as a benchmark (though still useful for testing
> compiler/optimization correctness).
>
> So I was thinking of using some large files already available in the
> testsuite (oggenc has one), and then maybe point it to scan the last
> *stable* build of LLVM. Or find some files that are scanned slowly,  
> but
> that don't presume lots of disk I/O (an archive, with ratio/size  
> limits
> disabled, with highly compressable data).
> You won't be able to benchmark clamav in a "real world"
scenario
> though,
> since that'd involve making it scanning malware, and I'm sure you  
> don't
> want that on your build farm.
>
> You could give it to scan random data, but you'll need it to be
> reproducible, so scanning /dev/random, or /bin of current LLVM tree is
> not a good choice ;)
>
> There's also the problem of eliminating the initial disk I/O time  
> out of
> the benchmark, like rerun 3 times automatically or something like  
> that?
>
> 5. Library dependencies
> It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I
> can reasonably assume zlib is available on all systems where the
> testsuite is run.
>
> 6. Sample output on using 126Mb of data as input:
>
> $ make TEST=nightly report
> ....
> Program  | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen |
> GCC     CBE     LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA
> LLC/LLC-BETA
> clamscan | 7.0729 2074308  *           *                *           |
> 17.48   17.55   18.81 *        *   | 1.00    0.93    n/a          n/a
>
> 7. Clamav is multithreaded
> If you're interested in testing if llvm-generated code works when
> multithreaded (I don't see why it wouldn't, but we're talking
about a
> testsuite), you'd need to start the daemon (as an unprivileged user is
> just fine), and then connect to it.
> Is it possible to tell the testsuite build system to do this?
>
> 8. Code coverage
> Testing all of clamav code with llvm is ... problematic. Unless you
> create files with every packer/archiver known to clamav it is likely
> there will be files that are compiled in but never used during the
> testsuite run. You can still test that these files compile, but  
> thats it.
>
> 9. Configure tests
> Configure has 3 tests that check for gcc bugs known to break ClamAV (2
> of which you already have, since those are in gcc's testsuite too).  
> Add
> as separate "programs" to run in llvm testsuite?
>
> Thoughts?
>
> Best regards,
> Edwin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Török Edwin

2007-Dec-17 19:24 UTC

head link

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

Evan Cheng wrote:> We always welcome more tests. But it looks like there are two issues  
> here.
>
> 1. The autoconf requirement. Is it possible to get one configuration  
> working without the need for autoconf?
>   I could make an clamav-config.h that should work if compiled with llvm-gcc.
Can I assume <endian.h> exists on all your platforms?
[or how else can I detect endianness by using only macros from headers?]

I've seen a Makefile having if $(ENDIAN), can I use that to pass
-DWORDS_BIGENDIAN=... to the compiler?

Or I can create a config.h that assumes the platform is bigendian
(assuming little-endian would SIGBUS on Sparc).

Thoughts?

Thanks,
Edwin

Chris Lattner

2007-Dec-18 01:43 UTC

head link

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

On Mon, 17 Dec 2007, Evan Cheng wrote:> We always welcome more tests. But it looks like there are two issues
> here.
>
> 1. The autoconf requirement. Is it possible to get one configuration
> working without the need for autoconf?
One way to do this is to add a "cut down" version of the app to the
test
suite.
> 2. GPL license. Chris?
Any open source license that allows unrestricted redistribution is fine in 
llvm-test

-Chris
> Evan
>
> On Dec 14, 2007, at 12:30 PM, Török Edwin wrote:
>
>> Hi,
>>
>> I see that you are looking for new programs for the testsuite, as
>> described in 'Compile programs with the LLVM compiler', and
'Adding
>> programs to the llvm testsuite" on llvm.org/OpenProjects.
>>
>> My favourite "C source code" is ClamAV (www.clamav.net), and
I would
>> like to get it included in the testsuite.
>>
>> This mail is kind of long, but please bear with me, as I want to
>> clarify
>> how to best integrate Clamav into LLVM-testsuite's buildsystem.
>>
>> Why include it?
>>
>> It can be useful to find regressions, or new bugs; it already
>> uncovered
>> a few bugs in llvm's cbe, llc, and optimizers that I've
reported
>> through
>> bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV
>> was
>> also the "victim" of a bug in gcc 4.1.0's optimizer [see
9)]
>>
>> It can be useful to test new/existing optimizations. There aren't
any
>> significant differences on its performance when compiled by different
>> compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in
>> the
>> future) make it faster ;)
>>
>> I had a quick look at the build infrastructure, and there are some
>> issues with getting it to work with programs that use autoconf (such
>> as
>> ClamAV), since AFAICT testsuites aren't allowed to run configure
>> (listed
>> below)
>>
>> Building issues aside there are some more questions:
>> * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite?
>> * what version to use? Latest stable, or latest svn?
>> [In any case I'll wait till the next stable is published, it should
be
>> happening *very soon*]
>> * what happens if you find bugs that also cause it to fail under gcc
>> (unlikely) ? [I would prefer to get an entry on clamav's bugzilla
>> then,
>> with something in its subject saying it is llvm-testsuite related]
>> * what happens if it only fails under llvm-gcc/llc/clang,.. and it is
>> not due to a bug in llvm, but because of portability issues in the
>> source code (unlikely)?
>> I would prefer a clamav bugzilla here too, clamav is meant to be
>> "portable" :)
>>
>> Also after I have set it up in the llvm testsuite, is there an easy
>> way
>> to run clang on it? Currently I have to hack autoconf generated
>> makefiles if I want to test clang on it.
>>
>> 1. I've manually run, and generated a clamav-config.h.
>> This usually just contains HAVE_* macros for headers, which should all
>> be available on a POSIX system, so it shouldn't be a problem from
this
>> perspective for llvm's build farm.
>> However there are some target specific macros:
>> #define C_LINUX 1
>> #define FPU_WORDS_BIGENDIAN 0
>> #define WORDS_BIGENDIAN 0
>> Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system
>> doesn't have a proper <stdint.h>
>> Also not sure of this:
>> /* ctime_r takes 2 arguments */
>> #define HAVE_CTIME_R_2 1
>>
>> What OS and CPU do the machines on llvm's buildfarm have? We could
>> try a
>> config.h that works on Linux (or MacOSX), and try to apply
>> that to all, though there might be (non-obvious) failures.
>>
>> Any solutions to having these macros defined in the LLVM testsuite
>> build? (especially for the bigendian macro)
>>
>> 2. AFAICT the llvm-testsuite build doesn't support a program that
is
>> built from multiple subdirectories.
>> libclamav has its source split into multiple subdirectories, gathering
>> those into one also requires changing #include that have relative
>> paths.
>> I also get files with the same name but from different subdirs, so I
>> have to rename them to subdir_filename, and do that in #include too.
>>
>> I have done this manually, and it works (native, llc, cbe work).
>> I could hack together some perl script to do this automatically, or is
>> there a better solution?
>>
>> 3. Comparing output: I've written a small script that compares the
>> --debug output, because it needs some adjustments since I also get
>> memory addresses in the --debug output that obviously don't match
up
>> between runs.
>> There isn't anything else to compare besides --debug output
(besides
>> ClamAV saying no virus found), and that can be a fairly good test.
>>
>> 4. What is the input data?
>> Clamav is fast :)
>> It needs a lot of input data if you want to get reasonable timings out
>> of it (tens, hundreds of MB).
>> Scanning multiple small files will be I/O bound, and it'd be mostly
>> useless as a benchmark (though still useful for testing
>> compiler/optimization correctness).
>>
>> So I was thinking of using some large files already available in the
>> testsuite (oggenc has one), and then maybe point it to scan the last
>> *stable* build of LLVM. Or find some files that are scanned slowly,
>> but
>> that don't presume lots of disk I/O (an archive, with ratio/size
>> limits
>> disabled, with highly compressable data).
>> You won't be able to benchmark clamav in a "real world"
scenario
>> though,
>> since that'd involve making it scanning malware, and I'm sure
you
>> don't
>> want that on your build farm.
>>
>> You could give it to scan random data, but you'll need it to be
>> reproducible, so scanning /dev/random, or /bin of current LLVM tree is
>> not a good choice ;)
>>
>> There's also the problem of eliminating the initial disk I/O time
>> out of
>> the benchmark, like rerun 3 times automatically or something like
>> that?
>>
>> 5. Library dependencies
>> It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I
>> can reasonably assume zlib is available on all systems where the
>> testsuite is run.
>>
>> 6. Sample output on using 126Mb of data as input:
>>
>> $ make TEST=nightly report
>> ....
>> Program  | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen |
>> GCC     CBE     LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA
>> LLC/LLC-BETA
>> clamscan | 7.0729 2074308  *           *                *           |
>> 17.48   17.55   18.81 *        *   | 1.00    0.93    n/a          n/a
>>
>> 7. Clamav is multithreaded
>> If you're interested in testing if llvm-generated code works when
>> multithreaded (I don't see why it wouldn't, but we're
talking about a
>> testsuite), you'd need to start the daemon (as an unprivileged user
is
>> just fine), and then connect to it.
>> Is it possible to tell the testsuite build system to do this?
>>
>> 8. Code coverage
>> Testing all of clamav code with llvm is ... problematic. Unless you
>> create files with every packer/archiver known to clamav it is likely
>> there will be files that are compiled in but never used during the
>> testsuite run. You can still test that these files compile, but
>> thats it.
>>
>> 9. Configure tests
>> Configure has 3 tests that check for gcc bugs known to break ClamAV (2
>> of which you already have, since those are in gcc's testsuite too).
>> Add
>> as separate "programs" to run in llvm testsuite?
>>
>> Thoughts?
>>
>> Best regards,
>> Edwin
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Dec 2007 - [LLVMdev] Adding ClamAV to the llvm testsuite (long)

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

[LLVMdev] Adding ClamAV to the llvm testsuite (long)

Possibly Parallel Threads