Hi, I see that you are looking for new programs for the testsuite, as described in 'Compile programs with the LLVM compiler', and 'Adding programs to the llvm testsuite" on llvm.org/OpenProjects. My favourite "C source code" is ClamAV (www.clamav.net), and I would like to get it included in the testsuite. This mail is kind of long, but please bear with me, as I want to clarify how to best integrate Clamav into LLVM-testsuite's buildsystem. Why include it? It can be useful to find regressions, or new bugs; it already uncovered a few bugs in llvm's cbe, llc, and optimizers that I've reported through bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV was also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)] It can be useful to test new/existing optimizations. There aren't any significant differences on its performance when compiled by different compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in the future) make it faster ;) I had a quick look at the build infrastructure, and there are some issues with getting it to work with programs that use autoconf (such as ClamAV), since AFAICT testsuites aren't allowed to run configure (listed below) Building issues aside there are some more questions: * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite? * what version to use? Latest stable, or latest svn? [In any case I'll wait till the next stable is published, it should be happening *very soon*] * what happens if you find bugs that also cause it to fail under gcc (unlikely) ? [I would prefer to get an entry on clamav's bugzilla then, with something in its subject saying it is llvm-testsuite related] * what happens if it only fails under llvm-gcc/llc/clang,.. and it is not due to a bug in llvm, but because of portability issues in the source code (unlikely)? I would prefer a clamav bugzilla here too, clamav is meant to be "portable" :) Also after I have set it up in the llvm testsuite, is there an easy way to run clang on it? Currently I have to hack autoconf generated makefiles if I want to test clang on it. 1. I've manually run, and generated a clamav-config.h. This usually just contains HAVE_* macros for headers, which should all be available on a POSIX system, so it shouldn't be a problem from this perspective for llvm's build farm. However there are some target specific macros: #define C_LINUX 1 #define FPU_WORDS_BIGENDIAN 0 #define WORDS_BIGENDIAN 0 Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system doesn't have a proper <stdint.h> Also not sure of this: /* ctime_r takes 2 arguments */ #define HAVE_CTIME_R_2 1 What OS and CPU do the machines on llvm's buildfarm have? We could try a config.h that works on Linux (or MacOSX), and try to apply that to all, though there might be (non-obvious) failures. Any solutions to having these macros defined in the LLVM testsuite build? (especially for the bigendian macro) 2. AFAICT the llvm-testsuite build doesn't support a program that is built from multiple subdirectories. libclamav has its source split into multiple subdirectories, gathering those into one also requires changing #include that have relative paths. I also get files with the same name but from different subdirs, so I have to rename them to subdir_filename, and do that in #include too. I have done this manually, and it works (native, llc, cbe work). I could hack together some perl script to do this automatically, or is there a better solution? 3. Comparing output: I've written a small script that compares the --debug output, because it needs some adjustments since I also get memory addresses in the --debug output that obviously don't match up between runs. There isn't anything else to compare besides --debug output (besides ClamAV saying no virus found), and that can be a fairly good test. 4. What is the input data? Clamav is fast :) It needs a lot of input data if you want to get reasonable timings out of it (tens, hundreds of MB). Scanning multiple small files will be I/O bound, and it'd be mostly useless as a benchmark (though still useful for testing compiler/optimization correctness). So I was thinking of using some large files already available in the testsuite (oggenc has one), and then maybe point it to scan the last *stable* build of LLVM. Or find some files that are scanned slowly, but that don't presume lots of disk I/O (an archive, with ratio/size limits disabled, with highly compressable data). You won't be able to benchmark clamav in a "real world" scenario though, since that'd involve making it scanning malware, and I'm sure you don't want that on your build farm. You could give it to scan random data, but you'll need it to be reproducible, so scanning /dev/random, or /bin of current LLVM tree is not a good choice ;) There's also the problem of eliminating the initial disk I/O time out of the benchmark, like rerun 3 times automatically or something like that? 5. Library dependencies It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I can reasonably assume zlib is available on all systems where the testsuite is run. 6. Sample output on using 126Mb of data as input: $ make TEST=nightly report .... Program | GCCAS Bytecode LLC compile LLC-BETA compile JIT codegen | GCC CBE LLC LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA LLC/LLC-BETA clamscan | 7.0729 2074308 * * * | 17.48 17.55 18.81 * * | 1.00 0.93 n/a n/a 7. Clamav is multithreaded If you're interested in testing if llvm-generated code works when multithreaded (I don't see why it wouldn't, but we're talking about a testsuite), you'd need to start the daemon (as an unprivileged user is just fine), and then connect to it. Is it possible to tell the testsuite build system to do this? 8. Code coverage Testing all of clamav code with llvm is ... problematic. Unless you create files with every packer/archiver known to clamav it is likely there will be files that are compiled in but never used during the testsuite run. You can still test that these files compile, but thats it. 9. Configure tests Configure has 3 tests that check for gcc bugs known to break ClamAV (2 of which you already have, since those are in gcc's testsuite too). Add as separate "programs" to run in llvm testsuite? Thoughts? Best regards, Edwin
We always welcome more tests. But it looks like there are two issues here. 1. The autoconf requirement. Is it possible to get one configuration working without the need for autoconf? 2. GPL license. Chris? Evan On Dec 14, 2007, at 12:30 PM, Török Edwin wrote:> Hi, > > I see that you are looking for new programs for the testsuite, as > described in 'Compile programs with the LLVM compiler', and 'Adding > programs to the llvm testsuite" on llvm.org/OpenProjects. > > My favourite "C source code" is ClamAV (www.clamav.net), and I would > like to get it included in the testsuite. > > This mail is kind of long, but please bear with me, as I want to > clarify > how to best integrate Clamav into LLVM-testsuite's buildsystem. > > Why include it? > > It can be useful to find regressions, or new bugs; it already > uncovered > a few bugs in llvm's cbe, llc, and optimizers that I've reported > through > bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV > was > also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)] > > It can be useful to test new/existing optimizations. There aren't any > significant differences on its performance when compiled by different > compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in > the > future) make it faster ;) > > I had a quick look at the build infrastructure, and there are some > issues with getting it to work with programs that use autoconf (such > as > ClamAV), since AFAICT testsuites aren't allowed to run configure > (listed > below) > > Building issues aside there are some more questions: > * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite? > * what version to use? Latest stable, or latest svn? > [In any case I'll wait till the next stable is published, it should be > happening *very soon*] > * what happens if you find bugs that also cause it to fail under gcc > (unlikely) ? [I would prefer to get an entry on clamav's bugzilla > then, > with something in its subject saying it is llvm-testsuite related] > * what happens if it only fails under llvm-gcc/llc/clang,.. and it is > not due to a bug in llvm, but because of portability issues in the > source code (unlikely)? > I would prefer a clamav bugzilla here too, clamav is meant to be > "portable" :) > > Also after I have set it up in the llvm testsuite, is there an easy > way > to run clang on it? Currently I have to hack autoconf generated > makefiles if I want to test clang on it. > > 1. I've manually run, and generated a clamav-config.h. > This usually just contains HAVE_* macros for headers, which should all > be available on a POSIX system, so it shouldn't be a problem from this > perspective for llvm's build farm. > However there are some target specific macros: > #define C_LINUX 1 > #define FPU_WORDS_BIGENDIAN 0 > #define WORDS_BIGENDIAN 0 > Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system > doesn't have a proper <stdint.h> > Also not sure of this: > /* ctime_r takes 2 arguments */ > #define HAVE_CTIME_R_2 1 > > What OS and CPU do the machines on llvm's buildfarm have? We could > try a > config.h that works on Linux (or MacOSX), and try to apply > that to all, though there might be (non-obvious) failures. > > Any solutions to having these macros defined in the LLVM testsuite > build? (especially for the bigendian macro) > > 2. AFAICT the llvm-testsuite build doesn't support a program that is > built from multiple subdirectories. > libclamav has its source split into multiple subdirectories, gathering > those into one also requires changing #include that have relative > paths. > I also get files with the same name but from different subdirs, so I > have to rename them to subdir_filename, and do that in #include too. > > I have done this manually, and it works (native, llc, cbe work). > I could hack together some perl script to do this automatically, or is > there a better solution? > > 3. Comparing output: I've written a small script that compares the > --debug output, because it needs some adjustments since I also get > memory addresses in the --debug output that obviously don't match up > between runs. > There isn't anything else to compare besides --debug output (besides > ClamAV saying no virus found), and that can be a fairly good test. > > 4. What is the input data? > Clamav is fast :) > It needs a lot of input data if you want to get reasonable timings out > of it (tens, hundreds of MB). > Scanning multiple small files will be I/O bound, and it'd be mostly > useless as a benchmark (though still useful for testing > compiler/optimization correctness). > > So I was thinking of using some large files already available in the > testsuite (oggenc has one), and then maybe point it to scan the last > *stable* build of LLVM. Or find some files that are scanned slowly, > but > that don't presume lots of disk I/O (an archive, with ratio/size > limits > disabled, with highly compressable data). > You won't be able to benchmark clamav in a "real world" scenario > though, > since that'd involve making it scanning malware, and I'm sure you > don't > want that on your build farm. > > You could give it to scan random data, but you'll need it to be > reproducible, so scanning /dev/random, or /bin of current LLVM tree is > not a good choice ;) > > There's also the problem of eliminating the initial disk I/O time > out of > the benchmark, like rerun 3 times automatically or something like > that? > > 5. Library dependencies > It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I > can reasonably assume zlib is available on all systems where the > testsuite is run. > > 6. Sample output on using 126Mb of data as input: > > $ make TEST=nightly report > .... > Program | GCCAS Bytecode LLC compile LLC-BETA compile JIT codegen | > GCC CBE LLC LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA > LLC/LLC-BETA > clamscan | 7.0729 2074308 * * * | > 17.48 17.55 18.81 * * | 1.00 0.93 n/a n/a > > 7. Clamav is multithreaded > If you're interested in testing if llvm-generated code works when > multithreaded (I don't see why it wouldn't, but we're talking about a > testsuite), you'd need to start the daemon (as an unprivileged user is > just fine), and then connect to it. > Is it possible to tell the testsuite build system to do this? > > 8. Code coverage > Testing all of clamav code with llvm is ... problematic. Unless you > create files with every packer/archiver known to clamav it is likely > there will be files that are compiled in but never used during the > testsuite run. You can still test that these files compile, but > thats it. > > 9. Configure tests > Configure has 3 tests that check for gcc bugs known to break ClamAV (2 > of which you already have, since those are in gcc's testsuite too). > Add > as separate "programs" to run in llvm testsuite? > > Thoughts? > > Best regards, > Edwin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Evan Cheng wrote:> We always welcome more tests. But it looks like there are two issues > here. > > 1. The autoconf requirement. Is it possible to get one configuration > working without the need for autoconf? >I could make an clamav-config.h that should work if compiled with llvm-gcc. Can I assume <endian.h> exists on all your platforms? [or how else can I detect endianness by using only macros from headers?] I've seen a Makefile having if $(ENDIAN), can I use that to pass -DWORDS_BIGENDIAN=... to the compiler? Or I can create a config.h that assumes the platform is bigendian (assuming little-endian would SIGBUS on Sparc). Thoughts? Thanks, Edwin
On Mon, 17 Dec 2007, Evan Cheng wrote:> We always welcome more tests. But it looks like there are two issues > here. > > 1. The autoconf requirement. Is it possible to get one configuration > working without the need for autoconf?One way to do this is to add a "cut down" version of the app to the test suite.> 2. GPL license. Chris?Any open source license that allows unrestricted redistribution is fine in llvm-test -Chris> Evan > > On Dec 14, 2007, at 12:30 PM, Török Edwin wrote: > >> Hi, >> >> I see that you are looking for new programs for the testsuite, as >> described in 'Compile programs with the LLVM compiler', and 'Adding >> programs to the llvm testsuite" on llvm.org/OpenProjects. >> >> My favourite "C source code" is ClamAV (www.clamav.net), and I would >> like to get it included in the testsuite. >> >> This mail is kind of long, but please bear with me, as I want to >> clarify >> how to best integrate Clamav into LLVM-testsuite's buildsystem. >> >> Why include it? >> >> It can be useful to find regressions, or new bugs; it already >> uncovered >> a few bugs in llvm's cbe, llc, and optimizers that I've reported >> through >> bugzilla (and they've been mostly fixed very fast! Thanks!). ClamAV >> was >> also the "victim" of a bug in gcc 4.1.0's optimizer [see 9)] >> >> It can be useful to test new/existing optimizations. There aren't any >> significant differences on its performance when compiled by different >> compilers (gcc, icc, llvm-gcc), so I hope LLVM's optimizers can (in >> the >> future) make it faster ;) >> >> I had a quick look at the build infrastructure, and there are some >> issues with getting it to work with programs that use autoconf (such >> as >> ClamAV), since AFAICT testsuites aren't allowed to run configure >> (listed >> below) >> >> Building issues aside there are some more questions: >> * ClamAV is GPL (but it includes BSD, LGPL parts), ok for testsuite? >> * what version to use? Latest stable, or latest svn? >> [In any case I'll wait till the next stable is published, it should be >> happening *very soon*] >> * what happens if you find bugs that also cause it to fail under gcc >> (unlikely) ? [I would prefer to get an entry on clamav's bugzilla >> then, >> with something in its subject saying it is llvm-testsuite related] >> * what happens if it only fails under llvm-gcc/llc/clang,.. and it is >> not due to a bug in llvm, but because of portability issues in the >> source code (unlikely)? >> I would prefer a clamav bugzilla here too, clamav is meant to be >> "portable" :) >> >> Also after I have set it up in the llvm testsuite, is there an easy >> way >> to run clang on it? Currently I have to hack autoconf generated >> makefiles if I want to test clang on it. >> >> 1. I've manually run, and generated a clamav-config.h. >> This usually just contains HAVE_* macros for headers, which should all >> be available on a POSIX system, so it shouldn't be a problem from this >> perspective for llvm's build farm. >> However there are some target specific macros: >> #define C_LINUX 1 >> #define FPU_WORDS_BIGENDIAN 0 >> #define WORDS_BIGENDIAN 0 >> Also SIZEOF_INT, SIZEOF_LONG,... but they are only used if the system >> doesn't have a proper <stdint.h> >> Also not sure of this: >> /* ctime_r takes 2 arguments */ >> #define HAVE_CTIME_R_2 1 >> >> What OS and CPU do the machines on llvm's buildfarm have? We could >> try a >> config.h that works on Linux (or MacOSX), and try to apply >> that to all, though there might be (non-obvious) failures. >> >> Any solutions to having these macros defined in the LLVM testsuite >> build? (especially for the bigendian macro) >> >> 2. AFAICT the llvm-testsuite build doesn't support a program that is >> built from multiple subdirectories. >> libclamav has its source split into multiple subdirectories, gathering >> those into one also requires changing #include that have relative >> paths. >> I also get files with the same name but from different subdirs, so I >> have to rename them to subdir_filename, and do that in #include too. >> >> I have done this manually, and it works (native, llc, cbe work). >> I could hack together some perl script to do this automatically, or is >> there a better solution? >> >> 3. Comparing output: I've written a small script that compares the >> --debug output, because it needs some adjustments since I also get >> memory addresses in the --debug output that obviously don't match up >> between runs. >> There isn't anything else to compare besides --debug output (besides >> ClamAV saying no virus found), and that can be a fairly good test. >> >> 4. What is the input data? >> Clamav is fast :) >> It needs a lot of input data if you want to get reasonable timings out >> of it (tens, hundreds of MB). >> Scanning multiple small files will be I/O bound, and it'd be mostly >> useless as a benchmark (though still useful for testing >> compiler/optimization correctness). >> >> So I was thinking of using some large files already available in the >> testsuite (oggenc has one), and then maybe point it to scan the last >> *stable* build of LLVM. Or find some files that are scanned slowly, >> but >> that don't presume lots of disk I/O (an archive, with ratio/size >> limits >> disabled, with highly compressable data). >> You won't be able to benchmark clamav in a "real world" scenario >> though, >> since that'd involve making it scanning malware, and I'm sure you >> don't >> want that on your build farm. >> >> You could give it to scan random data, but you'll need it to be >> reproducible, so scanning /dev/random, or /bin of current LLVM tree is >> not a good choice ;) >> >> There's also the problem of eliminating the initial disk I/O time >> out of >> the benchmark, like rerun 3 times automatically or something like >> that? >> >> 5. Library dependencies >> It needs zlib, all the rest is optional (bzip2, gmp, ....). I think I >> can reasonably assume zlib is available on all systems where the >> testsuite is run. >> >> 6. Sample output on using 126Mb of data as input: >> >> $ make TEST=nightly report >> .... >> Program | GCCAS Bytecode LLC compile LLC-BETA compile JIT codegen | >> GCC CBE LLC LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA >> LLC/LLC-BETA >> clamscan | 7.0729 2074308 * * * | >> 17.48 17.55 18.81 * * | 1.00 0.93 n/a n/a >> >> 7. Clamav is multithreaded >> If you're interested in testing if llvm-generated code works when >> multithreaded (I don't see why it wouldn't, but we're talking about a >> testsuite), you'd need to start the daemon (as an unprivileged user is >> just fine), and then connect to it. >> Is it possible to tell the testsuite build system to do this? >> >> 8. Code coverage >> Testing all of clamav code with llvm is ... problematic. Unless you >> create files with every packer/archiver known to clamav it is likely >> there will be files that are compiled in but never used during the >> testsuite run. You can still test that these files compile, but >> thats it. >> >> 9. Configure tests >> Configure has 3 tests that check for gcc bugs known to break ClamAV (2 >> of which you already have, since those are in gcc's testsuite too). >> Add >> as separate "programs" to run in llvm testsuite? >> >> Thoughts? >> >> Best regards, >> Edwin >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://nondot.org/sabre/ http://llvm.org/
Seemingly Similar Threads
- [LLVMdev] Adding ClamAV to the llvm testsuite (long)
- [LLVMdev] Adding ClamAV to the llvm testsuite (long)
- [LLVMdev] Adding ClamAV to the llvm testsuite (long)
- [LLVMdev] Adding ClamAV to the llvm testsuite (long)
- [LLVMdev] Adding ClamAV to the llvm testsuite (long)