Jeffrey Yasskin wrote:> See Owen's email about docs for the 2.6 release, but it's really not > that hard to keep up with trunk. I recently merged trunk LLVM into > Unladen Swallow, and the changes I needed to make are at > http://code.google.com/p/unladen-swallow/source/detail?r=724.Thanks Jeffrey, that was really very helpful! I have Pure working with both the LLVM 2.6 release branch and the trunk now. One thing I noticed is that writing LLVM assembler code (print() methods) seems to be horribly slow now (some 4-5 times slower than in LLVM 2.5). This is a real bummer for me, since Pure's batch compiler uses those methods to produce output code which then gets fed into llvmc. Is this a known problem? Will it be fixed until the 2.6 release? Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de WWW: http://www.musikinformatik.uni-mainz.de/ag
On Aug 23, 2009, at 4:29 PM, Albert Graef <Dr.Graef at t-online.de> wrote:> > One thing I noticed is that writing LLVM assembler code (print() > methods) seems to be horribly slow now (some 4-5 times slower than in > LLVM 2.5). This is a real bummer for me, since Pure's batch compiler > uses those methods to produce output code which then gets fed into > llvmc. > > Is this a known problem?Are you printing to stderr or errs()? If so, be aware that it's no longer buffered, so it isn't well suited for bulk output. Stdout and normal file output are still buffered though. Otherwise, no. Can you describe how you're doing the printing, and anything else that might be relevant?> Will it be fixed until the 2.6 releaseIt depends on the specifics. Output file performance is important. Dan
Dan Gohman wrote:> Are you printing to stderr or errs()? If so, be aware that it's no > longer buffered, so it isn't well suited for bulk output. Stdout and > normal file output are still buffered though.I'm using raw_fd_ostream. It gets initialized like this: string error; llvm::raw_ostream *codep = file_target? new llvm::raw_fd_ostream(target.c_str(), error, llvm::raw_fd_ostream::F_Force): new llvm::raw_stdout_ostream(); if (!error.empty()) { std::cerr << "Error opening " << target << '\n'; exit(1); } llvm::raw_ostream &code = *codep; (Yeah, it's a bit convoluted, since I'm allowing output to either stdout or a disk file here.) Lateron I then iterate over the global variables and functions of my module, decide which ones to keep, and output code for these using something like f->print(code). Anything wrong with that? Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de WWW: http://www.musikinformatik.uni-mainz.de/ag
Albert Graef
2009-Aug-24 19:36 UTC
[LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)
Albert Graef wrote:> One thing I noticed is that writing LLVM assembler code (print() > methods) seems to be horribly slow now (some 4-5 times slower than in > LLVM 2.5). This is a real bummer for me, since Pure's batch compiler > uses those methods to produce output code which then gets fed into llvmc.Let me follow up with some concrete figures. Unfortunately, I don't have a minimal C++ example, but the effect is easy to reproduce with Pure 0.31 from http://pure-lang.googlecode.com/ and the attached little Pure script. (I'm quite sure that this is not a bug in the Pure interpreter, as exactly the same code runs more than a magnitude faster with LLVM 2.5 than with LLVM 2.6/2.7svn.) The given figures are user CPU times in seconds, as given by time(1), so they are not really that accurate, but the effect is so prominent that this really doesn't matter. (See below for the details on how I obtained these figures.) LLVM 2.5 LLVM 2.6 LLVM 2.7(svn) execute 1.752s 2.272s 2.256s compile 2.316s 24.458s 24.834s codegen 0.564s 22.186s 22.578s (compile ./. execute) To measure the asmwriting times, I first ran the script without generating LLVM assembler output code ("execute", pure -x hello.pure) and then again with LLVM assembler output enabled ("compile", pure -o hello.ll -c -x hello.pure). The difference between the two figures ("codegen") gives a rough estimate of the net asmwriting times. (That's really all that pure -c does; at the end of script execution it takes the IR that's already in memory and just spits it out by iterating over the global variables and the functions of the IR module and using the appropriate print() methods.) The resulting LLVM assembler file hello.ll was some 5.3 MB for LLVM 2.6/2.7svn (4.4 MB for LLVM 2.5; the assembler programs are exactly the same, though, the size differences are apparently due to formatting changes in LLVM 2.6/2.7svn). Note that the code size is quite large because the function definitions compiled from Pure's prelude are all included. The tests were performed on an AMD Phenom 9950 4x2.6GHz with 4GB RAM running Linux x86-64 2.6.27. The following configure options were used to compile LLVM (all versions) and Pure 0.31, respectively: LLVM: --enable-optimized --disable-assertions --disable-expensive-checks --enable-targets=host-only --enable-pic Pure: --enable-release So the effect is actually much *more* prominent than I first made it out to be. This is just one data point, of course, but I get an easily noticable slowdown with every Pure script I tried. In fact it's so much slower that I consider it unusable. I'm at a loss here. I'd have to debug the LLVM asmwriter code to see where exactly the bottleneck is. I haven't done that yet, but I ruled out an issue with the raw_ostream buffer sizes by trying different sizes from 256 bytes up to 64K; it doesn't change the results very much. So my question to fellow frontend developers is: Has anyone else seen this? Does anyone know a workaround? Any help will be greatly appreciated. TIA, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de WWW: http://www.musikinformatik.uni-mainz.de/ag -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hello.pure URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090824/7f34f61f/attachment.ksh>
Dan Gohman
2009-Aug-24 20:32 UTC
[LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)
Before we get too far into this, I'd like to point out that there's a ready solution for the problem of the AsmPrinter being slow: Bitcode. If you want IR reading and writing to be fast, you should consider bitcode files rather than assembly (text) files anyway. Bitcode is smaller and faster. And the API is similar, so it's usually easy to change from assembly to bitcode. That said, I've done testing of the AsmPrinter performance myself and seen only moderate slowdowns due to the formatting changes; nothing of the magnitude you're describing. I'm hoping to try out Pure to see if I can reproduce what you're seeing. As a first step, would it be possible for you to use strace - etrace=write to determine if buffering is somehow not happening? One other question the occurs to me: is Pure dumping the whole Module at once, or is it manually writing out the IR in pieces? Dan On Aug 24, 2009, at 12:36 PM, Albert Graef wrote:> Albert Graef wrote: >> One thing I noticed is that writing LLVM assembler code (print() >> methods) seems to be horribly slow now (some 4-5 times slower than in >> LLVM 2.5). This is a real bummer for me, since Pure's batch compiler >> uses those methods to produce output code which then gets fed into >> llvmc. > > Let me follow up with some concrete figures. Unfortunately, I don't > have > a minimal C++ example, but the effect is easy to reproduce with Pure > 0.31 from http://pure-lang.googlecode.com/ and the attached little > Pure > script. (I'm quite sure that this is not a bug in the Pure > interpreter, > as exactly the same code runs more than a magnitude faster with LLVM > 2.5 > than with LLVM 2.6/2.7svn.) > > The given figures are user CPU times in seconds, as given by > time(1), so > they are not really that accurate, but the effect is so prominent that > this really doesn't matter. (See below for the details on how I > obtained > these figures.) > > LLVM 2.5 LLVM 2.6 LLVM 2.7(svn) > > execute 1.752s 2.272s 2.256s > compile 2.316s 24.458s 24.834s > > codegen 0.564s 22.186s 22.578s > (compile ./. execute) > > To measure the asmwriting times, I first ran the script without > generating LLVM assembler output code ("execute", pure -x hello.pure) > and then again with LLVM assembler output enabled ("compile", pure -o > hello.ll -c -x hello.pure). The difference between the two figures > ("codegen") gives a rough estimate of the net asmwriting times. > (That's > really all that pure -c does; at the end of script execution it takes > the IR that's already in memory and just spits it out by iterating > over > the global variables and the functions of the IR module and using the > appropriate print() methods.) > > The resulting LLVM assembler file hello.ll was some 5.3 MB for LLVM > 2.6/2.7svn (4.4 MB for LLVM 2.5; the assembler programs are exactly > the > same, though, the size differences are apparently due to formatting > changes in LLVM 2.6/2.7svn). Note that the code size is quite large > because the function definitions compiled from Pure's prelude are all > included. > > The tests were performed on an AMD Phenom 9950 4x2.6GHz with 4GB RAM > running Linux x86-64 2.6.27. The following configure options were used > to compile LLVM (all versions) and Pure 0.31, respectively: > > LLVM: --enable-optimized --disable-assertions --disable-expensive- > checks > --enable-targets=host-only --enable-pic > > Pure: --enable-release > > So the effect is actually much *more* prominent than I first made it > out > to be. This is just one data point, of course, but I get an easily > noticable slowdown with every Pure script I tried. In fact it's so > much > slower that I consider it unusable. > > I'm at a loss here. I'd have to debug the LLVM asmwriter code to see > where exactly the bottleneck is. I haven't done that yet, but I ruled > out an issue with the raw_ostream buffer sizes by trying different > sizes > from 256 bytes up to 64K; it doesn't change the results very much. > > So my question to fellow frontend developers is: Has anyone else seen > this? Does anyone know a workaround? > > Any help will be greatly appreciated. > > TIA, > Albert > > -- > Dr. Albert Gr"af > Dept. of Music-Informatics, University of Mainz, Germany > Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de > WWW: http://www.musikinformatik.uni-mainz.de/ag > using system; > > fact n = if n>0 then n*fact (n-1) else 1; > > main n = do puts ["Hello, world!", str (map fact (1..n))]; > > const n = if argc>1 then sscanf (argv!1) "%d" else 10; > main n; > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)
- [LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)
- [LLVMdev] LLVMContext: Suggestions for API Changes
- [LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)
- [LLVMdev] asmwriting times (was Re: LLVMContext: Suggestions for API Changes)